Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

F16beginner5 min read

PHP, Composer, and a Modern Dev Environment

PHP 8.x is a serious language. Here's how to set it up cleanly and use Composer the way modern PHP projects expect.

What you’ll learn

  • Install a recent PHP (8.2+) without relying on the OS-bundled version.
  • Initialise a Composer project and understand `composer.json` vs `composer.lock`.
  • Install packages, autoload your own code via PSR-4, and run scripts.
  • Use a sane local dev setup: PHP CLI, built-in server, or Docker, pick one.

PHP has a reputation problem from its 2005-era past. Modern PHP, 8.2 onwards, is fast, has strict types, named arguments, enums, readonly properties, fibers, and an ecosystem (Symfony, Laravel, Guzzle, API Platform) that's competitive with any language. This sub-path treats it as first-class.

Install a recent PHP

Platform Recommended
macOS Homebrew: brew install php@8.3
Linux (Debian/Ubuntu) Ondrej PPA: sudo add-apt-repository ppa:ondrej/php && sudo apt install php8.3 php8.3-cli php8.3-curl php8.3-xml php8.3-mbstring
Windows windows.php.net, pick the thread-safe (TS) build

Verify:

php -v
# PHP 8.3.x (cli)

Extensions you'll want for scraping:

  • curl, HTTP client (Guzzle uses it under the hood)
  • dom and xml, for DOMDocument / DomCrawler
  • mbstring, multi-byte string handling
  • intl, internationalisation; needed for unicode-aware lowercasing
  • pdo_mysql / pdo_pgsql, database
  • openssl, TLS

Most are bundled by default; verify with php -m.

Composer, the dependency manager

Composer is the equivalent of npm for PHP and pip for Python. One canonical tool, used by every modern PHP project.

Install

# macOS / Linux
curl -sS https://getcomposer.org/installer | php
sudo mv composer.phar /usr/local/bin/composer
composer --version
# Or via Homebrew
brew install composer

Windows: download Composer-Setup.exe from getcomposer.org.

Start a new project

mkdir my-scraper && cd my-scraper
composer init  # interactive, answer prompts; defaults are fine

composer init produces composer.json:

{
  "name": "you/my-scraper",
  "description": "A web scraper",
  "type": "project",
  "require": {},
  "autoload": {
  "psr-4": {
  "MyScraper\\": "src/"
  }
  }
}

Adding dependencies

composer require guzzlehttp/guzzle symfony/dom-crawler symfony/css-selector

Composer:

  1. Resolves the latest compatible versions.
  2. Downloads them into vendor/.
  3. Adds the constraint to composer.json.
  4. Generates composer.lock with the exact resolved versions.

Commit both composer.json (the constraints) and composer.lock (the resolved versions). Don't commit vendor/, it's regenerated by composer install.

# .gitignore
/vendor/

composer.json vs composer.lock

File Purpose
composer.json Your intent, "I want Guzzle ^7.5 and DomCrawler ^7.0"
composer.lock The exact versions Composer resolved when you ran composer install
vendor/ The actual code, fetched per the lock, gitignored

composer install reads the lock and produces a deterministic vendor tree. composer update re-resolves to latest within constraints and rewrites the lock. In a team, you run install on every checkout; only one person runs update deliberately when bumping dependencies.

Autoloading

The autoload.psr-4 section in composer.json maps namespaces to directories:

"autoload": {
  "psr-4": {
  "MyScraper\\": "src/"
  }
}

This means a class MyScraper\Client\HttpClient lives at src/Client/HttpClient.php. After editing, run composer dump-autoload to refresh the autoloader map.

In every entry point, you load Composer's autoloader once:

<?php
require __DIR__ . '/../vendor/autoload.php';

use MyScraper\Client\HttpClient;

$client = new HttpClient();

PSR-4 is the modern PHP convention. Older code uses require lines manually; don't.

Running a script

For a CLI scraper:

php scripts/crawl.php

For a local web app (e.g. when developing Catalog108):

php -S 127.0.0.1:8080 -t public public/index.php
# http://127.0.0.1:8080

PHP's built-in server (-S) is fine for development, single-threaded, serves static files, runs PHP files. Not for production. Production uses nginx/Apache + php-fpm.

A typical project layout

my-scraper/
├── composer.json
├── composer.lock
├── .gitignore  ← /vendor/
├── vendor/  ← composer install output
├── src/
│  ├── Client/
│  │  └── HttpClient.php  ← namespaced MyScraper\Client\HttpClient
│  ├── Parser/
│  │  └── ProductParser.php
│  └── Store/
│  └── CsvWriter.php
├── scripts/
│  └── crawl.php  ← entry point
├── public/  ← web docroot (when serving)
│  └── index.php
└── tests/
  └── ParserTest.php

Composer scripts

composer.json can define named tasks:

"scripts": {
  "serve": "php -S 127.0.0.1:8080 -t public public/index.php",
  "test":  "phpunit",
  "lint":  "php-cs-fixer fix"
}

Then composer serve, composer test, etc. The Catalog108 codebase uses this pattern, see its composer.json.

Production deployment notes

PHP traditionally deploys as files-on-Apache (the CGI / mod_php / php-fpm model), not as a long-running process. This makes it perfect for shared hosting like Hostinger Premium:

  • Upload files via FTP.
  • The hosting's Apache + php-fpm executes them on each request.
  • No daemons to keep running.

Hence Catalog108's choice of pure PHP, it's the only stack that runs cleanly on shared hosting. Production deployment is covered in DEPLOY.md.

Docker (when needed)

For more complex setups (multiple PHP versions, isolated extensions), Docker is the modern answer:

FROM php:8.3-cli
RUN apt-get update && apt-get install -y libzip-dev libxml2-dev && \
  docker-php-ext-install zip dom
COPY composer.json composer.lock ./
RUN curl -sS https://getcomposer.org/installer | php && php composer.phar install
COPY . .
CMD ["php", "scripts/crawl.php"]

Docker is overkill for "I want to run a scraper on my laptop", but essential when production has a specific PHP version + extension set you can't match locally.

Common gotchas

  1. php vs php-cli vs php-fpm. The CLI binary (php) is what you run scripts with. php-fpm is the server-side worker pool. Same language, different process model. Hostinger Premium runs php-fpm.

  2. Different php.ini files. CLI and FPM often have different config files (php-cli.ini vs php.ini for fpm). A memory_limit set in one won't apply to the other. php --ini shows the CLI's; check the hosting panel for FPM's.

  3. Composer in CI must use composer install --no-dev --optimize-autoloader for production, skip dev dependencies (PHPUnit etc.) and dump an optimised class map.

Hands-on lab

In a new directory, run composer init (accept defaults). Then:

composer require guzzlehttp/guzzle

Create scripts/hello.php:

<?php
require __DIR__ . '/../vendor/autoload.php';

use GuzzleHttp\Client;

$client = new Client();
$res = $client->get('https://practice.scrapingcentral.com/');
echo $res->getStatusCode() . PHP_EOL;
echo substr((string) $res->getBody(), 0, 200) . PHP_EOL;

Run php scripts/hello.php. You should see 200 and the first 200 chars of Catalog108's homepage. You now have a working PHP scraping environment.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

PHP, Composer, and a Modern Dev Environment1 / 8

What's the relationship between composer.json and composer.lock?

Score so far: 0 / 0