PHP, Composer, and a Modern Dev Environment
PHP 8.x is a serious language. Here's how to set it up cleanly and use Composer the way modern PHP projects expect.
What you’ll learn
- Install a recent PHP (8.2+) without relying on the OS-bundled version.
- Initialise a Composer project and understand `composer.json` vs `composer.lock`.
- Install packages, autoload your own code via PSR-4, and run scripts.
- Use a sane local dev setup: PHP CLI, built-in server, or Docker, pick one.
PHP has a reputation problem from its 2005-era past. Modern PHP, 8.2 onwards, is fast, has strict types, named arguments, enums, readonly properties, fibers, and an ecosystem (Symfony, Laravel, Guzzle, API Platform) that's competitive with any language. This sub-path treats it as first-class.
Install a recent PHP
| Platform | Recommended |
|---|---|
| macOS | Homebrew: brew install php@8.3 |
| Linux (Debian/Ubuntu) | Ondrej PPA: sudo add-apt-repository ppa:ondrej/php && sudo apt install php8.3 php8.3-cli php8.3-curl php8.3-xml php8.3-mbstring |
| Windows | windows.php.net, pick the thread-safe (TS) build |
Verify:
php -v
# PHP 8.3.x (cli)
Extensions you'll want for scraping:
curl, HTTP client (Guzzle uses it under the hood)domandxml, for DOMDocument / DomCrawlermbstring, multi-byte string handlingintl, internationalisation; needed for unicode-aware lowercasingpdo_mysql/pdo_pgsql, databaseopenssl, TLS
Most are bundled by default; verify with php -m.
Composer, the dependency manager
Composer is the equivalent of npm for PHP and pip for Python. One canonical tool, used by every modern PHP project.
Install
# macOS / Linux
curl -sS https://getcomposer.org/installer | php
sudo mv composer.phar /usr/local/bin/composer
composer --version
# Or via Homebrew
brew install composer
Windows: download Composer-Setup.exe from getcomposer.org.
Start a new project
mkdir my-scraper && cd my-scraper
composer init # interactive, answer prompts; defaults are fine
composer init produces composer.json:
{
"name": "you/my-scraper",
"description": "A web scraper",
"type": "project",
"require": {},
"autoload": {
"psr-4": {
"MyScraper\\": "src/"
}
}
}
Adding dependencies
composer require guzzlehttp/guzzle symfony/dom-crawler symfony/css-selector
Composer:
- Resolves the latest compatible versions.
- Downloads them into
vendor/. - Adds the constraint to
composer.json. - Generates
composer.lockwith the exact resolved versions.
Commit both composer.json (the constraints) and composer.lock (the resolved versions). Don't commit vendor/, it's regenerated by composer install.
# .gitignore
/vendor/
composer.json vs composer.lock
| File | Purpose |
|---|---|
composer.json |
Your intent, "I want Guzzle ^7.5 and DomCrawler ^7.0" |
composer.lock |
The exact versions Composer resolved when you ran composer install |
vendor/ |
The actual code, fetched per the lock, gitignored |
composer install reads the lock and produces a deterministic vendor tree. composer update re-resolves to latest within constraints and rewrites the lock. In a team, you run install on every checkout; only one person runs update deliberately when bumping dependencies.
Autoloading
The autoload.psr-4 section in composer.json maps namespaces to directories:
"autoload": {
"psr-4": {
"MyScraper\\": "src/"
}
}
This means a class MyScraper\Client\HttpClient lives at src/Client/HttpClient.php. After editing, run composer dump-autoload to refresh the autoloader map.
In every entry point, you load Composer's autoloader once:
<?php
require __DIR__ . '/../vendor/autoload.php';
use MyScraper\Client\HttpClient;
$client = new HttpClient();
PSR-4 is the modern PHP convention. Older code uses require lines manually; don't.
Running a script
For a CLI scraper:
php scripts/crawl.php
For a local web app (e.g. when developing Catalog108):
php -S 127.0.0.1:8080 -t public public/index.php
# http://127.0.0.1:8080
PHP's built-in server (-S) is fine for development, single-threaded, serves static files, runs PHP files. Not for production. Production uses nginx/Apache + php-fpm.
A typical project layout
my-scraper/
├── composer.json
├── composer.lock
├── .gitignore ← /vendor/
├── vendor/ ← composer install output
├── src/
│ ├── Client/
│ │ └── HttpClient.php ← namespaced MyScraper\Client\HttpClient
│ ├── Parser/
│ │ └── ProductParser.php
│ └── Store/
│ └── CsvWriter.php
├── scripts/
│ └── crawl.php ← entry point
├── public/ ← web docroot (when serving)
│ └── index.php
└── tests/
└── ParserTest.php
Composer scripts
composer.json can define named tasks:
"scripts": {
"serve": "php -S 127.0.0.1:8080 -t public public/index.php",
"test": "phpunit",
"lint": "php-cs-fixer fix"
}
Then composer serve, composer test, etc. The Catalog108 codebase uses this pattern, see its composer.json.
Production deployment notes
PHP traditionally deploys as files-on-Apache (the CGI / mod_php / php-fpm model), not as a long-running process. This makes it perfect for shared hosting like Hostinger Premium:
- Upload files via FTP.
- The hosting's Apache + php-fpm executes them on each request.
- No daemons to keep running.
Hence Catalog108's choice of pure PHP, it's the only stack that runs cleanly on shared hosting. Production deployment is covered in DEPLOY.md.
Docker (when needed)
For more complex setups (multiple PHP versions, isolated extensions), Docker is the modern answer:
FROM php:8.3-cli
RUN apt-get update && apt-get install -y libzip-dev libxml2-dev && \
docker-php-ext-install zip dom
COPY composer.json composer.lock ./
RUN curl -sS https://getcomposer.org/installer | php && php composer.phar install
COPY . .
CMD ["php", "scripts/crawl.php"]
Docker is overkill for "I want to run a scraper on my laptop", but essential when production has a specific PHP version + extension set you can't match locally.
Common gotchas
-
phpvsphp-clivsphp-fpm. The CLI binary (php) is what you run scripts with.php-fpmis the server-side worker pool. Same language, different process model. Hostinger Premium runs php-fpm. -
Different
php.inifiles. CLI and FPM often have different config files (php-cli.inivsphp.inifor fpm). Amemory_limitset in one won't apply to the other.php --inishows the CLI's; check the hosting panel for FPM's. -
Composer in CI must use
composer install --no-dev --optimize-autoloaderfor production, skip dev dependencies (PHPUnit etc.) and dump an optimised class map.
Hands-on lab
In a new directory, run composer init (accept defaults). Then:
composer require guzzlehttp/guzzle
Create scripts/hello.php:
<?php
require __DIR__ . '/../vendor/autoload.php';
use GuzzleHttp\Client;
$client = new Client();
$res = $client->get('https://practice.scrapingcentral.com/');
echo $res->getStatusCode() . PHP_EOL;
echo substr((string) $res->getBody(), 0, 200) . PHP_EOL;
Run php scripts/hello.php. You should see 200 and the first 200 chars of Catalog108's homepage. You now have a working PHP scraping environment.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.