Symfony Console, Building Scraper CLI Commands
Every Symfony scraper starts as a Console command. Arguments, options, progress bars, dependency injection, the right way to ship a CLI scraper.
What you’ll learn
- Define a Console command with arguments and options.
- Inject services (HttpClient, EntityManager) via the constructor.
- Show progress, log structured output, exit with the right code.
A scraper is a long-running batch job. Console is Symfony's CLI framework: the right entry point for batch jobs. It gives you arguments, options, DI, progress bars, signal handling, structured logging, and exit codes, without you writing any of that infrastructure.
A minimal command
<?php
// src/Command/ScrapeProductsCommand.php
namespace App\Command;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use Symfony\Contracts\HttpClient\HttpClientInterface;
#[AsCommand(
name: 'scrape:products',
description: 'Scrape product listings from Catalog108',
)]
class ScrapeProductsCommand extends Command
{
public function __construct(
private readonly HttpClientInterface $http,
) {
parent::__construct();
}
protected function configure(): void
{
$this
->addArgument('category', null, 'Category slug to scrape')
->addOption('limit', 'l', InputOption::VALUE_REQUIRED, 'Max items', 100)
->addOption('dry-run', null, InputOption::VALUE_NONE, 'Skip persistence');
}
protected function execute(InputInterface $i, OutputInterface $o): int
{
$io = new SymfonyStyle($i, $o);
$category = $i->getArgument('category') ?: 'all';
$limit = (int) $i->getOption('limit');
$dryRun = $i->getOption('dry-run');
$io->title("Scraping $category (limit=$limit, dry-run=" . ($dryRun?'yes':'no') . ")");
$resp = $this->http->request('GET', "https://practice.scrapingcentral.com/products?cat=$category");
$io->success(sprintf('Got %d bytes', strlen($resp->getContent())));
return Command::SUCCESS;
}
}
Run:
php bin/console scrape:products electronics --limit=50 --dry-run
Symfony auto-discovers the command (no manual registration). DI injects the HttpClient. SymfonyStyle gives you a clean output API.
Arguments vs options
- Arguments are positional and usually required.
scrape:products electronics,electronicsis the category argument. - Options are flags with
--name. Optional, named, can have short aliases (-l).
When in doubt: required positional thing = argument; optional flag = option.
Service injection
Symfony's container auto-wires services into the command constructor. No setup beyond declaring the type.
public function __construct(
private readonly HttpClientInterface $http,
private readonly EntityManagerInterface $em,
private readonly LoggerInterface $logger,
) {
parent::__construct();
}
The container resolves each. LoggerInterface is auto-wired to the framework's logger. EntityManagerInterface to Doctrine. Custom services follow the same pattern.
Progress bars
For long crawls, show progress:
$urls = $this->loadUrls();
$io->progressStart(count($urls));
foreach ($urls as $url) {
$this->scrape($url);
$io->progressAdvance();
}
$io->progressFinish();
progressStart(N) initializes; progressAdvance() ticks; progressFinish() cleans up. For unknown totals, omit the count.
Exit codes
Always return one of:
Command::SUCCESS(0), happy pathCommand::FAILURE(1), generic failureCommand::INVALID(2), bad input
Exit codes matter for cron jobs, systemd, GitHub Actions, every wrapper checks the exit code to decide whether to alert.
Signal handling
For long-running scrapers, you want graceful shutdown on Ctrl+C. Implement SignalableCommandInterface:
use Symfony\Component\Console\Command\SignalableCommandInterface;
class ScrapeProductsCommand extends Command implements SignalableCommandInterface
{
private bool $stop = false;
public function getSubscribedSignals(): array
{
return [SIGINT, SIGTERM];
}
public function handleSignal(int $signal, int|false $previousExitCode = 0): int|false
{
$this->stop = true;
return false; // continue execution; we'll exit the loop ourselves
}
protected function execute(InputInterface $i, OutputInterface $o): int
{
foreach ($urls as $url) {
if ($this->stop) {
$o->writeln('graceful shutdown');
break;
}
$this->scrape($url);
}
return Command::SUCCESS;
}
}
The handler flips a flag; the main loop checks it. Clean shutdown is the difference between losing the last 1000 scrapes and a tidy restart.
Verbosity
Console supports -v, -vv, -vvv verbosity flags. Use them.
if ($o->isVerbose()) {
$o->writeln("Detailed log line");
}
if ($o->isVeryVerbose()) {
$o->writeln(json_encode($debugData));
}
Default output is the happy path. -v shows progress. -vv shows internal state. -vvv shows everything. Don't print debug noise at default verbosity.
Output as JSON
For pipeline-friendly output, use the --format=json convention:
$this->addOption('format', null, InputOption::VALUE_REQUIRED, 'Output format', 'text');
// in execute:
if ($i->getOption('format') === 'json') {
$o->writeln(json_encode($result));
} else {
$io->table(['url', 'price'], $rows);
}
A scraper that can emit JSON is one piped into jq away from being part of a bigger pipeline.
Multiple sub-commands
For a scraper family, namespace commands:
scrape:products
scrape:reviews
scrape:categories
scrape:all
Each is its own class. The colon convention groups them. Run bin/console list scrape to see the family.
Hands-on lab
Build a scrape:products command that:
- Takes a
--limitoption and acategoryargument. - Hits
/productson Catalog108 with HttpClient. - Parses with DomCrawler.
- Outputs a SymfonyStyle table by default, JSON when
--format=json. - Handles SIGINT cleanly.
- Returns appropriate exit codes.
Run it three ways: default, -v, --format=json | jq '.[] | .price'. You should feel the difference between a debug-spew script and a clean composable CLI.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.