Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.9intermediate4 min read

Symfony Console, Building Scraper CLI Commands

Every Symfony scraper starts as a Console command. Arguments, options, progress bars, dependency injection, the right way to ship a CLI scraper.

What you’ll learn

  • Define a Console command with arguments and options.
  • Inject services (HttpClient, EntityManager) via the constructor.
  • Show progress, log structured output, exit with the right code.

A scraper is a long-running batch job. Console is Symfony's CLI framework: the right entry point for batch jobs. It gives you arguments, options, DI, progress bars, signal handling, structured logging, and exit codes, without you writing any of that infrastructure.

A minimal command

<?php
// src/Command/ScrapeProductsCommand.php
namespace App\Command;

use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use Symfony\Contracts\HttpClient\HttpClientInterface;

#[AsCommand(
  name: 'scrape:products',
  description: 'Scrape product listings from Catalog108',
)]
class ScrapeProductsCommand extends Command
{
  public function __construct(
  private readonly HttpClientInterface $http,
  ) {
  parent::__construct();
  }

  protected function configure(): void
  {
  $this
  ->addArgument('category', null, 'Category slug to scrape')
  ->addOption('limit', 'l', InputOption::VALUE_REQUIRED, 'Max items', 100)
  ->addOption('dry-run', null, InputOption::VALUE_NONE, 'Skip persistence');
  }

  protected function execute(InputInterface $i, OutputInterface $o): int
  {
  $io = new SymfonyStyle($i, $o);
  $category = $i->getArgument('category') ?: 'all';
  $limit = (int) $i->getOption('limit');
  $dryRun = $i->getOption('dry-run');

  $io->title("Scraping $category (limit=$limit, dry-run=" . ($dryRun?'yes':'no') . ")");

  $resp = $this->http->request('GET', "https://practice.scrapingcentral.com/products?cat=$category");
  $io->success(sprintf('Got %d bytes', strlen($resp->getContent())));

  return Command::SUCCESS;
  }
}

Run:

php bin/console scrape:products electronics --limit=50 --dry-run

Symfony auto-discovers the command (no manual registration). DI injects the HttpClient. SymfonyStyle gives you a clean output API.

Arguments vs options

  • Arguments are positional and usually required. scrape:products electronics, electronics is the category argument.
  • Options are flags with --name. Optional, named, can have short aliases (-l).

When in doubt: required positional thing = argument; optional flag = option.

Service injection

Symfony's container auto-wires services into the command constructor. No setup beyond declaring the type.

public function __construct(
  private readonly HttpClientInterface $http,
  private readonly EntityManagerInterface $em,
  private readonly LoggerInterface $logger,
) {
  parent::__construct();
}

The container resolves each. LoggerInterface is auto-wired to the framework's logger. EntityManagerInterface to Doctrine. Custom services follow the same pattern.

Progress bars

For long crawls, show progress:

$urls = $this->loadUrls();
$io->progressStart(count($urls));
foreach ($urls as $url) {
  $this->scrape($url);
  $io->progressAdvance();
}
$io->progressFinish();

progressStart(N) initializes; progressAdvance() ticks; progressFinish() cleans up. For unknown totals, omit the count.

Exit codes

Always return one of:

  • Command::SUCCESS (0), happy path
  • Command::FAILURE (1), generic failure
  • Command::INVALID (2), bad input

Exit codes matter for cron jobs, systemd, GitHub Actions, every wrapper checks the exit code to decide whether to alert.

Signal handling

For long-running scrapers, you want graceful shutdown on Ctrl+C. Implement SignalableCommandInterface:

use Symfony\Component\Console\Command\SignalableCommandInterface;

class ScrapeProductsCommand extends Command implements SignalableCommandInterface
{
  private bool $stop = false;

  public function getSubscribedSignals(): array
  {
  return [SIGINT, SIGTERM];
  }

  public function handleSignal(int $signal, int|false $previousExitCode = 0): int|false
  {
  $this->stop = true;
  return false; // continue execution; we'll exit the loop ourselves
  }

  protected function execute(InputInterface $i, OutputInterface $o): int
  {
  foreach ($urls as $url) {
  if ($this->stop) {
  $o->writeln('graceful shutdown');
  break;
  }
  $this->scrape($url);
  }
  return Command::SUCCESS;
  }
}

The handler flips a flag; the main loop checks it. Clean shutdown is the difference between losing the last 1000 scrapes and a tidy restart.

Verbosity

Console supports -v, -vv, -vvv verbosity flags. Use them.

if ($o->isVerbose()) {
  $o->writeln("Detailed log line");
}
if ($o->isVeryVerbose()) {
  $o->writeln(json_encode($debugData));
}

Default output is the happy path. -v shows progress. -vv shows internal state. -vvv shows everything. Don't print debug noise at default verbosity.

Output as JSON

For pipeline-friendly output, use the --format=json convention:

$this->addOption('format', null, InputOption::VALUE_REQUIRED, 'Output format', 'text');

// in execute:
if ($i->getOption('format') === 'json') {
  $o->writeln(json_encode($result));
} else {
  $io->table(['url', 'price'], $rows);
}

A scraper that can emit JSON is one piped into jq away from being part of a bigger pipeline.

Multiple sub-commands

For a scraper family, namespace commands:

scrape:products
scrape:reviews
scrape:categories
scrape:all

Each is its own class. The colon convention groups them. Run bin/console list scrape to see the family.

Hands-on lab

Build a scrape:products command that:

  1. Takes a --limit option and a category argument.
  2. Hits /products on Catalog108 with HttpClient.
  3. Parses with DomCrawler.
  4. Outputs a SymfonyStyle table by default, JSON when --format=json.
  5. Handles SIGINT cleanly.
  6. Returns appropriate exit codes.

Run it three ways: default, -v, --format=json | jq '.[] | .price'. You should feel the difference between a debug-spew script and a clean composable CLI.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Symfony Console, Building Scraper CLI Commands1 / 8

Which attribute registers a class as a Symfony Console command without manual config?

Score so far: 0 / 0