Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.13intermediate5 min read

Building a Headless Scraper as a Symfony Console Command

Wrap Panther in a Symfony Console command for cron-friendly, configurable, observable PHP scrapers.

What you’ll learn

  • Scaffold a `bin/console` command that drives Panther.
  • Use Symfony's input/output abstractions for arguments, options, and progress bars.
  • Inject services (logger, DB, HTTP client) via Symfony's container.
  • Run the command on cron with environment-aware configuration.

A one-off PHP script is fine for prototyping. For production, cron jobs, parameterised runs, logging into central infrastructure, wrap your Panther scraper in a Symfony Console command. Same Panther underneath; production-ready ergonomics on top.

Project setup

Assuming you've created or have an existing Symfony project (symfony new myproject or composer create-project symfony/skeleton myproject):

cd myproject
composer require symfony/panther symfony/console

If you're not using full-stack Symfony, the standalone Symfony Console works too. The lesson uses the full-stack assumption (bin/console) because it's the common deployment shape.

Generate the command

php bin/console make:command app:scrape:products

This creates src/Command/ScrapeProductsCommand.php:

<?php
namespace App\Command;

use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Attribute\AsCommand;

#[AsCommand(name: 'app:scrape:products', description: 'Scrape product catalog')]
class ScrapeProductsCommand extends Command
{
  protected function execute(InputInterface $input, OutputInterface $output): int
  {
  $output->writeln('Starting scrape...');
  return Command::SUCCESS;
  }
}

Run it:

php bin/console app:scrape:products

You'll see "Starting scrape..." and a clean exit. That's your scaffold.

Adding Panther

<?php
namespace App\Command;

use Psr\Log\LoggerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Helper\ProgressBar;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use Symfony\Component\Panther\Client;

#[AsCommand(name: 'app:scrape:products', description: 'Scrape product catalog')]
class ScrapeProductsCommand extends Command
{
  public function __construct(private LoggerInterface $logger)
  {
  parent::__construct();
  }

  protected function configure(): void
  {
  $this
  ->addArgument('url', InputArgument::OPTIONAL,
  'Target URL',
  'https://practice.scrapingcentral.com/challenges/dynamic/infinite-scroll/button-jsappend')
  ->addOption('limit', 'l', InputOption::VALUE_REQUIRED,
  'Maximum products to fetch', 100);
  }

  protected function execute(InputInterface $input, OutputInterface $output): int
  {
  $io  = new SymfonyStyle($input, $output);
  $url = $input->getArgument('url');
  $limit = (int) $input->getOption('limit');

  $io->title("Scraping $url (limit: $limit)");

  $client = Client::createChromeClient();
  try {
  $crawler = $client->request('GET', $url);
  $client->waitFor('.product-card', 10);

  $progress = new ProgressBar($output);
  $progress->start();

  $products = [];
  while (count($products) < $limit) {
  $cards = $crawler->filter('.product-card');
  foreach ($cards as $i => $node) {
  if ($i < count($products)) continue;
  $products[] = [
  'name' => trim($node->getElementsByTagName('h2')[0]->textContent ?? ''),
  'price' => trim($node->getElementsByClassName('price')[0]->textContent ?? ''),
  ];
  $progress->advance();
  if (count($products) >= $limit) break 2;
  }

  $loadMore = $crawler->filter('button.load-more');
  if ($loadMore->count() === 0) break;
  $loadMore->click();
  $client->waitFor('.product-card:nth-of-type(' . (count($products) + 1) . ')', 10);
  }

  $progress->finish();
  $io->newLine(2);
  $io->success(sprintf('Scraped %d products', count($products)));
  $this->logger->info('scrape.done', ['count' => count($products), 'url' => $url]);
  $io->table(['Name', 'Price'], array_slice($products, 0, 5));
  return Command::SUCCESS;
  } catch (\Throwable $e) {
  $this->logger->error('scrape.failed', ['exception' => $e->getMessage()]);
  $io->error($e->getMessage());
  return Command::FAILURE;
  } finally {
  $client->quit();
  }
  }
}

A few things to notice:

  • Constructor injection. LoggerInterface is autowired. Same pattern for Doctrine, HTTP clients, your own services.
  • SymfonyStyle. Wraps OutputInterface with title, success, error, table helpers. Clean visual output.
  • InputArgument + InputOption. Positional url and named --limit. The CLI reads php bin/console app:scrape:products https://... --limit=50.
  • ProgressBar. A 100-product scrape with a visible progress bar feels much shorter than the same scrape with silent stdout.
  • try/finally with quit(). Cleanup is guaranteed even on errors.
  • Return codes. Command::SUCCESS (0) and Command::FAILURE (1), cron and CI systems care about these.

Verbosity levels

$io->writeln("Detail message", OutputInterface::VERBOSITY_VERBOSE);

Run with -v, -vv, -vvv for progressively more output. Production cron uses default; debugging uses -vv. No code changes needed.

Environment-aware config

Put the target URL and credentials in .env:

SCRAPE_BASE_URL=https://practice.scrapingcentral.com
SCRAPE_USER=demo@example.com
SCRAPE_PASS=password

Inject via %env(SCRAPE_BASE_URL)% in services.yaml, or read from $_ENV in the command. Different environments (dev, staging, prod) get different .env.local files automatically.

Persistence: writing results

The command above prints; in reality you'll write to Doctrine, a CSV, a database, or a message queue. Inject the entity manager:

public function __construct(
  private LoggerInterface $logger,
  private EntityManagerInterface $em,
) {
  parent::__construct();
}

// ...inside execute, after scraping each product:
$product = new Product();
$product->setName($name)->setPrice($price);
$this->em->persist($product);

// after the loop:
$this->em->flush();

Flush in batches if you've scraped hundreds of items, otherwise the unit-of-work gets huge.

Running on cron

The whole point of wrapping the scrape in a console command is operational cleanness:

# /etc/cron.d/scrapes
0 */6 * * * www-data /usr/bin/php /var/www/app/bin/console app:scrape:products --env=prod --no-interaction >> /var/log/scrape.log 2>&1
  • --env=prod forces the production environment (loads .env.prod).
  • --no-interaction prevents the command from prompting interactively if it's wired to.
  • >> /var/log/scrape.log 2>&1 captures stdout and stderr, cron mailing is a bad practice these days.

Locking

Two cron runs overlapping is a bug. Symfony Console has a LockableTrait:

use Symfony\Component\Console\Command\LockableTrait;

class ScrapeProductsCommand extends Command
{
  use LockableTrait;

  protected function execute(InputInterface $input, OutputInterface $output): int
  {
  if (!$this->lock()) {
  $output->writeln('Already running, exiting.');
  return Command::SUCCESS;
  }
  try {
  // ... scrape ...
  } finally {
  $this->release();
  }
  return Command::SUCCESS;
  }
}

Standard file lock, automatic cleanup. No more "two scrapers racing for the same database" bug at 3 a.m.

Hands-on lab

Open /challenges/dynamic/infinite-scroll/button-jsappend. Generate a Symfony Console command called app:scrape:button-jsappend. Wire in Panther, a progress bar, and a --limit option. Run with --limit=10, then --limit=50, and confirm you get exactly that many products. Add LockableTrait and run two instances simultaneously to verify only one actually executes.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/infinite-scroll/button-jsappend

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Building a Headless Scraper as a Symfony Console Command1 / 8

Which Symfony Console attribute registers a command class with the framework?

Score so far: 0 / 0