Building a Headless Scraper as a Symfony Console Command
Wrap Panther in a Symfony Console command for cron-friendly, configurable, observable PHP scrapers.
What you’ll learn
- Scaffold a `bin/console` command that drives Panther.
- Use Symfony's input/output abstractions for arguments, options, and progress bars.
- Inject services (logger, DB, HTTP client) via Symfony's container.
- Run the command on cron with environment-aware configuration.
A one-off PHP script is fine for prototyping. For production, cron jobs, parameterised runs, logging into central infrastructure, wrap your Panther scraper in a Symfony Console command. Same Panther underneath; production-ready ergonomics on top.
Project setup
Assuming you've created or have an existing Symfony project (symfony new myproject or composer create-project symfony/skeleton myproject):
cd myproject
composer require symfony/panther symfony/console
If you're not using full-stack Symfony, the standalone Symfony Console works too. The lesson uses the full-stack assumption (bin/console) because it's the common deployment shape.
Generate the command
php bin/console make:command app:scrape:products
This creates src/Command/ScrapeProductsCommand.php:
<?php
namespace App\Command;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Attribute\AsCommand;
#[AsCommand(name: 'app:scrape:products', description: 'Scrape product catalog')]
class ScrapeProductsCommand extends Command
{
protected function execute(InputInterface $input, OutputInterface $output): int
{
$output->writeln('Starting scrape...');
return Command::SUCCESS;
}
}
Run it:
php bin/console app:scrape:products
You'll see "Starting scrape..." and a clean exit. That's your scaffold.
Adding Panther
<?php
namespace App\Command;
use Psr\Log\LoggerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Helper\ProgressBar;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use Symfony\Component\Panther\Client;
#[AsCommand(name: 'app:scrape:products', description: 'Scrape product catalog')]
class ScrapeProductsCommand extends Command
{
public function __construct(private LoggerInterface $logger)
{
parent::__construct();
}
protected function configure(): void
{
$this
->addArgument('url', InputArgument::OPTIONAL,
'Target URL',
'https://practice.scrapingcentral.com/challenges/dynamic/infinite-scroll/button-jsappend')
->addOption('limit', 'l', InputOption::VALUE_REQUIRED,
'Maximum products to fetch', 100);
}
protected function execute(InputInterface $input, OutputInterface $output): int
{
$io = new SymfonyStyle($input, $output);
$url = $input->getArgument('url');
$limit = (int) $input->getOption('limit');
$io->title("Scraping $url (limit: $limit)");
$client = Client::createChromeClient();
try {
$crawler = $client->request('GET', $url);
$client->waitFor('.product-card', 10);
$progress = new ProgressBar($output);
$progress->start();
$products = [];
while (count($products) < $limit) {
$cards = $crawler->filter('.product-card');
foreach ($cards as $i => $node) {
if ($i < count($products)) continue;
$products[] = [
'name' => trim($node->getElementsByTagName('h2')[0]->textContent ?? ''),
'price' => trim($node->getElementsByClassName('price')[0]->textContent ?? ''),
];
$progress->advance();
if (count($products) >= $limit) break 2;
}
$loadMore = $crawler->filter('button.load-more');
if ($loadMore->count() === 0) break;
$loadMore->click();
$client->waitFor('.product-card:nth-of-type(' . (count($products) + 1) . ')', 10);
}
$progress->finish();
$io->newLine(2);
$io->success(sprintf('Scraped %d products', count($products)));
$this->logger->info('scrape.done', ['count' => count($products), 'url' => $url]);
$io->table(['Name', 'Price'], array_slice($products, 0, 5));
return Command::SUCCESS;
} catch (\Throwable $e) {
$this->logger->error('scrape.failed', ['exception' => $e->getMessage()]);
$io->error($e->getMessage());
return Command::FAILURE;
} finally {
$client->quit();
}
}
}
A few things to notice:
- Constructor injection.
LoggerInterfaceis autowired. Same pattern for Doctrine, HTTP clients, your own services. - SymfonyStyle. Wraps
OutputInterfacewith title, success, error, table helpers. Clean visual output. - InputArgument + InputOption. Positional
urland named--limit. The CLI readsphp bin/console app:scrape:products https://... --limit=50. - ProgressBar. A 100-product scrape with a visible progress bar feels much shorter than the same scrape with silent stdout.
- try/finally with
quit(). Cleanup is guaranteed even on errors. - Return codes.
Command::SUCCESS(0) andCommand::FAILURE(1), cron and CI systems care about these.
Verbosity levels
$io->writeln("Detail message", OutputInterface::VERBOSITY_VERBOSE);
Run with -v, -vv, -vvv for progressively more output. Production cron uses default; debugging uses -vv. No code changes needed.
Environment-aware config
Put the target URL and credentials in .env:
SCRAPE_BASE_URL=https://practice.scrapingcentral.com
SCRAPE_USER=demo@example.com
SCRAPE_PASS=password
Inject via %env(SCRAPE_BASE_URL)% in services.yaml, or read from $_ENV in the command. Different environments (dev, staging, prod) get different .env.local files automatically.
Persistence: writing results
The command above prints; in reality you'll write to Doctrine, a CSV, a database, or a message queue. Inject the entity manager:
public function __construct(
private LoggerInterface $logger,
private EntityManagerInterface $em,
) {
parent::__construct();
}
// ...inside execute, after scraping each product:
$product = new Product();
$product->setName($name)->setPrice($price);
$this->em->persist($product);
// after the loop:
$this->em->flush();
Flush in batches if you've scraped hundreds of items, otherwise the unit-of-work gets huge.
Running on cron
The whole point of wrapping the scrape in a console command is operational cleanness:
# /etc/cron.d/scrapes
0 */6 * * * www-data /usr/bin/php /var/www/app/bin/console app:scrape:products --env=prod --no-interaction >> /var/log/scrape.log 2>&1
--env=prodforces the production environment (loads.env.prod).--no-interactionprevents the command from prompting interactively if it's wired to.>> /var/log/scrape.log 2>&1captures stdout and stderr, cron mailing is a bad practice these days.
Locking
Two cron runs overlapping is a bug. Symfony Console has a LockableTrait:
use Symfony\Component\Console\Command\LockableTrait;
class ScrapeProductsCommand extends Command
{
use LockableTrait;
protected function execute(InputInterface $input, OutputInterface $output): int
{
if (!$this->lock()) {
$output->writeln('Already running, exiting.');
return Command::SUCCESS;
}
try {
// ... scrape ...
} finally {
$this->release();
}
return Command::SUCCESS;
}
}
Standard file lock, automatic cleanup. No more "two scrapers racing for the same database" bug at 3 a.m.
Hands-on lab
Open /challenges/dynamic/infinite-scroll/button-jsappend. Generate a Symfony Console command called app:scrape:button-jsappend. Wire in Panther, a progress bar, and a --limit option. Run with --limit=10, then --limit=50, and confirm you get exactly that many products. Add LockableTrait and run two instances simultaneously to verify only one actually executes.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/dynamic/infinite-scroll/button-jsappendQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.