Symfony BrowserKit, Simulating a Browser in Pure PHP, Static Scraping

BrowserKit gives you browser-like navigation in pure PHP, cookies, history, form submission, follow-redirect, without launching a real browser. The right tool for stateful scraping flows.

When a flow requires multiple steps with state, log in, follow a link, fill a form, submit, manually shepherding cookies through Guzzle or Symfony HttpClient gets old. BrowserKit's HttpBrowser is exactly the abstraction you want: it behaves like a stateless headless browser (no JS, no rendering, just HTTP + DOM).

Install

composer require symfony/browser-kit symfony/http-client symfony/dom-crawler symfony/css-selector

HttpBrowser wraps an HttpClient; you'll also want dom-crawler and css-selector for filtering the parsed responses.

Your first browser session

<?php
require 'vendor/autoload.php';

use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$browser = new HttpBrowser(HttpClient::create([
  'headers' => ['User-Agent' => 'Mozilla/5.0 (compatible; learning-scraper)'],
]));

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/');
echo $crawler->filter('h1')->text() . "\n";
echo $browser->getResponse()->getStatusCode() . "\n";

$browser->request() returns a Crawler over the response body, already parsed, ready to query. Cookies are captured automatically; subsequent requests carry them.

Logging in via form submission

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/account/login');

$form = $crawler->selectButton('Login')->form([
  'username' => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
]);

$browser->submit($form);

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/dashboard');
echo $crawler->filter('h1')->text() . "\n";  // "Welcome, student"

What just happened:

GET the login page; Crawler parses the response.
selectButton('Login') finds the submit button by its visible label.
->form([...]) returns a Form object with the page's existing fields plus your overrides.
$browser->submit($form) POSTs the form with all the right fields (including hidden ones like CSRF tokens) AND follows the redirect to the post-login destination.
Cookies set during login persist on the browser; the next request to /dashboard is authenticated.

You didn't have to write a single line about cookies, redirects, or finding the form's action URL. That's the BrowserKit value.

Selecting forms, three ways

// By button label (visible text)
$form = $crawler->selectButton('Login')->form();

// By form attribute via filter
$form = $crawler->filter('form#login')->form();

// By generic selector + form(), works on any node inside the form
$form = $crawler->filter('input[name="username"]')->form();

The button-label form is the most "user-like", find the form the user would interact with by what they'd click.

Hidden fields, including CSRF tokens

When the page has hidden fields (most CSRF setups), ->form() includes them automatically with their current values. You only need to override the ones the user would actually type:

$form = $crawler->selectButton('Submit')->form([
  'username' => '...',
  'password' => '...',
  // _csrf_token field is included automatically with its existing value
]);

This is the killer feature compared to manual Guzzle: no need to parse the CSRF token out yourself and re-attach it.

Follow redirects manually

By default, BrowserKit follows redirects. To inspect them:

$browser->followRedirects(false);
$browser->submit($form);
echo $browser->getResponse()->getStatusCode();  // 302
echo $browser->getResponse()->getHeader('Location');

$browser->followRedirect();  // explicitly follow

Useful when you want to capture intermediate state (cookies set on the redirect hop, the redirect URL itself).

Cookies and history

print_r($browser->getCookieJar()->all());  // array of cookies
print_r($browser->getHistory());  // navigation history

// Restore a previous request
$browser->back();
$browser->forward();
$browser->reload();

The cookie jar is shared across all requests on this browser instance, that's how persistence works. You can also seed it manually:

use Symfony\Component\BrowserKit\Cookie;

$browser->getCookieJar()->set(new Cookie('session_id', 'abc123', null, '/', 'practice.scrapingcentral.com'));

Inspecting the last response

$resp = $browser->getResponse();
echo $resp->getStatusCode();
echo $resp->getHeader('Content-Type');
echo $resp->getContent();  // raw body

// The DomCrawler corresponding to the last response:
$crawler = $browser->getCrawler();

$browser->getCrawler() always returns the Crawler from the most recent navigation. Useful when you've called submit() and want to inspect the result page without re-parsing.

Click links by visible text

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/blog');

$link = $crawler->selectLink('Next page')->link();
$crawler = $browser->click($link);

selectLink('label') finds an <a> by its visible text. click(link) follows it as a GET (or POST if the link is somehow a form trigger). For pagination flows this is dramatically cleaner than parsing URL patterns manually.

A complete login + scrape

<?php
require 'vendor/autoload.php';

use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$browser = new HttpBrowser(HttpClient::create());

// 1. Land on login
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/account/login');

// 2. Submit credentials
$form = $crawler->selectButton('Login')->form([
  'username' => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
]);
$crawler = $browser->submit($form);

// 3. Confirm we landed somewhere logged-in
if (!str_contains($browser->getResponse()->getContent(), 'Welcome')) {
  throw new RuntimeException('Login failed');
}

// 4. Scrape protected pages
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/orders');
$orders = $crawler->filter('tr.order')->each(fn($c) => [
  'id'  => $c->filter('.id')->text(),
  'total' => $c->filter('.total')->text(),
]);

print_r($orders);

That's a working, cookie-aware, CSRF-handling, redirect-following PHP scraper in 25 lines. No browser, no JS.

When BrowserKit isn't enough

BrowserKit doesn't run JavaScript. If a site renders content client-side (React, Vue, etc.), the Crawler will see a stripped-down shell. That's the Dynamic Web sub-path's territory (Panther for PHP, Playwright for Python, both drive real browsers). For server-rendered sites with stateful flows, BrowserKit is the sweet spot.

Hands-on lab

Use HttpBrowser to log in at /account/login with the demo credentials, then navigate to /dashboard and confirm the response indicates an authenticated session. Inspect $browser->getCookieJar()->all() to see what cookies are now stored. Then call $browser->getHistory() to see the navigation chain so far.

Symfony BrowserKit, Simulating a Browser in Pure PHP

What you’ll learn

Install

Your first browser session

Logging in via form submission

Selecting forms, three ways

Hidden fields, including CSRF tokens

Follow redirects manually

Cookies and history

Inspecting the last response

Click links by visible text

A complete login + scrape

When BrowserKit isn't enough

Hands-on lab

Hands-on lab

Quiz, check your understanding

What does `HttpBrowser` add on top of `HttpClient`?