Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

1.19intermediate5 min read

Symfony BrowserKit, Simulating a Browser in Pure PHP

BrowserKit gives you browser-like navigation in pure PHP, cookies, history, form submission, follow-redirect, without launching a real browser. The right tool for stateful scraping flows.

What you’ll learn

  • Use `HttpBrowser` to navigate as a browser would (with cookies and history).
  • Submit forms by name or button label.
  • Inspect cookies, history, and the last response/request.
  • Compose BrowserKit with DomCrawler for end-to-end PHP scraping.

When a flow requires multiple steps with state, log in, follow a link, fill a form, submit, manually shepherding cookies through Guzzle or Symfony HttpClient gets old. BrowserKit's HttpBrowser is exactly the abstraction you want: it behaves like a stateless headless browser (no JS, no rendering, just HTTP + DOM).

Install

composer require symfony/browser-kit symfony/http-client symfony/dom-crawler symfony/css-selector

HttpBrowser wraps an HttpClient; you'll also want dom-crawler and css-selector for filtering the parsed responses.

Your first browser session

<?php
require 'vendor/autoload.php';

use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$browser = new HttpBrowser(HttpClient::create([
  'headers' => ['User-Agent' => 'Mozilla/5.0 (compatible; learning-scraper)'],
]));

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/');
echo $crawler->filter('h1')->text() . "\n";
echo $browser->getResponse()->getStatusCode() . "\n";

$browser->request() returns a Crawler over the response body, already parsed, ready to query. Cookies are captured automatically; subsequent requests carry them.

Logging in via form submission

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/account/login');

$form = $crawler->selectButton('Login')->form([
  'username' => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
]);

$browser->submit($form);

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/dashboard');
echo $crawler->filter('h1')->text() . "\n";  // "Welcome, student"

What just happened:

  1. GET the login page; Crawler parses the response.
  2. selectButton('Login') finds the submit button by its visible label.
  3. ->form([...]) returns a Form object with the page's existing fields plus your overrides.
  4. $browser->submit($form) POSTs the form with all the right fields (including hidden ones like CSRF tokens) AND follows the redirect to the post-login destination.
  5. Cookies set during login persist on the browser; the next request to /dashboard is authenticated.

You didn't have to write a single line about cookies, redirects, or finding the form's action URL. That's the BrowserKit value.

Selecting forms, three ways

// By button label (visible text)
$form = $crawler->selectButton('Login')->form();

// By form attribute via filter
$form = $crawler->filter('form#login')->form();

// By generic selector + form(), works on any node inside the form
$form = $crawler->filter('input[name="username"]')->form();

The button-label form is the most "user-like", find the form the user would interact with by what they'd click.

Hidden fields, including CSRF tokens

When the page has hidden fields (most CSRF setups), ->form() includes them automatically with their current values. You only need to override the ones the user would actually type:

$form = $crawler->selectButton('Submit')->form([
  'username' => '...',
  'password' => '...',
  // _csrf_token field is included automatically with its existing value
]);

This is the killer feature compared to manual Guzzle: no need to parse the CSRF token out yourself and re-attach it.

Follow redirects manually

By default, BrowserKit follows redirects. To inspect them:

$browser->followRedirects(false);
$browser->submit($form);
echo $browser->getResponse()->getStatusCode();  // 302
echo $browser->getResponse()->getHeader('Location');

$browser->followRedirect();  // explicitly follow

Useful when you want to capture intermediate state (cookies set on the redirect hop, the redirect URL itself).

Cookies and history

print_r($browser->getCookieJar()->all());  // array of cookies
print_r($browser->getHistory());  // navigation history

// Restore a previous request
$browser->back();
$browser->forward();
$browser->reload();

The cookie jar is shared across all requests on this browser instance, that's how persistence works. You can also seed it manually:

use Symfony\Component\BrowserKit\Cookie;

$browser->getCookieJar()->set(new Cookie('session_id', 'abc123', null, '/', 'practice.scrapingcentral.com'));

Inspecting the last response

$resp = $browser->getResponse();
echo $resp->getStatusCode();
echo $resp->getHeader('Content-Type');
echo $resp->getContent();  // raw body

// The DomCrawler corresponding to the last response:
$crawler = $browser->getCrawler();

$browser->getCrawler() always returns the Crawler from the most recent navigation. Useful when you've called submit() and want to inspect the result page without re-parsing.

Click links by visible text

$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/blog');

$link = $crawler->selectLink('Next page')->link();
$crawler = $browser->click($link);

selectLink('label') finds an <a> by its visible text. click(link) follows it as a GET (or POST if the link is somehow a form trigger). For pagination flows this is dramatically cleaner than parsing URL patterns manually.

A complete login + scrape

<?php
require 'vendor/autoload.php';

use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$browser = new HttpBrowser(HttpClient::create());

// 1. Land on login
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/account/login');

// 2. Submit credentials
$form = $crawler->selectButton('Login')->form([
  'username' => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
]);
$crawler = $browser->submit($form);

// 3. Confirm we landed somewhere logged-in
if (!str_contains($browser->getResponse()->getContent(), 'Welcome')) {
  throw new RuntimeException('Login failed');
}

// 4. Scrape protected pages
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/orders');
$orders = $crawler->filter('tr.order')->each(fn($c) => [
  'id'  => $c->filter('.id')->text(),
  'total' => $c->filter('.total')->text(),
]);

print_r($orders);

That's a working, cookie-aware, CSRF-handling, redirect-following PHP scraper in 25 lines. No browser, no JS.

When BrowserKit isn't enough

BrowserKit doesn't run JavaScript. If a site renders content client-side (React, Vue, etc.), the Crawler will see a stripped-down shell. That's the Dynamic Web sub-path's territory (Panther for PHP, Playwright for Python, both drive real browsers). For server-rendered sites with stateful flows, BrowserKit is the sweet spot.

Hands-on lab

Use HttpBrowser to log in at /account/login with the demo credentials, then navigate to /dashboard and confirm the response indicates an authenticated session. Inspect $browser->getCookieJar()->all() to see what cookies are now stored. Then call $browser->getHistory() to see the navigation chain so far.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /account/login

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Symfony BrowserKit, Simulating a Browser in Pure PHP1 / 8

What does `HttpBrowser` add on top of `HttpClient`?

Score so far: 0 / 0