Symfony BrowserKit, Simulating a Browser in Pure PHP
BrowserKit gives you browser-like navigation in pure PHP, cookies, history, form submission, follow-redirect, without launching a real browser. The right tool for stateful scraping flows.
What you’ll learn
- Use `HttpBrowser` to navigate as a browser would (with cookies and history).
- Submit forms by name or button label.
- Inspect cookies, history, and the last response/request.
- Compose BrowserKit with DomCrawler for end-to-end PHP scraping.
When a flow requires multiple steps with state, log in, follow a link, fill a form, submit, manually shepherding cookies through Guzzle or Symfony HttpClient gets old. BrowserKit's HttpBrowser is exactly the abstraction you want: it behaves like a stateless headless browser (no JS, no rendering, just HTTP + DOM).
Install
composer require symfony/browser-kit symfony/http-client symfony/dom-crawler symfony/css-selector
HttpBrowser wraps an HttpClient; you'll also want dom-crawler and css-selector for filtering the parsed responses.
Your first browser session
<?php
require 'vendor/autoload.php';
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
$browser = new HttpBrowser(HttpClient::create([
'headers' => ['User-Agent' => 'Mozilla/5.0 (compatible; learning-scraper)'],
]));
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/');
echo $crawler->filter('h1')->text() . "\n";
echo $browser->getResponse()->getStatusCode() . "\n";
$browser->request() returns a Crawler over the response body, already parsed, ready to query. Cookies are captured automatically; subsequent requests carry them.
Logging in via form submission
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/account/login');
$form = $crawler->selectButton('Login')->form([
'username' => 'student@practice.scrapingcentral.com',
'password' => 'practice123',
]);
$browser->submit($form);
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/dashboard');
echo $crawler->filter('h1')->text() . "\n"; // "Welcome, student"
What just happened:
- GET the login page; Crawler parses the response.
selectButton('Login')finds the submit button by its visible label.->form([...])returns a Form object with the page's existing fields plus your overrides.$browser->submit($form)POSTs the form with all the right fields (including hidden ones like CSRF tokens) AND follows the redirect to the post-login destination.- Cookies set during login persist on the browser; the next request to
/dashboardis authenticated.
You didn't have to write a single line about cookies, redirects, or finding the form's action URL. That's the BrowserKit value.
Selecting forms, three ways
// By button label (visible text)
$form = $crawler->selectButton('Login')->form();
// By form attribute via filter
$form = $crawler->filter('form#login')->form();
// By generic selector + form(), works on any node inside the form
$form = $crawler->filter('input[name="username"]')->form();
The button-label form is the most "user-like", find the form the user would interact with by what they'd click.
Hidden fields, including CSRF tokens
When the page has hidden fields (most CSRF setups), ->form() includes them automatically with their current values. You only need to override the ones the user would actually type:
$form = $crawler->selectButton('Submit')->form([
'username' => '...',
'password' => '...',
// _csrf_token field is included automatically with its existing value
]);
This is the killer feature compared to manual Guzzle: no need to parse the CSRF token out yourself and re-attach it.
Follow redirects manually
By default, BrowserKit follows redirects. To inspect them:
$browser->followRedirects(false);
$browser->submit($form);
echo $browser->getResponse()->getStatusCode(); // 302
echo $browser->getResponse()->getHeader('Location');
$browser->followRedirect(); // explicitly follow
Useful when you want to capture intermediate state (cookies set on the redirect hop, the redirect URL itself).
Cookies and history
print_r($browser->getCookieJar()->all()); // array of cookies
print_r($browser->getHistory()); // navigation history
// Restore a previous request
$browser->back();
$browser->forward();
$browser->reload();
The cookie jar is shared across all requests on this browser instance, that's how persistence works. You can also seed it manually:
use Symfony\Component\BrowserKit\Cookie;
$browser->getCookieJar()->set(new Cookie('session_id', 'abc123', null, '/', 'practice.scrapingcentral.com'));
Inspecting the last response
$resp = $browser->getResponse();
echo $resp->getStatusCode();
echo $resp->getHeader('Content-Type');
echo $resp->getContent(); // raw body
// The DomCrawler corresponding to the last response:
$crawler = $browser->getCrawler();
$browser->getCrawler() always returns the Crawler from the most recent navigation. Useful when you've called submit() and want to inspect the result page without re-parsing.
Click links by visible text
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/blog');
$link = $crawler->selectLink('Next page')->link();
$crawler = $browser->click($link);
selectLink('label') finds an <a> by its visible text. click(link) follows it as a GET (or POST if the link is somehow a form trigger). For pagination flows this is dramatically cleaner than parsing URL patterns manually.
A complete login + scrape
<?php
require 'vendor/autoload.php';
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
$browser = new HttpBrowser(HttpClient::create());
// 1. Land on login
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/account/login');
// 2. Submit credentials
$form = $crawler->selectButton('Login')->form([
'username' => 'student@practice.scrapingcentral.com',
'password' => 'practice123',
]);
$crawler = $browser->submit($form);
// 3. Confirm we landed somewhere logged-in
if (!str_contains($browser->getResponse()->getContent(), 'Welcome')) {
throw new RuntimeException('Login failed');
}
// 4. Scrape protected pages
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/orders');
$orders = $crawler->filter('tr.order')->each(fn($c) => [
'id' => $c->filter('.id')->text(),
'total' => $c->filter('.total')->text(),
]);
print_r($orders);
That's a working, cookie-aware, CSRF-handling, redirect-following PHP scraper in 25 lines. No browser, no JS.
When BrowserKit isn't enough
BrowserKit doesn't run JavaScript. If a site renders content client-side (React, Vue, etc.), the Crawler will see a stripped-down shell. That's the Dynamic Web sub-path's territory (Panther for PHP, Playwright for Python, both drive real browsers). For server-rendered sites with stateful flows, BrowserKit is the sweet spot.
Hands-on lab
Use HttpBrowser to log in at /account/login with the demo credentials, then navigate to /dashboard and confirm the response indicates an authenticated session. Inspect $browser->getCookieJar()->all() to see what cookies are now stored. Then call $browser->getHistory() to see the navigation chain so far.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/account/loginQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.