Guzzle: The Industry-Standard PHP HTTP Client
Guzzle wraps cURL with a clean, modern API. PSR-7 messages, sessions, async, middleware. The default choice for any serious PHP scraper.
What you’ll learn
- Install Guzzle via Composer and configure a Client.
- Send GET, POST, and JSON requests with the modern Guzzle API.
- Reuse cookies and headers across requests via Client defaults.
- Handle HTTP and connection errors via Guzzle exceptions.
Guzzle is to PHP what requests is to Python, the de facto HTTP library. It's built on top of cURL (or PHP streams when cURL isn't available), exposes PSR-7 message objects, supports async via Promises, and has a middleware system that makes adding retries, logging, and auth trivial. Every modern PHP scraper, Symfony or otherwise, ends up using Guzzle either directly or indirectly.
Install
composer require guzzlehttp/guzzle
The Client object
A Client is the equivalent of requests.Session, it holds defaults (base URI, headers, cookies, timeouts) and reuses connections:
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
$client = new Client([
'base_uri' => 'https://practice.scrapingcentral.com',
'timeout' => 10,
'connect_timeout' => 5,
'headers' => [
'User-Agent' => 'Mozilla/5.0 (compatible; my-scraper)',
'Accept-Language' => 'en-US,en;q=0.9',
],
'cookies' => true, // enable cookie jar
'http_errors' => true, // throw on 4xx/5xx (default)
]);
With base_uri set, every request URL is resolved relative to it: $client->get('/products') hits https://practice.scrapingcentral.com/products.
GET requests
$response = $client->get('/products', [
'query' => ['page' => 2, 'category' => 'kitchen'],
]);
echo $response->getStatusCode(); // 200
echo $response->getHeaderLine('Content-Type'); // text/html; charset=UTF-8
$body = (string) $response->getBody(); // cast PSR-7 stream to string
The query option is the equivalent of Python's params=, Guzzle URL-encodes for you.
$response is a PSR-7 ResponseInterface. The body is a stream (Psr\Http\Message\StreamInterface), cast it to string with (string) for the full content, or read it incrementally for big payloads.
POST requests, three body formats
Form-encoded:
$response = $client->post('/account/login', [
'form_params' => [
'username' => 'student@practice.scrapingcentral.com',
'password' => 'practice123',
],
]);
Multipart (for file uploads):
$response = $client->post('/upload', [
'multipart' => [
['name' => 'file', 'contents' => fopen('/path/to/file.png', 'r')],
['name' => 'caption', 'contents' => 'My photo'],
],
]);
JSON:
$response = $client->post('/api/products', [
'json' => ['name' => 'New product', 'price' => 9.99],
]);
Like Python requests, the json option both serializes the body AND sets Content-Type: application/json. Don't pass body => json_encode(...) and forget the header, common bug.
Cookies and sessions
The 'cookies' => true option on the Client enables an in-memory cookie jar that persists across requests:
$client = new Client([
'base_uri' => 'https://practice.scrapingcentral.com',
'cookies' => true,
]);
$client->get('/'); // server sets a session cookie
$client->post('/account/login', [ // login sends that cookie back
'form_params' => ['username' => '...', 'password' => '...'],
]);
$response = $client->get('/dashboard'); // authenticated request
For more control, pass a CookieJar instance instead of true:
use GuzzleHttp\Cookie\CookieJar;
$jar = new CookieJar();
$client = new Client(['cookies' => $jar]);
// Inspect, share, or persist:
print_r($jar->toArray());
FileCookieJar and SessionCookieJar are also available if you need on-disk or PHP-session-backed persistence.
Error handling
By default, 4xx and 5xx responses throw an exception:
use GuzzleHttp\Exception\ClientException; // 4xx
use GuzzleHttp\Exception\ServerException; // 5xx
use GuzzleHttp\Exception\ConnectException; // network failure
use GuzzleHttp\Exception\RequestException; // parent of all above
try {
$response = $client->get('/account/missing');
} catch (ClientException $e) {
echo "404 or 4xx: " . $e->getResponse()->getStatusCode();
} catch (ServerException $e) {
echo "5xx: " . $e->getResponse()->getStatusCode();
} catch (ConnectException $e) {
echo "Network failure: " . $e->getMessage();
}
To opt out of exceptions and inspect manually (helpful for scrapers that don't want exceptions for normal 404s):
$response = $client->get('/some/url', ['http_errors' => false]);
if ($response->getStatusCode() === 200) {
// process body
}
Async and concurrency
Where Guzzle really shines: send many requests in parallel without threads.
use GuzzleHttp\Promise\Utils;
$promises = [
'p1' => $client->getAsync('/products?page=1'),
'p2' => $client->getAsync('/products?page=2'),
'p3' => $client->getAsync('/products?page=3'),
];
$responses = Utils::unwrap($promises);
foreach ($responses as $key => $response) {
echo "$key: " . $response->getStatusCode() . "\n";
}
Three requests fire in parallel; unwrap waits for all to complete. For bulk scraping, this is dramatically faster than serial loops, often 5-10x.
For controlled concurrency (e.g., max 5 in flight at any time), use GuzzleHttp\Pool:
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
$requests = function () {
for ($page = 1; $page <= 100; $page++) {
yield new Request('GET', "/products?page=$page");
}
};
$pool = new Pool($client, $requests(), [
'concurrency' => 5,
'fulfilled' => function ($response, $index) {
echo "page $index: " . $response->getStatusCode() . "\n";
},
'rejected' => function ($reason, $index) {
echo "page $index FAILED: " . $reason . "\n";
},
]);
$pool->promise()->wait();
100 pages, 5 in flight at any moment, with per-result and per-failure callbacks. This is production scraper territory.
Middleware
Guzzle uses a middleware stack you can extend, retries, logging, caching, custom auth, all written as middleware that wraps the request/response cycle. We'll cover retries and middleware in the Production sub-path; for now, know that the extension point exists.
Proxies, TLS, auth
$client = new Client([
'proxy' => 'http://user:pass@proxy.example.com:8000',
'verify' => true,
'auth' => ['student', 'practice123'], // HTTP Basic
'headers' => ['Authorization' => 'Bearer ' . $token],
]);
All the same concepts as Python requests, same names, similar semantics. If you know requests, you know Guzzle's defaults.
Hands-on lab
Use Guzzle to fetch /products?page=2. Confirm getStatusCode() returns 200, print the first 500 bytes of (string) $response->getBody(), and verify getHeaderLine('Content-Type') says HTML. Then call getAsync for pages 1-5 and use Utils::unwrap to fetch them in parallel. Note the total elapsed time vs. a serial loop.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/products?page=2Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.