Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

3.7beginner5 min read

Copy as cURL → Working PHP Request (Guzzle)

Same captured curl, translated to idiomatic PHP with Guzzle. The minimum-viable client a senior PHP scraper writes.

What you’ll learn

  • Convert curl headers, query strings, and bodies to Guzzle options.
  • Use base_uri, query, headers, json, and form_params correctly.
  • Handle cookies via Guzzle's CookieJar.
  • Recognise the most common Guzzle translation bugs.

The PHP companion to the previous lesson. Guzzle is the de-facto HTTP client for PHP, every Symfony, Laravel, and standalone project uses it. The translation mechanics differ from Python's requests, but the workflow is identical.

The captured curl, again

curl 'https://practice.scrapingcentral.com/api/products?page=1&category=mugs' \
  -H 'accept: application/json' \
  -H 'cookie: session=abc123' \
  -H 'referer: https://practice.scrapingcentral.com/products' \
  -H 'user-agent: Mozilla/5.0' \
  --compressed

The Guzzle translation

<?php
require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;

$client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'timeout'  => 10,
]);

$res = $client->get('/api/products', [
  'query'  => ['page' => 1, 'category' => 'mugs'],
  'headers' => [
  'Accept'  => 'application/json',
  'User-Agent' => 'Mozilla/5.0',
  'Referer'  => 'https://practice.scrapingcentral.com/products',
  'Cookie'  => 'session=abc123',
  ],
]);

$data = json_decode($res->getBody()->getContents(), true);
echo count($data['products']) . " products\n";

What changed:

  • base_uri set once on the client → URLs in calls become relative paths. Cleaner.
  • Query string in the captured URL → query option (associative array).
  • Each -H line → entry in the headers option.
  • --compressed ignored; Guzzle decompresses by default (when Accept-Encoding is allowed).
  • json_decode(..., true) returns an associative array (the true arg flips it from object).

POST body translations

For a JSON POST:

curl 'https://.../api/auth/login' \
  -X POST \
  -H 'content-type: application/json' \
  --data-raw '{"email":"...","password":"..."}'

PHP:

$res = $client->post('/api/auth/login', [
  'json' => [
  'email'  => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
  ],
]);
$token = json_decode($res->getBody(), true)['access_token'];

For a form-encoded POST (application/x-www-form-urlencoded):

$res = $client->post('/api/login-form', [
  'form_params' => [
  'email' => '...',
  'password' => '...',
  ],
]);

For multipart (multipart/form-data, e.g. file upload):

$res = $client->post('/api/upload', [
  'multipart' => [
  ['name' => 'file', 'contents' => fopen('/path/to/file', 'r')],
  ['name' => 'description', 'contents' => 'My file'],
  ],
]);

Mapping table, curl flags to Guzzle options

curl Guzzle option
-H 'name: value' 'headers' => ['Name' => 'value']
-b / -H 'cookie: ...' 'cookies' => $jar (see below) or set Cookie header
--data-raw 'a=b' (form) 'form_params' => ['a' => 'b']
--data-raw '{"a":1}' (JSON) 'json' => ['a' => 1]
-X POST / -X PUT $client->post(...) / $client->put(...)
-u user:pass 'auth' => ['user', 'pass']
-L (follow redirects) 'allow_redirects' => true (default)
--max-time 10 'timeout' => 10
-k (insecure TLS) 'verify' => false (don't ship this)
--proxy http://... 'proxy' => 'http://...'
?a=1&b=2 (query string) 'query' => ['a' => 1, 'b' => 2]

CookieJar for session-aware scraping

When the scraper needs to log in, then make subsequent requests with the issued cookies, use Guzzle's CookieJar:

use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;

$jar = new CookieJar();

$client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'cookies'  => $jar,
]);

// Login, sets cookies on $jar
$client->post('/api/auth/login', [
  'json' => [
  'email'  => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
  ],
]);

// Subsequent calls automatically include the cookies
$res = $client->get('/api/auth/me');
$me = json_decode($res->getBody(), true);
print_r($me);

The CookieJar is the PHP equivalent of Python's requests.Session().

Four most common Guzzle translation bugs

  1. json vs body. Use 'json' => $array to send a JSON body. 'body' => json_encode($array) works but forces you to set Content-Type manually. Stick with json for JSON endpoints.

  2. form_params vs multipart. form_params produces application/x-www-form-urlencoded (cheap, key=value). multipart produces multipart/form-data (heavier, supports files). Pick based on what the server expects.

  3. Hard-coded Cookie header vs CookieJar. Setting 'Cookie' => 'session=abc' in headers works for one-shot calls but doesn't update across login redirects. Use CookieJar when there's a real session.

  4. Forgetting ->getContents() rewinds the stream. After $res->getBody()->getContents(), the stream pointer is at the end. Subsequent getContents() returns an empty string. Cache the string in a variable.

A complete, production-shaped sample

<?php
require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;
use GuzzleHttp\Exception\RequestException;

class Catalog108 {
  private Client $client;
  private CookieJar $jar;

  public function __construct() {
  $this->jar = new CookieJar();
  $this->client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'timeout'  => 10,
  'cookies'  => $this->jar,
  'headers'  => ['Accept' => 'application/json'],
  ]);
  }

  public function login(string $email, string $password): array {
  $res = $this->client->post('/api/auth/login', [
  'json' => ['email' => $email, 'password' => $password],
  ]);
  return json_decode($res->getBody()->getContents(), true);
  }

  public function products(int $page = 1, int $perPage = 12): array {
  $res = $this->client->get('/api/products', [
  'query' => ['page' => $page, 'per_page' => $perPage],
  ]);
  return json_decode($res->getBody()->getContents(), true);
  }
}

$api = new Catalog108();
$api->login('student@practice.scrapingcentral.com', 'practice123');
$page = $api->products(1, 50);
echo count($page['products']) . " products on page 1\n";

This is the shape a real PHP scraper takes, covered in much more detail in lesson 3.12.

Hands-on lab

Composer-init a new project: composer require guzzlehttp/guzzle. Copy as cURL on /api/products, translate to PHP using the mapping table, run it, confirm you get JSON. Then write a for ($page = 1; $page <= 5; $page++) loop. You've just built the PHP equivalent of the Python scraper from lesson 3.6.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /api/products

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Copy as cURL → Working PHP Request (Guzzle)1 / 8

Which Guzzle option correctly sends a JSON body for an endpoint with `Content-Type: application/json`?

Score so far: 0 / 0