Copy as cURL → Working PHP Request (Guzzle), APIs, SERPs & Reverse Engineering

Same captured curl, translated to idiomatic PHP with Guzzle. The minimum-viable client a senior PHP scraper writes.

The PHP companion to the previous lesson. Guzzle is the de-facto HTTP client for PHP, every Symfony, Laravel, and standalone project uses it. The translation mechanics differ from Python's requests, but the workflow is identical.

The captured curl, again

curl 'https://practice.scrapingcentral.com/api/products?page=1&category=mugs' \
  -H 'accept: application/json' \
  -H 'cookie: session=abc123' \
  -H 'referer: https://practice.scrapingcentral.com/products' \
  -H 'user-agent: Mozilla/5.0' \
  --compressed

The Guzzle translation

<?php
require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;

$client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'timeout'  => 10,
]);

$res = $client->get('/api/products', [
  'query'  => ['page' => 1, 'category' => 'mugs'],
  'headers' => [
  'Accept'  => 'application/json',
  'User-Agent' => 'Mozilla/5.0',
  'Referer'  => 'https://practice.scrapingcentral.com/products',
  'Cookie'  => 'session=abc123',
  ],
]);

$data = json_decode($res->getBody()->getContents(), true);
echo count($data['products']) . " products\n";

What changed:

base_uri set once on the client → URLs in calls become relative paths. Cleaner.
Query string in the captured URL → query option (associative array).
Each -H line → entry in the headers option.
--compressed ignored; Guzzle decompresses by default (when Accept-Encoding is allowed).
json_decode(..., true) returns an associative array (the true arg flips it from object).

POST body translations

For a JSON POST:

curl 'https://.../api/auth/login' \
  -X POST \
  -H 'content-type: application/json' \
  --data-raw '{"email":"...","password":"..."}'

PHP:

$res = $client->post('/api/auth/login', [
  'json' => [
  'email'  => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
  ],
]);
$token = json_decode($res->getBody(), true)['access_token'];

For a form-encoded POST (application/x-www-form-urlencoded):

$res = $client->post('/api/login-form', [
  'form_params' => [
  'email' => '...',
  'password' => '...',
  ],
]);

For multipart (multipart/form-data, e.g. file upload):

$res = $client->post('/api/upload', [
  'multipart' => [
  ['name' => 'file', 'contents' => fopen('/path/to/file', 'r')],
  ['name' => 'description', 'contents' => 'My file'],
  ],
]);

Mapping table, curl flags to Guzzle options

curl	Guzzle option
`-H 'name: value'`	`'headers' => ['Name' => 'value']`
`-b` / `-H 'cookie: ...'`	`'cookies' => $jar` (see below) or set Cookie header
`--data-raw 'a=b'` (form)	`'form_params' => ['a' => 'b']`
`--data-raw '{"a":1}'` (JSON)	`'json' => ['a' => 1]`
`-X POST` / `-X PUT`	`$client->post(...)` / `$client->put(...)`
`-u user:pass`	`'auth' => ['user', 'pass']`
`-L` (follow redirects)	`'allow_redirects' => true` (default)
`--max-time 10`	`'timeout' => 10`
`-k` (insecure TLS)	`'verify' => false` (don't ship this)
`--proxy http://...`	`'proxy' => 'http://...'`
`?a=1&b=2` (query string)	`'query' => ['a' => 1, 'b' => 2]`

CookieJar for session-aware scraping

When the scraper needs to log in, then make subsequent requests with the issued cookies, use Guzzle's CookieJar:

use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;

$jar = new CookieJar();

$client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'cookies'  => $jar,
]);

// Login, sets cookies on $jar
$client->post('/api/auth/login', [
  'json' => [
  'email'  => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
  ],
]);

// Subsequent calls automatically include the cookies
$res = $client->get('/api/auth/me');
$me = json_decode($res->getBody(), true);
print_r($me);

The CookieJar is the PHP equivalent of Python's requests.Session().

Four most common Guzzle translation bugs

json vs body. Use 'json' => $array to send a JSON body. 'body' => json_encode($array) works but forces you to set Content-Type manually. Stick with json for JSON endpoints.
form_params vs multipart. form_params produces application/x-www-form-urlencoded (cheap, key=value). multipart produces multipart/form-data (heavier, supports files). Pick based on what the server expects.
Hard-coded Cookie header vs CookieJar. Setting 'Cookie' => 'session=abc' in headers works for one-shot calls but doesn't update across login redirects. Use CookieJar when there's a real session.
Forgetting ->getContents() rewinds the stream. After $res->getBody()->getContents(), the stream pointer is at the end. Subsequent getContents() returns an empty string. Cache the string in a variable.

A complete, production-shaped sample

<?php
require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;
use GuzzleHttp\Exception\RequestException;

class Catalog108 {
  private Client $client;
  private CookieJar $jar;

  public function __construct() {
  $this->jar = new CookieJar();
  $this->client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'timeout'  => 10,
  'cookies'  => $this->jar,
  'headers'  => ['Accept' => 'application/json'],
  ]);
  }

  public function login(string $email, string $password): array {
  $res = $this->client->post('/api/auth/login', [
  'json' => ['email' => $email, 'password' => $password],
  ]);
  return json_decode($res->getBody()->getContents(), true);
  }

  public function products(int $page = 1, int $perPage = 12): array {
  $res = $this->client->get('/api/products', [
  'query' => ['page' => $page, 'per_page' => $perPage],
  ]);
  return json_decode($res->getBody()->getContents(), true);
  }
}

$api = new Catalog108();
$api->login('student@practice.scrapingcentral.com', 'practice123');
$page = $api->products(1, 50);
echo count($page['products']) . " products on page 1\n";

This is the shape a real PHP scraper takes, covered in much more detail in lesson 3.12.

Hands-on lab

Composer-init a new project: composer require guzzlehttp/guzzle. Copy as cURL on /api/products, translate to PHP using the mapping table, run it, confirm you get JSON. Then write a for ($page = 1; $page <= 5; $page++) loop. You've just built the PHP equivalent of the Python scraper from lesson 3.6.

Copy as cURL → Working PHP Request (Guzzle)

What you’ll learn

The captured curl, again

The Guzzle translation

POST body translations

Mapping table, curl flags to Guzzle options

CookieJar for session-aware scraping

Four most common Guzzle translation bugs

A complete, production-shaped sample

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which Guzzle option correctly sends a JSON body for an endpoint with `Content-Type: application/json`?