Raw cURL in PHP, Foundations Every PHP Dev Must Know, Static Scraping

The libcurl bindings ship with every PHP install. Master them, and every HTTP library you use later makes more sense.

PHP ships with bindings to libcurl, the same library curl the command-line tool wraps. It's verbose, but it's everywhere, every PHP install has it, every host enables it, and every higher-level client builds on it. You should know how to use it raw at least once.

The four-step pattern

Every cURL call in PHP follows the same shape:

<?php
// 1. Init
$ch = curl_init('https://practice.scrapingcentral.com/products');

// 2. Configure
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; learning-scraper)');

// 3. Execute
$body = curl_exec($ch);

// 4. Close
curl_close($ch);

echo substr($body, 0, 500);

That's it. The verbosity comes from the dozens of options you can set; the core pattern is always init → setopt → exec → close.

The options that matter

Option	Purpose
`CURLOPT_URL`	Target URL (also settable in `curl_init()`)
`CURLOPT_RETURNTRANSFER`	Return the body from `curl_exec` instead of printing
`CURLOPT_TIMEOUT`	Total time limit in seconds
`CURLOPT_CONNECTTIMEOUT`	Time to establish the connection
`CURLOPT_USERAGENT`	The User-Agent header
`CURLOPT_FOLLOWLOCATION`	Follow 3xx redirects automatically
`CURLOPT_MAXREDIRS`	Cap on redirect chain length
`CURLOPT_HTTPHEADER`	Array of custom request headers
`CURLOPT_COOKIEJAR`	File to write Set-Cookie data to
`CURLOPT_COOKIEFILE`	File to read cookies from
`CURLOPT_POST`	Switch to POST method
`CURLOPT_POSTFIELDS`	Body for POST (array → form-encoded, string → raw)
`CURLOPT_HEADER`	Include response headers in the returned body
`CURLOPT_NOBODY`	HEAD request, headers only
`CURLOPT_SSL_VERIFYPEER`	Verify the server's TLS cert (default true, leave it)
`CURLOPT_PROXY`	Route through a proxy

CURLOPT_RETURNTRANSFER is the one you forget once and then never forget again. Without it, curl_exec prints the response directly to output and returns just true/false. With it (which is what you want 99% of the time), the body is returned as a string.

Inspect what you got back

$body  = curl_exec($ch);
$info  = curl_getinfo($ch);
$error = curl_error($ch);
$errno = curl_errno($ch);

print_r($info);
// Array
// (
//  [url] => https://practice.scrapingcentral.com/products
//  [http_code] => 200
//  [total_time] => 0.234
//  [primary_ip] => 185.93.228.150
//  [size_download] => 14823
//  [content_type] => text/html; charset=UTF-8
//  ...
// )

curl_getinfo($ch) returns an associative array with everything you need: status code, final URL, timing, content type, byte counts, IP info. Use it before curl_close.

curl_errno($ch) returns 0 on success. Non-zero with a non-empty curl_error($ch) means the request failed at the network or protocol layer (DNS, connection refused, timeout, TLS), not a 4xx/5xx (which IS a successful HTTP exchange that returned an error status).

Headers in, headers out

Send custom headers:

curl_setopt($ch, CURLOPT_HTTPHEADER, [
  'Accept: application/json',
  'X-Requested-With: XMLHttpRequest',
  'Authorization: Bearer ' . $token,
]);

Note the format: an array of "Name: value" strings, not an associative array. This is a quirk of the cURL bindings.

Read response headers, two options:

Set CURLOPT_HEADER => true and curl_exec returns headers + body concatenated. You then have to split them yourself.
Set CURLOPT_HEADERFUNCTION to a callback that's invoked once per header line. Cleaner for programmatic access:

$headers = [];
curl_setopt($ch, CURLOPT_HEADERFUNCTION, function ($ch, $line) use (&$headers) {
  $headers[] = $line;
  return strlen($line);
});

POST with form data

$ch = curl_init('https://practice.scrapingcentral.com/account/login');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query([
  'username' => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
]));
$body = curl_exec($ch);

http_build_query() URL-encodes the array into username=...&password=... form. Pass it as a string to CURLOPT_POSTFIELDS and cURL sends Content-Type: application/x-www-form-urlencoded automatically.

For JSON, set the header AND pass a JSON string:

$payload = json_encode(['name' => 'New product', 'price' => 9.99]);
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);
curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);

Cookies, the file-based approach

Persist cookies across multiple cURL calls:

$cookieFile = tempnam(sys_get_temp_dir(), 'cookies');

// First request, log in, write cookies to file
$ch = curl_init('https://practice.scrapingcentral.com/account/login');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query(['username' => '...', 'password' => '...']));
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
curl_close($ch);

// Second request, read cookies from file
$ch = curl_init('https://practice.scrapingcentral.com/dashboard');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
$dashboard = curl_exec($ch);

Inelegant compared to Python's Session, but it works. You can also use one cURL handle for the entire session, set both COOKIEJAR and COOKIEFILE to the same file (or pass them in-memory via ''), and reuse the handle across calls with curl_setopt($ch, CURLOPT_URL, $newUrl).

When to leave raw cURL behind

Raw cURL is fine for:

Quick scripts where adding a Composer dep feels like overkill.
Hosts where you genuinely can't install Composer packages.
Debugging, understanding what higher-level libraries actually do.

For anything bigger, switch to Guzzle (Lesson 1.9) or Symfony HttpClient (Lesson 1.10). They wrap cURL but add structure, retries, middleware, async, and dramatically cleaner ergonomics.

A reusable wrapper

If you must stay on raw cURL, at least extract a helper:

function http_get(string $url, array $headers = [], int $timeout = 10): array {
  $ch = curl_init($url);
  curl_setopt_array($ch, [
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_TIMEOUT  => $timeout,
  CURLOPT_USERAGENT  => 'Mozilla/5.0 (compatible; my-scraper)',
  CURLOPT_HTTPHEADER  => $headers,
  ]);
  $body = curl_exec($ch);
  $info = curl_getinfo($ch);
  $err  = curl_error($ch);
  curl_close($ch);
  if ($body === false) {
  throw new RuntimeException("cURL error: $err");
  }
  return ['body' => $body, 'status' => $info['http_code'], 'url' => $info['url']];
}

That gives you a Python-requests-shaped API in 15 lines.

Hands-on lab

Fetch /products with raw cURL. Print the HTTP status code from curl_getinfo, the content type, and the first 500 bytes of the body. Then add a CURLOPT_USERAGENT and confirm the response changes if you switch between a realistic browser UA and a deliberate bot-like one like 'curl/7.0'.

Raw cURL in PHP, Foundations Every PHP Dev Must Know

What you’ll learn

The four-step pattern

The options that matter

Inspect what you got back

Headers in, headers out

POST with form data

Cookies, the file-based approach

When to leave raw cURL behind

A reusable wrapper

Hands-on lab

Hands-on lab

Quiz, check your understanding

What does `CURLOPT_RETURNTRANSFER => true` do?