Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

F17beginner6 min read

PHP Crash-Course for Scrapers

The PHP you'll actually use to write scrapers. Arrays, strings, file I/O, error handling, JSON, and the modern PHP 8 features that make it pleasant.

What you’ll learn

  • Manipulate strings and arrays in modern PHP 8.
  • Use associative arrays as the PHP equivalent of Python dicts.
  • Read and write files, JSON, and CSV without surprises.
  • Handle errors with try/catch and named exception types, not the @ silencer.

PHP 8.x is genuinely good. If your last PHP was 5.x or 7.x, much has changed: strict types, named arguments, enums, readonly properties, match expressions, the null-safe operator. This crash course assumes none of that history, it's just the PHP you'll write in scrapers, in 2026 idiom.

Strings

$url = "https://practice.scrapingcentral.com/products?page=2";

str_starts_with($url, "https://");  // true (PHP 8.0+)
str_ends_with($url, ".pdf");  // false
str_contains($url, "products");  // true
strtolower($url);  // lowercase
trim($url);  // strip whitespace

explode("?", $url);  // ["https://practice.scrapingcentral.com/products", "page=2"]
str_replace("page=2", "page=3", $url);  // rewrite
sprintf("page=%d&limit=%d", 2, 20);  // formatted string, "page=2&limit=20"

PHP 8 added the three str_*_with functions; before that you used strpos(...) === 0. Use the new ones, they're clearer.

String interpolation

$page = 2;
$url = "https://practice.scrapingcentral.com/products?page=$page";

// More explicit with curly braces (preferred when expressions get complex):
$url = "https://practice.scrapingcentral.com/products?page={$page}";

// printf-style for formatting:
$formatted = sprintf('$%.2f', 14.99);  // '$14.99'

Double-quoted strings interpolate; single-quoted don't. For literal-text strings (no variables), single quotes are marginally faster and signal intent.

Arrays

PHP arrays are unusual: they're both list and dict in one structure. An array with integer keys behaves like a list; with string keys, like a dict.

List-style

$prices = [14.99, 24.95, 9.50, 49.00];
$prices[0];  // 14.99
$prices[count($prices) - 1];  // 49.00, no negative indexing
end($prices);  // 49.00, alternative
$prices[] = 7.00;  // append
sort($prices);  // mutates in place
asort($prices);  // sort, preserve keys
array_sum($prices) / count($prices);  // average

Dict-style (associative)

$product = [
  "id"  => 42,
  "title" => "Yellow ceramic mug",
  "price" => 14.99,
  "tags"  => ["kitchen", "ceramic"],
];

$product["title"];  // direct access, Warning if missing
$product["title"] ?? "n/a";  // null coalesce, "n/a" if missing
isset($product["title"]);  // true
array_key_exists("title", $product);  // true (and: works for null values too)
$product["stock"] = 15;  // assign
array_keys($product);  // ["id", "title", "price", "tags", "stock"]

The ?? null-coalescing operator is the PHP equivalent of Python's dict.get(k, default). It returns the right side when the left side is null or undefined.

Iterating

foreach ($product as $key => $value) {
  echo "$key = $value\n";
}

// Just values
foreach ($product as $value) {
  var_dump($value);
}

Array transformations

$prices = [14.99, 24.95, 9.50, 49.00];

array_map(fn($p) => $p * 0.9, $prices);  // 10% off all
array_filter($prices, fn($p) => $p < 20);  // [14.99, 9.50]
array_reduce($prices, fn($c, $p) => $c + $p, 0);  // sum

PHP 7.4+ has short arrow functions (fn($x) => ...), single-expression closures, with automatic capture from the enclosing scope. Use them; they're terser than function ($x) use ($outer) { return ...; }.

Functions

function fetchPage(string $url, int $timeout = 10): string {
  $ch = curl_init($url);
  curl_setopt_array($ch, [
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_TIMEOUT  => $timeout,
  CURLOPT_FOLLOWLOCATION => true,
  ]);
  $body = curl_exec($ch);
  if ($body === false) {
  throw new RuntimeException(curl_error($ch));
  }
  curl_close($ch);
  return $body;
}

// Named arguments (PHP 8.0+)
$body = fetchPage(url: "https://example.com", timeout: 30);

Type declarations (string $url, : string) are optional in PHP but recommended. They turn typos into errors at call time instead of silent bugs.

Modern PHP 8 features you'll use

// match expression
$status = match (true) {
  $code >= 500 => 'server-error',
  $code >= 400 => 'client-error',
  $code >= 300 => 'redirect',
  $code >= 200 => 'success',
  default  => 'unknown',
};

// Null-safe operator
$avg = $response?->reviews?->average ?? 0;

// Constructor property promotion (PHP 8.0)
class HttpClient {
  public function __construct(
  private readonly int $timeout = 10,
  private readonly string $userAgent = 'my-scraper/1.0',
  ) {}
}

// Enums (PHP 8.1)
enum HttpStatus: int {
  case OK = 200;
  case NotFound = 404;
  case ServerError = 500;
}

match replaces unwieldy if/elseif chains. The null-safe ?-> is for traversing nullable objects. Constructor promotion turns 15-line classes into 5-line ones.

File I/O

// Write
file_put_contents("output.txt", "Hello\n");
file_put_contents("output.txt", "World\n", FILE_APPEND);

// Read all at once (small files)
$contents = file_get_contents("input.txt");

// Read line by line (big files)
$f = fopen("input.txt", "r");
while (($line = fgets($f)) !== false) {
  $line = trim($line);
  process($line);
}
fclose($f);

For text files, file_put_contents and file_get_contents are the one-liners you'll use 80% of the time.

JSON

$data = json_decode($responseBody, associative: true);  // PHP 8.0 named arg
// or: $data = json_decode($responseBody, true);

// JSON encode, pretty-printed, preserves UTF-8
$json = json_encode($products, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE);
file_put_contents("products.json", $json);

Two flags worth knowing on json_encode:

  • JSON_UNESCAPED_UNICODE, don't escape non-ASCII to \uXXXX
  • JSON_UNESCAPED_SLASHES, don't escape / (cleaner URLs in output)
  • JSON_PRETTY_PRINT, indented output

CSV

$f = fopen("products.csv", "w");
fputcsv($f, ["id", "title", "price"]);  // header
foreach ($products as $p) {
  fputcsv($f, [$p["id"], $p["title"], $p["price"]]);
}
fclose($f);

// Read
$f = fopen("products.csv", "r");
$headers = fgetcsv($f);
while (($row = fgetcsv($f)) !== false) {
  $product = array_combine($headers, $row);  // turn list into dict
  // ...
}
fclose($f);

fputcsv and fgetcsv handle quoting correctly, never construct CSV with string concatenation.

Error handling

PHP has two error mechanisms, old warnings/notices (loose) and exceptions (clean). Modern code uses exceptions:

try {
  $body = fetchPage($url);
  $data = json_decode($body, true, flags: JSON_THROW_ON_ERROR);
} catch (RuntimeException $e) {
  // network / curl failure
  error_log("Fetch failed for $url: " . $e->getMessage());
} catch (JsonException $e) {
  // malformed JSON
  error_log("Bad JSON from $url");
}

Two practices to adopt:

  1. Pass JSON_THROW_ON_ERROR so json_decode raises on bad JSON instead of returning null and leaving you to figure out why.
  2. Never use the @ silencer. Code like @file_get_contents(...) swallows the error entirely, making bugs invisible.

Useful built-in functions

// URL handling
parse_url("https://example.com/products?page=2");
http_build_query(["page" => 2, "limit" => 20]);  // "page=2&limit=20"
urlencode("hello world");  // "hello+world"

// Time
time();  // unix timestamp
date("Y-m-d", time());  // "2026-05-12"
strtotime("2026-04-12");  // parse to timestamp
$dt = new DateTimeImmutable("now");  // modern OO date

// Regex
preg_match('/\d{4}-\d{2}-\d{2}/', $text, $m);
preg_match_all('/\$(\d+\.\d{2})/', $text, $matches);
preg_replace('/\s+/', ' ', $text);

// Sort
sort($items);  // values, reindex
asort($items);  // values, preserve keys
ksort($items);  // by keys
usort($items, fn($a, $b) => $a["price"] <=> $b["price"]);  // custom comparator

The <=> "spaceship" operator (PHP 7.0+) returns -1/0/+1, ideal for custom sort comparators.

Hands-on lab

Write a 20-line PHP script that:

  1. Fetches https://practice.scrapingcentral.com/ via Guzzle.
  2. Decodes the response body, counts how many <a> substrings appear using substr_count.
  3. Writes the count to a JSON file using json_encode with JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE.
  4. Wraps the network call in try / catch (\GuzzleHttp\Exception\RequestException $e).

If you can do that in 20 lines, you're fluent enough to follow the rest of the curriculum's PHP track.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

PHP Crash-Course for Scrapers1 / 8

What does PHP's `??` (null-coalescing) operator do?

Score so far: 0 / 0