Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.25intermediate5 min read

PHP: Symfony HttpClient Async Streaming

Often the easiest way to get PHP concurrency: Symfony HttpClient's `stream()` API. No fibers, no promises, just sync-looking code that multiplexes underneath.

What you’ll learn

  • Use HttpClient->stream() to consume responses as they arrive.
  • Stream large response bodies chunk-by-chunk.
  • Decide when stream() beats ReactPHP/Amp.

For most PHP scraping needs, you don't need ReactPHP or Amp. Symfony HttpClient's stream() API achieves curl-multi concurrency without any async/await syntax. Sync-looking code; concurrent execution underneath.

The pattern

use Symfony\Component\HttpClient\HttpClient;

$client = HttpClient::create([
  'max_host_connections' => 10,
  'timeout' => 15,
  'headers' => ['User-Agent' => 'Scraper/1.0'],
]);

$urls = [];
for ($i = 1; $i <= 20; $i++) {
  $urls[] = "https://practice.scrapingcentral.com/api/products?page=$i";
}

// 1. Fire all requests (non-blocking, returns lazy responses immediately)
$responses = array_map(
  fn(string $url) => $client->request('GET', $url),
  $urls,
);

// 2. Stream chunks as they arrive across ALL responses, in completion order
foreach ($client->stream($responses) as $response => $chunk) {
  if ($chunk->isFirst()) {
  // Headers arrived
  if ($response->getStatusCode() !== 200) continue;
  }
  if ($chunk->isLast()) {
  $data = json_decode($response->getContent(), true);
  echo $response->getInfo('url') . ': ' . count($data) . " items\n";
  }
}

$client->request() doesn't block, the response is "lazy." Iterating $client->stream(...) advances all responses in lockstep, yielding chunks as they arrive on the wire.

Under the hood: curl multi-handle. Single process, multiplexed I/O, no fibers.

When stream() wins

For these patterns, stream() is the simplest tool:

  1. Batch fetch, gather all responses. Fire N requests, process when complete. One loop.
  2. Streaming large responses. Get the first chunks of a multi-MB JSON without loading the whole body into memory.
  3. First-N-complete patterns. Process responses in the order they finish, not the order they were issued.

Stream a single large response

$response = $client->request('GET', $url);
$buffer = '';
foreach ($client->stream($response) as $chunk) {
  $buffer .= $chunk->getContent();
  // Once we have a complete JSON object boundary, decode and process
  //, useful for newline-delimited JSON streams
}

For Content-Type: application/x-ndjson or similar streams, you can decode line-by-line as data arrives:

$leftover = '';
foreach ($client->stream($response) as $chunk) {
  $data = $leftover . $chunk->getContent();
  $lines = explode("\n", $data);
  $leftover = array_pop($lines);  // keep partial line for next chunk
  foreach ($lines as $line) {
  if ($line === '') continue;
  $obj = json_decode($line, true);
  process($obj);
  }
}

Memory stays flat; you process as data arrives. The same pattern works for streaming CSV.

Per-host connection limits

$client = HttpClient::create([
  'max_host_connections' => 8,
]);

Caps parallel TCP connections to a single host at 8. Combined with concurrent request() calls, you get parallelism without overwhelming the target.

Retries

RetryableHttpClient (or the retry_failed config in framework.yaml) handles transient failures automatically:

use Symfony\Component\HttpClient\RetryableHttpClient;

$client = new RetryableHttpClient(HttpClient::create(), maxRetries: 3);

Retries integrate with stream() transparently, failed responses retry without manual intervention.

Concurrent inserts to a queue

A common pattern: scrape with stream(), push results to a Messenger queue or a database:

foreach ($client->stream($responses) as $response => $chunk) {
  if ($chunk->isLast()) {
  $data = json_decode($response->getContent(), true);
  $bus->dispatch(new StoreScrapedDataMessage($data));
  }
}

The scrape is fast (concurrent fetches). The expensive parts (DB writes, enrichment) live in workers consuming the queue. Decoupled, scalable.

stream() vs ReactPHP vs Amp

Style Code shape Best for
HttpClient + stream() Sync-looking, no async syntax "Fire N, gather all," batch scraping, simple flows
ReactPHP Promise chains Long-lived event-driven processes, JS-familiar teams
Amp v3 Fiber-based, sync-looking Complex async flows, cancellation, mixed I/O

For 80% of PHP scraping concurrency needs, stream() is enough. ReactPHP/Amp are reach-for-when-you-need-more.

Limits

  • No timer/scheduler primitives. stream() is just HTTP concurrency. If you need event-loop-based scheduling, ReactPHP/Amp.
  • No long-lived server pattern. stream() runs to completion. A WebSocket client or pub/sub consumer wants ReactPHP/Amp.
  • No fiber-style nested awaits. Sequential dependencies between requests in a complex flow are awkward with stream(), chain them sequentially or move to Amp.

A complete production pattern

class ConcurrentScraper
{
  public function __construct(
  private readonly HttpClientInterface $client,
  private readonly LoggerInterface $logger,
  ) {}

  public function scrape(array $urls, int $maxConcurrent = 10): iterable
  {
  $responses = [];
  foreach ($urls as $url) {
  $responses[] = $this->client->request('GET', $url);
  if (count($responses) >= $maxConcurrent) {
  yield from $this->drain($responses, partial: true);
  }
  }
  yield from $this->drain($responses);
  }

  private function drain(array &$responses, bool $partial = false): iterable
  {
  foreach ($this->client->stream($responses) as $response => $chunk) {
  if ($chunk->isLast()) {
  try {
  if ($response->getStatusCode() === 200) {
  yield $response->getInfo('url') => $response->toArray();
  }
  } catch (\Throwable $e) {
  $this->logger->warning('fetch failed', ['url' => $response->getInfo('url'), 'error' => $e->getMessage()]);
  }
  $key = array_search($response, $responses, true);
  if ($key !== false) unset($responses[$key]);
  if ($partial && count($responses) < $maxConcurrent) return;
  }
  }
  }
}

Bounded concurrency, error handling, lazy result yield. About 30 lines, no async/await syntax, fully concurrent.

Hands-on lab

Build a concurrent scraper using stream() against /api/products:

  1. Fetch 50 pages concurrently with max_host_connections=10.
  2. Process JSON responses as they complete (don't wait for all).
  3. Write each item to a JSONL file.

Compare runtime to a sequential getContent() loop. The stream() version should be 5–10x faster for typical API latency, the simplest PHP concurrency story available.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

PHP: Symfony HttpClient Async Streaming1 / 8

What does `$client->request('GET', $url)` return in Symfony HttpClient?

Score so far: 0 / 0