Symfony HttpClient in Production Context
Beyond `HttpClient::create()`, retries, timeouts, concurrent batching, decorators, and the mocking story for tests.
What you’ll learn
- Configure HttpClient with sensible production defaults.
- Run requests concurrently with stream() for high throughput.
- Use RetryableHttpClient and TraceableHttpClient decorators.
You used HttpClient in Sub-Path 2. Now we use it the way real Symfony scrapers do: with retries, batched concurrency, and decorators that you can swap out per environment.
Service configuration in framework.yaml
# config/packages/framework.yaml
framework:
http_client:
default_options:
timeout: 10
max_redirects: 5
headers:
User-Agent: 'CatalogScraper/1.0 (mailto:ops@example.com)'
Accept-Encoding: 'gzip, deflate, br'
scoped_clients:
catalog108.client:
base_uri: 'https://practice.scrapingcentral.com/'
timeout: 15
retry_failed:
max_retries: 3
delay: 1000
multiplier: 2
jitter: 0.1
http_codes:
- 429
- 500
- 502
- 503
- 504
Three things to notice:
default_optionsapply to every HttpClient injection.scoped_clientscreate per-target clients.catalog108.clientgets injected viaHttpClientInterface $catalog108Client.retry_failedwires upRetryableHttpClientautomatically, no manual decoration.
Auto-wiring scoped clients
public function __construct(
private readonly HttpClientInterface $catalog108Client,
) {}
The parameter name matches the scoped client name (catalog108.client → $catalog108Client). Symfony auto-wires it. No more passing base URIs around.
Concurrent batching with stream()
The single biggest HttpClient feature for production scraping: you can fire 100 requests and consume responses as they complete.
public function scrape(array $urls): iterable
{
$responses = [];
foreach ($urls as $url) {
$responses[] = $this->catalog108Client->request('GET', $url);
}
foreach ($this->catalog108Client->stream($responses) as $response => $chunk) {
if ($chunk->isFirst()) {
// headers available
if ($response->getStatusCode() >= 400) continue;
}
if ($chunk->isLast()) {
yield [
'url' => $response->getInfo('url'),
'body' => $response->getContent(),
];
}
}
}
The stream() method is a generator that yields chunks as they arrive across all responses. Multiplexed I/O, one process, dozens of in-flight connections. This is the equivalent of asyncio.gather or Promise.all, without leaving sync PHP.
For a simpler version that doesn't stream chunks:
$responses = array_map(
fn($url) => $this->client->request('GET', $url),
$urls,
);
foreach ($responses as $response) {
// getContent() blocks until this response is fully downloaded,
// but the others kept downloading in parallel
$body = $response->getContent();
}
HttpClient under the hood (with ext-curl) uses curl multi-handle, so concurrency works without async/await syntax.
Tuning concurrency
max_host_connections controls per-host parallelism:
catalog108.client:
base_uri: 'https://practice.scrapingcentral.com/'
max_host_connections: 8
The default is 6. Increase carefully, too high and you trigger rate limiting. Combine with the RateLimiter component for explicit politeness (covered in §4.16).
RetryableHttpClient, the decorator pattern
retry_failed in YAML is the easy path. Manually:
use Symfony\Component\HttpClient\RetryableHttpClient;
use Symfony\Component\HttpClient\Retry\GenericRetryStrategy;
$client = new RetryableHttpClient(
HttpClient::create(),
new GenericRetryStrategy(
statusCodes: [429, 500, 502, 503, 504],
delayMs: 1000,
multiplier: 2.0,
maxDelayMs: 60_000,
jitter: 0.1,
),
maxRetries: 3,
);
The retry strategy is pluggable. Custom strategy:
class ScraperRetryStrategy implements RetryStrategyInterface
{
public function shouldRetry(AsyncContext $context, ?string $responseContent, ?TransportExceptionInterface $exception): ?bool
{
if ($exception) return true;
$status = $context->getStatusCode();
if ($status === 429) return true; // always retry rate limit
if ($status >= 500 && $status < 600) return true;
return false;
}
public function getDelay(AsyncContext $context, ?string $responseContent, ?TransportExceptionInterface $exception): int
{
// Honour Retry-After header
$headers = $context->getHeaders();
if (isset($headers['retry-after'][0])) {
return (int) $headers['retry-after'][0] * 1000;
}
return 1000 * (2 ** $context->getInfo('retry_count'));
}
}
Honouring Retry-After on 429 is good citizenship and improves success rates.
TraceableHttpClient, for debugging
In dev, wrap HttpClient in TraceableHttpClient. It records every request and shows them in Symfony's profiler. Set in dev env only:
# config/packages/framework.yaml, Symfony does this automatically when WebProfilerBundle is installed
In prod, you don't want the overhead. The framework defaults are right.
Mocking for tests
use Symfony\Component\HttpClient\MockHttpClient;
use Symfony\Component\HttpClient\Response\MockResponse;
$client = new MockHttpClient([
new MockResponse('<html>...</html>', ['http_code' => 200]),
new MockResponse('', ['http_code' => 404]),
]);
MockHttpClient queues responses. Inject it in tests. Verify the request count, the URLs hit, the bodies sent, without touching the network.
Cookies and sessions
Unlike a browser, HttpClient does NOT persist cookies between calls by default. For sessions:
use Symfony\Component\BrowserKit\HttpBrowser;
$browser = new HttpBrowser(HttpClient::create());
$browser->request('GET', 'https://target.com/login');
$browser->submitForm('Login', ['email' => '...', 'password' => '...']);
$browser->request('GET', 'https://target.com/dashboard');
// Cookies persist across $browser->request() calls
HttpBrowser wraps HttpClient with a CookieJar and a form-submission helper, the PHP equivalent of requests.Session.
Hands-on lab
Configure a scoped client catalog108 against Catalog108. Set retries on 429/5xx. Write a Console command that:
- Loads 50 product URLs.
- Fires them all via
stream(). - Logs the success/failure breakdown.
Compare to the same scrape with sequential getContent(), concurrent should be roughly 5–10x faster for the same wall-clock time.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/api/productsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.