PHP: ReactPHP for Async Scraping
The async PHP runtime. ReactPHP's event loop, promises, and HTTP client, the PHP analog to Node.js or Python's asyncio.
What you’ll learn
- Write a ReactPHP async fetcher with concurrent requests.
- Use promises (then/all) to compose async operations.
- Decide between ReactPHP and Symfony HttpClient concurrency.
PHP is famously synchronous. ReactPHP is the most established async runtime for PHP, a non-blocking event loop with HTTP, DNS, sockets, file I/O, and timers. If you've used Node.js or Python's asyncio, the model will feel familiar.
When to use ReactPHP
PHP has two async stories: ReactPHP (older, promise-based) and Amp (newer, fiber-based, covered in §4.24). Both work. Symfony HttpClient also achieves concurrency via curl multi-handle without async syntax.
| Tool | Style | Ergonomics |
|---|---|---|
| Symfony HttpClient | Sync API, internally concurrent | Cleanest for "fire N requests, await all" |
| ReactPHP | Promise-based async | Like JavaScript |
| Amp | Fiber-based async (PHP 8.1+) | Like Go / async-await |
For scraping specifically, Symfony HttpClient is usually fine. Reach for ReactPHP when you need a long-lived event loop (servers, WebSocket clients, schedulers) or when integrating with libraries that already use it.
Install
composer require react/http react/event-loop
A concurrent fetcher
<?php
require 'vendor/autoload.php';
use React\Http\Browser;
use React\EventLoop\Loop;
use Psr\Http\Message\ResponseInterface;
$browser = new Browser();
$urls = [];
for ($i = 1; $i <= 10; $i++) {
$urls[] = "https://practice.scrapingcentral.com/api/products?page=$i";
}
$promises = array_map(fn($url) => $browser->get($url), $urls);
\React\Promise\all($promises)->then(function (array $responses) {
foreach ($responses as $i => $r) {
$body = (string) $r->getBody();
$data = json_decode($body, true);
echo "page " . ($i + 1) . ": " . count($data) . " items\n";
}
});
Loop::run();
Key concepts:
Browserissues HTTP via the event loop.$browser->get($url)returns a Promise.\React\Promise\all([...])resolves when all promises resolve.Loop::run()drives the event loop, required for any I/O to actually happen.
Without Loop::run(), the script exits without doing anything. The event loop is the heartbeat.
Bounded concurrency
ReactPHP's Browser has no built-in concurrency cap. You add it manually with a "pool" pattern:
class Pool {
private int $running = 0;
private array $pending = [];
public function __construct(private int $limit) {}
public function run(callable $task): \React\Promise\PromiseInterface {
$deferred = new \React\Promise\Deferred();
$this->pending[] = [$task, $deferred];
$this->tick();
return $deferred->promise();
}
private function tick(): void {
while ($this->running < $this->limit && $this->pending) {
[$task, $deferred] = array_shift($this->pending);
$this->running++;
$task()->then(
function ($v) use ($deferred) {
$this->running--;
$deferred->resolve($v);
$this->tick();
},
function ($e) use ($deferred) {
$this->running--;
$deferred->reject($e);
$this->tick();
}
);
}
}
}
$pool = new Pool(5);
$promises = array_map(fn($url) => $pool->run(fn() => $browser->get($url)), $urls);
Five in-flight at a time, the rest queued. This is the ReactPHP equivalent of asyncio.Semaphore(5).
Timeouts
$browser = (new Browser())
->withTimeout(10)
->withFollowRedirects(5);
Per-request timeouts are critical. Without them, a slow target stalls the whole event loop.
Error handling
Promises take two callbacks: resolve and reject.
$browser->get($url)->then(
function (ResponseInterface $r) {
echo "ok: " . $r->getStatusCode() . "\n";
},
function (Throwable $e) {
echo "error: " . $e->getMessage() . "\n";
}
);
Or catch separately:
$browser->get($url)
->then(fn($r) => process($r))
->catch(fn($e) => log_error($e));
\React\Promise\all rejects on the first failure. For "wait for all, collecting errors," use \React\Promise\settle().
ReactPHP vs Symfony HttpClient
Symfony HttpClient with stream() already runs requests concurrently via curl multi-handle, without async syntax:
$client = HttpClient::create();
$responses = array_map(fn($url) => $client->request('GET', $url), $urls);
foreach ($client->stream($responses) as $resp => $chunk) {
if ($chunk->isLast()) {
// ...
}
}
For straight-line "fire N requests, gather responses" patterns, Symfony HttpClient is shorter and simpler. ReactPHP earns its place when:
- The application is a long-lived process (event-driven server, queue consumer, scheduler).
- You need cooperative multitasking with timers, sockets, file I/O, not just HTTP.
- You want JavaScript-style promise composition.
Integration with Symfony
ReactPHP runs alongside a Symfony Console command:
protected function execute(InputInterface $i, OutputInterface $o): int
{
$browser = new Browser();
$browser->get($url)->then(
function ($r) use ($o) {
$o->writeln((string) $r->getBody());
}
);
Loop::run();
return Command::SUCCESS;
}
The Console command provides DI, args, output formatting. The body uses ReactPHP for I/O. Two ecosystems, one process.
Limits and gotchas
- Blocking PHP code stalls the loop. A long
usleep()orfile_get_contentsto a slow URL freezes everything. Use ReactPHP-native libraries where possible. - Most PHP libraries are sync. Database drivers (PDO), file libraries, even some HTTP wrappers. Bridging is awkward, usually means a worker thread or shoving I/O through ReactPHP equivalents.
- Debugging is harder. Stack traces span event loop ticks. Tools like
clue/php-async-debughelp.
These are the reasons many PHP scrapers don't bother with ReactPHP. The performance ceiling is high; the entry cost is real.
Hands-on lab
Against /api/products:
- Write a ReactPHP script that fetches 20 pages concurrently.
- Add the Pool pattern with concurrency=5.
- Compare wall-clock time to the same script using Symfony HttpClient's stream().
For most scrapers, Symfony HttpClient wins on simplicity for similar throughput. ReactPHP wins when the rest of your app is already async.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.