Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.11intermediate5 min read

Async/Await Patterns in Node Scrapers

Async is the model, but the wrong patterns leak browsers, deadlock on shared state, and silently swallow errors. Four idioms to internalise.

What you’ll learn

  • Distinguish sequential, parallel, and bounded-concurrency patterns and when each is right.
  • Handle errors in async scrapers without losing partial results.
  • Bound concurrency with `p-limit` or hand-rolled semaphores to avoid resource exhaustion.
  • Avoid the three classic async bugs: unawaited promises, shared mutable state, and unhandled rejections.

Node's async model lets one process drive dozens of browser contexts in parallel. That power comes with sharp edges, orphan browsers, silent failures, exhausted memory. Four patterns cover every scraper you'll write. Internalise them and you'll never write a brittle async scraper again.

Pattern 1: sequential

Process URLs one at a time. Simple, predictable, slow:

const { chromium } = require("playwright");

async function scrapeOne(page, url) {
  await page.goto(url);
  return await page.locator("h1").innerText();
}

(async () => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();

  const urls = ["/products?page=1", "/products?page=2", "/products?page=3"];
  const results = [];
  for (const u of urls) {
  results.push(await scrapeOne(page, `https://practice.scrapingcentral.com${u}`));
  }

  await browser.close();
  console.log(results);
})();

One page reused across all URLs. Cheap on resources, easy to debug. Use this when:

  • You're learning the API.
  • Order matters (each URL depends on the previous).
  • You're rate-limited and need polite spacing anyway.

Pattern 2: full parallel

Fire all URLs simultaneously and wait for all to finish:

const browser = await chromium.launch();
const context = await browser.newContext();

const urls = [...Array(20).keys()].map(i => `/products?page=${i+1}`);

const results = await Promise.all(
  urls.map(async (path) => {
  const page = await context.newPage();
  try {
  await page.goto(`https://practice.scrapingcentral.com${path}`);
  return await page.locator(".product-card").count();
  } finally {
  await page.close();
  }
  })
);

Twenty pages open at once inside one context. Fast on a few URLs, lethal on many. Each page is 30–50 MB plus the goto memory spike. Twenty pages might fit; two hundred won't.

Promise.all also rejects fast, if one page errors, the whole batch rejects (though others keep running until completion). For partial-failure tolerance use Promise.allSettled:

const settled = await Promise.allSettled(urls.map(scrape));
const ok = settled.filter(s => s.status === "fulfilled").map(s => s.value);
const failed = settled.filter(s => s.status === "rejected").map(s => s.reason);

Pattern 3: bounded concurrency

The pattern for real scraping: cap parallelism at N. Use p-limit:

npm install p-limit
const pLimit = require("p-limit");
const limit = pLimit(5);  // max 5 in flight

const urls = [...Array(200).keys()].map(i => `/products?page=${i+1}`);

const browser = await chromium.launch();
const context = await browser.newContext();

const results = await Promise.all(
  urls.map(path => limit(async () => {
  const page = await context.newPage();
  try {
  await page.goto(`https://practice.scrapingcentral.com${path}`);
  return await page.locator(".product-card").count();
  } finally {
  await page.close();
  }
  }))
);

await browser.close();

Five pages in flight at any moment, two hundred URLs total. The limit function queues tasks; as one completes, the next starts.

Pick N based on:

  • Memory. ~50 MB per page; with 8 GB available, ~100 pages max, but stay well below.
  • Target's rate limit. Even if your machine can handle 50, the site might throttle at 10.
  • Network bandwidth. Each page pulls megabytes; 50 concurrent goto's saturate most home connections.

For most scraping work, N = 3 to 10. Higher is rarely better.

Pattern 4: queue + workers

The "professional" pattern: a queue of work and a fixed pool of long-lived workers pulling from it.

async function worker(id, queue, context, results) {
  while (queue.length) {
  const url = queue.shift();
  if (!url) break;
  const page = await context.newPage();
  try {
  await page.goto(url);
  const count = await page.locator(".product-card").count();
  results.push({ url, count });
  } catch (e) {
  results.push({ url, error: String(e) });
  } finally {
  await page.close();
  }
  }
  console.log(`worker ${id} done`);
}

const urls = [...Array(50).keys()].map(i => `https://practice.scrapingcentral.com/products?page=${i+1}`);
const browser = await chromium.launch();
const context = await browser.newContext();
const results = [];
await Promise.all([1, 2, 3, 4, 5].map(id => worker(id, urls, context, results)));
await browser.close();

Five concurrent workers, each pulling URLs from the shared array until empty. More overhead than p-limit but easier to instrument (per-worker logging, per-worker retry counts, etc.). At scale, you'll prefer this, see Lesson 2.26.

Error handling: no silent failures

Two classic mistakes:

Mistake 1: missing await.

async function scrape(url) {
  page.goto(url);  // missing await
  return page.locator("h1").innerText();
}

page.goto returns a Promise that never gets awaited. The next line runs before navigation completes. You scrape the previous page's content, silently. Lint rules (no-floating-promises) catch this, turn them on.

Mistake 2: unhandled rejection.

urls.forEach(u => scrape(u));  // promise rejections go nowhere

Each scrape(u) returns a promise. forEach doesn't await them. If any reject, Node logs "UnhandledPromiseRejection" and (on recent versions) exits. Use Promise.all or a for...of with await.

Shared mutable state

Multiple workers writing to the same array is a hazard:

const seen = new Set();
async function worker(queue) {
  while (queue.length) {
  const url = queue.shift();  // race: two workers may read same length
  // ...
  seen.add(extractedId);  // safe: Set ops are atomic in Node
  }
}

Node is single-threaded, JS code is atomic within a turn of the event loop. But anything spanning an await is interruptible. The queue.shift() example above is safe (no await between length check and shift), but if you replaced shift with an async read, two workers might grab the same URL. Be precise about which lines yield.

Pattern selection cheat sheet

Need Pattern
1–10 URLs, learning Sequential
1–20 URLs, no rate limit Full parallel
20+ URLs, rate-limit-aware Bounded with p-limit
100+ URLs, professional scrape Queue + workers

If unsure, default to bounded with N=5. Adjust based on observed memory and target tolerance.

Hands-on lab

Open /challenges/dynamic/heavy-dom/10k-items. The page is intentionally large (10,000 DOM nodes), perfect for testing your patterns. Scrape it three times: once sequentially, once with Promise.all on the per-page extraction blocks, and once with p-limit(5) over a list of 20 URLs to the same page. Measure memory and time. You should see bounded concurrency win on both.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/heavy-dom/10k-items

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Async/Await Patterns in Node Scrapers1 / 8

What's the main risk of `Promise.all(urls.map(scrape))` when `urls` has 200 entries and each opens a Playwright page?

Score so far: 0 / 0