Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.16intermediate5 min read

Puppeteer in Node.js

Google's own browser-automation library, the Chromium-only ancestor of Playwright. Smaller, simpler, and still excellent for Chrome-specific scrapes.

What you’ll learn

  • Install Puppeteer and run a script that drives Chromium.
  • Translate Playwright Node concepts to Puppeteer equivalents.
  • Choose Puppeteer over Playwright when CDP-native features (devtools recording, performance traces) matter.
  • Use `puppeteer-extra` and stealth plugins when needed.

Puppeteer was the library Microsoft hired the team away from to build Playwright. It's still maintained by Google, still excellent, still the right tool when you need Chromium-only with raw CDP access. If you're inheriting a Puppeteer codebase or specifically want devtools-protocol features, this is where you start.

Install

npm install puppeteer

Puppeteer downloads a known-good Chromium build during npm install (~150 MB). To skip that and use your system Chrome:

npm install puppeteer-core

puppeteer-core is the same library without the bundled browser. You point it at any existing Chromium installation via executablePath.

Your first scraper

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://practice.scrapingcentral.com/");
  console.log(await page.title());
  console.log(await page.$eval("h1", el => el.innerText));
  await browser.close();
})();

Almost identical to Playwright Node. The semantic shifts:

Playwright Puppeteer
chromium.launch() puppeteer.launch() (Chromium-only)
page.locator(sel).innerText() page.$eval(sel, el => el.innerText)
page.locator(sel).all() page.$$eval(sel, els => ...)
page.expect_response(...) page.waitForResponse(...)
page.route(...) page.on('request', req => ...) / setRequestInterception(true)
BrowserContext browser.createIncognitoBrowserContext()

You'll notice Puppeteer is less declarative. There's no locator abstraction, you query elements with $ (single) and $$ (multi), and act on the returned handles. That makes Puppeteer's surface smaller but more verbose for complex scripts.

$eval vs $$eval

The Puppeteer idiom: run JS in the page context, returning serializable values.

// Single element → run function in page, return result
const title = await page.$eval("h1", el => el.innerText);

// Multiple elements → run function with array of elements
const links = await page.$$eval("a[href]", anchors =>
  anchors.map(a => ({ text: a.innerText, href: a.href }))
);

// More control via $ / $$ that return ElementHandles
const handle = await page.$(".product-card");
await handle.click();

$eval/$$eval are convenient for read-only extraction. $/$$ give you handles for interaction. ElementHandles share the same stale-element risk as Selenium WebElements, they're snapshots, not lazy queries.

Waiting

Puppeteer has the wait primitives but spells them slightly differently:

await page.waitForSelector(".product-card", { timeout: 10000 });
await page.waitForSelector(".spinner", { hidden: true });
await page.waitForFunction(() => document.querySelectorAll(".item").length >= 20);
await page.waitForNavigation({ waitUntil: "domcontentloaded" });
await page.waitForResponse(r => r.url().includes("/api/products"));

waitForSelector with { hidden: true } is the spinner-wait pattern. waitForFunction is the same JS-predicate poll as Playwright.

Actions

await page.click(".add-to-cart");
await page.type("#search", "yellow mug");
await page.hover(".tooltip-target");
await page.focus("#email");
await page.keyboard.press("Enter");

page.click(selector) is the Puppeteer shortcut, it finds the element and clicks in one call. Same for page.type, page.hover, page.focus. You can also act on handles:

const button = await page.$(".add-to-cart");
await button.click();

For drag-and-drop:

const source = await page.$("#item-1");
const target = await page.$("#drop-zone");
await page.mouse.move(...elementCenter(source));
await page.mouse.down();
await page.mouse.move(...elementCenter(target), { steps: 10 });
await page.mouse.up();

Puppeteer doesn't have a high-level dragTo. You drive the mouse manually.

Network interception

await page.setRequestInterception(true);

page.on("request", req => {
  if (req.resourceType() === "image") {
  req.abort();
  } else {
  req.continue();
  }
});

Puppeteer's request interception is at the protocol level. Aborting images cuts a typical page load by 50-70% (Lesson 2.24). Note: once interception is enabled, every request must be explicitly continued or aborted, or it hangs.

CDP access, Puppeteer's real strength

Puppeteer exposes the raw Chrome DevTools Protocol session:

const client = await page.target().createCDPSession();

// Network throttling
await client.send("Network.emulateNetworkConditions", {
  offline: false,
  downloadThroughput: 500 * 1024,  // 500 KB/s
  uploadThroughput: 500 * 1024,
  latency: 100,
});

// Performance traces
await client.send("Tracing.start");
// ... actions ...
await client.send("Tracing.end");

You can do anything Chrome DevTools can do, performance traces, heap snapshots, coverage analysis, accessibility tree extraction. Playwright wraps some of this; Puppeteer lets you reach the raw protocol.

Stealth, puppeteer-extra

The community wraps Puppeteer with plugins, the most important being stealth:

npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch();
// ... use as normal puppeteer ...

Stealth applies ~17 fingerprint patches: webdriver flag, navigator.plugins, languages, chrome runtime, etc. Lesson 2.28 covers this in depth. Playwright has its own stealth ecosystem; Puppeteer's is older, more mature, and the canonical reference.

When Puppeteer beats Playwright

  • CDP-native work. Performance traces, coverage, network conditions emulation, Puppeteer exposes these directly.
  • Smaller surface. Less to learn for one-off scripts.
  • Established stealth ecosystem. puppeteer-extra-plugin-stealth is the gold standard.
  • Google's first-party Chromium tooling. Puppeteer ships alongside Chrome; new DevTools features land here first.

When Playwright is the better default:

  • Multi-browser. Playwright runs Firefox and WebKit too.
  • Locator API. Auto-waiting eliminates whole categories of bugs.
  • Trace viewer and codegen. First-class developer tooling.

For new projects in 2026, Playwright is usually the right pick. For Chromium-specific deep work, Puppeteer remains relevant.

Hands-on lab

Open /challenges/dynamic/drag-drop/list-reorder. Write a Puppeteer script that drags the second item to the first position using page.mouse primitives. Compare the line count to a Playwright dragTo() equivalent, you'll see why high-level abstractions matter.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/drag-drop/list-reorder

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Puppeteer in Node.js1 / 8

What's the difference between `puppeteer` and `puppeteer-core`?

Score so far: 0 / 0