Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.15intermediate4 min read

Building a Scraping API with API Platform

Expose scraped data as a queryable REST/GraphQL API in a few hours. API Platform turns Doctrine entities into production endpoints with filters, pagination, and OpenAPI docs.

What you’ll learn

  • Expose a Product entity as a REST collection and item endpoint.
  • Add filters, ordering, and pagination.
  • Secure the API and add a custom search endpoint.

You've scraped 200,000 products. Now what? If clients (internal apps, partners, your own SaaS) need to query the data, an API beats handing out database dumps. API Platform turns Doctrine entities into production endpoints in minutes.

Install

composer require api

That pulls in api-platform/core and wires up the bundle. By default it exposes both REST (JSON-LD, JSON:API, HAL) and GraphQL, plus an interactive Swagger UI at /api.

Expose an entity

<?php
// src/Entity/Product.php
namespace App\Entity;

use ApiPlatform\Metadata\ApiResource;
use ApiPlatform\Metadata\GetCollection;
use ApiPlatform\Metadata\Get;
use Doctrine\ORM\Mapping as ORM;

#[ORM\Entity]
#[ApiResource(
  operations: [
  new GetCollection(),
  new Get(),
  ],
  paginationItemsPerPage: 30,
  paginationMaximumItemsPerPage: 100,
)]
class Product
{
  #[ORM\Id, ORM\GeneratedValue, ORM\Column]
  public ?int $id = null;

  #[ORM\Column(length: 500)]
  public string $url;

  #[ORM\Column]
  public string $title;

  #[ORM\Column(type: 'decimal', precision: 10, scale: 2, nullable: true)]
  public ?string $price = null;
}

That's it. You now have:

  • GET /api/products, paginated collection
  • GET /api/products/{id}, single resource
  • OpenAPI 3 spec at /api/docs.json
  • Swagger UI at /api

By default, only read endpoints (Get, GetCollection). Add Post, Patch, Delete to expose write operations, usually you don't, for scraped data.

Filters

use ApiPlatform\Doctrine\Orm\Filter\SearchFilter;
use ApiPlatform\Doctrine\Orm\Filter\RangeFilter;
use ApiPlatform\Doctrine\Orm\Filter\OrderFilter;
use ApiPlatform\Metadata\ApiFilter;

#[ApiResource(...)]
#[ApiFilter(SearchFilter::class, properties: ['title' => 'partial', 'sku' => 'exact'])]
#[ApiFilter(RangeFilter::class, properties: ['price'])]
#[ApiFilter(OrderFilter::class, properties: ['price', 'scrapedAt'])]
class Product { ... }

Now clients can do:

GET /api/products?title=keyboard&price[gt]=50&price[lt]=200&order[price]=desc

API Platform translates query parameters into Doctrine QueryBuilder predicates. Add an index on the filtered column and you have a real product-search API.

Custom search endpoints

For complex queries beyond filter combos, add a custom operation backed by a Doctrine repository method:

#[ApiResource(
  operations: [
  new GetCollection(),
  new Get(),
  new GetCollection(
  uriTemplate: '/products/search',
  controller: ProductSearchController::class,
  paginationEnabled: false,
  ),
  ],
)]
class Product { ... }

The controller:

class ProductSearchController extends AbstractController
{
  public function __invoke(
  Request $request,
  ProductRepository $repo,
  ): JsonResponse {
  $q = $request->query->get('q', '');
  return $this->json($repo->fullTextSearch($q, limit: 50));
  }
}

Use this when filters become awkward, full-text search, complex joins, computed fields.

Pagination

Defaults to 30 items per page. Configure per resource:

#[ApiResource(
  paginationItemsPerPage: 30,
  paginationMaximumItemsPerPage: 100,
  paginationClientItemsPerPage: true,  // ?itemsPerPage=50 allowed
)]

For very large collections, cursor pagination beats offset pagination on performance. API Platform supports both; configure paginationViaCursor.

Security

The minimum:

# config/packages/security.yaml
security:
  firewalls:
  api:
  pattern: ^/api
  stateless: true
  custom_authenticator: App\Security\ApiKeyAuthenticator

A simple API-key authenticator validates X-API-Key headers against a database table. For production, layer:

  1. Per-key rate limits (Symfony RateLimiter).
  2. Per-operation security expressions: security: "is_granted('ROLE_API')".
  3. Per-property visibility groups (don't expose raw_data to external clients).
#[ApiResource(
  operations: [
  new Get(security: "is_granted('ROLE_API')"),
  new GetCollection(security: "is_granted('ROLE_API')"),
  ],
  normalizationContext: ['groups' => ['product:read']],
)]
class Product
{
  #[Groups(['product:read'])]
  public string $title;

  // No group annotation, internal only, not serialized
  public ?array $rawData = null;
}

Serialization groups

Different consumers see different shapes. Public API hides raw_data, internal API includes everything. Group annotations on properties; group context on operations.

GraphQL

It's enabled by default. Visit /api/graphql. Queries:

{
  products(first: 10, title: "keyboard") {
  edges {
  node {
  id
  title
  price
  }
  }
  }
}

For data APIs where clients want flexible queries, GraphQL is often easier than dozens of REST filter combos.

Caching

Add HTTP cache headers via cacheHeaders`:

#[ApiResource(
  cacheHeaders: [
  'max_age' => 60,
  'shared_max_age' => 3600,
  ],
)]

For read-heavy public APIs, an HTTP cache (Varnish, Cloudflare) in front of API Platform cuts cost dramatically. The framework cooperates by sending appropriate Cache-Control and ETag headers.

When NOT to use API Platform

  • The API is purely internal between two services you control. A small custom Symfony controller is less abstraction.
  • You need extreme write-path performance. API Platform's abstractions add ms-level overhead per request; for raw write paths sometimes a hand-rolled controller is faster.
  • You want full control over response shape. API Platform's serialization is opinionated.

For most scraping data products, those concerns don't apply. API Platform is the fastest way to ship a real API.

Hands-on lab

In your scraping project:

  1. Expose Product as an API Platform resource with GetCollection and Get.
  2. Add SearchFilter on title (partial) and category, RangeFilter on price, OrderFilter on scrapedAt.
  3. Visit /api/products?title=keyboard&price[gt]=50&order[scrapedAt]=desc&page=1 and verify results.
  4. Add an API key authenticator. Block requests without X-API-Key.

In one afternoon you've turned a Postgres table into a versioned, documented, queryable, secured public API. The framework leverage is real.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Building a Scraping API with API Platform1 / 8

What is the minimum to expose a Doctrine entity as a REST API via API Platform?

Score so far: 0 / 0