Building a Scraping API with API Platform
Expose scraped data as a queryable REST/GraphQL API in a few hours. API Platform turns Doctrine entities into production endpoints with filters, pagination, and OpenAPI docs.
What you’ll learn
- Expose a Product entity as a REST collection and item endpoint.
- Add filters, ordering, and pagination.
- Secure the API and add a custom search endpoint.
You've scraped 200,000 products. Now what? If clients (internal apps, partners, your own SaaS) need to query the data, an API beats handing out database dumps. API Platform turns Doctrine entities into production endpoints in minutes.
Install
composer require api
That pulls in api-platform/core and wires up the bundle. By default it exposes both REST (JSON-LD, JSON:API, HAL) and GraphQL, plus an interactive Swagger UI at /api.
Expose an entity
<?php
// src/Entity/Product.php
namespace App\Entity;
use ApiPlatform\Metadata\ApiResource;
use ApiPlatform\Metadata\GetCollection;
use ApiPlatform\Metadata\Get;
use Doctrine\ORM\Mapping as ORM;
#[ORM\Entity]
#[ApiResource(
operations: [
new GetCollection(),
new Get(),
],
paginationItemsPerPage: 30,
paginationMaximumItemsPerPage: 100,
)]
class Product
{
#[ORM\Id, ORM\GeneratedValue, ORM\Column]
public ?int $id = null;
#[ORM\Column(length: 500)]
public string $url;
#[ORM\Column]
public string $title;
#[ORM\Column(type: 'decimal', precision: 10, scale: 2, nullable: true)]
public ?string $price = null;
}
That's it. You now have:
GET /api/products, paginated collectionGET /api/products/{id}, single resource- OpenAPI 3 spec at
/api/docs.json - Swagger UI at
/api
By default, only read endpoints (Get, GetCollection). Add Post, Patch, Delete to expose write operations, usually you don't, for scraped data.
Filters
use ApiPlatform\Doctrine\Orm\Filter\SearchFilter;
use ApiPlatform\Doctrine\Orm\Filter\RangeFilter;
use ApiPlatform\Doctrine\Orm\Filter\OrderFilter;
use ApiPlatform\Metadata\ApiFilter;
#[ApiResource(...)]
#[ApiFilter(SearchFilter::class, properties: ['title' => 'partial', 'sku' => 'exact'])]
#[ApiFilter(RangeFilter::class, properties: ['price'])]
#[ApiFilter(OrderFilter::class, properties: ['price', 'scrapedAt'])]
class Product { ... }
Now clients can do:
GET /api/products?title=keyboard&price[gt]=50&price[lt]=200&order[price]=desc
API Platform translates query parameters into Doctrine QueryBuilder predicates. Add an index on the filtered column and you have a real product-search API.
Custom search endpoints
For complex queries beyond filter combos, add a custom operation backed by a Doctrine repository method:
#[ApiResource(
operations: [
new GetCollection(),
new Get(),
new GetCollection(
uriTemplate: '/products/search',
controller: ProductSearchController::class,
paginationEnabled: false,
),
],
)]
class Product { ... }
The controller:
class ProductSearchController extends AbstractController
{
public function __invoke(
Request $request,
ProductRepository $repo,
): JsonResponse {
$q = $request->query->get('q', '');
return $this->json($repo->fullTextSearch($q, limit: 50));
}
}
Use this when filters become awkward, full-text search, complex joins, computed fields.
Pagination
Defaults to 30 items per page. Configure per resource:
#[ApiResource(
paginationItemsPerPage: 30,
paginationMaximumItemsPerPage: 100,
paginationClientItemsPerPage: true, // ?itemsPerPage=50 allowed
)]
For very large collections, cursor pagination beats offset pagination on performance. API Platform supports both; configure paginationViaCursor.
Security
The minimum:
# config/packages/security.yaml
security:
firewalls:
api:
pattern: ^/api
stateless: true
custom_authenticator: App\Security\ApiKeyAuthenticator
A simple API-key authenticator validates X-API-Key headers against a database table. For production, layer:
- Per-key rate limits (Symfony RateLimiter).
- Per-operation
securityexpressions:security: "is_granted('ROLE_API')". - Per-property visibility groups (don't expose
raw_datato external clients).
#[ApiResource(
operations: [
new Get(security: "is_granted('ROLE_API')"),
new GetCollection(security: "is_granted('ROLE_API')"),
],
normalizationContext: ['groups' => ['product:read']],
)]
class Product
{
#[Groups(['product:read'])]
public string $title;
// No group annotation, internal only, not serialized
public ?array $rawData = null;
}
Serialization groups
Different consumers see different shapes. Public API hides raw_data, internal API includes everything. Group annotations on properties; group context on operations.
GraphQL
It's enabled by default. Visit /api/graphql. Queries:
{
products(first: 10, title: "keyboard") {
edges {
node {
id
title
price
}
}
}
}
For data APIs where clients want flexible queries, GraphQL is often easier than dozens of REST filter combos.
Caching
Add HTTP cache headers via cacheHeaders`:
#[ApiResource(
cacheHeaders: [
'max_age' => 60,
'shared_max_age' => 3600,
],
)]
For read-heavy public APIs, an HTTP cache (Varnish, Cloudflare) in front of API Platform cuts cost dramatically. The framework cooperates by sending appropriate Cache-Control and ETag headers.
When NOT to use API Platform
- The API is purely internal between two services you control. A small custom Symfony controller is less abstraction.
- You need extreme write-path performance. API Platform's abstractions add ms-level overhead per request; for raw write paths sometimes a hand-rolled controller is faster.
- You want full control over response shape. API Platform's serialization is opinionated.
For most scraping data products, those concerns don't apply. API Platform is the fastest way to ship a real API.
Hands-on lab
In your scraping project:
- Expose
Productas an API Platform resource withGetCollectionandGet. - Add SearchFilter on
title(partial) andcategory, RangeFilter onprice, OrderFilter onscrapedAt. - Visit
/api/products?title=keyboard&price[gt]=50&order[scrapedAt]=desc&page=1and verify results. - Add an API key authenticator. Block requests without
X-API-Key.
In one afternoon you've turned a Postgres table into a versioned, documented, queryable, secured public API. The framework leverage is real.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.