CSRF Tokens, Capturing Dynamically
POST endpoints often require a one-time CSRF token. Static capture breaks immediately; you must fetch the token, then use it, in a single flow.
What you’ll learn
- Identify CSRF protection on a POST endpoint.
- Capture a fresh CSRF token from the page or a /csrf endpoint.
- Submit it correctly via header or hidden form field.
- Handle token rotation per-request.
CSRF protection: the server issues a random token, embeds it on the page, and requires it back on any state-changing POST. If a malicious site forges a POST in your browser, it can't include the secret token (cross-origin rules block it), so the server rejects.
For scrapers, CSRF is rarely about security, it's an obstacle. Capture the token, send it back, done. The trick is that each token is often single-use or short-lived. Hard-coding doesn't work.
How to spot CSRF protection
Signs in DevTools:
- A request header like
X-CSRF-Token,X-XSRF-Token,X-CSRFToken, orcsrf-token. - A hidden form field:
<input type="hidden" name="_csrf" value="...">orname="authenticity_token"(Rails) orname="__RequestVerificationToken"(.NET). - A meta tag in
<head>:<meta name="csrf-token" content="...">. - A dedicated endpoint:
/csrf,/api/csrf-token,/sanctum/csrf-cookie(Laravel).
Captured curls usually include the header; if you strip it, the POST returns 419/403/422.
The pattern
1. GET the page (or /csrf endpoint) to capture the token.
2. POST with the token in the right place (header or body).
3. (Sometimes) repeat, token may rotate per request.
Catalog108 example
The /challenges/api/auth/csrf-form lab issues a CSRF token via a <meta name="csrf-token"> tag in the HTML and requires X-CSRF-Token: <token> on the form POST.
import requests, re
s = requests.Session()
# 1. Fetch the form page; extract the token
r = s.get("https://practice.scrapingcentral.com/challenges/api/auth/csrf-form")
m = re.search(r'<meta name="csrf-token" content="([^"]+)"', r.text)
token = m.group(1)
# 2. POST with the token
r = s.post(
"https://practice.scrapingcentral.com/challenges/api/auth/csrf-form",
headers={"X-CSRF-Token": token},
data={"name": "Scraper", "message": "Hello"},
)
print(r.status_code, r.json())
Same session used for both, the token is sometimes paired with a cookie (XSRF-TOKEN) so the server can correlate.
PHP version
use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;
$jar = new CookieJar();
$client = new Client([
'base_uri' => 'https://practice.scrapingcentral.com',
'cookies' => $jar,
]);
// Fetch the form page
$html = $client->get('/challenges/api/auth/csrf-form')->getBody()->getContents();
preg_match('/<meta name="csrf-token" content="([^"]+)"/', $html, $m);
$token = $m[1];
// POST
$res = $client->post('/challenges/api/auth/csrf-form', [
'headers' => ['X-CSRF-Token' => $token],
'form_params' => ['name' => 'Scraper', 'message' => 'Hello'],
]);
echo $res->getBody()->getContents();
Token-in-body pattern
Some servers expect the token in the POST body, not a header:
r = s.post(
"...",
data={
"name": "Scraper",
"message": "Hello",
"_csrf": token, # hidden form field
},
)
The actual field name varies, _csrf, authenticity_token, csrfmiddlewaretoken (Django), __RequestVerificationToken (ASP.NET). Check the form's HTML.
The Double-Submit Cookie pattern
Some sites issue the token as both a cookie (XSRF-TOKEN) AND require it back as a header (X-XSRF-TOKEN). The server compares them, they must match.
s.get("https://example.com/") # sets XSRF-TOKEN cookie
token = s.cookies.get("XSRF-TOKEN")
r = s.post(
"https://example.com/api/action",
headers={"X-XSRF-TOKEN": token},
json={"...": "..."},
)
This is the Laravel/Angular default. Works automatically if you use Session() + read the cookie.
SPA-style: fetch the token via endpoint
Modern SPAs often have a dedicated GET:
csrf = s.get("https://example.com/api/csrf").json()["token"]
r = s.post("...", headers={"X-CSRF-Token": csrf}, json=payload)
Laravel Sanctum: hit /sanctum/csrf-cookie first; it sets XSRF-TOKEN cookie; you then send it as X-XSRF-TOKEN on POSTs.
Token rotation, single-use tokens
The trickiest case: each token is consumed on use. A second POST with the same token fails. Scrapers must fetch a fresh token before every POST:
def post_with_csrf(s, url, **kw):
page = s.get(url)
token = re.search(r'name="_csrf"\s+value="([^"]+)"', page.text).group(1)
return s.post(url, headers={"X-CSRF-Token": token}, **kw)
for record in records:
post_with_csrf(s, "https://example.com/submit", json=record)
Yes, this means an extra GET per POST. The site's frontend does the same thing. Live with it.
Wrapping CSRF in a client class
class CsrfClient:
BASE = "https://practice.scrapingcentral.com"
def __init__(self):
self.s = requests.Session()
def _fresh_token(self, page_path):
r = self.s.get(f"{self.BASE}{page_path}")
m = re.search(r'<meta name="csrf-token" content="([^"]+)"', r.text)
return m.group(1)
def submit_form(self, payload):
token = self._fresh_token("/challenges/api/auth/csrf-form")
r = self.s.post(
f"{self.BASE}/challenges/api/auth/csrf-form",
headers={"X-CSRF-Token": token},
data=payload,
)
r.raise_for_status()
return r.json()
Errors and what they mean
- 419 Page Expired (Laravel), CSRF token missing or invalid.
- 422 Unprocessable Entity, form validation failed, possibly including CSRF.
- 403 Forbidden + 'CSRF verification failed', Django.
- 400 Bad Request + 'invalid token', generic.
Always check the response body; servers usually say what failed.
Hands-on lab
Hit /challenges/api/auth/csrf-form in your browser. Inspect the POST in DevTools, note the X-CSRF-Token header and find where the token lives in the HTML (meta tag, hidden field, both). Write a scraper that GETs the page, extracts the token, then POSTs the form. Confirm it works, then try POSTing twice with the same token, most likely the second fails. Adjust your scraper to fetch a fresh token each time.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/api/auth/csrf-formQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.