Form Submission with CSRF Tokens, Static Scraping

Most forms hide a CSRF token to block bots. Fetch the form, extract the token, submit it back along with your real fields, the canonical scraper pattern.

A CSRF (Cross-Site Request Forgery) token is a random per-session value the server embeds in a form. To submit the form, the client must include it. Browsers do this automatically because they fetched the form first. Scrapers must replicate the flow.

How to spot one

In DevTools, on a form page, look for a hidden input like:

<form method="post" action="/submit">
  <input type="hidden" name="csrf_token" value="a1b2c3d4...">
  <input type="hidden" name="_csrf" value="...">
  <input type="text" name="username">
  ...
</form>

Common field names: csrf_token, _csrf, _token (Laravel), csrfmiddlewaretoken (Django), __RequestVerificationToken (.NET).

Often, the token is ALSO set in a cookie (XSRF-TOKEN) and sometimes in a meta tag (<meta name="csrf-token">). The double-submit pattern: include the token both in the form body AND in a header (or cookie).

The flow

1. GET the form page  → server sets Set-Cookie: csrf=ABC
  → response HTML contains <input name=csrf value=ABC>
2. POST with same csrf value → server validates body token == cookie token → accept

Skip step 1 and you have no token to send. Reuse the same Session/Client/Browser so the cookie carries.

Python: full implementation

import requests
from bs4 import BeautifulSoup

s = requests.Session()
s.headers["User-Agent"] = "Mozilla/5.0 ..."

# 1. GET the form
r = s.get("https://practice.scrapingcentral.com/challenges/static/forms/csrf")
soup = BeautifulSoup(r.content, "lxml")

# 2. Extract token
token_input = soup.select_one('input[name="csrf_token"]')
token = token_input["value"]
print("Token:", token)

# 3. Submit
r = s.post(
  "https://practice.scrapingcentral.com/challenges/static/forms/csrf",
  data={
  "csrf_token": token,
  "username":  "alice",
  "color":  "blue",
  },
)
print(r.status_code, r.text[:300])

Three things to verify:

The form's action URL, sometimes different from the page URL. Get it from <form action="...">.
The form's method, POST (almost always for CSRF-protected forms).
The hidden field's exact name, csrf_token vs _csrf vs whatever.

A more robust helper

def extract_form(soup, form_selector="form"):
  """Return (action, method, fields_dict) from a form in the page."""
  form = soup.select_one(form_selector)
  if not form:
  raise ValueError(f"No form matching {form_selector}")
  action = form.get("action", "")
  method = form.get("method", "get").lower()
  fields = {}
  for inp in form.select("input, select, textarea"):
  name = inp.get("name")
  if not name:
  continue
  fields[name] = inp.get("value", "")
  return action, method, fields

# Usage
action, method, fields = extract_form(soup)
fields["username"] = "alice"
fields["color"]  = "blue"
# CSRF tokens, hidden state, everything else is already in `fields`
r = s.post(action, data=fields)

This pattern handles dozens of hidden state fields without enumerating them. Especially useful for legacy ASP.NET sites with their giant __VIEWSTATE and __EVENTVALIDATION blobs.

PHP: BrowserKit does it for you

use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$browser = new HttpBrowser(HttpClient::create());
$crawler = $browser->request('GET', 'https://practice.scrapingcentral.com/challenges/static/forms/csrf');

$form = $crawler->selectButton('Submit')->form([
  'username' => 'alice',
  'color'  => 'blue',
]);
// csrf_token is automatically populated with its existing value
$browser->submit($form);

echo $browser->getResponse()->getStatusCode();

This is why BrowserKit is great for stateful flows, the CSRF token handling is invisible. ->form([...]) captures every field including hidden ones; you only override the visible ones.

When the token is in a cookie or header

Some sites use the double-submit pattern: the token is in a cookie, and you must send it back as a HEADER (often X-CSRF-Token or X-XSRF-TOKEN) rather than as a form field.

# 1. Get the token from the cookie
s.get("https://practice.scrapingcentral.com/...")
token = s.cookies.get("XSRF-TOKEN")

# 2. Send it as a header on the POST
s.post(url, json=payload, headers={"X-XSRF-TOKEN": token})

Inspect DevTools to see which pattern the site uses, body field, cookie, header, or all three.

When the token rotates

Some sites issue a new token after every form submission. The previous token becomes invalid. For multi-step flows (e.g. checkout: cart → address → payment), re-extract the token from each response:

def get_csrf(soup):
  inp = soup.select_one('input[name="csrf_token"]')
  return inp["value"] if inp else None

r = s.get(checkout_url)
soup = BeautifulSoup(r.content, "lxml")
token = get_csrf(soup)

r = s.post(checkout_url, data={"csrf_token": token...})
soup = BeautifulSoup(r.content, "lxml")
token = get_csrf(soup)  # NEW token for the next step

r = s.post(address_url, data={"csrf_token": token...})
# ... and so on

Forget to re-extract and you get "CSRF validation failed" on step 2.

Common mistakes

Submitting to the page URL instead of the form's action. Always read the form's action attribute.
Using data= when the server wants JSON. Inspect the browser's submission Content-Type.
Reusing a stale token. Re-extract per step in multi-step flows.
Wrong cookie scope. If the form is on app.example.com but the cookie is set on auth.example.com, your scraper Session may not include it. Visit the cookie's origin first.

Why CSRF exists (briefly)

The CSRF token defends against an attacker getting a logged-in user's browser to submit a forged form. The attacker can't read the token (same-origin policy), so they can't forge a valid submission. For scrapers, you're the same browser flow as the user, you DO have access to the token. CSRF was never meant to block legitimate automation; it incidentally blocks lazy bots that skip step 1.

Hands-on lab

Hit /challenges/static/forms/csrf. Try submitting the form without a token first, confirm you get rejected (likely 403 or a "missing CSRF" error page). Then fetch the form, extract the hidden token, submit with it included. Confirm the success response. Finally, try submitting the SAME token twice, does the site rotate tokens, or accept the same one twice?

Form Submission with CSRF Tokens

What you’ll learn

How to spot one

The flow

Python: full implementation

A more robust helper

PHP: BrowserKit does it for you

When the token is in a cookie or header

When the token rotates

Common mistakes

Why CSRF exists (briefly)

Hands-on lab

Hands-on lab

Quiz, check your understanding

What's the canonical flow for submitting a CSRF-protected form from a scraper?