Tutorial
Web Scraping with Rust - Fast and Efficient
Learn web scraping with Rust using reqwest and scraper crates. Covers async scraping, HTML parsing, and building high-performance scrapers.
Rust offers the best performance for CPU-intensive scraping tasks. With zero-cost abstractions and no garbage collector, Rust scrapers process data faster and use less memory than any alternative.
Setup
Add dependencies to Cargo.toml:
[dependencies]
reqwest = { version = "0.12", features = ["json", "cookies"] }
scraper = "0.21"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
Basic Scraper
use reqwest;
use scraper::{Html, Selector};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let response = reqwest::get("https://example.com")
.await?
.text()
.await?;
let document = Html::parse_document(&response);
let title_selector = Selector::parse("h1").unwrap();
for element in document.select(&title_selector) {
println!("Title: {}", element.text().collect::<String>());
}
Ok(())
}
Scraping with Structured Output
use serde::Serialize;
use scraper::{Html, Selector};
#[derive(Serialize, Debug)]
struct Product {
name: String,
price: String,
url: String,
}
async fn scrape_products(html: &str) -> Vec<Product> {
let document = Html::parse_document(html);
let card_selector = Selector::parse(".product-card").unwrap();
let name_selector = Selector::parse(".product-name").unwrap();
let price_selector = Selector::parse(".product-price").unwrap();
let link_selector = Selector::parse("a").unwrap();
let mut products = Vec::new();
for card in document.select(&card_selector) {
let name = card.select(&name_selector)
.next()
.map(|e| e.text().collect::<String>())
.unwrap_or_default();
let price = card.select(&price_selector)
.next()
.map(|e| e.text().collect::<String>())
.unwrap_or_default();
let url = card.select(&link_selector)
.next()
.and_then(|e| e.value().attr("href"))
.unwrap_or_default()
.to_string();
products.push(Product { name, price, url });
}
products
}
Concurrent Scraping
use futures::stream::{self, StreamExt};
use reqwest::Client;
async fn scrape_urls(urls: Vec<String>, concurrency: usize) -> Vec<String> {
let client = Client::new();
let results: Vec<String> = stream::iter(urls)
.map(|url| {
let client = client.clone();
async move {
match client.get(&url).send().await {
Ok(resp) => resp.text().await.unwrap_or_default(),
Err(e) => {
eprintln!("Error fetching {}: {}", url, e);
String::new()
}
}
}
})
.buffer_unordered(concurrency)
.collect()
.await;
results
}
#[tokio::main]
async fn main() {
let urls: Vec<String> = (1..=100)
.map(|i| format!("https://example.com/page/{}", i))
.collect();
let pages = scrape_urls(urls, 10).await;
println!("Scraped {} pages", pages.len());
}
Using with ScraperAPI
use reqwest::Client;
async fn scrape_with_api(url: &str) -> Result<String, reqwest::Error> {
let client = Client::new();
let api_key = "YOUR_SCRAPERAPI_KEY";
let response = client.get("http://api.scraperapi.com")
.query(&[
("api_key", api_key),
("url", url)
])
.send()
.await?
.text()
.await?;
Ok(response)
}
Performance Comparison
| Language | 10K pages parse time | Memory usage |
|---|---|---|
| Rust | ~2 seconds | ~50 MB |
| Go | ~5 seconds | ~100 MB |
| Python | ~30 seconds | ~300 MB |
| Node.js | ~15 seconds | ~200 MB |
HTML parsing benchmarks on a standard VPS. Network time excluded.
When to Choose Rust for Scraping
- You are processing millions of pages and need maximum throughput
- Memory is constrained (edge computing, small VPS)
- You are building a long-running scraping service
- The scraping logic is stable and will not change frequently
For rapid prototyping or sites requiring browser rendering, Python with ScraperAPI is more practical. Use Rust when raw performance justifies the development time.