Guide
How to Scrape Mobile App Data (API Reverse Engineering)
Learn how to reverse engineer mobile app APIs to extract data. Covers traffic interception with mitmproxy, API analysis, and Python implementation.
Many valuable datasets are only accessible through mobile apps. By intercepting and reverse engineering their API calls, you can extract this data programmatically.
The Process Overview
- Intercept traffic between the mobile app and its servers
- Analyze the API endpoints, authentication, and data format
- Reproduce the requests in Python
Step 1: Set Up mitmproxy
mitmproxy is the standard tool for intercepting mobile app traffic.
pip install mitmproxy
mitmproxy --listen-port 8080
Configure your phone to use your computer as a proxy:
- Set HTTP proxy to your computer's IP, port 8080
- Install the mitmproxy CA certificate from http://mitm.it on your phone
Step 2: Capture and Analyze API Calls
Open the target app and perform the actions you want to scrape. mitmproxy captures all HTTPS traffic.
# Save interesting requests to a file with a mitmproxy addon
# save_requests.py
from mitmproxy import http
import json
def response(flow: http.HTTPFlow):
if "api.targetapp.com" in flow.request.url:
data = {
"url": flow.request.url,
"method": flow.request.method,
"headers": dict(flow.request.headers),
"request_body": flow.request.text,
"response_status": flow.response.status_code,
"response_body": flow.response.text[:2000]
}
with open("captured_apis.jsonl", "a") as f:
f.write(json.dumps(data) + "\n")
Run it: mitmproxy -s save_requests.py
Step 3: Reproduce in Python
Once you identify the API endpoints, reproduce them.
import requests
import hashlib
import time
class MobileAppScraper:
def __init__(self):
self.session = requests.Session()
self.base_url = "https://api.targetapp.com/v2"
# Headers captured from mitmproxy
self.session.headers.update({
"User-Agent": "TargetApp/3.5.1 (iPhone; iOS 17.4)",
"X-App-Version": "3.5.1",
"X-Device-Id": "unique-device-id",
"Accept": "application/json",
"Authorization": "Bearer YOUR_TOKEN"
})
def get_listings(self, page=1):
response = self.session.get(
f"{self.base_url}/listings",
params={"page": page, "per_page": 50}
)
return response.json()
def get_details(self, item_id):
response = self.session.get(f"{self.base_url}/listings/{item_id}")
return response.json()
scraper = MobileAppScraper()
listings = scraper.get_listings(page=1)
print(f"Found {len(listings.get('data', []))} listings")
Handling Common Protections
Request Signing
Many apps sign requests with HMAC or custom hashing.
def sign_request(path, timestamp, secret):
"""Reproduce the app's request signing logic."""
message = f"{path}{timestamp}"
signature = hashlib.sha256(f"{message}{secret}".encode()).hexdigest()
return signature
Certificate Pinning
Apps with certificate pinning reject mitmproxy's certificate. Solutions:
- Use Frida to bypass pinning at runtime
- Use an older app version without pinning
- Use an Android emulator with a patched system image
Alternative: ScraperAPI for Web Versions
Many mobile apps have web counterparts. When available, scraping the web version with ScraperAPI is simpler than reverse engineering the mobile API.
Key Takeaways
- mitmproxy is essential for understanding mobile app APIs
- Most mobile APIs return clean JSON, which is easier to parse than HTML
- Watch for request signing, token refresh, and certificate pinning
- Always check if a web version or official API exists first