Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Guide

How to Scrape Mobile App Data (API Reverse Engineering)

Learn how to reverse engineer mobile app APIs to extract data. Covers traffic interception with mitmproxy, API analysis, and Python implementation.

Many valuable datasets are only accessible through mobile apps. By intercepting and reverse engineering their API calls, you can extract this data programmatically.

The Process Overview

  1. Intercept traffic between the mobile app and its servers
  2. Analyze the API endpoints, authentication, and data format
  3. Reproduce the requests in Python

Step 1: Set Up mitmproxy

mitmproxy is the standard tool for intercepting mobile app traffic.

pip install mitmproxy
mitmproxy --listen-port 8080

Configure your phone to use your computer as a proxy:

  • Set HTTP proxy to your computer's IP, port 8080
  • Install the mitmproxy CA certificate from http://mitm.it on your phone

Step 2: Capture and Analyze API Calls

Open the target app and perform the actions you want to scrape. mitmproxy captures all HTTPS traffic.

# Save interesting requests to a file with a mitmproxy addon
# save_requests.py
from mitmproxy import http
import json

def response(flow: http.HTTPFlow):
    if "api.targetapp.com" in flow.request.url:
        data = {
            "url": flow.request.url,
            "method": flow.request.method,
            "headers": dict(flow.request.headers),
            "request_body": flow.request.text,
            "response_status": flow.response.status_code,
            "response_body": flow.response.text[:2000]
        }
        with open("captured_apis.jsonl", "a") as f:
            f.write(json.dumps(data) + "\n")

Run it: mitmproxy -s save_requests.py

Step 3: Reproduce in Python

Once you identify the API endpoints, reproduce them.

import requests
import hashlib
import time

class MobileAppScraper:
    def __init__(self):
        self.session = requests.Session()
        self.base_url = "https://api.targetapp.com/v2"
        # Headers captured from mitmproxy
        self.session.headers.update({
            "User-Agent": "TargetApp/3.5.1 (iPhone; iOS 17.4)",
            "X-App-Version": "3.5.1",
            "X-Device-Id": "unique-device-id",
            "Accept": "application/json",
            "Authorization": "Bearer YOUR_TOKEN"
        })

    def get_listings(self, page=1):
        response = self.session.get(
            f"{self.base_url}/listings",
            params={"page": page, "per_page": 50}
        )
        return response.json()

    def get_details(self, item_id):
        response = self.session.get(f"{self.base_url}/listings/{item_id}")
        return response.json()

scraper = MobileAppScraper()
listings = scraper.get_listings(page=1)
print(f"Found {len(listings.get('data', []))} listings")

Handling Common Protections

Request Signing

Many apps sign requests with HMAC or custom hashing.

def sign_request(path, timestamp, secret):
    """Reproduce the app's request signing logic."""
    message = f"{path}{timestamp}"
    signature = hashlib.sha256(f"{message}{secret}".encode()).hexdigest()
    return signature

Certificate Pinning

Apps with certificate pinning reject mitmproxy's certificate. Solutions:

  • Use Frida to bypass pinning at runtime
  • Use an older app version without pinning
  • Use an Android emulator with a patched system image

Alternative: ScraperAPI for Web Versions

Many mobile apps have web counterparts. When available, scraping the web version with ScraperAPI is simpler than reverse engineering the mobile API.

Key Takeaways

  • mitmproxy is essential for understanding mobile app APIs
  • Most mobile APIs return clean JSON, which is easier to parse than HTML
  • Watch for request signing, token refresh, and certificate pinning
  • Always check if a web version or official API exists first