HTTP in Go: net/http Without a Framework, Go for Scrapers

The standard library is the framework. Build, send, customise HTTP requests in Go with net/http, headers, cookies, timeouts, proxies, and TLS settings.

Go's standard library is the framework. net/http is mature, fast, and used by every Go scraping tool on the planet (utls and tls-client included). No third-party HTTP client is needed for 95% of work; the niche where you reach for tls-client is specifically TLS fingerprinting, covered in GO5.

The quickest possible GET

import (
    "fmt"
    "io"
    "net/http"
)

func main() {
    resp, err := http.Get("https://practice.scrapingcentral.com/")
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)
    fmt.Println(resp.StatusCode, string(body[:200]))
}

Three things to internalise:

Always defer resp.Body.Close() right after the error check. Skipping this leaks connections and file descriptors. It's the Go equivalent of forgetting with in Python.
io.ReadAll is how you consume the whole body. For huge bodies, stream with io.Copy instead.
http.Get uses http.DefaultClient under the hood. For real scrapers you'll build your own client (see below).

Why you should always build your own Client

http.DefaultClient has no timeout. If a server hangs, your scraper hangs forever. Build a Client with sensible defaults:

import (
    "net/http"
    "time"
)

client := &http.Client{
    Timeout: 30 * time.Second,
}

resp, err := client.Get(url)

The Timeout covers the entire request: connection + TLS + headers + body read. If you want finer control, use Transport (below) and context.Context for per-request deadlines.

Adding headers

http.Get doesn't let you set headers. Build a Request and call client.Do(req):

req, err := http.NewRequest("GET", url, nil)
if err != nil {
    return err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (compatible; my-scraper/1.0)")
req.Header.Set("Accept", "text/html,application/xhtml+xml")
req.Header.Set("Accept-Language", "en-US,en;q=0.9")

resp, err := client.Do(req)

This is the form you'll use most often. Note that the default User-Agent Go sets is Go-http-client/1.1, instantly identifiable as a bot. Set a real browser UA.

POST with a body

For form POSTs:

import (
    "net/url"
    "strings"
)

form := url.Values{}
form.Set("username", "scraper")
form.Set("password", "letmein")

req, _ := http.NewRequest("POST", loginURL, strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")

resp, _ := client.Do(req)

For JSON POSTs:

import (
    "bytes"
    "encoding/json"
)

payload, _ := json.Marshal(map[string]string{
    "query": "laptops",
    "page":  "2",
})

req, _ := http.NewRequest("POST", apiURL, bytes.NewReader(payload))
req.Header.Set("Content-Type", "application/json")

resp, _ := client.Do(req)

bytes.NewReader and strings.NewReader both implement io.Reader, which is what http.NewRequest wants for the body.

Parsing JSON responses

type SearchResult struct {
    Total int `json:"total"`
    Items []struct {
        ID    string `json:"id"`
        Title string `json:"title"`
    } `json:"items"`
}

var result SearchResult
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
    return err
}
fmt.Println(result.Total, result.Items[0].Title)

The json:"total" struct tags map JSON field names to Go field names. Go field names must be capitalised (otherwise encoding/json can't see them).

For deeply nested or unknown JSON shapes, use map[string]any:

var data map[string]any
json.NewDecoder(resp.Body).Decode(&data)
// data["items"].([]any)[0].(map[string]any)["title"].(string)

It works but it's painful. Define a struct when you can.

Parsing HTML

Go's stdlib doesn't include an HTML parser. Use golang.org/x/net/html (semi-official) or the more ergonomic github.com/PuerkitoBio/goquery (jQuery-style selectors).

import "github.com/PuerkitoBio/goquery"

doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
    return err
}
doc.Find(".product").Each(func(i int, s *goquery.Selection) {
    title := s.Find("h2").Text()
    price, _ := s.Find(".price").Attr("data-price")
    fmt.Println(title, price)
})

goquery's API is BeautifulSoup-shaped. If you can write BS4, you can write goquery.

Cookies, persistence

By default, http.Client does not persist cookies. Each request is stateless. To survive logins and session-based sites, attach a cookie jar:

import "net/http/cookiejar"

jar, _ := cookiejar.New(nil)
client := &http.Client{
    Jar:     jar,
    Timeout: 30 * time.Second,
}

// Now any Set-Cookie on a response is stored, and any subsequent request
// to the same host carries those cookies.

If you want to inspect or set cookies manually:

for _, c := range jar.Cookies(parsedURL) {
    fmt.Println(c.Name, c.Value)
}

Proxies

You configure proxies on the Transport, not the Client:

import (
    "net/url"
    "net/http"
)

proxyURL, _ := url.Parse("http://user:pass@proxy.example.com:8080")
transport := &http.Transport{
    Proxy: http.ProxyURL(proxyURL),
}
client := &http.Client{
    Transport: transport,
    Timeout:   30 * time.Second,
}

For dynamic per-request proxies (rotation):

transport.Proxy = func(req *http.Request) (*url.URL, error) {
    return pickRandomProxy(), nil
}

Both HTTP and HTTPS-over-CONNECT proxies are supported by http.ProxyURL. SOCKS proxies need golang.org/x/net/proxy.

Timeouts at three levels

A robust scraper sets timeouts at multiple levels:

Per-client overall timeout (Client.Timeout).
Per-request context (context.WithTimeout), if you want different timeouts on different requests sharing one client.
Per-stage transport timeouts (Transport.DialContext, Transport.ResponseHeaderTimeout, etc.) for fine control.

A reasonable default:

transport := &http.Transport{
    DialContext: (&net.Dialer{
        Timeout: 5 * time.Second,
    }).DialContext,
    TLSHandshakeTimeout:   5 * time.Second,
    ResponseHeaderTimeout: 10 * time.Second,
    IdleConnTimeout:       30 * time.Second,
    MaxIdleConns:          100,
    MaxIdleConnsPerHost:   10,
}

client := &http.Client{
    Transport: transport,
    Timeout:   30 * time.Second,
}

Tune to your workload. The defaults in http.DefaultClient and http.DefaultTransport are too generous for scrapers.

Connection reuse

Like every HTTP client, net/http keeps connections alive across requests by default. Use one Client (or one Transport) for the entire lifetime of your program, not a new one per request. Creating a new transport per request defeats keep-alive and triggers a fresh TCP + TLS handshake every time.

This is the same lesson as Foundations' "Connection: keep-alive": reuse, don't reconnect.

Following redirects

By default, Client.Do follows redirects up to 10. You can override:

client := &http.Client{
    CheckRedirect: func(req *http.Request, via []*http.Request) error {
        if len(via) >= 5 {
            return http.ErrUseLastResponse
        }
        return nil
    },
}

// Or to disable redirects entirely:
client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
    return http.ErrUseLastResponse
}

Disabling redirects is sometimes useful when scraping URL shorteners or when you want to capture every intermediate URL.

What this all looks like together

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "net/http/cookiejar"
    "time"
)

type APIItem struct {
    ID    string `json:"id"`
    Title string `json:"title"`
}

func main() {
    jar, _ := cookiejar.New(nil)
    client := &http.Client{
        Jar:     jar,
        Timeout: 30 * time.Second,
    }

    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    req, _ := http.NewRequestWithContext(ctx, "GET",
        "https://practice.scrapingcentral.com/api/items", nil)
    req.Header.Set("User-Agent", "Mozilla/5.0")
    req.Header.Set("Accept", "application/json")

    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("err:", err)
        return
    }
    defer resp.Body.Close()

    var items []APIItem
    if err := json.NewDecoder(resp.Body).Decode(&items); err != nil {
        fmt.Println("decode:", err)
        return
    }

    for _, it := range items {
        fmt.Println(it.ID, it.Title)
    }
}

50 lines. Cookies persisted. Timeouts at two levels. Real UA. JSON decoded into a typed struct. This is what production Go scrapers look like, with tls-client swapped in (GO5) when fingerprinting matters.

Where to practice

Convert your favourite Python requests-based scraper to Go. Don't over-optimise; just match it function-for-function.
Add a cookiejar.Jar to the client and verify cookies persist across requests against practice.scrapingcentral.com.
Read Go by Example: HTTP Client and Go by Example: JSON. Both are short, both are exactly the patterns above.

Next: GO5 covers the scraping-specific reason this sub-path exists, TLS fingerprinting with utls and tls-client.

HTTP in Go: net/http Without a Framework

What you’ll learn