Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

GO4intermediate6 min read

HTTP in Go: net/http Without a Framework

The standard library is the framework. Build, send, customise HTTP requests in Go with net/http, headers, cookies, timeouts, proxies, and TLS settings.

What you’ll learn

  • Construct GET, POST, and custom-header requests using net/http.
  • Configure a Client with timeouts, transports, and proxy URLs.
  • Persist cookies across requests with `cookiejar.Jar`.
  • Read and parse JSON and HTML response bodies.

Go's standard library is the framework. net/http is mature, fast, and used by every Go scraping tool on the planet (utls and tls-client included). No third-party HTTP client is needed for 95% of work; the niche where you reach for tls-client is specifically TLS fingerprinting, covered in GO5.

The quickest possible GET

import (
    "fmt"
    "io"
    "net/http"
)

func main() {
    resp, err := http.Get("https://practice.scrapingcentral.com/")
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)
    fmt.Println(resp.StatusCode, string(body[:200]))
}

Three things to internalise:

  • Always defer resp.Body.Close() right after the error check. Skipping this leaks connections and file descriptors. It's the Go equivalent of forgetting with in Python.
  • io.ReadAll is how you consume the whole body. For huge bodies, stream with io.Copy instead.
  • http.Get uses http.DefaultClient under the hood. For real scrapers you'll build your own client (see below).

Why you should always build your own Client

http.DefaultClient has no timeout. If a server hangs, your scraper hangs forever. Build a Client with sensible defaults:

import (
    "net/http"
    "time"
)

client := &http.Client{
    Timeout: 30 * time.Second,
}

resp, err := client.Get(url)

The Timeout covers the entire request: connection + TLS + headers + body read. If you want finer control, use Transport (below) and context.Context for per-request deadlines.

Adding headers

http.Get doesn't let you set headers. Build a Request and call client.Do(req):

req, err := http.NewRequest("GET", url, nil)
if err != nil {
    return err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (compatible; my-scraper/1.0)")
req.Header.Set("Accept", "text/html,application/xhtml+xml")
req.Header.Set("Accept-Language", "en-US,en;q=0.9")

resp, err := client.Do(req)

This is the form you'll use most often. Note that the default User-Agent Go sets is Go-http-client/1.1, instantly identifiable as a bot. Set a real browser UA.

POST with a body

For form POSTs:

import (
    "net/url"
    "strings"
)

form := url.Values{}
form.Set("username", "scraper")
form.Set("password", "letmein")

req, _ := http.NewRequest("POST", loginURL, strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")

resp, _ := client.Do(req)

For JSON POSTs:

import (
    "bytes"
    "encoding/json"
)

payload, _ := json.Marshal(map[string]string{
    "query": "laptops",
    "page":  "2",
})

req, _ := http.NewRequest("POST", apiURL, bytes.NewReader(payload))
req.Header.Set("Content-Type", "application/json")

resp, _ := client.Do(req)

bytes.NewReader and strings.NewReader both implement io.Reader, which is what http.NewRequest wants for the body.

Parsing JSON responses

type SearchResult struct {
    Total int `json:"total"`
    Items []struct {
        ID    string `json:"id"`
        Title string `json:"title"`
    } `json:"items"`
}

var result SearchResult
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
    return err
}
fmt.Println(result.Total, result.Items[0].Title)

The json:"total" struct tags map JSON field names to Go field names. Go field names must be capitalised (otherwise encoding/json can't see them).

For deeply nested or unknown JSON shapes, use map[string]any:

var data map[string]any
json.NewDecoder(resp.Body).Decode(&data)
// data["items"].([]any)[0].(map[string]any)["title"].(string)

It works but it's painful. Define a struct when you can.

Parsing HTML

Go's stdlib doesn't include an HTML parser. Use golang.org/x/net/html (semi-official) or the more ergonomic github.com/PuerkitoBio/goquery (jQuery-style selectors).

import "github.com/PuerkitoBio/goquery"

doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
    return err
}
doc.Find(".product").Each(func(i int, s *goquery.Selection) {
    title := s.Find("h2").Text()
    price, _ := s.Find(".price").Attr("data-price")
    fmt.Println(title, price)
})

goquery's API is BeautifulSoup-shaped. If you can write BS4, you can write goquery.

Cookies, persistence

By default, http.Client does not persist cookies. Each request is stateless. To survive logins and session-based sites, attach a cookie jar:

import "net/http/cookiejar"

jar, _ := cookiejar.New(nil)
client := &http.Client{
    Jar:     jar,
    Timeout: 30 * time.Second,
}

// Now any Set-Cookie on a response is stored, and any subsequent request
// to the same host carries those cookies.

If you want to inspect or set cookies manually:

for _, c := range jar.Cookies(parsedURL) {
    fmt.Println(c.Name, c.Value)
}

Proxies

You configure proxies on the Transport, not the Client:

import (
    "net/url"
    "net/http"
)

proxyURL, _ := url.Parse("http://user:pass@proxy.example.com:8080")
transport := &http.Transport{
    Proxy: http.ProxyURL(proxyURL),
}
client := &http.Client{
    Transport: transport,
    Timeout:   30 * time.Second,
}

For dynamic per-request proxies (rotation):

transport.Proxy = func(req *http.Request) (*url.URL, error) {
    return pickRandomProxy(), nil
}

Both HTTP and HTTPS-over-CONNECT proxies are supported by http.ProxyURL. SOCKS proxies need golang.org/x/net/proxy.

Timeouts at three levels

A robust scraper sets timeouts at multiple levels:

  1. Per-client overall timeout (Client.Timeout).
  2. Per-request context (context.WithTimeout), if you want different timeouts on different requests sharing one client.
  3. Per-stage transport timeouts (Transport.DialContext, Transport.ResponseHeaderTimeout, etc.) for fine control.

A reasonable default:

transport := &http.Transport{
    DialContext: (&net.Dialer{
        Timeout: 5 * time.Second,
    }).DialContext,
    TLSHandshakeTimeout:   5 * time.Second,
    ResponseHeaderTimeout: 10 * time.Second,
    IdleConnTimeout:       30 * time.Second,
    MaxIdleConns:          100,
    MaxIdleConnsPerHost:   10,
}

client := &http.Client{
    Transport: transport,
    Timeout:   30 * time.Second,
}

Tune to your workload. The defaults in http.DefaultClient and http.DefaultTransport are too generous for scrapers.

Connection reuse

Like every HTTP client, net/http keeps connections alive across requests by default. Use one Client (or one Transport) for the entire lifetime of your program, not a new one per request. Creating a new transport per request defeats keep-alive and triggers a fresh TCP + TLS handshake every time.

This is the same lesson as Foundations' "Connection: keep-alive": reuse, don't reconnect.

Following redirects

By default, Client.Do follows redirects up to 10. You can override:

client := &http.Client{
    CheckRedirect: func(req *http.Request, via []*http.Request) error {
        if len(via) >= 5 {
            return http.ErrUseLastResponse
        }
        return nil
    },
}

// Or to disable redirects entirely:
client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
    return http.ErrUseLastResponse
}

Disabling redirects is sometimes useful when scraping URL shorteners or when you want to capture every intermediate URL.

What this all looks like together

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "net/http/cookiejar"
    "time"
)

type APIItem struct {
    ID    string `json:"id"`
    Title string `json:"title"`
}

func main() {
    jar, _ := cookiejar.New(nil)
    client := &http.Client{
        Jar:     jar,
        Timeout: 30 * time.Second,
    }

    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    req, _ := http.NewRequestWithContext(ctx, "GET",
        "https://practice.scrapingcentral.com/api/items", nil)
    req.Header.Set("User-Agent", "Mozilla/5.0")
    req.Header.Set("Accept", "application/json")

    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("err:", err)
        return
    }
    defer resp.Body.Close()

    var items []APIItem
    if err := json.NewDecoder(resp.Body).Decode(&items); err != nil {
        fmt.Println("decode:", err)
        return
    }

    for _, it := range items {
        fmt.Println(it.ID, it.Title)
    }
}

50 lines. Cookies persisted. Timeouts at two levels. Real UA. JSON decoded into a typed struct. This is what production Go scrapers look like, with tls-client swapped in (GO5) when fingerprinting matters.

Where to practice

  • Convert your favourite Python requests-based scraper to Go. Don't over-optimise; just match it function-for-function.
  • Add a cookiejar.Jar to the client and verify cookies persist across requests against practice.scrapingcentral.com.
  • Read Go by Example: HTTP Client and Go by Example: JSON. Both are short, both are exactly the patterns above.

Next: GO5 covers the scraping-specific reason this sub-path exists, TLS fingerprinting with utls and tls-client.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

HTTP in Go: net/http Without a Framework1 / 6

Why does the lesson say you should always build your own `*http.Client` instead of using `http.DefaultClient`?

Score so far: 0 / 0