Python, pip, venv, uv, Modern Toolchain
Install Python correctly, isolate every project, and meet the tooling that actually makes Python pleasant in 2026.
What you’ll learn
- Install a recent Python (3.11+) without breaking your system Python.
- Create a virtual environment per project and understand why.
- Install dependencies with pip, and know when uv is the better choice.
- Pin versions in a requirements file so your scraper is reproducible.
The Python landscape is messy. There are multiple installers, two package managers, a half-dozen ways to make a virtual environment, and contradictory advice everywhere. Here's the short version that works.
Install a recent Python
Don't use your operating system's bundled Python. macOS ships an old one for backward compatibility; Linux distros ship one for the OS itself; Windows often ships none. Install a fresh one alongside.
| Platform | Recommended installer | Why |
|---|---|---|
| macOS | Homebrew: brew install python@3.12 |
Easy upgrades, no admin password |
| Linux | Distro package (e.g. apt install python3.12) or pyenv |
OS package works; pyenv lets you have multiple versions |
| Windows | python.org installer, check "Add Python to PATH" | Cleanest; or use Microsoft Store version |
Verify:
python3 --version
# Python 3.12.x
For scraping you want 3.10 at minimum (for pattern matching) and ideally 3.11+ (for performance, Python 3.11 is ~25% faster than 3.10).
Why virtual environments
If you pip install requests globally, every Python project on your machine shares that exact version of requests. The moment two projects need different versions, you're stuck.
A virtual environment is a private directory with its own python and pip that installs into a per-project library folder. One project, one venv, no version collisions.
Create one with venv (built-in)
cd /path/to/my-scraper-project
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
pip install requests beautifulsoup4
.venv (the dot-prefixed name) is the modern convention, it's git-ignored by default in most templates.
After activate, your terminal prompt usually shows (.venv) and python/pip point at the venv binaries. To leave: deactivate.
Always git-ignore the venv
# .gitignore
.venv/
__pycache__/
*.pyc
Never commit a venv. They contain absolute paths and OS-specific binaries, they can't be shared. What you commit is the list of packages (next section).
pip and requirements files
pip install is the standard installer. Pin what you've installed:
pip install requests beautifulsoup4 lxml
pip freeze > requirements.txt
requirements.txt now contains exact versions:
beautifulsoup4==4.12.3
certifi==2024.2.2
charset-normalizer==3.3.2
idna==3.6
lxml==5.1.0
requests==2.31.0
soupsieve==2.5
urllib3==2.2.0
On another machine (or in CI):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Same versions, reproducible install. The pinning matters, requests 2.31 and 2.32 are not guaranteed to behave identically.
uv, the modern alternative
uv (from Astral, makers of ruff) is a drop-in replacement for pip + venv that's 10–100× faster. In late-2025 it's mature enough to recommend for new projects.
# Install uv (one-time)
curl -LsSf https://astral.sh/uv/install.sh | sh
# or: pip install uv
# Create + activate a venv
uv venv
# Install dependencies (no need to activate first)
uv pip install requests beautifulsoup4 lxml
# Lock to a file
uv pip freeze > requirements.txt
uv is fully compatible with pip's command flags, you can rename pip to uv pip mechanically. The speed difference matters most when you have a dozen scrapers each rebuilding their venv in CI.
A typical project layout
my-scraper/
├── .venv/ ← gitignored
├── .gitignore
├── README.md
├── requirements.txt ← pinned dependencies
├── pyproject.toml ← (optional) project metadata
├── src/
│ └── my_scraper/
│ ├── __init__.py
│ ├── client.py ← HTTP session, retries
│ ├── parsers.py ← BeautifulSoup / lxml selectors
│ └── store.py ← write to CSV / SQLite
├── scripts/
│ └── crawl.py ← entry point
└── tests/
└── test_parsers.py
This isn't mandatory, small one-file scrapers don't need this, but as a project grows past one file, this is the shape most professional scraping projects converge on.
pyproject.toml (when ready)
Beyond requirements.txt, pyproject.toml is the modern packaging manifest. Useful when:
- You want to install your scraper as a CLI command (
pip install -e .) - You're publishing to PyPI
- You're using tools that respect pyproject (ruff, black, mypy, pytest)
Minimal version:
[project]
name = "my-scraper"
version = "0.1.0"
dependencies = [
"requests>=2.31",
"beautifulsoup4>=4.12",
"lxml>=5.0",
]
For now, requirements.txt is enough. Move to pyproject.toml when the project warrants it.
Common gotchas
-
pythonvspython3. On most macOS/Linux installs,pythonis Python 2 (still!) or doesn't exist. Always usepython3andpip3unless you've explicitly set up an alias. -
System-wide pip install. If you ever see
error: externally-managed-environmenton a fresh Python install, that's the OS protecting itself. Use a venv. Neversudo pip install. -
Venv doesn't auto-activate. Each new terminal needs
source .venv/bin/activate(or usedirenv/miseto auto-activate per directory). -
pip freezevspip list.freezeoutputs in requirements.txt format.listis human-readable. Alwaysfreezefor the file you commit.
Hands-on lab
Create a directory, set up a venv, install requests and beautifulsoup4, save a requirements.txt. Then write a one-liner:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://practice.scrapingcentral.com/")
soup = BeautifulSoup(r.text, "html.parser")
print(soup.title.string)
Run it inside your venv. You should see the page title printed. You now have a working Python scraping environment.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.