Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.72intermediate5 min read

Maintaining Your Own Composer / PyPI Package

Publishing and maintaining a small scraping utility on PyPI or Packagist is one of the highest-leverage career moves a scraping engineer can make. The how, and the responsibilities.

What you’ll learn

  • Publish a Python package to PyPI.
  • Publish a PHP package to Packagist.
  • Set up versioning, CHANGELOG, and a sustainable maintenance cadence.

Contributing to other people's libraries is good for your career. Publishing your own, even small, package is a different signal: that you can design, document, version, and support code others depend on. Most scraping engineers never do this, which makes the ones who do stand out.

Pick a small useful thing

Don't start with "the next Scrapy." Start with a utility you'd reuse across your own projects:

  • A wrapper around requests that adds Retry-After parsing.
  • A dataclasses_json-style decoder optimised for HTML-table data.
  • A Symfony bundle that ships your preferred Messenger config defaults.
  • A small Roach extension for a specific anti-bot pattern.
  • A parser for a particular SERP response format.

If it scratches your own itch, it'll scratch someone else's. 200 stars on a focused utility is a much stronger signal than 0 stars on an ambitious framework.

Python: publish to PyPI

Project layout:

my-scraper-utils/
├── pyproject.toml
├── README.md
├── LICENSE
├── CHANGELOG.md
├── src/my_scraper_utils/
│  ├── __init__.py
│  └── retry_after.py
└── tests/
  └── test_retry_after.py

pyproject.toml:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "my-scraper-utils"
version = "0.1.0"
description = "Small utilities for production HTTP scrapers."
authors = [{name = "Your Name", email = "you@example.com"}]
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10"
dependencies = ["httpx>=0.27"]

[project.urls]
Homepage = "https://github.com/you/my-scraper-utils"
Issues = "https://github.com/you/my-scraper-utils/issues"

Publish:

pip install build twine
python -m build
twine upload dist/*  # uses ~/.pypirc or env PYPI_API_TOKEN

That's it. Within 5 minutes anyone can pip install my-scraper-utils.

PHP: publish to Packagist

my-scraper-utils/
├── composer.json
├── README.md
├── LICENSE
├── CHANGELOG.md
├── src/RetryAfter.php
└── tests/RetryAfterTest.php

composer.json:

{
  "name": "yourname/scraper-utils",
  "description": "Small utilities for production HTTP scrapers.",
  "type": "library",
  "license": "MIT",
  "require": {
  "php": "^8.2",
  "symfony/http-client": "^7.0"
  },
  "require-dev": {
  "phpunit/phpunit": "^10.0"
  },
  "autoload": {
  "psr-4": {"YourName\\ScraperUtils\\": "src/"}
  }
}

Publish:

  1. Push to GitHub.
  2. Sign in to packagist.org, submit the repo URL.
  3. Tag a release: git tag v0.1.0 && git push --tags.
  4. Set up a GitHub webhook (Packagist tells you how) so future tags auto-update.

composer require yourname/scraper-utils now works globally.

Semantic versioning

Major.Minor.Patch:

  • Patch (0.1.1): bug fix, no behaviour change.
  • Minor (0.2.0): new feature, backward compatible.
  • Major (1.0.0): breaking change. Or signal the package is "stable" by going to 1.x.

Stick to this religiously. Users base their dependency constraints on it. A "minor" release that breaks existing users will produce angry issues fast.

Version 0.x is "anything goes" for breaking changes, make that explicit in the README. Many projects stay 0.x for years until APIs solidify.

CHANGELOG.md

# Changelog

## [Unreleased]

## [0.2.0] - 2026-04-15
### Added
- `parse_retry_after()` now supports HTTP-date format.

### Fixed
- Off-by-one in delay calculation when header is "0".

## [0.1.0] - 2026-01-10
### Added
- Initial release with `parse_retry_after()`.

Keep a Changelog is the convention. Update on every release. Saves you and your users hours over the package's lifetime.

Documentation

The minimum:

  • README: what it does, install instructions, a 10-line code example, link to full docs.
  • Code-level docstrings / phpdoc: every public function.
  • A "Why this exists" paragraph: helps users decide quickly.

For larger packages, MkDocs (Python) or DocFX / Symfony docs theme (PHP) build a hosted site. GitHub Pages is free.

CI

A package without CI is a package waiting to break.

# .github/workflows/test.yml
name: test
on: [push, pull_request]
jobs:
  test:
  runs-on: ubuntu-latest
  strategy:
  matrix:
  python: ["3.10", "3.11", "3.12"]
  steps:
  - uses: actions/checkout@v4
  - uses: actions/setup-python@v5
  with: {python-version: "${{ matrix.python }}"}
  - run: pip install -e ".[dev]"
  - run: pytest

PHP version with multiple PHP versions:

strategy:
  matrix:
  php: ["8.2", "8.3", "8.4"]

Run on every commit; show passing badges in the README.

The maintenance bargain

When you publish, you implicitly promise:

  • Respond to issues within a reasonable timeframe (1–2 weeks for an acknowledgment is fine).
  • Patch critical bugs.
  • Be clear about scope and what you'll accept.

You do not promise:

  • 24/7 support.
  • To implement every feature request.
  • To never deprecate things.

Be explicit in the README: "Maintained on a volunteer basis. Issues triaged weekends." Sets expectations.

If you can't maintain anymore, say so. Mark archive. Find a successor. Don't ghost a package thousands of people depend on.

Career outcomes

Maintaining your own package signals to employers:

  • You can design APIs.
  • You can document.
  • You take long-term responsibility for code.
  • You can interact with users.

Concretely: a Python developer with one popular niche package is roughly equivalent to one without it but with 1–2 years more experience, in interview ranking.

Small package examples (look at these)

  • python-dateutil
  • tenacity (retrying)
  • python-slugify
  • parse (inverse of format)
  • symfony/string (PHP)
  • webmozart/assert (PHP)

Each is tightly scoped, well-documented, and used in millions of projects.

What to try

This month:

  1. Identify one utility you've copied between 3+ scraping projects.
  2. Extract it into a standalone repo with the layout above.
  3. Write the README, CHANGELOG, and tests.
  4. Publish to PyPI or Packagist.
  5. Tweet/post about it once. See if anyone uses it.

Even if no one stars the repo, you now have a published package on your profile. The act of publishing teaches you packaging, versioning, and CI in ways nothing else does.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Maintaining Your Own Composer / PyPI Package1 / 8

What's the best first package to publish if you're new to maintaining open source?

Score so far: 0 / 0