Skip to content

Quick Start

This walkthrough takes you from installation through a complete download-resume-retry cycle. It takes about 3 minutes.

Install

pyhaul has zero required dependencies. Pick an HTTP client extra that matches what you already use:

pip install pyhaul[httpx]
pip install pyhaul[requests]
pip install pyhaul[niquests]
pip install pyhaul[aiohttp]
pip install pyhaul[urllib3]

Download a file

The entire API surface fits in one function: haul() (or haul_async() for async code). Pass a URL, your HTTP client, and a destination path:

import httpx
from pyhaul import haul

with httpx.Client() as client:
    result = haul("https://example.com/big.zip", client, dest="big.zip")
    print(f"done: sha256={result.sha256[:16]}…")

haul() returns a CompleteHaul on success, which carries the SHA-256 tree hash, ETag, and content type.

What happens on interruption

If the download is interrupted — network drop, process kill, Ctrl-C — two sidecar files remain on disk:

  • big.zip.part — the bytes downloaded so far
  • big.zip.part.ctrl — a binary checkpoint with the cursor position, ETag, and block-level hashes

The destination file (big.zip) does not exist at this point. There is no state where a partially-written file sits at the final path.

Resume

To resume, call haul() again with the same destination. pyhaul reads the checkpoint and sends Range for the tail of the object. When the checkpoint holds a strong ETag, pyhaul also sends If-Range with that validator; weak or missing validators omit If-Range (byte-range preconditioning only applies to strong validators). It then appends from where it left off:

# Just call haul() again — it resumes automatically
result = haul("https://example.com/big.zip", client, dest="big.zip")

If the remote file changed between attempts, pyhaul detects the ETag mismatch and restarts from byte 0 — no silent corruption.

Add retry logic

One haul() = one HTTP request. When the stream ends early, pyhaul raises PartialHaulError and saves progress. Wrap it in a retry loop:

import time
from pyhaul import haul, PartialHaulError, HaulState

state = HaulState()

with httpx.Client() as client:
    for attempt in range(1, 11):
        try:
            result = haul(
                "https://example.com/big.zip",
                client,
                dest="big.zip",
                state=state,
            )
            print(f"done: {state.valid_length:,} bytes")
            break
        except PartialHaulError as exc:
            print(f"attempt {attempt}: {exc.reason} "
                  f"({state.valid_length:,} bytes so far)")
            time.sleep(min(2**attempt, 30))

HaulState is an optional mutable bag updated in-place throughout the download — useful for progress reporting or deciding whether to retry.

Track progress

Pass on_progress to get called after each chunk lands on disk:

def show_progress(state: HaulState) -> None:
    if state.reported_length:
        pct = state.valid_length / state.reported_length * 100
        print(f"\r{pct:.1f}%", end="", flush=True)

result = haul(url, client, dest="big.zip", state=state, on_progress=show_progress)

Use the CLI

pyhaul also works as a command-line tool for quick smoke tests:

python -m pyhaul -o big.zip https://example.com/big.zip

The CLI handles retries automatically (up to 20 attempts with exponential backoff). See CLI Reference for all options. Note that the CLI is not a stable interface — for scripting and automation, use the Python API directly.

Next steps

  • Bulk Downloads — parallel downloads, interruption handling, and destination file safety
  • HTTP Client Adapters — integrate with your existing auth, session pooling, and proxy configuration
  • Retry Patterns — advanced retry strategies with tenacity and backoff
  • Async Usagehaul_async() with TaskGroup and semaphore-based concurrency