Quick Start¶
This walkthrough takes you from installation through a complete download-resume-retry cycle. It takes about 3 minutes.
Install¶
pyhaul has zero required dependencies. Pick an HTTP client extra that matches what you already use:
Download a file¶
The entire API surface fits in one function: haul() (or haul_async() for
async code). Pass a URL, your HTTP client, and a destination path:
import httpx
from pyhaul import haul
with httpx.Client() as client:
result = haul("https://example.com/big.zip", client, dest="big.zip")
print(f"done: sha256={result.sha256[:16]}…")
haul() returns a CompleteHaul on success, which carries the SHA-256 tree
hash, ETag, and content type.
What happens on interruption¶
If the download is interrupted — network drop, process kill, Ctrl-C — two
sidecar files remain on disk:
big.zip.part— the bytes downloaded so farbig.zip.part.ctrl— a binary checkpoint with the cursor position, ETag, and block-level hashes
The destination file (big.zip) does not exist at this point. There is no
state where a partially-written file sits at the final path.
Resume¶
To resume, call haul() again with the same destination. pyhaul reads the
checkpoint and sends Range for the tail of the object. When the checkpoint
holds a strong ETag, pyhaul also sends If-Range with that validator;
weak or missing validators omit If-Range (byte-range preconditioning only
applies to strong validators). It then appends from where it left off:
# Just call haul() again — it resumes automatically
result = haul("https://example.com/big.zip", client, dest="big.zip")
If the remote file changed between attempts, pyhaul detects the ETag mismatch and restarts from byte 0 — no silent corruption.
Add retry logic¶
One haul() = one HTTP request. When the stream ends early, pyhaul raises
PartialHaulError and saves progress. Wrap it in a retry loop:
import time
from pyhaul import haul, PartialHaulError, HaulState
state = HaulState()
with httpx.Client() as client:
for attempt in range(1, 11):
try:
result = haul(
"https://example.com/big.zip",
client,
dest="big.zip",
state=state,
)
print(f"done: {state.valid_length:,} bytes")
break
except PartialHaulError as exc:
print(f"attempt {attempt}: {exc.reason} "
f"({state.valid_length:,} bytes so far)")
time.sleep(min(2**attempt, 30))
HaulState is an optional mutable bag updated in-place throughout the
download — useful for progress reporting or deciding whether to retry.
Track progress¶
Pass on_progress to get called after each chunk lands on disk:
def show_progress(state: HaulState) -> None:
if state.reported_length:
pct = state.valid_length / state.reported_length * 100
print(f"\r{pct:.1f}%", end="", flush=True)
result = haul(url, client, dest="big.zip", state=state, on_progress=show_progress)
Use the CLI¶
pyhaul also works as a command-line tool for quick smoke tests:
The CLI handles retries automatically (up to 20 attempts with exponential backoff). See CLI Reference for all options. Note that the CLI is not a stable interface — for scripting and automation, use the Python API directly.
Next steps¶
- Bulk Downloads — parallel downloads, interruption handling, and destination file safety
- HTTP Client Adapters — integrate with your existing auth, session pooling, and proxy configuration
- Retry Patterns — advanced retry strategies with tenacity and backoff
- Async Usage —
haul_async()with TaskGroup and semaphore-based concurrency