Skip to content

API Reference

Download functions

pyhaul.engine.haul(url: str, client: TransportSession | object, *, dest: str | Path, headers: Mapping[str, str] | None = None, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: Callable[[HaulState], None] | None = None) -> CompleteHaul

Download a single byte range to dest, resumably.

client is your HTTP session — requests.Session, httpx.Client, niquests.Session, or urllib3.PoolManager.

url is validated on entry; invalid schemes or missing hosts raise :class:ValueError.

headers, when provided, are merged with pyhaul's structural requirements.

state, when provided, is a :class:HaulState updated in-place throughout the download — always accurate regardless of how the function exits.

on_progress, if set, is called after each chunk is written with the current state (synchronous; keep it fast — e.g. progress UI, metrics).

Returns :class:CompleteHaul on success. Raises :class:PartialHaulError when the stream ends before all bytes arrive (the .part and .part.ctrl files remain on disk for the next call to resume from). Transport errors from the underlying HTTP library propagate unwrapped.

pyhaul.async_engine.haul_async(url: str, client: AsyncTransportSession | object, *, dest: str | Path, headers: Mapping[str, str] | None = None, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: AsyncProgressCallback | None = None) -> CompleteHaul async

Async equivalent of :func:pyhaul.engine.haul.

client is your async HTTP session — httpx.AsyncClient, niquests.AsyncSession, or aiohttp.ClientSession.

url is validated on entry; invalid schemes or missing hosts raise :class:ValueError.

headers, when provided, are merged with pyhaul's structural requirements.

state, when provided, is a :class:HaulState updated in-place throughout the download — always accurate regardless of how the function exits.

on_progress, if set, is called after each chunk. It may be an ordinary synchronous callable or return an awaitable (for example an async def); when the return value is awaitable, it is awaited before the next chunk is read. :func:pyhaul.engine.haul accepts only synchronous callbacks.

Returns :class:CompleteHaul on success. Raises :class:PartialHaulError when the stream ends before all bytes arrive (the .part and .part.ctrl files remain on disk for the next call to resume from). Transport errors from the underlying HTTP library propagate unwrapped.

Types

pyhaul._types.AsyncProgressCallback = Callable[[HaulState], None | Awaitable[None]] module-attribute

Progress hook for :func:~pyhaul.async_engine.haul_async.

May be an ordinary function (returns None) or return an awaitable (e.g. async def without awaiting it yourself); the engine awaits it when applicable so callers need not asyncio.create_task each chunk.

pyhaul._types.CompleteHaul(*, elapsed: float, sha256: str, etag: ETag, content_type: str) dataclass

Returned by haul() on success.

Carries completion metadata that only exists once the file is done. Progress counters (bytes_read, valid_length) live in :class:HaulState, not here.

pyhaul._types.HaulState(is_complete: bool = False, bytes_read: int = 0, valid_length: int = 0, reported_length: int | None = None, block_size: int = 8 * 1024 * 1024, hashes: list[bytes] = list[bytes]()) dataclass

Mutable progress bag updated in-place by haul() / haul_async().

Pass an instance as the optional state parameter; the engine updates it throughout the download. After the call — whether it returned, raised :class:PartialHaulError, or let a transport exception fly — the bag reflects the state at the point of exit.

valid_length is not monotonic across attempts: a server response that invalidates prior progress rewinds the cursor to zero.

reported_length is the total size of the resource as claimed by the server, derived from the Content-Range (complete-length) or Content-Length headers. It may be None if the server omits the total, and may not match the final on-disk size if the server misbehaves. (Useful for progress UIs.)

pyhaul._types.ServerMeta(*, etag: ETag = EMPTY_ETAG, total_length: int | None = None, last_modified: str = '', content_type: str = '') dataclass

Typed view of the response headers pyhaul cares about.

total_length is None when the server omits Content-Length or returns a non-integer value. Every string field is "" when the corresponding header is absent; is_file_changed treats missing last_modified on either side as "cannot prove unchanged".

pyhaul._types.Url = NewType('Url', str) module-attribute

An http(s) URL that has passed :func:parse_url.

pyhaul.etag.EntityTag(opaque_tag: str | None, is_weak: bool, is_wildcard: bool, raw_value: str) dataclass

Immutable HTTP entity-tag with explicit weak / wildcard discrimination.

opaque_tag is None means absent (no validator). Otherwise opaque_tag is the opaque string inside the RFC grammar (possibly "" for ETag: "").

Use :meth:parse_header_value for wire values (ETag response header). Use :meth:from_canonical for pyhaul .ctrl TLV UTF-8 payloads.

Instances compare equal by (opaque_tag, is_weak, is_wildcard) only.

usable_for_byte_range_precondition: bool property

Strong, non-wildcard tag suitable for If-Range / strict 206 equality.

__post_init__() -> None

Validate invariants for wildcard vs absent vs weak-strong tags.

parse_header_value(raw: str) -> EntityTag classmethod

Parse an ETag / If-Match style field-value fragment.

Accepts RFC quoted-string and liberal unquoted tokens. A strong empty quoted opaque ("") is valid grammar and preserved on the instance (including :attr:raw_value). W/"" yields :data:EMPTY_ETAG because weak validators require a non-empty opaque here. Malformed input yields :data:EMPTY_ETAG. Whitespace-only → :data:EMPTY_ETAG.

from_canonical(value: str) -> EntityTag classmethod

Rebuild from UTF-8 stored by :meth:to_canonical.

Expects quoted RFC entity-tag text for values written by current pyhaul. Leading / trailing ASCII OWS is stripped once (surrounding the whole blob). Legacy control files may hold bare tokens (no quotes); those still parse when they satisfy the same rules as :meth:parse_header_value.

__bool__() -> bool

True when this tag is present (wildcard or any opaque, including empty).

__eq__(other: object) -> bool

Equality by semantic triple; :attr:raw_value is ignored.

__hash__() -> int

Hash matches :meth:__eq__ (semantic triple only).

__repr__() -> str

Debug representation without :attr:raw_value (can be large).

__str__() -> str

Canonical HTTP entity-tag field-value (empty when absent).

strong_equals(other: object) -> bool

Strong comparison: both strong validators and opaque_tags byte-equal.

weak_equals(other: object) -> bool

Weak comparison: opaque_tags equal; weak/strong flags ignored (wildcards excluded).

to_http_field_value() -> str

RFC entity-tag for outbound headers (If-Range, If-Match, …).

to_canonical() -> str

RFC-shaped entity-tag text for .part.ctrl TLV UTF-8 (or "" when absent).

Uses the same quoting as :meth:to_http_field_value so the opaque is always framed by quoted-string rules (escaping \ and " inside the opaque).

pyhaul.etag.EMPTY_ETAG: Final[EntityTag] = EntityTag(opaque_tag=None, is_weak=False, is_wildcard=False, raw_value='') module-attribute

TransportHeaders

The immutable header type used for normalized responses and for merged outbound requests inside adapters. See TransportHeaders for constructors and methods.

Utility functions

pyhaul._types.parse_url(raw: str) -> Url

Validate raw as an http(s) URL and brand it as :data:Url.

Empty / whitespace-only input returns :data:EMPTY_URL ("no URL yet") — callers that forbid that case (e.g. the CLI) must check separately. Non-empty input must have an http or https scheme and a host, or :class:ValueError is raised.

pyhaul.etag.parse_etag(raw: object) -> EntityTag

Backward-compatible wrapper around :meth:EntityTag.parse_header_value.

Non-string raises :class:TypeError.

pyhaul.etag.format_entity_tag_for_http_header(etag: EntityTag) -> str

Serialize for merged request headers; absent tags yield "".

pyhaul.etag.is_weak_validator(etag: EntityTag) -> bool

True if etag is a weak validator (W/…).

Weak validators are not used for byte-range preconditioning in pyhaul.

pyhaul._types.HashBuilder(block_size: int, initial_hashes: list[bytes] | None = None)

Incremental SHA-256 block-level accumulator.

current_digest: bytes | None property

Return the digest of the current partial block, if any.

update(data: bytes) -> list[bytes]

Feed data. Returns any hashes completed during this update.

finalize() -> str

Finish the current block (if any) and return the tree hash.

hash_file(path: str | Path, block_size: int = 8 * 1024 * 1024) -> str staticmethod

Compute the tree hash of a complete file on disk.

Adapter registration

pyhaul._session_dispatch.register_sync_adapter(factory: SyncAdapterFactory) -> None

Append a sync adapter factory.

factory is called with a raw client object and must return a :class:~pyhaul.transport.protocols.TransportSession or None. Factories are tried in registration order; the first non-None result wins.

pyhaul._session_dispatch.register_async_adapter(factory: AsyncAdapterFactory) -> None

Append an async adapter factory.

factory is called with a raw client object and must return an :class:~pyhaul.transport.protocols.AsyncTransportSession or None.

Transport protocols

pyhaul.transport.protocols.TransportSession

Bases: Protocol

Sync transport: prepare_headers, stream_get, and stream_head.

prepare_headers(headers: TransportHeaders) -> TransportHeaders

Optionally mutate headers before they are sent.

The engine calls this after merging user headers and pyhaul structural requirements. Adapters can use this to suppress headers (e.g. for conformance testing) or add session-specific metadata.

.. warning:: Removing structural headers like Range or If-Range may break pyhaul's resume logic.

stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractContextManager[TransportResponse]

Context manager yielding a response whose body must be streamed.

stream_head(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractContextManager[TransportResponse]

Context manager yielding a HEAD response (entity body must not be read).

pyhaul.transport.protocols.TransportResponse

Bases: Protocol

Read-only HTTP response: status, normalized headers, raw body iterator.

status_code: int property

HTTP status line code (e.g. 200, 206, 416).

headers: TransportHeaders property

Normalized response headers.

raise_for_status() -> None

Raise an error if the response status is a client or server error.

iter_raw_bytes(*, chunk_size: int) -> Iterator[bytes]

Yield raw entity-body chunks (post-TE, pre-CE).

chunk_size is a hint; the last chunk may be shorter.

pyhaul.transport.protocols.AsyncTransportSession

Bases: Protocol

Async transport: prepare_headers, stream_get, and stream_head.

prepare_headers(headers: TransportHeaders) -> TransportHeaders

Optionally mutate headers before they are sent (see sync version).

stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractAsyncContextManager[AsyncTransportResponse]

Async context manager yielding a response whose body must be streamed.

stream_head(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractAsyncContextManager[AsyncTransportResponse]

Async context manager yielding a HEAD response.

pyhaul.transport.protocols.AsyncTransportResponse

Bases: Protocol

Async equivalent of :class:TransportResponse.

status_code: int property

HTTP status code of the response.

headers: TransportHeaders property

Normalized response headers.

raise_for_status() -> None

Raise an error if the response status is a client or server error.

aiter_raw_bytes(*, chunk_size: int) -> AsyncIterator[bytes]

Async version of :meth:TransportResponse.iter_raw_bytes.

Transport session proxy

Layer header policy around an existing adapter without subclassing each backend:

pyhaul.transport.proxy_transport_session.transport_session_proxy() -> TransportSessionProxyPlanning

Start a fluent chain for wrapping a synchronous :class:TransportSession.

pyhaul.transport.proxy_transport_session.async_transport_session_proxy() -> AsyncTransportSessionProxyPlanning

Start a fluent chain for wrapping an :class:AsyncTransportSession.