Skip to content

API Reference

Download functions

pyhaul.engine.haul(url: str, client: TransportSession | object, *, dest: str | Path, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: Callable[[HaulState], None] | None = None) -> CompleteHaul

Download a single byte range to dest, resumably.

client is your HTTP session — requests.Session, httpx.Client, niquests.Session, or urllib3.PoolManager.

url is validated on entry; invalid schemes or missing hosts raise :class:ValueError.

state, when provided, is a :class:HaulState updated in-place throughout the download — always accurate regardless of how the function exits.

on_progress, if set, is called after each chunk is written with the current state (synchronous; keep it fast — e.g. progress UI, metrics).

Returns :class:CompleteHaul on success. Raises :class:PartialHaulError when the stream ends before all bytes arrive (the .part and .part.ctrl files remain on disk for the next call to resume from). Transport errors from the underlying HTTP library propagate unwrapped.

pyhaul.async_engine.haul_async(url: str, client: AsyncTransportSession | object, *, dest: str | Path, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: Callable[[HaulState], None] | None = None) -> CompleteHaul async

Async equivalent of :func:pyhaul.engine.haul.

client is your async HTTP session — httpx.AsyncClient, niquests.AsyncSession, or aiohttp.ClientSession.

url is validated on entry; invalid schemes or missing hosts raise :class:ValueError.

state, when provided, is a :class:HaulState updated in-place throughout the download — always accurate regardless of how the function exits.

on_progress, if set, is called after each chunk (synchronous callback; same contract as :func:pyhaul.engine.haul).

Returns :class:CompleteHaul on success. Raises :class:PartialHaulError when the stream ends before all bytes arrive (the .part and .part.ctrl files remain on disk for the next call to resume from). Transport errors from the underlying HTTP library propagate unwrapped.

Types

pyhaul._types.CompleteHaul(*, elapsed: float, sha256: str, etag: ETag, content_type: str) dataclass

Returned by haul() on success.

Carries completion metadata that only exists once the file is done. Progress counters (bytes_read, valid_length) live in :class:HaulState, not here.

pyhaul._types.HaulState(is_complete: bool = False, bytes_read: int = 0, valid_length: int = 0, reported_length: int | None = None, block_size: int = 8 * 1024 * 1024, hashes: list[bytes] = list[bytes]()) dataclass

Mutable progress bag updated in-place by haul() / haul_async().

Pass an instance as the optional state parameter; the engine updates it throughout the download. After the call — whether it returned, raised :class:PartialHaulError, or let a transport exception fly — the bag reflects the state at the point of exit.

valid_length is not monotonic across attempts: a server response that invalidates prior progress rewinds the cursor to zero.

reported_length is the total size of the resource as claimed by the server, derived from the Content-Range (complete-length) or Content-Length headers. It may be None if the server omits the total, and may not match the final on-disk size if the server misbehaves. (Useful for progress UIs.)

pyhaul._types.ServerMeta(*, etag: ETag = EMPTY_ETAG, total_length: int | None = None, last_modified: str = '', content_type: str = '') dataclass

Typed view of the response headers pyhaul cares about.

total_length is None when the server omits Content-Length or returns a non-integer value. Every string field is "" when the corresponding header is absent; is_file_changed treats missing last_modified on either side as "cannot prove unchanged".

pyhaul._types.Url = NewType('Url', str) module-attribute

An http(s) URL that has passed :func:parse_url.

pyhaul._types.ETag = NewType('ETag', str) module-attribute

An HTTP ETag value (possibly empty) that has passed :func:parse_etag.

Stored verbatim as the server sent it — quotes and optional W/ prefix included — so that it can be echoed back in If-Range / If-Match byte-for-byte. An empty string means "no ETag for this resource".

pyhaul.transport._headers.TransportHeaders(items: Pairs = ())

Bases: Mapping[str, str]

Immutable, case-insensitive HTTP headers with multi-value support.

Field names are matched case-insensitively per HTTP semantics. Duplicate names are preserved in wire order.

Construct via build, from_pairs, or from_mapping.

Utility functions

pyhaul._types.parse_url(raw: str) -> Url

Validate raw as an http(s) URL and brand it as :data:Url.

Empty / whitespace-only input returns :data:EMPTY_URL ("no URL yet") — callers that forbid that case (e.g. the CLI) must check separately. Non-empty input must have an http or https scheme and a host, or :class:ValueError is raised.

pyhaul._types.parse_etag(raw: object) -> ETag

Normalize a raw ETag header value into an :data:ETag.

Strips surrounding whitespace. Empty input becomes ETag("") to represent "no ETag". Values that clearly aren't well-formed entity-tags (per RFC 7232: optional W/ then a quoted string) are also treated as absent, since echoing them back in If-Range / If-Match would be unsafe. Non-string input raises :class:TypeError.

pyhaul._types.HashBuilder(block_size: int, initial_hashes: list[bytes] | None = None)

Incremental SHA-256 block-level accumulator.

current_digest: bytes | None property

Return the digest of the current partial block, if any.

update(data: bytes) -> list[bytes]

Feed data. Returns any hashes completed during this update.

finalize() -> str

Finish the current block (if any) and return the tree hash.

hash_file(path: str | Path, block_size: int = 8 * 1024 * 1024) -> str staticmethod

Compute the tree hash of a complete file on disk.

Adapter registration

pyhaul._session_dispatch.register_sync_adapter(factory: SyncAdapterFactory) -> None

Append a sync adapter factory.

factory is called with a raw client object and must return a :class:~pyhaul.transport.protocols.TransportSession or None. Factories are tried in registration order; the first non-None result wins.

pyhaul._session_dispatch.register_async_adapter(factory: AsyncAdapterFactory) -> None

Append an async adapter factory.

factory is called with a raw client object and must return an :class:~pyhaul.transport.protocols.AsyncTransportSession or None.

Transport protocols

pyhaul.transport.protocols.TransportSession

Bases: Protocol

Sync session: one method (stream_get), no lifecycle, no mutation.

stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractContextManager[TransportResponse]

Context manager yielding a response whose body must be streamed.

pyhaul.transport.protocols.TransportResponse

Bases: Protocol

Read-only HTTP response: status, normalized headers, raw body iterator.

status_code: int property

HTTP status line code (e.g. 200, 206, 416).

headers: TransportHeaders property

Normalized response headers.

raise_for_status() -> None

Raise an error if the response status is a client or server error.

iter_raw_bytes(*, chunk_size: int) -> Iterator[bytes]

Yield raw entity-body chunks (post-TE, pre-CE).

chunk_size is a hint; the last chunk may be shorter.

pyhaul.transport.protocols.AsyncTransportSession

Bases: Protocol

Async equivalent of :class:TransportSession.

stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractAsyncContextManager[AsyncTransportResponse]

Async context manager yielding a response whose body must be streamed.

pyhaul.transport.protocols.AsyncTransportResponse

Bases: Protocol

Async equivalent of :class:TransportResponse.

status_code: int property

HTTP status line code (e.g. 200, 206, 416).

headers: TransportHeaders property

Normalized response headers.

raise_for_status() -> None

Raise an error if the response status is a client or server error.

aiter_raw_bytes(*, chunk_size: int) -> AsyncIterator[bytes]

Async version of :meth:TransportResponse.iter_raw_bytes.