API Reference¶
Download functions¶
pyhaul.engine.haul(url: str, client: TransportSession | object, *, dest: str | Path, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: Callable[[HaulState], None] | None = None) -> CompleteHaul
¶
Download a single byte range to dest, resumably.
client is your HTTP session — requests.Session,
httpx.Client, niquests.Session, or urllib3.PoolManager.
url is validated on entry; invalid schemes or missing hosts raise
:class:ValueError.
state, when provided, is a :class:HaulState updated in-place
throughout the download — always accurate regardless of how the
function exits.
on_progress, if set, is called after each chunk is written with the current state (synchronous; keep it fast — e.g. progress UI, metrics).
Returns :class:CompleteHaul on success. Raises
:class:PartialHaulError when the stream ends before all bytes
arrive (the .part and .part.ctrl files remain on disk for
the next call to resume from). Transport errors from the
underlying HTTP library propagate unwrapped.
pyhaul.async_engine.haul_async(url: str, client: AsyncTransportSession | object, *, dest: str | Path, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: Callable[[HaulState], None] | None = None) -> CompleteHaul
async
¶
Async equivalent of :func:pyhaul.engine.haul.
client is your async HTTP session — httpx.AsyncClient,
niquests.AsyncSession, or aiohttp.ClientSession.
url is validated on entry; invalid schemes or missing hosts raise
:class:ValueError.
state, when provided, is a :class:HaulState updated in-place
throughout the download — always accurate regardless of how the
function exits.
on_progress, if set, is called after each chunk (synchronous
callback; same contract as :func:pyhaul.engine.haul).
Returns :class:CompleteHaul on success. Raises
:class:PartialHaulError when the stream ends before all bytes
arrive (the .part and .part.ctrl files remain on disk for
the next call to resume from). Transport errors from the
underlying HTTP library propagate unwrapped.
Types¶
pyhaul._types.CompleteHaul(*, elapsed: float, sha256: str, etag: ETag, content_type: str)
dataclass
¶
Returned by haul() on success.
Carries completion metadata that only exists once the file is done.
Progress counters (bytes_read, valid_length) live in
:class:HaulState, not here.
pyhaul._types.HaulState(is_complete: bool = False, bytes_read: int = 0, valid_length: int = 0, reported_length: int | None = None, block_size: int = 8 * 1024 * 1024, hashes: list[bytes] = list[bytes]())
dataclass
¶
Mutable progress bag updated in-place by haul() / haul_async().
Pass an instance as the optional state parameter; the engine
updates it throughout the download. After the call — whether it
returned, raised :class:PartialHaulError, or let a transport
exception fly — the bag reflects the state at the point of exit.
valid_length is not monotonic across attempts: a server
response that invalidates prior progress rewinds the cursor to
zero.
reported_length is the total size of the resource as claimed by
the server, derived from the Content-Range (complete-length) or
Content-Length headers. It may be None if the server omits
the total, and may not match the final on-disk size if the server
misbehaves. (Useful for progress UIs.)
pyhaul._types.ServerMeta(*, etag: ETag = EMPTY_ETAG, total_length: int | None = None, last_modified: str = '', content_type: str = '')
dataclass
¶
Typed view of the response headers pyhaul cares about.
total_length is None when the server omits Content-Length
or returns a non-integer value. Every string field is "" when
the corresponding header is absent; is_file_changed treats
missing last_modified on either side as "cannot prove unchanged".
pyhaul._types.Url = NewType('Url', str)
module-attribute
¶
An http(s) URL that has passed :func:parse_url.
pyhaul._types.ETag = NewType('ETag', str)
module-attribute
¶
An HTTP ETag value (possibly empty) that has passed :func:parse_etag.
Stored verbatim as the server sent it — quotes and optional W/ prefix
included — so that it can be echoed back in If-Range / If-Match
byte-for-byte. An empty string means "no ETag for this resource".
pyhaul.transport._headers.TransportHeaders(items: Pairs = ())
¶
Bases: Mapping[str, str]
Immutable, case-insensitive HTTP headers with multi-value support.
Field names are matched case-insensitively per HTTP semantics. Duplicate names are preserved in wire order.
Construct via build, from_pairs, or from_mapping.
Utility functions¶
pyhaul._types.parse_url(raw: str) -> Url
¶
Validate raw as an http(s) URL and brand it as :data:Url.
Empty / whitespace-only input returns :data:EMPTY_URL ("no URL
yet") — callers that forbid that case (e.g. the CLI) must check
separately. Non-empty input must have an http or https
scheme and a host, or :class:ValueError is raised.
pyhaul._types.parse_etag(raw: object) -> ETag
¶
Normalize a raw ETag header value into an :data:ETag.
Strips surrounding whitespace. Empty input becomes ETag("") to
represent "no ETag". Values that clearly aren't well-formed
entity-tags (per RFC 7232: optional W/ then a quoted string) are
also treated as absent, since echoing them back in If-Range /
If-Match would be unsafe. Non-string input raises
:class:TypeError.
pyhaul._types.HashBuilder(block_size: int, initial_hashes: list[bytes] | None = None)
¶
Incremental SHA-256 block-level accumulator.
current_digest: bytes | None
property
¶
Return the digest of the current partial block, if any.
update(data: bytes) -> list[bytes]
¶
Feed data. Returns any hashes completed during this update.
finalize() -> str
¶
Finish the current block (if any) and return the tree hash.
hash_file(path: str | Path, block_size: int = 8 * 1024 * 1024) -> str
staticmethod
¶
Compute the tree hash of a complete file on disk.
Adapter registration¶
pyhaul._session_dispatch.register_sync_adapter(factory: SyncAdapterFactory) -> None
¶
Append a sync adapter factory.
factory is called with a raw client object and must return a
:class:~pyhaul.transport.protocols.TransportSession or None.
Factories are tried in registration order; the first non-None
result wins.
pyhaul._session_dispatch.register_async_adapter(factory: AsyncAdapterFactory) -> None
¶
Append an async adapter factory.
factory is called with a raw client object and must return an
:class:~pyhaul.transport.protocols.AsyncTransportSession or None.
Transport protocols¶
pyhaul.transport.protocols.TransportSession
¶
Bases: Protocol
Sync session: one method (stream_get), no lifecycle, no mutation.
stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractContextManager[TransportResponse]
¶
Context manager yielding a response whose body must be streamed.
pyhaul.transport.protocols.TransportResponse
¶
Bases: Protocol
Read-only HTTP response: status, normalized headers, raw body iterator.
status_code: int
property
¶
HTTP status line code (e.g. 200, 206, 416).
headers: TransportHeaders
property
¶
Normalized response headers.
raise_for_status() -> None
¶
Raise an error if the response status is a client or server error.
iter_raw_bytes(*, chunk_size: int) -> Iterator[bytes]
¶
Yield raw entity-body chunks (post-TE, pre-CE).
chunk_size is a hint; the last chunk may be shorter.
pyhaul.transport.protocols.AsyncTransportSession
¶
Bases: Protocol
Async equivalent of :class:TransportSession.
stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractAsyncContextManager[AsyncTransportResponse]
¶
Async context manager yielding a response whose body must be streamed.
pyhaul.transport.protocols.AsyncTransportResponse
¶
Bases: Protocol
Async equivalent of :class:TransportResponse.
status_code: int
property
¶
HTTP status line code (e.g. 200, 206, 416).
headers: TransportHeaders
property
¶
Normalized response headers.
raise_for_status() -> None
¶
Raise an error if the response status is a client or server error.
aiter_raw_bytes(*, chunk_size: int) -> AsyncIterator[bytes]
¶
Async version of :meth:TransportResponse.iter_raw_bytes.