API Reference¶
Download functions¶
pyhaul.engine.haul(url: str, client: TransportSession | object, *, dest: str | Path, headers: Mapping[str, str] | None = None, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: Callable[[HaulState], None] | None = None) -> CompleteHaul
¶
Download a single byte range to dest, resumably.
client is your HTTP session — requests.Session,
httpx.Client, niquests.Session, or urllib3.PoolManager.
url is validated on entry; invalid schemes or missing hosts raise
:class:ValueError.
headers, when provided, are merged with pyhaul's structural requirements.
state, when provided, is a :class:HaulState updated in-place
throughout the download — always accurate regardless of how the
function exits.
on_progress, if set, is called after each chunk is written with the current state (synchronous; keep it fast — e.g. progress UI, metrics).
Returns :class:CompleteHaul on success. Raises
:class:PartialHaulError when the stream ends before all bytes
arrive (the .part and .part.ctrl files remain on disk for
the next call to resume from). Transport errors from the
underlying HTTP library propagate unwrapped.
pyhaul.async_engine.haul_async(url: str, client: AsyncTransportSession | object, *, dest: str | Path, headers: Mapping[str, str] | None = None, state: HaulState | None = None, chunk_size: int = DEFAULT_CHUNK, flush_every: int = DEFAULT_FLUSH, on_progress: AsyncProgressCallback | None = None) -> CompleteHaul
async
¶
Async equivalent of :func:pyhaul.engine.haul.
client is your async HTTP session — httpx.AsyncClient,
niquests.AsyncSession, or aiohttp.ClientSession.
url is validated on entry; invalid schemes or missing hosts raise
:class:ValueError.
headers, when provided, are merged with pyhaul's structural requirements.
state, when provided, is a :class:HaulState updated in-place
throughout the download — always accurate regardless of how the
function exits.
on_progress, if set, is called after each chunk. It may be an
ordinary synchronous callable or return an awaitable (for example
an async def); when the return value is awaitable, it is
awaited before the next chunk is read. :func:pyhaul.engine.haul
accepts only synchronous callbacks.
Returns :class:CompleteHaul on success. Raises
:class:PartialHaulError when the stream ends before all bytes
arrive (the .part and .part.ctrl files remain on disk for
the next call to resume from). Transport errors from the
underlying HTTP library propagate unwrapped.
Types¶
pyhaul._types.AsyncProgressCallback = Callable[[HaulState], None | Awaitable[None]]
module-attribute
¶
Progress hook for :func:~pyhaul.async_engine.haul_async.
May be an ordinary function (returns None) or return an awaitable
(e.g. async def without awaiting it yourself); the engine awaits it
when applicable so callers need not asyncio.create_task each chunk.
pyhaul._types.CompleteHaul(*, elapsed: float, sha256: str, etag: ETag, content_type: str)
dataclass
¶
Returned by haul() on success.
Carries completion metadata that only exists once the file is done.
Progress counters (bytes_read, valid_length) live in
:class:HaulState, not here.
pyhaul._types.HaulState(is_complete: bool = False, bytes_read: int = 0, valid_length: int = 0, reported_length: int | None = None, block_size: int = 8 * 1024 * 1024, hashes: list[bytes] = list[bytes]())
dataclass
¶
Mutable progress bag updated in-place by haul() / haul_async().
Pass an instance as the optional state parameter; the engine
updates it throughout the download. After the call — whether it
returned, raised :class:PartialHaulError, or let a transport
exception fly — the bag reflects the state at the point of exit.
valid_length is not monotonic across attempts: a server
response that invalidates prior progress rewinds the cursor to
zero.
reported_length is the total size of the resource as claimed by
the server, derived from the Content-Range (complete-length) or
Content-Length headers. It may be None if the server omits
the total, and may not match the final on-disk size if the server
misbehaves. (Useful for progress UIs.)
pyhaul._types.ServerMeta(*, etag: ETag = EMPTY_ETAG, total_length: int | None = None, last_modified: str = '', content_type: str = '')
dataclass
¶
Typed view of the response headers pyhaul cares about.
total_length is None when the server omits Content-Length
or returns a non-integer value. Every string field is "" when
the corresponding header is absent; is_file_changed treats
missing last_modified on either side as "cannot prove unchanged".
pyhaul._types.Url = NewType('Url', str)
module-attribute
¶
An http(s) URL that has passed :func:parse_url.
pyhaul.etag.EntityTag(opaque_tag: str | None, is_weak: bool, is_wildcard: bool, raw_value: str)
dataclass
¶
Immutable HTTP entity-tag with explicit weak / wildcard discrimination.
opaque_tag is None means absent (no validator). Otherwise opaque_tag is the
opaque string inside the RFC grammar (possibly "" for ETag: "").
Use :meth:parse_header_value for wire values (ETag response header).
Use :meth:from_canonical for pyhaul .ctrl TLV UTF-8 payloads.
Instances compare equal by (opaque_tag, is_weak, is_wildcard) only.
usable_for_byte_range_precondition: bool
property
¶
Strong, non-wildcard tag suitable for If-Range / strict 206 equality.
__post_init__() -> None
¶
Validate invariants for wildcard vs absent vs weak-strong tags.
parse_header_value(raw: str) -> EntityTag
classmethod
¶
Parse an ETag / If-Match style field-value fragment.
Accepts RFC quoted-string and liberal unquoted tokens. A strong empty
quoted opaque ("") is valid grammar and preserved on the instance
(including :attr:raw_value). W/"" yields :data:EMPTY_ETAG because weak
validators require a non-empty opaque here. Malformed input yields
:data:EMPTY_ETAG. Whitespace-only → :data:EMPTY_ETAG.
from_canonical(value: str) -> EntityTag
classmethod
¶
Rebuild from UTF-8 stored by :meth:to_canonical.
Expects quoted RFC entity-tag text for values written by current pyhaul.
Leading / trailing ASCII OWS is stripped once (surrounding the whole blob).
Legacy control files may hold bare tokens (no quotes); those still parse when
they satisfy the same rules as :meth:parse_header_value.
__bool__() -> bool
¶
True when this tag is present (wildcard or any opaque, including empty).
__eq__(other: object) -> bool
¶
Equality by semantic triple; :attr:raw_value is ignored.
__hash__() -> int
¶
Hash matches :meth:__eq__ (semantic triple only).
__repr__() -> str
¶
Debug representation without :attr:raw_value (can be large).
__str__() -> str
¶
Canonical HTTP entity-tag field-value (empty when absent).
strong_equals(other: object) -> bool
¶
Strong comparison: both strong validators and opaque_tags byte-equal.
weak_equals(other: object) -> bool
¶
Weak comparison: opaque_tags equal; weak/strong flags ignored (wildcards excluded).
to_http_field_value() -> str
¶
RFC entity-tag for outbound headers (If-Range, If-Match, …).
to_canonical() -> str
¶
RFC-shaped entity-tag text for .part.ctrl TLV UTF-8 (or "" when absent).
Uses the same quoting as :meth:to_http_field_value so the opaque is always
framed by quoted-string rules (escaping \ and " inside the opaque).
pyhaul.etag.EMPTY_ETAG: Final[EntityTag] = EntityTag(opaque_tag=None, is_weak=False, is_wildcard=False, raw_value='')
module-attribute
¶
TransportHeaders¶
The immutable header type used for normalized responses and for merged outbound requests inside adapters. See TransportHeaders for constructors and methods.
Utility functions¶
pyhaul._types.parse_url(raw: str) -> Url
¶
Validate raw as an http(s) URL and brand it as :data:Url.
Empty / whitespace-only input returns :data:EMPTY_URL ("no URL
yet") — callers that forbid that case (e.g. the CLI) must check
separately. Non-empty input must have an http or https
scheme and a host, or :class:ValueError is raised.
pyhaul.etag.parse_etag(raw: object) -> EntityTag
¶
Backward-compatible wrapper around :meth:EntityTag.parse_header_value.
Non-string raises :class:TypeError.
pyhaul.etag.format_entity_tag_for_http_header(etag: EntityTag) -> str
¶
Serialize for merged request headers; absent tags yield "".
pyhaul.etag.is_weak_validator(etag: EntityTag) -> bool
¶
True if etag is a weak validator (W/…).
Weak validators are not used for byte-range preconditioning in pyhaul.
pyhaul._types.HashBuilder(block_size: int, initial_hashes: list[bytes] | None = None)
¶
Incremental SHA-256 block-level accumulator.
current_digest: bytes | None
property
¶
Return the digest of the current partial block, if any.
update(data: bytes) -> list[bytes]
¶
Feed data. Returns any hashes completed during this update.
finalize() -> str
¶
Finish the current block (if any) and return the tree hash.
hash_file(path: str | Path, block_size: int = 8 * 1024 * 1024) -> str
staticmethod
¶
Compute the tree hash of a complete file on disk.
Adapter registration¶
pyhaul._session_dispatch.register_sync_adapter(factory: SyncAdapterFactory) -> None
¶
Append a sync adapter factory.
factory is called with a raw client object and must return a
:class:~pyhaul.transport.protocols.TransportSession or None.
Factories are tried in registration order; the first non-None
result wins.
pyhaul._session_dispatch.register_async_adapter(factory: AsyncAdapterFactory) -> None
¶
Append an async adapter factory.
factory is called with a raw client object and must return an
:class:~pyhaul.transport.protocols.AsyncTransportSession or None.
Transport protocols¶
pyhaul.transport.protocols.TransportSession
¶
Bases: Protocol
Sync transport: prepare_headers, stream_get, and stream_head.
prepare_headers(headers: TransportHeaders) -> TransportHeaders
¶
Optionally mutate headers before they are sent.
The engine calls this after merging user headers and pyhaul structural requirements. Adapters can use this to suppress headers (e.g. for conformance testing) or add session-specific metadata.
.. warning::
Removing structural headers like Range or If-Range may
break pyhaul's resume logic.
stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractContextManager[TransportResponse]
¶
Context manager yielding a response whose body must be streamed.
stream_head(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractContextManager[TransportResponse]
¶
Context manager yielding a HEAD response (entity body must not be read).
pyhaul.transport.protocols.TransportResponse
¶
Bases: Protocol
Read-only HTTP response: status, normalized headers, raw body iterator.
status_code: int
property
¶
HTTP status line code (e.g. 200, 206, 416).
headers: TransportHeaders
property
¶
Normalized response headers.
raise_for_status() -> None
¶
Raise an error if the response status is a client or server error.
iter_raw_bytes(*, chunk_size: int) -> Iterator[bytes]
¶
Yield raw entity-body chunks (post-TE, pre-CE).
chunk_size is a hint; the last chunk may be shorter.
pyhaul.transport.protocols.AsyncTransportSession
¶
Bases: Protocol
Async transport: prepare_headers, stream_get, and stream_head.
prepare_headers(headers: TransportHeaders) -> TransportHeaders
¶
Optionally mutate headers before they are sent (see sync version).
stream_get(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractAsyncContextManager[AsyncTransportResponse]
¶
Async context manager yielding a response whose body must be streamed.
stream_head(url: Url, *, headers: Mapping[str, str], options: TransportRequestOptions | None = None) -> AbstractAsyncContextManager[AsyncTransportResponse]
¶
Async context manager yielding a HEAD response.
pyhaul.transport.protocols.AsyncTransportResponse
¶
Bases: Protocol
Async equivalent of :class:TransportResponse.
status_code: int
property
¶
HTTP status code of the response.
headers: TransportHeaders
property
¶
Normalized response headers.
raise_for_status() -> None
¶
Raise an error if the response status is a client or server error.
aiter_raw_bytes(*, chunk_size: int) -> AsyncIterator[bytes]
¶
Async version of :meth:TransportResponse.iter_raw_bytes.
Transport session proxy¶
Layer header policy around an existing adapter without subclassing each backend:
pyhaul.transport.proxy_transport_session.transport_session_proxy() -> TransportSessionProxyPlanning
¶
Start a fluent chain for wrapping a synchronous :class:TransportSession.
pyhaul.transport.proxy_transport_session.async_transport_session_proxy() -> AsyncTransportSessionProxyPlanning
¶
Start a fluent chain for wrapping an :class:AsyncTransportSession.