Skip to content

API reference

Auto-extracted from yarlpattern's docstrings by mkdocstrings. Items are grouped by audience: the primary public API, the escape helpers most callers eventually need, and the lower-level building blocks reserved for advanced use.

Primary API

The 95% surface: one class, one result type, one tuple of component names.

yarlpattern.URLPattern(input: URLPatternInit | str | None = None, base_url_or_options: str | Mapping[str, Any] | None = None, options: Mapping[str, Any] | None = None, /, *, engine: RegexEngine | None = None)

A compiled WHATWG URL pattern.

Construction accepts either a :class:URLPatternInit dict or a constructor string (e.g. "https://example.com/:foo") plus optional base URL and options. Each URL component has an independent parser-options bundle (delimiter / prefix code points) defined in the spec; components not present in the input default to the "*" pattern (matches anything).

The compiled regex matcher behind every component is produced by a regex engine selected via :mod:yarlpattern._regex_engine. The default engine auto-picks Matthew Barnett's regex package when importable (handles JS v-flag set operations like [a&&b] correctly) and falls back to stdlib :mod:re otherwise. Pass engine=... to override per-instance — useful for benchmarking or for shipping a custom backend.

has_regexp_groups: bool property

Whether any compiled component contains a custom regex body.

Spec definition: true iff any component's part list has a part whose type is regexp and whose value is not the segment-wildcard or full-wildcard regexp value. The per-component bool was set at compile time from the parts list; aggregating here is a tight any() over the matchers (returns on first hit, no allocations).

test(input: URLPatternInput | None = None, base_url: str | _YarlURL | None = None) -> bool

Return True iff input matches every component pattern.

Accepts a URL string, a per-component dict, or a :class:yarl.URL. A bare yarl URL is the fast path: its already-parsed components are read directly, skipping the str-form reparse — ideal for pat.test(request.url) in aiohttp / yarl-based applications.

Per the WHATWG spec, exceptions raised while processing the input (URL parsing failures, invalid scheme characters, malformed IDNA, etc.) are caught and surface as a "no match". Exceptions during pattern construction still propagate; only the input-side processing is forgiving.

exec(input: URLPatternInput | None = None, base_url: str | _YarlURL | None = None) -> URLPatternResult | None

Return a :class:URLPatternResult for input or None on miss.

Accepts the same input shapes as :meth:test. A bare :class:yarl.URL is the fast path — its already-parsed components are read directly.

Like :meth:test, exceptions during input processing surface as None (no match) rather than propagating to the caller.

compare_component(component: str, left: URLPattern, right: URLPattern) -> int staticmethod

Three-way ordering of two patterns along a single component.

Returns -1 / 0 / 1 per the (tentative) WHATWG URLPattern.compareComponent specification. The ordering ranks patterns by specificity: a fully literal component outranks one with a regex group, which outranks a segment wildcard, which outranks a full wildcard. Within a type, modifiers further refine (mandatory > one-or-more > optional > zero-or-more), and within a (type, modifier) the prefix / value / suffix strings are compared lexicographically. Names are not part of the order.

The comparison runs on the parsed part list, not the user's raw input — so /foo/{bar}/baz and /foo/bar/baz compare equal even though the surface strings differ. An entirely empty component is treated as *.

Raises :class:TypeError if component is not one of the eight spec-defined names.

generate(component: str, groups: Mapping[str, str] | None = None) -> str

Produce the URL-component string that this pattern would have matched.

generate reverses :meth:exec: given a component name (one of the eight WHATWG components) and a mapping of named-group values, emit the canonical-form string that, fed back through :meth:exec, would yield the same groups.

This is a tentative spec feature — see urlpattern-generate.tentative.any.js in the upstream WPT suite. The 19 WPT conformance cases for generate() ship with yarlpattern at reference/wpt/urlpattern/resources/urlpattern-generate-test-data.json; the public algorithm is anchored in those cases. yarlpattern's implementation is a direct walk over the per-component parsed part-list built at construction time.

Raises :class:TypeError for any of these conditions:

  • component is not one of the eight known names;
  • a part with a modifier (?, *, +) — those are not uniquely reversible;
  • a standalone full wildcard (*) or an anonymous regex group — there is no named group to substitute into;
  • a required named group is missing from groups;
  • the encoded substitution would violate the part's own constraint (e.g. "bar/baz" substituted into :foo in a pathname component, where the segment-wildcard rejects /).

with_(**components: str) -> Self

Return a copy of this pattern with selected components replaced.

with_protocol(protocol: str) -> Self

Return a copy with protocol replaced.

with_username(username: str) -> Self

Return a copy with username replaced.

with_password(password: str) -> Self

Return a copy with password replaced.

with_hostname(hostname: str) -> Self

Return a copy with hostname replaced.

with_port(port: str) -> Self

Return a copy with port replaced.

with_pathname(pathname: str) -> Self

Return a copy with pathname replaced.

Return a copy with search replaced.

with_hash(hash: str) -> Self

Return a copy with hash replaced.

yarlpattern.URLPatternResult(inputs: list[URLPatternInput], protocol: dict[str, Any] | None = None, username: dict[str, Any] | None = None, password: dict[str, Any] | None = None, hostname: dict[str, Any] | None = None, port: dict[str, Any] | None = None, pathname: dict[str, Any] | None = None, search: dict[str, Any] | None = None, hash: dict[str, Any] | None = None) dataclass

Result of a successful :meth:URLPattern.exec match.

inputs echoes the arguments originally supplied to :meth:URLPattern.exec. Each component attribute holds a {"input": str, "groups": dict[str, str]} mapping. Optional groups that did not capture anything are omitted from groups (rather than carrying a None value) so dict-equality comparisons against the WPT expectations work without translation — the WPT data uses JSON null to mean "undefined / key absent in JS".

yarlpattern.COMPONENTS: Final[tuple[str, ...]] = ('protocol', 'username', 'password', 'hostname', 'port', 'pathname', 'search', 'hash') module-attribute

Escape helpers

When you're building a pattern from a string whose contents might contain pattern metacharacters (:, *, (, ), {, }), escape it first.

yarlpattern.escape_pattern_string(input_: str) -> str

Implement §2.3 "escape a pattern string".

Single C-level :meth:str.translate call. Like escape_regexp_string we skip the ASCII assertion — the canonicalization layer is responsible for the input being ASCII before reaching here.

yarlpattern.escape_regexp_string(input_: str) -> str

Implement §2.2 "escape a regexp string".

The spec asserts the input is ASCII; we don't enforce that at the call boundary because the canonicalization layer above us already guarantees it, and adding an assertion would only show up as overhead. If a future caller feeds non-ASCII text through here, the resulting regex will still be syntactically valid — just potentially semantically off, which we'd rather surface in the matcher than re-validate on every call.

Low-level building blocks

The spec-aligned tokenizer, parser, and regex generator are public because they're useful for tools that compose URLPatterns programmatically — a route-table linter, a static analyzer for overlapping patterns, a code generator emitting JavaScript URLPattern strings from a Python source of truth.

Most users never need these.

Tokenizer

yarlpattern.tokenize(input_: str, policy: TokenizePolicy = TokenizePolicy.STRICT) -> list[Token]

Tokenize input_ per §2.1.2.

Returns the token list including the trailing end token. The token's index field uses the spec convention — position of the first code point of the token. value is the payload substring.

On a malformed token, raises :class:TypeError under strict and emits an invalid-char token under lenient (matching the constructor- string parser's expectations).

yarlpattern.Token

Bases: NamedTuple

Tokenizer output entry.

index is the position of the first code point of the token in the original input. value is the substring of the input that the token represents — for escaped-char / regexp / name this is the payload without the surrounding syntax characters (\, (), :).

yarlpattern.TokenType

Bases: StrEnum

One of the ten token kinds in §2.1.1.

StrEnum members compare equal to their string values, which keeps spec text and our checks in lockstep without forcing every comparison to go through enum machinery.

yarlpattern.TokenizePolicy

Bases: StrEnum

Tokenize policy from §2.1.2.

strict raises :class:TypeError on the first malformed token. The constructor-string parser uses lenient so that ambiguous protocol / pathname boundaries (e.g. the : in https://host:port) can be resolved by later phases rather than rejected at tokenization.

Parts

A part is one syntactic element of a pattern: a literal text run, a named segment-wildcard, a regex group, or a full wildcard.

yarlpattern.parse_pattern_string(input_: str, options: Options, encoding_callback: EncodingCallback = identity_encoding_callback) -> list[Part]

Parse input_ into a :class:Part list per §2.1.5.

Mirrors the spec algorithm exactly. The two top-level branches of the outer loop correspond to the two grouping forms the spec illustrates:

  • <prefix-char><name><regexp><modifier> — an inline matching group, possibly with a single literal prefix character (the spec's "automatic prefix" — only kicks in when that character equals options.prefix_code_point).
  • <open><prefix-text><name><regexp><suffix-text><close><modifier> — an explicit {...} grouping, with arbitrary literal text on either side of the matching group and a required modifier afterwards.

yarlpattern.parts_to_pattern_string(part_list: list[Part], options: Options) -> str

Implement §2.3 "generate a pattern string".

Inverse of :func:parse_pattern_string — turns a part list back into its canonical pattern-string form. This is what the URLPattern instance exposes via pattern.<component>: not the user's raw input, but the parser's "normalized" understanding of it. That means /foo/(.*) round-trips as /foo/*, and /foo/([^\/]+?) round-trips as /foo/:0 — collapsing equivalent forms onto the shorter pattern syntax.

The function is one tight loop with several "needs grouping" heuristics that handle ambiguity cases — e.g. /:a followed by literal b would parse as the name :ab if emitted unwrapped, so we wrap it as {/:a}b. Those heuristics live inline because each one references properties of the current / previous / next part.

yarlpattern.Part(type: PartType, value: str, modifier: PartModifier, name: str = '', prefix: str = '', suffix: str = '') dataclass

A single piece of a parsed pattern.

Carries at most one matching group plus its surrounding fixed prefix / suffix and a modifier. See §2.1.3.

yarlpattern.PartType

Bases: StrEnum

Four kinds of part, per §2.1.3.

yarlpattern.PartModifier

Bases: StrEnum

Per-part modifier from §2.1.3.

yarlpattern.Options(delimiter_code_point: str = '', prefix_code_point: str = '', ignore_case: bool = False) dataclass

Pattern-parser options from §2.1.4.

The fields correspond to the spec's struct entries. They are frozen because a single Options instance is shared across all parts emitted from one parse, and we never need to mutate it.

yarlpattern.EncodingCallback = Callable[[str], str] module-attribute

Per §2.1.5 — validate and encode a literal text slice from a pattern.

Each URL component supplies its own callback so canonicalization runs while the parser already knows which slices are literal vs. matching-group syntax.

Regex generation

yarlpattern.parts_to_regex(part_list: list[Part], options: Options) -> tuple[str, list[str]]

Implement §2.2 "generate a regular expression and name list".

Returns (regex_source, name_list). regex_source is anchored (^...$) and uses positional capture groups in document order. name_list has one entry per capture group, in the same order — caller is expected to zip these with re.Match.groups() to build a names dict.

The body is a tight list.append + "".join loop rather than result += ... because long pattern strings with many parts can otherwise quadratic on CPython.

yarlpattern.generate_segment_wildcard_regexp(options: Options) -> str

Per §2.1.5 "generate a segment wildcard regexp": [^<delim>]+?.

yarlpattern.FULL_WILDCARD_REGEXP_VALUE: Final = '.*' module-attribute