API reference¶
Auto-extracted from yarlpattern's docstrings by mkdocstrings. Items are grouped by audience: the primary public API, the escape helpers most callers eventually need, and the lower-level building blocks reserved for advanced use.
Primary API¶
The 95% surface: one class, one result type, one tuple of component names.
yarlpattern.URLPattern(input: URLPatternInit | str | None = None, base_url_or_options: str | Mapping[str, Any] | None = None, options: Mapping[str, Any] | None = None, /, *, engine: RegexEngine | None = None)
¶
A compiled WHATWG URL pattern.
Construction accepts either a :class:URLPatternInit dict or a
constructor string (e.g. "https://example.com/:foo") plus
optional base URL and options. Each URL component has an independent
parser-options bundle (delimiter / prefix code points) defined in the
spec; components not present in the input default to the "*"
pattern (matches anything).
The compiled regex matcher behind every component is produced by a
regex engine selected via :mod:yarlpattern._regex_engine. The
default engine auto-picks Matthew Barnett's regex package when
importable (handles JS v-flag set operations like [a&&b]
correctly) and falls back to stdlib :mod:re otherwise. Pass
engine=... to override per-instance — useful for benchmarking
or for shipping a custom backend.
has_regexp_groups: bool
property
¶
Whether any compiled component contains a custom regex body.
Spec definition: true iff any component's part list has a part whose
type is regexp and whose value is not the segment-wildcard or
full-wildcard regexp value. The per-component bool was set at
compile time from the parts list; aggregating here is a tight
any() over the matchers (returns on first hit, no allocations).
test(input: URLPatternInput | None = None, base_url: str | _YarlURL | None = None) -> bool
¶
Return True iff input matches every component pattern.
Accepts a URL string, a per-component dict, or a :class:yarl.URL.
A bare yarl URL is the fast path: its already-parsed components
are read directly, skipping the str-form reparse — ideal for
pat.test(request.url) in aiohttp / yarl-based applications.
Per the WHATWG spec, exceptions raised while processing the input (URL parsing failures, invalid scheme characters, malformed IDNA, etc.) are caught and surface as a "no match". Exceptions during pattern construction still propagate; only the input-side processing is forgiving.
exec(input: URLPatternInput | None = None, base_url: str | _YarlURL | None = None) -> URLPatternResult | None
¶
Return a :class:URLPatternResult for input or None on miss.
Accepts the same input shapes as :meth:test. A bare
:class:yarl.URL is the fast path — its already-parsed
components are read directly.
Like :meth:test, exceptions during input processing surface as
None (no match) rather than propagating to the caller.
compare_component(component: str, left: URLPattern, right: URLPattern) -> int
staticmethod
¶
Three-way ordering of two patterns along a single component.
Returns -1 / 0 / 1 per the (tentative) WHATWG
URLPattern.compareComponent specification. The ordering ranks
patterns by specificity: a fully literal component outranks one
with a regex group, which outranks a segment wildcard, which
outranks a full wildcard. Within a type, modifiers further refine
(mandatory > one-or-more > optional > zero-or-more), and within
a (type, modifier) the prefix / value / suffix strings are
compared lexicographically. Names are not part of the order.
The comparison runs on the parsed part list, not the user's
raw input — so /foo/{bar}/baz and /foo/bar/baz compare
equal even though the surface strings differ. An entirely empty
component is treated as *.
Raises :class:TypeError if component is not one of the eight
spec-defined names.
generate(component: str, groups: Mapping[str, str] | None = None) -> str
¶
Produce the URL-component string that this pattern would have matched.
generate reverses :meth:exec: given a component name (one of
the eight WHATWG components) and a mapping of named-group values,
emit the canonical-form string that, fed back through :meth:exec,
would yield the same groups.
This is a tentative spec feature — see
urlpattern-generate.tentative.any.js in the upstream WPT suite.
The 19 WPT conformance cases for generate() ship with
yarlpattern at
reference/wpt/urlpattern/resources/urlpattern-generate-test-data.json;
the public algorithm is anchored in those cases. yarlpattern's
implementation is a direct walk over the per-component parsed
part-list built at construction time.
Raises :class:TypeError for any of these conditions:
- component is not one of the eight known names;
- a part with a modifier (
?,*,+) — those are not uniquely reversible; - a standalone full wildcard (
*) or an anonymous regex group — there is no named group to substitute into; - a required named group is missing from groups;
- the encoded substitution would violate the part's own
constraint (e.g.
"bar/baz"substituted into:fooin a pathname component, where the segment-wildcard rejects/).
with_(**components: str) -> Self
¶
Return a copy of this pattern with selected components replaced.
with_protocol(protocol: str) -> Self
¶
Return a copy with protocol replaced.
with_username(username: str) -> Self
¶
Return a copy with username replaced.
with_password(password: str) -> Self
¶
Return a copy with password replaced.
with_hostname(hostname: str) -> Self
¶
Return a copy with hostname replaced.
with_port(port: str) -> Self
¶
Return a copy with port replaced.
with_pathname(pathname: str) -> Self
¶
Return a copy with pathname replaced.
with_search(search: str) -> Self
¶
Return a copy with search replaced.
with_hash(hash: str) -> Self
¶
Return a copy with hash replaced.
yarlpattern.URLPatternResult(inputs: list[URLPatternInput], protocol: dict[str, Any] | None = None, username: dict[str, Any] | None = None, password: dict[str, Any] | None = None, hostname: dict[str, Any] | None = None, port: dict[str, Any] | None = None, pathname: dict[str, Any] | None = None, search: dict[str, Any] | None = None, hash: dict[str, Any] | None = None)
dataclass
¶
Result of a successful :meth:URLPattern.exec match.
inputs echoes the arguments originally supplied to :meth:URLPattern.exec.
Each component attribute holds a {"input": str, "groups": dict[str, str]}
mapping. Optional groups that did not capture anything are omitted from
groups (rather than carrying a None value) so dict-equality
comparisons against the WPT expectations work without translation — the
WPT data uses JSON null to mean "undefined / key absent in JS".
yarlpattern.COMPONENTS: Final[tuple[str, ...]] = ('protocol', 'username', 'password', 'hostname', 'port', 'pathname', 'search', 'hash')
module-attribute
¶
Escape helpers¶
When you're building a pattern from a string whose contents might
contain pattern metacharacters (:, *, (, ), {, }), escape
it first.
yarlpattern.escape_pattern_string(input_: str) -> str
¶
Implement §2.3 "escape a pattern string".
Single C-level :meth:str.translate call. Like escape_regexp_string
we skip the ASCII assertion — the canonicalization layer is responsible
for the input being ASCII before reaching here.
yarlpattern.escape_regexp_string(input_: str) -> str
¶
Implement §2.2 "escape a regexp string".
The spec asserts the input is ASCII; we don't enforce that at the call boundary because the canonicalization layer above us already guarantees it, and adding an assertion would only show up as overhead. If a future caller feeds non-ASCII text through here, the resulting regex will still be syntactically valid — just potentially semantically off, which we'd rather surface in the matcher than re-validate on every call.
Low-level building blocks¶
The spec-aligned tokenizer, parser, and regex generator are public because they're useful for tools that compose URLPatterns programmatically — a route-table linter, a static analyzer for overlapping patterns, a code generator emitting JavaScript URLPattern strings from a Python source of truth.
Most users never need these.
Tokenizer¶
yarlpattern.tokenize(input_: str, policy: TokenizePolicy = TokenizePolicy.STRICT) -> list[Token]
¶
Tokenize input_ per §2.1.2.
Returns the token list including the trailing end token. The token's
index field uses the spec convention — position of the first code
point of the token. value is the payload substring.
On a malformed token, raises :class:TypeError under strict and emits
an invalid-char token under lenient (matching the constructor-
string parser's expectations).
yarlpattern.Token
¶
Bases: NamedTuple
Tokenizer output entry.
index is the position of the first code point of the token in the
original input. value is the substring of the input that the token
represents — for escaped-char / regexp / name this is the
payload without the surrounding syntax characters (\, (), :).
yarlpattern.TokenType
¶
Bases: StrEnum
One of the ten token kinds in §2.1.1.
StrEnum members compare equal to their string values, which keeps spec
text and our checks in lockstep without forcing every comparison to go
through enum machinery.
yarlpattern.TokenizePolicy
¶
Bases: StrEnum
Tokenize policy from §2.1.2.
strict raises :class:TypeError on the first malformed token. The
constructor-string parser uses lenient so that ambiguous protocol /
pathname boundaries (e.g. the : in https://host:port) can be
resolved by later phases rather than rejected at tokenization.
Parts¶
A part is one syntactic element of a pattern: a literal text run, a named segment-wildcard, a regex group, or a full wildcard.
yarlpattern.parse_pattern_string(input_: str, options: Options, encoding_callback: EncodingCallback = identity_encoding_callback) -> list[Part]
¶
Parse input_ into a :class:Part list per §2.1.5.
Mirrors the spec algorithm exactly. The two top-level branches of the outer loop correspond to the two grouping forms the spec illustrates:
<prefix-char><name><regexp><modifier>— an inline matching group, possibly with a single literal prefix character (the spec's "automatic prefix" — only kicks in when that character equalsoptions.prefix_code_point).<open><prefix-text><name><regexp><suffix-text><close><modifier>— an explicit{...}grouping, with arbitrary literal text on either side of the matching group and a required modifier afterwards.
yarlpattern.parts_to_pattern_string(part_list: list[Part], options: Options) -> str
¶
Implement §2.3 "generate a pattern string".
Inverse of :func:parse_pattern_string — turns a part list back into its
canonical pattern-string form. This is what the URLPattern instance
exposes via pattern.<component>: not the user's raw input, but the
parser's "normalized" understanding of it. That means /foo/(.*)
round-trips as /foo/*, and /foo/([^\/]+?) round-trips as
/foo/:0 — collapsing equivalent forms onto the shorter pattern
syntax.
The function is one tight loop with several "needs grouping" heuristics
that handle ambiguity cases — e.g. /:a followed by literal b
would parse as the name :ab if emitted unwrapped, so we wrap it
as {/:a}b. Those heuristics live inline because each one references
properties of the current / previous / next part.
yarlpattern.Part(type: PartType, value: str, modifier: PartModifier, name: str = '', prefix: str = '', suffix: str = '')
dataclass
¶
A single piece of a parsed pattern.
Carries at most one matching group plus its surrounding fixed prefix / suffix and a modifier. See §2.1.3.
yarlpattern.PartType
¶
Bases: StrEnum
Four kinds of part, per §2.1.3.
yarlpattern.PartModifier
¶
Bases: StrEnum
Per-part modifier from §2.1.3.
yarlpattern.Options(delimiter_code_point: str = '', prefix_code_point: str = '', ignore_case: bool = False)
dataclass
¶
Pattern-parser options from §2.1.4.
The fields correspond to the spec's struct entries. They are frozen
because a single Options instance is shared across all parts
emitted from one parse, and we never need to mutate it.
yarlpattern.EncodingCallback = Callable[[str], str]
module-attribute
¶
Per §2.1.5 — validate and encode a literal text slice from a pattern.
Each URL component supplies its own callback so canonicalization runs while the parser already knows which slices are literal vs. matching-group syntax.
Regex generation¶
yarlpattern.parts_to_regex(part_list: list[Part], options: Options) -> tuple[str, list[str]]
¶
Implement §2.2 "generate a regular expression and name list".
Returns (regex_source, name_list). regex_source is anchored
(^...$) and uses positional capture groups in document order.
name_list has one entry per capture group, in the same order — caller
is expected to zip these with re.Match.groups() to build a names dict.
The body is a tight list.append + "".join loop rather than
result += ... because long pattern strings with many parts can
otherwise quadratic on CPython.
yarlpattern.generate_segment_wildcard_regexp(options: Options) -> str
¶
Per §2.1.5 "generate a segment wildcard regexp": [^<delim>]+?.