woob.browser.url

exception UrlNotResolvable[source]

Bases: Exception

Raised when trying to locate on an URL instance which url pattern is not resolvable as a real url.

class URL(*args, base='BASEURL')[source]

Bases: object

A description of an URL on the PagesBrowser website.

It takes one or several regexps to match urls, and an optional Page class which is instancied by PagesBrowser.open if the page matches a regex.

Parameters:

base (str) – The name of the browser’s property containing the base URL. (default: 'BASEURL')

is_here(**kwargs)[source]

Returns True if the current page of browser matches this URL. If arguments are provided, and only then, they are checked against the arguments that were used to build the current page URL.

Return type:

bool

stay_or_go(params=None, data=None, json=None, method=None, headers=None, **kwargs)[source]

Request to go on this url only if we aren’t already here.

Arguments are optional parameters for url.

Return type:

Response | Page

>>> url = URL('https://exawple.org/(?P<pagename>).html')
>>> url.stay_or_go(pagename='index')
go(*, params=None, data=None, json=None, method=None, headers=None, **kwargs)[source]

Request to go on this url.

Arguments are optional parameters for url.

Return type:

Response | Page

>>> url = URL('https://exawple.org/(?P<pagename>).html')
>>> url.stay_or_go(pagename='index')
open(*, params=None, data=None, json=None, method=None, headers=None, is_async=False, callback=lambda response: ..., **kwargs)[source]

Request to open on this url.

Arguments are optional parameters for url.

Return type:

Response | Page

>>> url = URL('https://exawple.org/(?P<pagename>).html')
>>> url.open(pagename='index')
get_base_url(browser=None, for_pattern=None)[source]

Get the browser’s base URL for the instance.

for_pattern argument is optional and only used to display more information in the ValueError exception (don’t know why, may be removed).

Return type:

str

build(**kwargs)[source]

Build an url with the given arguments from URL’s regexps.

Parameters:

param – Query string parameters

Return type:

str

Raises:

UrlNotResolvable if unable to resolve a correct url with the given arguments.

match(url, base=None)[source]

Check if the given url match this object.

Returns None if none matches.

Return type:

Match | None

handle(response)[source]

Handle a HTTP response to get an instance of the klass if it matches.

Return type:

Page | None

id2url(func)[source]

Helper decorator to get an URL if the given first parameter is an ID.

with_page(cls)[source]

Get a new URL with the same path but a different page class.

Parameters:

cls (Page) – The new page class to use.

Return type:

URL

with_urls(*urls, clear=True, match_new_first=True)[source]

Get a new URL object with the same page but with different paths.

Parameters:
  • urls (str) – List of urls handled by the page

  • clear (bool) – If True, the page will only handled the given urls. Otherwise, the urls are added to already handled urls.

  • match_new_first (bool) – If true, new paths will be matched first for this URL; this parameter is ignored when clear is True.

Return type:

URL

class BrowserParamURL(*args, base='BASEURL')[source]

Bases: URL

A URL that automatically fills some params from browser attributes.

URL patterns having groups named “browser_*” will pick the relevant attribute from the browser. For example:

foo = BrowserParamURL(r’/foo?bar=(?P<browser_token>w+)’)

The browser is expected to have a .token attribute and it will be passed automatically when just calling foo.go(), it’s equivalent to foo.go(browser_token=browser.token).

Warning: all browser_* params will be passed, having multiple patterns with different groups in a BrowserParamURL is risky.

build(**kwargs)[source]

Build an url with the given arguments from URL’s regexps.

Parameters:

param – Query string parameters

Return type:

str

Raises:

UrlNotResolvable if unable to resolve a correct url with the given arguments.

normalize_url(url)[source]

Normalize URL by lower-casing the domain and other fixes.

Lower-cases the domain, removes the default port and a trailing dot.

Return type:

str

>>> normalize_url('https://EXAMPLE:80')
'https://example'