weboob.browser.pages

class weboob.browser.pages.AbstractPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

BROWSER_ATTR = None
PARENT = None
PARENT_URL = None
exception weboob.browser.pages.AbstractPageError

Bases: exceptions.Exception

class weboob.browser.pages.ChecksumPage

Bases: object

Compute a checksum of raw content before parsing it.

build_doc(content)
checksum = None
hashfunc()

Returns a md5 hash object; optionally initialized with a string

hashlib = <module 'hashlib' from '/usr/lib/python2.7/hashlib.pyc'>
class weboob.browser.pages.CsvPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

Page which parses CSV files.

DIALECT = 'excel'

Dialect given to the csv module.

ENCODING = 'utf-8'

Encoding of the file.

FMTPARAMS = {}

Parameters given to the csv module.

HEADER = None

If not None, will consider the line represented by this index as a header. This means the rows will be also available as dictionaries.

NEWLINES_HACK = True

Convert all strange newlines to unix ones.

build_doc(content)
decode_row(row, encoding)

Method called by CsvPage.parse() to decode a row using the given encoding.

parse(data, encoding=None)

Method called by the constructor of CsvPage to parse the document.

Parameters:
  • data (BytesIO) – file stream
  • encoding (str) – if given, use it to decode cell strings
class weboob.browser.pages.Form(page, el, submit_el=None)

Bases: collections.OrderedDict

Represents a form of an HTML page.

It is used as a dict with pre-filled values from HTML. You can set new values as strings by setting an item value.

It is recommended to not use this class by yourself, but call HTMLPage.get_form().

Parameters:
  • page (Page) – the page where the form is located
  • el – the form element on the page
  • submit_el – allows you to only consider one submit button (which is what browsers do). If set to None, it takes all of them, and if set to False, it takes none.
request

Get the Request object from the form.

submit(**kwargs)

Submit the form and tell browser to be located to the new page.

exception weboob.browser.pages.FormNotFound

Bases: exceptions.Exception

Raised when HTMLPage.get_form() can’t find a form.

exception weboob.browser.pages.FormSubmitWarning

Bases: exceptions.UserWarning

A form has more than one submit element selected, and will likely generate an invalid request.

class weboob.browser.pages.GWTPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

GWT page where the “doc” attribute is a list

More info about GWT protcol here : https://goo.gl/GP5dv9

build_doc(content)

Reponse starts with “//” followed by “OK” or “EX”. 2 last elements in list are protocol and flag. We need to read the list in reversed order.

get_date(data)

Get date from string

get_elements(type='String')

Get elements of specified type

class weboob.browser.pages.HTMLPage(*args, **kwargs)

Bases: weboob.browser.pages.Page

HTML page.

Parameters:
  • browser (weboob.browser.browsers.Browser) – browser used to go on the page
  • response (Response) – response object
  • params (dict) – optional dictionary containing parameters given to the page (see weboob.browser.url.URL)
  • encoding (basestring) – optional parameter to force the encoding of the page
FORM_CLASS

The class to instanciate when using HTMLPage.get_form(). Default to Form.

alias of Form

REFRESH_MAX = None

When handling a “Refresh” meta header, the page considers it only if the sleep time in lesser than this value.

Default value is None, means refreshes aren’t handled.

build_doc(content)

Method to build the lxml document from response and given encoding.

define_xpath_functions(ns)

Define XPath functions on the given lxml function namespace.

This method is called in constructor of HTMLPage and can be overloaded by children classes to add extra functions.

detect_encoding()

Look for encoding in the document “http-equiv” and “charset” meta nodes.

get_form(xpath='//form', name=None, id=None, nr=None, submit=None)

Get a Form object from a selector. The form will be analyzed and its parameters extracted. In the case there is more than one “submit” input, only one of them should be chosen to generate the request.

Parameters:
  • xpath (str) – xpath string to select forms
  • name (str) – if supplied, select a form with the given name
  • nr (int) – if supplied, take the n+1 th selected form
  • submit (str) – if supplied, xpath string to select the submit element from the form
Return type:

Form

Raises:

FormNotFound if no form is found

handle_refresh()
on_load()
class weboob.browser.pages.JsonPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

Json Page.

build_doc(text)
data
get(path)
path(path, context=None)
class weboob.browser.pages.LoggedPage

Bases: object

A page that only logged users can reach. If we did not get a redirection for this page, we are sure that the login is still active.

Do not use this class for page with mixed content (logged/anonymous) or for pages with a login form.

logged = True
exception weboob.browser.pages.NextPage(request)

Bases: exceptions.Exception

Exception used for example in a Page to tell PagesBrowser.pagination to go on the next page.

See PagesBrowser.pagination() or decorator pagination().

class weboob.browser.pages.PDFPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

Parse a PDF and write raw data in the “doc” attribute as a string.

build_doc(content)
class weboob.browser.pages.Page(browser, response, params=None, encoding=None)

Bases: object

Represents a page.

Encoding can be forced by setting the ENCODING class-wide attribute, or by passing an encoding keyword argument, which overrides ENCODING. Finally, it can be manually changed by assigning a new value to encoding instance attribute. A unicode version of the response content is accessible in text, decoded with specified encoding.

Parameters:
  • browser (weboob.browser.browsers.Browser) – browser used to go on the page
  • response (Response) – response object
  • params (dict) – optional dictionary containing parameters given to the page (see weboob.browser.url.URL)
  • encoding (basestring) – optional parameter to force the encoding of the page, overrides ENCODING
ENCODING = None

Force a page encoding. It is recommended to use None for autodetection.

absurl(url)

Get an absolute URL from an a partial URL, relative to the Page URL

build_doc(content)

Abstract method to be implemented by subclasses to build structured data (HTML, Json, CSV...) from data property. It also can be overriden in modules pages to preprocess or postprocess data. It must return an object – that will be assigned to doc.

content

Raw content from response.

data

Data passed to build_doc().

detect_encoding()

Override this method to implement detection of document-level encoding declaration, if any (eg. html5’s <meta charset=”some-charset”>).

encoding
logged = False

If True, the page is in a restricted area of the website. Useful with LoginBrowser and the need_login() decorator.

on_leave()

Event called when browser leaves this page.

on_load()

Event called when browser loads this page.

text

Content of the response, in unicode, decoded with encoding.

class weboob.browser.pages.PartialHTMLPage(*args, **kwargs)

Bases: weboob.browser.pages.HTMLPage

HTML page for broken pages with multiple roots.

This class should typically be used for requests which return only a part of a full document, to insert in another document. Such a sub-document can have multiple root tags, so this class is required in this case.

build_doc(content)
class weboob.browser.pages.RawPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

Raw page where the “doc” attribute is the content string.

build_doc(content)
class weboob.browser.pages.XLSPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

XLS Page.

HEADER = None

If not None, will consider the line represented by this index as a header.

SHEET_INDEX = 0

Specify the index of the worksheet to use.

build_doc(content)
parse(data)

Method called by the constructor of XLSPage to parse the document.

class weboob.browser.pages.XMLPage(browser, response, params=None, encoding=None)

Bases: weboob.browser.pages.Page

XML Page.

build_doc(content)
detect_encoding()
weboob.browser.pages.pagination(func)

This helper decorator can be used to handle pagination pages easily.

When the called function raises an exception NextPage, it goes on the wanted page and recall the function.

NextPage constructor can take an url or a Request object.

>>> class Page(HTMLPage):
...     @pagination
...     def iter_values(self):
...         for el in self.doc.xpath('//li'):
...             yield el.text
...         for next in self.doc.xpath('//a'):
...             raise NextPage(next.attrib['href'])
...
>>> from .browsers import PagesBrowser
>>> from .url import URL
>>> class Browser(PagesBrowser):
...     BASEURL = 'https://people.symlink.me'
...     list = URL('/~rom1/projects/weboob/list-(?P<pagenum>\d+).html', Page)
...
>>> b = Browser()
>>> b.list.go(pagenum=1) 
<weboob.browser.pages.Page object at 0x...>
>>> list(b.page.iter_values())
['One', 'Two', 'Three', 'Four']