weboob.browser.filters.html

class weboob.browser.filters.html.CSS(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base._Selector

Select HTML elements with a CSS selector

For example:

obj_foo = CleanText(CSS('div.main'))

will take the text of all <div> having CSS class “main”.

Parameters:default – default value in case the filter fails to find or parse the requested value
select(selector, item)
class weboob.browser.filters.html.XPath(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base._Selector

Select HTML elements with a XPath selector

Parameters:default – default value in case the filter fails to find or parse the requested value
exception weboob.browser.filters.html.XPathNotFound

Bases: weboob.browser.filters.base.ItemNotFound

exception weboob.browser.filters.html.AttributeNotFound

Bases: weboob.browser.filters.base.ItemNotFound

class weboob.browser.filters.html.Attr(selector, attr, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Get the text value of an HTML attribute.

Get value from attribute attr of HTML element matched by selector.

For example:

obj_foo = Attr('//img[@id="thumbnail"]', 'src')

will take the “src” attribute of <img> whose “id” is “thumbnail”.

Parameters:
  • selector – selector targeting the element
  • attr – name of the attribute to take
filter(value)
Raises:XPathNotFound if no element is found
Raises:AttributeNotFound if the element doesn’t have the requested attribute

Bases: weboob.browser.filters.html.Attr

Get the link uri of an element.

If the <a> tag is not found, an exception IndexError is raised.

Bases: weboob.browser.filters.html.Link

Get the absolute link URI of an element.

class weboob.browser.filters.html.CleanHTML(selector=None, options=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Convert HTML to text (Markdown) using html2text.

See also

html2text site

Parameters:options (dict) – options suitable for html2text
classmethod clean(txt, options=None)
filter(value)

This method has to be overridden by children classes.

class weboob.browser.filters.html.FormValue(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Extract a Python value from a form element.

Checkboxes and radio return booleans, while the rest return text. For <select> tags, returns the user-visible text.

Parameters:default – default value in case the filter fails to find or parse the requested value
filter(value)

This method has to be overridden by children classes.

class weboob.browser.filters.html.HasElement(selector, yesvalue=True, novalue=False)

Bases: weboob.browser.filters.base.Filter

Returns yesvalue if the selector finds elements, novalue otherwise.

filter(value)

This method has to be overridden by children classes.

class weboob.browser.filters.html.TableCell(*names, **kwargs)

Bases: weboob.browser.filters.base._Filter

Used with TableElement, gets the cell element from its name.

For example:

>>> from weboob.capabilities.bank import Transaction
>>> from weboob.browser.elements import TableElement, ItemElement
>>> class table(TableElement):
...     head_xpath = '//table/thead/th'
...     item_xpath = '//table/tbody/tr'
...     col_date =    u'Date'
...     col_label =   [u'Name', u'Label']
...     class item(ItemElement):
...         klass = Transaction
...         obj_date = Date(TableCell('date'))
...         obj_label = CleanText(TableCell('label'))
...

The ‘colspan’ variable enables the handling of table tags that have a “colspan” attribute that modify the width of the column: for example <td colspan=”2”> will occupy two columns instead of one, creating a column shift for all the next columns that must be taken in consideration when trying to match columns values with column heads.

call_with_colspan(item)
call_without_colspan(item)
exception weboob.browser.filters.html.ColumnNotFound

Bases: weboob.browser.filters.base.FilterError

class weboob.browser.filters.html.ReplaceEntities(selector=None, symbols='', replace=[], children=True, newlines=True, normalize='NFC', **kwargs)

Bases: weboob.browser.filters.standard.CleanText

Filter to replace HTML entities like “&eacute;” or “&#x42;” with their unicode counterpart.

Parameters:
  • symbols (list) – list of strings to remove from text
  • replace (list[tuple[str, str]]) – optional pairs of text replacements to perform
  • children (bool) – whether to get text from children elements of the select elements
  • newlines (bool) – if True, newlines will be converted to space too
  • normalize (str or None) – Unicode normalization to perform
filter(data)

This method has to be overridden by children classes.