weboob.browser.filters.standard

exception weboob.browser.filters.standard.FilterError

Bases: weboob.exceptions.ParseError

exception weboob.browser.filters.standard.ColumnNotFound

Bases: weboob.browser.filters.base.FilterError

exception weboob.browser.filters.standard.RegexpError

Bases: weboob.browser.filters.base.FilterError

exception weboob.browser.filters.standard.ItemNotFound

Bases: weboob.browser.filters.base.FilterError

class weboob.browser.filters.standard.Filter(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base._Filter

Class used to filter on a HTML element given as call parameter to return matching elements.

Filters can be chained, so the parameter supplied to constructor can be either a xpath selector string, or an other filter called before.

>>> from lxml.html import etree
>>> f = CleanDecimal(CleanText('//p'), replace_dots=True)
>>> f(etree.fromstring('<html><body><p>blah: <span>229,90</span></p></body></html>'))
Decimal('229.90')
Parameters:default – default value in case the filter fails to find or parse the requested value
filter(value)

This method has to be overridden by children classes.

select(selector, item)
class weboob.browser.filters.standard.Base(base, selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Change the base element used in filters.

>>> Base(Env('header'), CleanText('./h1'))  
class weboob.browser.filters.standard.Env(name, default=NO_DEFAULT)

Bases: weboob.browser.filters.base._Filter

Filter to get environment value of the item.

It is used for example to get page parameters, or when there is a parse() method on ItemElement.

class weboob.browser.filters.standard.TableCell(*names, **kwargs)

Bases: weboob.browser.filters.base._Filter

Used with TableElement, gets the cell element from its name.

For example:

>>> from weboob.capabilities.bank import Transaction
>>> from weboob.browser.elements import TableElement, ItemElement
>>> class table(TableElement):
...     head_xpath = '//table/thead/th'
...     item_xpath = '//table/tbody/tr'
...     col_date =    u'Date'
...     col_label =   [u'Name', u'Label']
...     class item(ItemElement):
...         klass = Transaction
...         obj_date = Date(TableCell('date'))
...         obj_label = CleanText(TableCell('label'))
...
class weboob.browser.filters.standard.RawText(selector=None, children=False, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Get raw text from an element.

Unlike CleanText, whitespace is kept as is.

Parameters:children (bool) – whether to get text from children elements of the select elements
filter(value)
class weboob.browser.filters.standard.CleanText(selector=None, symbols='', replace=[], children=True, newlines=True, normalize='NFC', **kwargs)

Bases: weboob.browser.filters.base.Filter

Get a cleaned text from an element.

It first replaces all tabs and multiple spaces (including newlines if newlines is True) to one space and strips the result string.

The result is coerced into unicode, and optionally normalized according to the normalize argument.

Then it replaces all symbols given in the symbols argument.

>>> CleanText().filter('coucou ')
u'coucou'
>>> CleanText().filter(u'coucou coucou')
u'coucou coucou'
>>> CleanText(newlines=True).filter(u'coucou\r\n coucou ')
u'coucou coucou'
>>> CleanText(newlines=False).filter(u'coucou\r\n coucou ')
u'coucou\ncoucou'
Parameters:
  • symbols (list) – list of strings to remove from text
  • replace (list[tuple[str, str]]) – optional pairs of text replacements to perform
  • children (bool) – whether to get text from children elements of the select elements
  • newlines (bool) – if True, newlines will be converted to space too
  • normalize (str or None) – Unicode normalization to perform
classmethod clean(txt, children=True, newlines=True, normalize='NFC')
filter(value)
classmethod remove(txt, symbols)
classmethod replace(txt, replace)
class weboob.browser.filters.standard.Lower(selector=None, symbols='', replace=[], children=True, newlines=True, normalize='NFC', **kwargs)

Bases: weboob.browser.filters.standard.CleanText

Extract text with CleanText and convert to lower-case.

Parameters:
  • symbols (list) – list of strings to remove from text
  • replace (list[tuple[str, str]]) – optional pairs of text replacements to perform
  • children (bool) – whether to get text from children elements of the select elements
  • newlines (bool) – if True, newlines will be converted to space too
  • normalize (str or None) – Unicode normalization to perform
filter(value)
class weboob.browser.filters.standard.Upper(selector=None, symbols='', replace=[], children=True, newlines=True, normalize='NFC', **kwargs)

Bases: weboob.browser.filters.standard.CleanText

Extract text with CleanText and convert to upper-case.

Parameters:
  • symbols (list) – list of strings to remove from text
  • replace (list[tuple[str, str]]) – optional pairs of text replacements to perform
  • children (bool) – whether to get text from children elements of the select elements
  • newlines (bool) – if True, newlines will be converted to space too
  • normalize (str or None) – Unicode normalization to perform
filter(value)
class weboob.browser.filters.standard.Capitalize(selector=None, symbols='', replace=[], children=True, newlines=True, normalize='NFC', **kwargs)

Bases: weboob.browser.filters.standard.CleanText

Extract text with CleanText and capitalize it.

Parameters:
  • symbols (list) – list of strings to remove from text
  • replace (list[tuple[str, str]]) – optional pairs of text replacements to perform
  • children (bool) – whether to get text from children elements of the select elements
  • newlines (bool) – if True, newlines will be converted to space too
  • normalize (str or None) – Unicode normalization to perform
filter(value)
class weboob.browser.filters.standard.CleanDecimal(selector=None, replace_dots=False, sign=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.standard.CleanText

Get a cleaned Decimal value from an element.

replace_dots is False by default. A dot is interpreted as a decimal separator.

If replace_dots is set to True, we remove all the dots. The ‘,’ is used as decimal separator (often useful for French values)

If replace_dots is a tuple, the first element will be used as the thousands separator, and the second as the decimal separator.

See http://en.wikipedia.org/wiki/Thousands_separator#Examples_of_use

For example, for the UK style (as in 1,234,567.89):

>>> CleanDecimal('./td[1]', replace_dots=(',', '.'))  
Parameters:sign – function accepting the text as param and returning the sign
filter(value)
class weboob.browser.filters.standard.Field(name)

Bases: weboob.browser.filters.base._Filter

Get the attribute of object.

Example:

obj_foo = CleanText('//h1')
obj_bar = Field('foo')

will make “bar” field equal to “foo” field.

class weboob.browser.filters.standard.Regexp(selector=None, pattern=None, template=None, nth=0, flags=0, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Apply a regex.

>>> from lxml.html import etree
>>> doc = etree.fromstring('<html><body><p>Date: <span>13/08/1988</span></p></body></html>')
>>> Regexp(CleanText('//p'), r'Date: (\d+)/(\d+)/(\d+)', '\\3-\\2-\\1')(doc)
u'1988-08-13'
>>> (Regexp(CleanText('//body'), r'(\d+)', nth=1))(doc)
u'08'
>>> (Regexp(CleanText('//body'), r'(\d+)', nth=-1))(doc)
u'1988'
>>> (Regexp(CleanText('//body'), r'(\d+)', template='[\\1]', nth='*'))(doc)
[u'[13]', u'[08]', u'[1988]']
>>> (Regexp(CleanText('//body'), r'Date:.*'))(doc)
u'Date: 13/08/1988'
>>> (Regexp(CleanText('//body'), r'^(?!Date:).*', default=None))(doc)
>>>
expand(m)
filter(value)
Raises:RegexpError if pattern was not found
class weboob.browser.filters.standard.Map(selector, map_dict, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Map selected value to another value using a dict.

Example:

TYPES = {
    'Concert': CATEGORIES.CONCERT,
    'Cinéma': CATEGORIES.CINE,
}

obj_type = Map(CleanText('./li'), TYPES)
Parameters:selector – key from map_dict to use
filter(value)
Raises:ItemNotFound if key does not exist in dict
class weboob.browser.filters.standard.DateTime(selector=None, default=NO_DEFAULT, dayfirst=False, translations=None, parse_func=<function parse>, fuzzy=False)

Bases: weboob.browser.filters.base.Filter

Parse date and time.

Parameters:
  • dayfirst (bool) – if True, the day is be the first element in the string to parse
  • parse_func – the function to use for parsing the datetime
  • translations (list[tuple[str, str]]) – string replacements from site locale to English
filter(value)
class weboob.browser.filters.standard.Date(selector=None, default=NO_DEFAULT, dayfirst=False, translations=None, parse_func=<function parse>, fuzzy=False)

Bases: weboob.browser.filters.standard.DateTime

Parse date.

filter(value)
class weboob.browser.filters.standard.Time(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Parse time.

filter(value)
klass

alias of time

kwargs = {'second': 'ss', 'minute': 'mm', 'hour': 'hh'}
class weboob.browser.filters.standard.DateGuesser(selector, date_guesser, **kwargs)

Bases: weboob.browser.filters.base.Filter

class weboob.browser.filters.standard.Duration(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.standard.Time

Parse a duration as timedelta.

klass

alias of timedelta

kwargs = {'hours': 'hh', 'seconds': 'ss', 'minutes': 'mm'}
class weboob.browser.filters.standard.MultiFilter(*args, **kwargs)

Bases: weboob.browser.filters.base.Filter

filter(values)
class weboob.browser.filters.standard.CombineDate(date, time)

Bases: weboob.browser.filters.standard.MultiFilter

Combine separate Date and Time filters into a single datetime.

filter(value)
class weboob.browser.filters.standard.Format(fmt, *args)

Bases: weboob.browser.filters.standard.MultiFilter

Combine multiple filters with string-format.

Example:

obj_title = Format('%s (%s)', CleanText('//h1'), CleanText('//h2'))

will concatenate the text from all <h1> and all <h2> (but put the latter between parentheses).

Parameters:
  • fmt (str) – string format suitable for “%”-formatting
  • args – other filters to insert in fmt string. There should be as many args as there are “%” in fmt.
filter(value)
class weboob.browser.filters.standard.Join(pattern, selector=None, textCleaner=<class 'weboob.browser.filters.standard.CleanText'>, newline=False, addBefore='', addAfter='')

Bases: weboob.browser.filters.base.Filter

filter(value)
class weboob.browser.filters.standard.Type(selector=None, type=None, minlen=0, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Get a cleaned value of any type from an element text. The type_func can be any callable (class, function, etc.). By default an empty string will not be parsed but it can be changed by specifying minlen=False. Otherwise, a minimal length can be specified.

>>> Type(CleanText('./td[1]'), type=int)  
>>> Type(type=int).filter(42)
42
>>> Type(type=int).filter('42')
42
>>> Type(type=int, default='NaN').filter('')
'NaN'
>>> Type(type=list, minlen=False, default=list('ab')).filter('')
[]
>>> Type(type=list, minlen=0, default=list('ab')).filter('')
['a', 'b']
filter(value)
class weboob.browser.filters.standard.Eval(func, *args)

Bases: weboob.browser.filters.standard.MultiFilter

Evaluate a function with given ‘deferred’ arguments.

>>> F = Field; Eval(lambda a, b, c: a * b + c, F('foo'), F('bar'), F('baz')) 
>>> Eval(lambda x, y: x * y + 1).filter([3, 7])
22

Example:

obj_ratio = Eval(lambda x: x / 100, Env('percentage'))
Parameters:func – function to apply to all filters. The function should accept as many args as there are filters passed to Eval.
filter(value)
class weboob.browser.filters.standard.BrowserURL(url_name, **kwargs)

Bases: weboob.browser.filters.standard.MultiFilter

filter(value)
class weboob.browser.filters.standard.Async(name, selector=None)

Bases: weboob.browser.filters.base.Filter

Selector that uses another page fetched earlier.

Often used in combination with AsyncLoad filter. Requires that the other page’s URL is matched with a Page by the Browser.

Example:

class item(ItemElement):
    load_details = Field('url') & AsyncLoad

    obj_description = Async('details') & CleanText('//h3')
filter(*args)
loaded_page(item)
class weboob.browser.filters.standard.AsyncLoad(selector=None, default=NO_DEFAULT)

Bases: weboob.browser.filters.base.Filter

Load a page asynchronously for later use.

Often used in combination with Async filter.

Parameters:default – default value in case the filter fails to find or parse the requested value