weboob.tools.regex_helper

Functions for reversing a regular expression (used in reverse URL resolving). Used internally by Django and not intended for external use.

This is not, and is not intended to be, a complete reg-exp decompiler. It should be good enough for a large class of URLS, however.

class weboob.tools.regex_helper.Choice

Bases: list

Used to represent multiple possibilities at this point in a pattern string. We use a distinguished type, rather than a list, so that the usage in the code is clear.

class weboob.tools.regex_helper.Group

Bases: list

Used to represent a capturing group in the pattern string.

class weboob.tools.regex_helper.NonCapture

Bases: list

Used to represent a non-capturing group in the pattern string.

weboob.tools.regex_helper.contains(source, inst)

Returns True if the “source” contains an instance of “inst”. False, otherwise.

weboob.tools.regex_helper.flatten_result(source)

Turns the given source sequence into a list of reg-exp possibilities and their arguments. Returns a list of strings and a list of argument lists. Each of the two lists will be of the same length.

weboob.tools.regex_helper.get_quantifier(ch, input_iter)

Parse a quantifier from the input, where “ch” is the first character in the quantifier.

Returns the minimum number of occurences permitted by the quantifier and either None or the next character from the input_iter if the next character is not part of the quantifier.

weboob.tools.regex_helper.next_char(input_iter)

An iterator that yields the next character from “pattern_iter”, respecting escape sequences. An escaped character is replaced by a representative of its class (e.g. w -> “x”). If the escaped character is one that is skipped, it is not returned (the next character is returned instead).

Yields the next character, along with a boolean indicating whether it is a raw (unescaped) character or not.

weboob.tools.regex_helper.normalize(pattern)

Given a reg-exp pattern, normalizes it to a list of forms that suffice for reverse matching. This does the following:

  1. For any repeating sections, keeps the minimum number of occurrences permitted (this means zero for optional groups).
  2. If an optional group includes parameters, include one occurrence of that group (along with the zero occurrence case from step (1)).
  3. Select the first (essentially an arbitrary) element from any character class. Select an arbitrary character for any unordered class (e.g. ‘.’ or ‘w’) in the pattern.
  4. Ignore comments and any of the reg-exp flags that won’t change what we construct (“iLmsu”). “(?x)” is an error, however.
  5. Raise an error on all other non-capturing (?...) forms (e.g. look-ahead and look-behind matches) and any disjunctive (‘|’) constructs.

Django’s URLs for forward resolving are either all positional arguments or all keyword arguments. That is assumed here, as well. Although reverse resolving can be done using positional args when keyword args are specified, the two cannot be mixed in the same reverse() call.

weboob.tools.regex_helper.walk_to_end(ch, input_iter)

The iterator is currently inside a capturing group. We want to walk to the close of this group, skipping over any nested groups and handling escaped parentheses correctly.

Previous topic

weboob.tools.property

Next topic

weboob.tools.storage