mincer package

Submodules

mincer.utils module

exception mincer.utils.MultipleMatchError[source]

Raised by extract_content_from_html() and extract_node_from_html() when multiple div are found.

exception mincer.utils.NoMatchError[source]

Raised by extract_content_from_html() and extract_node_from_html() when no div are found.

mincer.utils.extract_content_from_html(selector, expected_content, html)[source]

Extract the content of an HTML node from a HTML document according to a JQuery selector and a string mattching that content.

Paramètres:
  • selector (str) – a JQuery selector query that define how we select the desired div in the document.
  • expected_content (str) – a string that must be present in the selected node.
  • html (str) – a string containing an HTML document..
Retourne:

the selected content encapsuled in a div. There could be only one top-level div in the string returned. It may seem strange to return exactly the expected_content param encapsuled in a div but this ensure an interface similar to extract_node_from_html().

Type retourné:

str

Raises:
  • MultipleMatchError – Multiple div matched the selector query and the expected string in the document.
  • NoMatchError – No div matched the selector query in the document.

Examples

>>> PAGE = '<!DOCTYPE html><html><div id="hop">hip</div></html>'
>>> extract_content_from_html("#hop", 'hip', PAGE)
'<div>hip</div>'
>>> extract_content_from_html("#hop", 'popopo', PAGE) 
Traceback (most recent call last):
NoMatchError
mincer.utils.extract_node_from_html(selector, html)[source]

Extract one div from a html document according to a JQuery selector.

Paramètres:
  • selector (str) – a JQuery selector query that define how we select the desired div in the document.
  • html (str) – a string containing an HTML document.
Retourne:

the selected div. There could be only one top-level div in the

string returned.

Type retourné:

str

Raises:

Examples

>>> PAGE = '<!DOCTYPE html><html><div id="hop">hip</div></html>'
>>> extract_node_from_html("#hop", PAGE)
'<div id="hop">hip</div>'
mincer.utils.once()[source]

Return True if one and only one element of the sequence is True.

Examples

True if only one element is True...

>>> once([True, False, False])
True

...whatever the position...

>>> once([False, True, False])
True

...but false in all other case:

>>> once([True, True, True])
False
>>> once([False, False, False])
False
>>> once([True, True, False])
False

Module contents

Mincer: stuff your websites with the best ingredients

Mincer is a web server used to extract results from one web service by parsing its html result page.

Mincer is developed for the BULAC library in Paris.

mincer.home()[source]

Provide a home page of the server.

mincer.koha_book_list(booklist_id)[source]

Retrieve a book list from the KOHA server of the BULAC.

Query int booklist_id:
 numeric id of the book list for KOHA
Status 200:everything was ok
Status 404:when no booklist_id is provided

Retrieve a search result list from the KOHA server of the BULAC.

Query string search_query:
 the terms to search already url encoded (meaning space and special char are replaced see urllib for reference)
Status 200:everything was ok
Status 404:when no search_query is provided
mincer.status()[source]

Provide a status page showing if all the adaptation works.