Module: Infoboxer::Navigation

Defined in:
lib/infoboxer/navigation.rb,
lib/infoboxer/navigation/lookup.rb,
lib/infoboxer/navigation/sections.rb,
lib/infoboxer/navigation/selector.rb,
lib/infoboxer/navigation/wikipath.rb,
lib/infoboxer/navigation/shortcuts.rb

Overview

Navigation is one of the things Infoboxer is proud about. It tries to be logical, unobtrusive and compact.

There's several levels of navigation:

  • simple tree navigation;
  • navigational shortcuts;
  • logical structure navigation (sections).

Simple tree navigation

It's somewhat similar to XPath/CSS selectors you'll use to navigate through HTML DOM. It is represented (and documented) in Lookup::Node module. To show you the taste of it:

document.
 lookup(:Wikilink, text: /Chile/).
 lookup_parents(:Table){|t| t.params[:class] == 'wikitable'}.
 lookup_children(size: 3)

There is nothing too complicated, just pretty shortcuts over lookup_* methods, so, you can write just

document.paragraphs.last.wikilinks('Category')

...instead of

document.lookup(:Paragraph).last.lookup(:Wikilink, namespace: 'Category')

...and so on.

Look into Shortcuts::Node documentation for list of shortcuts.

Wikipath

WikiPath is XPath-alike query language you can use to navigate the tree:

document.wikipath('//paragraph//wikilink[namespace=Category]')

It can look more or less verbose than pure-ruby navigation, but the big advantage of WikiPath is it is pure data: you can store some paths in YAML file, for example.

Look at #wikipath method docs for full reference.

Logical structure navigation

MediaWiki page structure is flat, like HTML's (there's just sequence of headings and paragraphs). Though, for most tasks of information extraction it is usefult to think of page as a structure of nested sections. Sections module provides such ability. It treats document as an intro and set of subsequent sections of same level, which, in turn, have inside they own intro and sections. Also, each node has #in_sections method, returning all sections in which it is nested.

The code with sections can feel like this:

page.sections('Culture' => 'Music').tables
# or like this
page.wikilinks.select{|link| link.in_sections.first.heading.text.include?('Culture')}

See Sections::Container for downwards section navigation, and Sections::Node for upwards.

Defined Under Namespace

Modules: Helpers, Lookup, Sections, Shortcuts, Wikipath