Microformat Parser
MicroformatParser is a Ruby module for creating microformat parsers. A microformat parser is a class with a set of rules for extracting interesting content from (X)HTML documents. You create your own parser by writing a class with a set of rules. The magic happens in the parse method which taks an (X)HTML document or element, runs all the rules on it, and returns new object that holds the extracted valus.
Here’s a simple example to find all links and all tags in a document:
class MyParser
include MicroformatParser
rule :links, "a", "a@href"
rule :tags, "a[rel~=tag]", "text()"
end
content = MyParser.parse(doc)
puts "Found " + content.links.size + " links" if content.links
puts "Tagged with " + content..join(', ') if content.
Documentation
You may want to read the documentation for a more details discussion of selectors, extractors, compound rules, (X)HTML parsing and examples
trac.labnotes.org/cgi-bin/trac.cgi/wiki/Ruby/MicroformatParser
Download
The latest version of can be found at
rubyforge.org/projects/uformatparser/
License
This package is licensed under the MIT license and/or the Creative Commons Attribution-ShareAlike.
:include: MIT-LICENSE