csteamer (content steamer)
CSteamer is RIDICULOUSLY EARLY IN ITS DEVELOPMENT. IT'S LESS THAN 24 HOURS OLD AND NOT EVEN VAGUELY DONE!! The demo below will certainly work though ;-)
CSteamer extracts metadata and machine-usable data from otherwise unstructured HTML documents.
For example, if you have a blog post HTML file, CSteamer should, in theory, be able to extract the title, the actual “content”, images relating to the content, look up Delicious tags, and analyze for keywords.
Note on Patches/Pull Requests
Fork the project.
Make your feature addition or bug fix.
Add tests for it. This is important so I don't break it in a future version unintentionally.
Commit, do not mess with rakefile, version, or history.
Send me a pull request.
Copyright © 2010 Peter Cooper. See LICENSE for details.