Module: SimpleCrawler

Defined in:
lib/simplecrawler.rb,
lib/document.rb

Overview

Simple Crawler

:title: SimpleCrawler - a generic web crawler library in Ruby

Author

Peter Krantz (www.peterkrantz.com)

License

LGPL (See LICENSE file)

The SimpleCrawler module is a library for crawling web sites. The crawler provides comprehensive data from the page crawled which can be used for page analysis, indexing, accessibility checks etc. Restrictions can be specified to limit crawling of binary files.

Output

The SimpleCrawler::Crawler class yields a SimpleCrawler::Document object instance. This object contains information about a specific URI such as http headers and response data etc.

Contributions

None yet :-) Why don’t you go ahead and be first?

Example usage

See the “Simple Crawler wiki“.

Defined Under Namespace

Classes: Crawler, Document

Constant Summary collapse

MARKUP_MIME_TYPES =
["text/html", "text/xml", "application/xml", "application/xhtml+xml"]
VERSION =
"0.1.8"