Module: SimpleCrawler
- Defined in:
- lib/simplecrawler.rb,
lib/document.rb
Overview
Simple Crawler
:title: SimpleCrawler - a generic web crawler library in Ruby
- Author
-
Peter Krantz (www.peterkrantz.com)
- License
-
LGPL (See LICENSE file)
The SimpleCrawler module is a library for crawling web sites. The crawler provides comprehensive data from the page crawled which can be used for page analysis, indexing, accessibility checks etc. Restrictions can be specified to limit crawling of binary files.
Output
The SimpleCrawler::Crawler class yields a SimpleCrawler::Document object instance. This object contains information about a specific URI such as http headers and response data etc.
Contributions
None yet :-) Why don’t you go ahead and be first?
Example usage
See the “Simple Crawler wiki“.
Defined Under Namespace
Constant Summary collapse
- MARKUP_MIME_TYPES =
["text/html", "text/xml", "application/xml", "application/xhtml+xml"]
- VERSION =
"0.1.8"