Module: Wgit
- Defined in:
- lib/wgit/web_crawler.rb,
lib/wgit/url.rb,
lib/wgit/utils.rb,
lib/wgit/crawler.rb,
lib/wgit/version.rb,
lib/wgit/document.rb,
lib/wgit/assertable.rb,
lib/wgit/database/model.rb,
lib/wgit/database/database.rb,
lib/wgit/database/mongo_connection_details.rb
Overview
Defined Under Namespace
Modules: Assertable, Model, Utils Classes: Crawler, Database, Document, Url, WebCrawler
Constant Summary collapse
- VERSION =
"0.0.1".freeze
- DB_PROVIDER =
:MongoLabs.freeze
- CONNECTION_DETAILS =
MongoLabs (MongoDB 3.0)
{ :host => "ds037205.mongolab.com", :port => "37205", :db => "crawler", :uname => "rubyapp", :pword => "R5jUKv1fessb", }.freeze
Class Method Summary collapse
-
.crawl_the_web(max_sites_to_crawl = -1,, max_data_size = 1048576000) ⇒ Object
Convience method to crawl the World Wide Web.
Class Method Details
.crawl_the_web(max_sites_to_crawl = -1,, max_data_size = 1048576000) ⇒ Object
Convience method to crawl the World Wide Web. The default value (-1) for max_sites_to_crawl is unrestricted. The default max_data_size is 1GB.
12 13 14 15 16 |
# File 'lib/wgit/web_crawler.rb', line 12 def self.crawl_the_web(max_sites_to_crawl = -1, max_data_size = 1048576000) db = Wgit::Database.new web_crawler = Wgit::WebCrawler.new(db, max_sites_to_crawl, max_data_size) web_crawler.crawl_the_web end |