Class: Elasticrawl::CrawlSegment
- Inherits:
-
ActiveRecord::Base
- Object
- ActiveRecord::Base
- Elasticrawl::CrawlSegment
- Defined in:
- lib/elasticrawl/crawl_segment.rb
Overview
Represents a segment of a web crawl released by the Common Crawl Foundation. Each segment contains archive, metadata and text files.
Class Method Summary collapse
-
.create_segment(crawl, segment_name, file_count) ⇒ Object
Creates a crawl segment based on its S3 path if it does not exist.
Instance Method Summary collapse
-
#segment_desc ⇒ Object
Description shows name and number of files in the segment.
Class Method Details
.create_segment(crawl, segment_name, file_count) ⇒ Object
Creates a crawl segment based on its S3 path if it does not exist.
14 15 16 17 18 19 20 21 |
# File 'lib/elasticrawl/crawl_segment.rb', line 14 def self.create_segment(crawl, segment_name, file_count) s3_uri = build_s3_uri(crawl.crawl_name, segment_name) segment = CrawlSegment.where(:crawl_id => crawl.id, :segment_name => segment_name, :segment_s3_uri => s3_uri, :file_count => file_count).first_or_create end |
Instance Method Details
#segment_desc ⇒ Object
Description shows name and number of files in the segment.
9 10 11 |
# File 'lib/elasticrawl/crawl_segment.rb', line 9 def segment_desc "Segment: #{segment_name} Files: #{file_count}" end |