Class: Elasticrawl::CrawlSegment

Inherits:
ActiveRecord::Base
  • Object
show all
Defined in:
lib/elasticrawl/crawl_segment.rb

Overview

Represents a segment of a web crawl released by the Common Crawl Foundation. Each segment contains archive, metadata and text files.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.create_segment(crawl, segment_name, file_count) ⇒ Object

Creates a crawl segment based on its S3 path if it does not exist.



14
15
16
17
18
19
20
21
# File 'lib/elasticrawl/crawl_segment.rb', line 14

def self.create_segment(crawl, segment_name, file_count)
  s3_uri = build_s3_uri(crawl.crawl_name, segment_name)

  segment = CrawlSegment.where(:crawl_id => crawl.id,
                              :segment_name => segment_name,
                              :segment_s3_uri => s3_uri,
                              :file_count => file_count).first_or_create
end

Instance Method Details

#segment_descObject

Description shows name and number of files in the segment.



9
10
11
# File 'lib/elasticrawl/crawl_segment.rb', line 9

def segment_desc
  "Segment: #{segment_name} Files: #{file_count}"
end