Class: IncludedInFile

Inherits:
Object
  • Object
show all
Defined in:
lib/spider/included_in_file.rb

Overview

A specialized class using a plain text to track items stored. It supports three operations: new, <<, and include? . Together these can be used to add items to the text file, then determine whether the item has been added.

To use it with Spider use the check_already_seen_with method:

Spider.start_at('http://example.com/') do |s|
  s.check_already_seen_with IncludedInFile.new('/tmp/crawled.log')
end

Instance Method Summary collapse

Constructor Details

#initialize(filepath) ⇒ IncludedInFile

Construct a new IncludedInFile instance.

Parameters:

  • filepath (String)

    as path of file to store crawled URL



15
16
17
18
19
20
# File 'lib/spider/included_in_file.rb', line 15

def initialize(filepath)
  @filepath = filepath
  # create file if not exists
  File.write(@filepath, '') unless File.file?(@filepath)
  @urls = File.readlines(@filepath).map(&:chomp)
end

Instance Method Details

#<<(v) ⇒ Object

Add an item to the file & array of URL.



23
24
25
26
# File 'lib/spider/included_in_file.rb', line 23

def <<(v)
  @urls << v.to_s
  File.write(@filepath, "#{v}\r\n", File.size(@filepath), mode: 'a')
end

#include?(v) ⇒ Boolean

True if the item is in the file.

Returns:

  • (Boolean)


29
30
31
# File 'lib/spider/included_in_file.rb', line 29

def include?(v)
  @urls.include? v.to_s
end