Module: Spidr::Sanitizers

Included in:
Agent
Defined in:
lib/spidr_epg/sanitizers.rb

Overview

The Sanitizers module adds methods to Agent which control the sanitation of incoming links.

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#strip_fragmentsObject

Specifies whether the Agent will strip URI fragments



10
11
12
# File 'lib/spidr_epg/sanitizers.rb', line 10

def strip_fragments
  @strip_fragments
end

#strip_queryObject

Specifies whether the Agent will strip URI queries



13
14
15
# File 'lib/spidr_epg/sanitizers.rb', line 13

def strip_query
  @strip_query
end

Instance Method Details

#sanitize_url(url) ⇒ URI::HTTP, URI::HTTPS

Sanitizes a URL based on filtering options.

Parameters:

  • url (URI::HTTP, URI::HTTPS, String)

    The URL to be sanitized

Returns:

  • (URI::HTTP, URI::HTTPS)

    The new sanitized URL.

Since:

  • 0.2.2



26
27
28
29
30
31
32
33
# File 'lib/spidr_epg/sanitizers.rb', line 26

def sanitize_url(url)
  url = URI(url.to_s) unless url.kind_of?(URI)

  url.fragment = nil if @strip_fragments
  url.query    = nil if @strip_query

  return url
end