Class: PublicSuffix::List

Inherits:

Object

Object
PublicSuffix::List

Includes:: Enumerable

Defined in:: lib/public_suffix/list.rb

Overview

A List is a collection of one or more Rule.

Given a List, you can add or remove Rule, iterate all items in the list or search for the first rule which matches a specific domain name.

# Create a new list
list =  PublicSuffix::List.new

# Push two rules to the list
list << PublicSuffix::Rule.factory("it")
list << PublicSuffix::Rule.factory("com")

# Get the size of the list
list.size
# => 2

# Search for the rule matching given domain
list.find("example.com")
# => #<PublicSuffix::Rule::Normal>
list.find("example.org")
# => nil

You can create as many List you want. The List.default rule list is used to tokenize and validate a domain.

List implements Enumerable module.

Constant Summary collapse

DEFAULT_LIST_PATH =

File.join(File.dirname(__FILE__), "..", "..", "data", "list.txt")

Instance Attribute Summary collapse

#rules ⇒ Array<PublicSuffix::Rule::*> readonly

Gets the array of rules.

Class Method Summary collapse

.clear ⇒ self

Sets the default rule list to nil.
.default(**options) ⇒ PublicSuffix::List

Gets the default rule list.
.default=(value) ⇒ PublicSuffix::List

Sets the default rule list to value.
.parse(input, private_domains: true) ⇒ Array<PublicSuffix::Rule::*>

Parse given input treating the content as Public Suffix List.

Instance Method Summary collapse

#==(other) ⇒ Boolean (also: #eql?)

Checks whether two lists are equal.
#add(rule, reindex: true) ⇒ self (also: #<<)

Adds the given object to the list and optionally refreshes the rule index.
#clear ⇒ self

Removes all elements.
#default_rule ⇒ PublicSuffix::Rule::*

Gets the default rule.
#each(*args, &block) ⇒ Object

Iterates each rule in the list.
#empty? ⇒ Boolean

Checks whether the list is empty.
#find(name, default: default_rule, **options) ⇒ PublicSuffix::Rule::*

Finds and returns the most appropriate rule for the domain name.
#indexes ⇒ Object

Gets the naive index, a hash that with the keys being the first label of every rule pointing to an array of integers (indexes of the rules in @rules).
#initialize {|self| ... } ⇒ List constructor

Initializes an empty List.
#reindex! ⇒ Object

Creates a naive index for @rules.
#select(name, ignore_private: false) ⇒ Array<PublicSuffix::Rule::*>

Selects all the rules matching given domain.
#size ⇒ Integer

Gets the number of elements in the list.

Constructor Details

#initialize {|self| ... } ⇒ `List`

Initializes an empty PublicSuffix::List.

Yields:

(self) —

Yields on self.

Yield Parameters:

self (PublicSuffix::List) —

The newly created instance.

# File 'lib/public_suffix/list.rb', line 126

def initialize
  @rules = []
  yield(self) if block_given?
  reindex!
end

Instance Attribute Details

#rules ⇒ `Array<PublicSuffix::Rule::*>` (readonly)

Gets the array of rules.

Returns:

(Array<PublicSuffix::Rule::*>)



118
119
120

# File 'lib/public_suffix/list.rb', line 118

def rules
  @rules
end

Class Method Details

.clear ⇒ `self`

Sets the default rule list to nil.

Returns:

(self)

# File 'lib/public_suffix/list.rb', line 68

def self.clear
  self.default = nil
  self
end

.default(**options) ⇒ `PublicSuffix::List`

Gets the default rule list.

Initializes a new PublicSuffix::List parsing the content of default_list_content, if required.

Returns:

(PublicSuffix::List)



51
52
53

# File 'lib/public_suffix/list.rb', line 51

def self.default(**options)
  @default ||= parse(File.read(DEFAULT_LIST_PATH), options)
end

.default=(value) ⇒ `PublicSuffix::List`

Sets the default rule list to value.

Parameters:

value (PublicSuffix::List) —

The new rule list.

Returns:

(PublicSuffix::List)



61
62
63

# File 'lib/public_suffix/list.rb', line 61

def self.default=(value)
  @default = value
end

.parse(input, private_domains: true) ⇒ `Array<PublicSuffix::Rule::*>`

Parse given input treating the content as Public Suffix List.

See publicsuffix.org/format/ for more details about input format.

Parameters:

string (#each_line) —

The list to parse.
private_domain (Boolean) —

whether to ignore the private domains section.

Returns:

(Array<PublicSuffix::Rule::*>)

# File 'lib/public_suffix/list.rb', line 82

def self.parse(input, private_domains: true)
  comment_token = "//".freeze
  private_token = "===BEGIN PRIVATE DOMAINS===".freeze
  section = nil # 1 == ICANN, 2 == PRIVATE

  new do |list|
    input.each_line do |line|
      line.strip!
      case # rubocop:disable Style/EmptyCaseCondition

      # skip blank lines
      when line.empty?
        next

      # include private domains or stop scanner
      when line.include?(private_token)
        break if !private_domains
        section = 2

      # skip comments
      when line.start_with?(comment_token)
        next

      else
        list.add(Rule.factory(line, private: section == 2), reindex: false)

      end
    end
  end
end

Instance Method Details

#==(other) ⇒ `Boolean` Also known as: eql?

Checks whether two lists are equal.

List one is equal to two, if two is an instance of PublicSuffix::List and each PublicSuffix::Rule::* in list one is available in list two, in the same order.

Parameters:

other (PublicSuffix::List) —

The List to compare.

Returns:

(Boolean)

# File 'lib/public_suffix/list.rb', line 166

def ==(other)
  return false unless other.is_a?(List)
  equal?(other) || rules == other.rules
end

#add(rule, reindex: true) ⇒ `self` Also known as: <<

Adds the given object to the list and optionally refreshes the rule index.

Parameters:

rule (PublicSuffix::Rule::*) —

The rule to add to the list.
reindex (Boolean) (defaults to: true) —

Set to true to recreate the rule index after the rule has been added to the list.

Returns:

(self)

#clear ⇒ `self`

Removes all elements.

Returns:

(self)

# File 'lib/public_suffix/list.rb', line 214

def clear
  @rules.clear
  reindex!
  self
end

#default_rule ⇒ `PublicSuffix::Rule::*`

Gets the default rule.

Returns:

(PublicSuffix::Rule::*)

#each(*args, &block) ⇒ `Object`

Iterates each rule in the list.



173
174
175

# File 'lib/public_suffix/list.rb', line 173

def each(*args, &block)
  @rules.each(*args, &block)
end

#empty? ⇒ `Boolean`

Checks whether the list is empty.

Returns:

(Boolean)



207
208
209

# File 'lib/public_suffix/list.rb', line 207

def empty?
  @rules.empty?
end

#find(name, default: default_rule, **options) ⇒ `PublicSuffix::Rule::*`

Finds and returns the most appropriate rule for the domain name.

From the Public Suffix List documentation:

If a hostname matches more than one rule in the file, the longest matching rule (the one with the most levels) will be used.
An exclamation mark (!) at the start of a rule marks an exception to a previous wildcard rule. An exception rule takes priority over any other matching rule.

## Algorithm description

Match domain against all rules and take note of the matching ones.
If no rules match, the prevailing rule is “*”.
If more than one rule matches, the prevailing rule is the one which is an exception rule.
If there is no matching exception rule, the prevailing rule is the one with the most labels.
If the prevailing rule is a exception rule, modify it by removing the leftmost label.
The public suffix is the set of labels from the domain which directly match the labels of the prevailing rule (joined by dots).
The registered domain is the public suffix plus one additional label.

Parameters:

name (String, #to_s) —

The domain name.
default (PublicSuffix::Rule::*) (defaults to: default_rule) —

The default rule to return in case no rule matches.

Returns:

(PublicSuffix::Rule::*)

# File 'lib/public_suffix/list.rb', line 243

def find(name, default: default_rule, **options)
  rule = select(name, **options).inject do |l, r|
    return r if r.class == Rule::Exception
    l.length > r.length ? l : r
  end
  rule || default
end

#indexes ⇒ `Object`

Gets the naive index, a hash that with the keys being the first label of every rule pointing to an array of integers (indexes of the rules in @rules).



151
152
153

# File 'lib/public_suffix/list.rb', line 151

def indexes
  @indexes.dup
end

#reindex! ⇒ `Object`

Creates a naive index for @rules. Just a hash that will tell us where the elements of @rules are relative to its first Rule::Base#labels element.

For instance if @rules and @rules are the only elements of the list where Rule#labels.first is ‘us’ @indexes #=> [5,4], that way in select we can avoid mapping every single rule against the candidate domain.

# File 'lib/public_suffix/list.rb', line 140

def reindex!
  @indexes = {}
  @rules.each_with_index do |rule, index|
    tld = Domain.name_to_labels(rule.value).last
    @indexes[tld] ||= []
    @indexes[tld] << index
  end
end

#select(name, ignore_private: false) ⇒ `Array<PublicSuffix::Rule::*>`

Selects all the rules matching given domain.

Internally, the lookup heavily rely on the ‘@indexes`. The input is split into labels, and we retriever from the index only the rules that end with the input label. After that, a sequential scan is performed. In most cases, where the number of rules for the same label is limited, this algorithm is efficient enough.

If ‘ignore_private` is set to true, the algorithm will skip the rules that are flagged as private domain. Note that the rules will still be part of the loop. If you frequently need to access lists ignoring the private domains, you should create a list that doesn’t include these domains setting the ‘private_domains: false` option when calling parse.

Parameters:

name (String, #to_s) —

The domain name.
ignore_private (Boolean) (defaults to: false)

Returns:

(Array<PublicSuffix::Rule::*>)

# File 'lib/public_suffix/list.rb', line 266

def select(name, ignore_private: false)
  name = name.to_s
  indices = (@indexes[Domain.name_to_labels(name).last] || [])

  finder = @rules.values_at(*indices).lazy
  finder = finder.select { |rule| rule.match?(name) }
  finder = finder.select { |rule| !rule.private } if ignore_private
  finder.to_a
end

#size ⇒ `Integer`

Gets the number of elements in the list.

Returns:

(Integer)



200
201
202

# File 'lib/public_suffix/list.rb', line 200

def size
  @rules.size
end

Class: PublicSuffix::List

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize {|self| ... } ⇒ List

Instance Attribute Details

#rules ⇒ Array<PublicSuffix::Rule::*> (readonly)

Class Method Details

.clear ⇒ self

.default(**options) ⇒ PublicSuffix::List

.default=(value) ⇒ PublicSuffix::List

.parse(input, private_domains: true) ⇒ Array<PublicSuffix::Rule::*>

Instance Method Details

#==(other) ⇒ Boolean Also known as: eql?

#add(rule, reindex: true) ⇒ self Also known as: <<

#clear ⇒ self

#default_rule ⇒ PublicSuffix::Rule::*

#each(*args, &block) ⇒ Object

#empty? ⇒ Boolean

#find(name, default: default_rule, **options) ⇒ PublicSuffix::Rule::*

#indexes ⇒ Object

#reindex! ⇒ Object

#select(name, ignore_private: false) ⇒ Array<PublicSuffix::Rule::*>

#size ⇒ Integer

#initialize {|self| ... } ⇒ `List`

#rules ⇒ `Array<PublicSuffix::Rule::*>` (readonly)

.clear ⇒ `self`

.default(**options) ⇒ `PublicSuffix::List`

.default=(value) ⇒ `PublicSuffix::List`

.parse(input, private_domains: true) ⇒ `Array<PublicSuffix::Rule::*>`

#==(other) ⇒ `Boolean` Also known as: eql?

#add(rule, reindex: true) ⇒ `self` Also known as: <<

#clear ⇒ `self`

#default_rule ⇒ `PublicSuffix::Rule::*`

#each(*args, &block) ⇒ `Object`

#empty? ⇒ `Boolean`

#find(name, default: default_rule, **options) ⇒ `PublicSuffix::Rule::*`

#indexes ⇒ `Object`

#reindex! ⇒ `Object`

#select(name, ignore_private: false) ⇒ `Array<PublicSuffix::Rule::*>`

#size ⇒ `Integer`