Class: PublicSuffix::List
- Inherits:
-
Object
- Object
- PublicSuffix::List
- Includes:
- Enumerable
- Defined in:
- lib/public_suffix/list.rb
Overview
A List is a collection of one or more Rule.
Given a List, you can add or remove Rule, iterate all items in the list or search for the first rule which matches a specific domain name.
# Create a new list
list = PublicSuffix::List.new
# Push two rules to the list
list << PublicSuffix::Rule.factory("it")
list << PublicSuffix::Rule.factory("com")
# Get the size of the list
list.size
# => 2
# Search for the rule matching given domain
list.find("example.com")
# => #<PublicSuffix::Rule::Normal>
list.find("example.org")
# => nil
You can create as many List you want. The List.default rule list is used to tokenize and validate a domain.
List implements Enumerable
module.
Constant Summary collapse
- DEFAULT_LIST_PATH =
File.join(File.dirname(__FILE__), "..", "..", "data", "list.txt")
Instance Attribute Summary collapse
-
#rules ⇒ Array<PublicSuffix::Rule::*>
readonly
Gets the array of rules.
Class Method Summary collapse
-
.clear ⇒ self
Sets the default rule list to
nil
. -
.default(**options) ⇒ PublicSuffix::List
Gets the default rule list.
-
.default=(value) ⇒ PublicSuffix::List
Sets the default rule list to
value
. -
.parse(input, private_domains: true) ⇒ Array<PublicSuffix::Rule::*>
Parse given
input
treating the content as Public Suffix List.
Instance Method Summary collapse
-
#==(other) ⇒ Boolean
(also: #eql?)
Checks whether two lists are equal.
-
#add(rule, reindex: true) ⇒ self
(also: #<<)
Adds the given object to the list and optionally refreshes the rule index.
-
#clear ⇒ self
Removes all elements.
-
#default_rule ⇒ PublicSuffix::Rule::*
Gets the default rule.
-
#each(*args, &block) ⇒ Object
Iterates each rule in the list.
-
#empty? ⇒ Boolean
Checks whether the list is empty.
-
#find(name, default: default_rule, **options) ⇒ PublicSuffix::Rule::*
Finds and returns the most appropriate rule for the domain name.
-
#indexes ⇒ Object
Gets the naive index, a hash that with the keys being the first label of every rule pointing to an array of integers (indexes of the rules in @rules).
-
#initialize {|self| ... } ⇒ List
constructor
Initializes an empty List.
-
#reindex! ⇒ Object
Creates a naive index for @rules.
-
#select(name, ignore_private: false) ⇒ Array<PublicSuffix::Rule::*>
Selects all the rules matching given domain.
-
#size ⇒ Integer
Gets the number of elements in the list.
Constructor Details
#initialize {|self| ... } ⇒ List
Initializes an empty PublicSuffix::List.
126 127 128 129 130 |
# File 'lib/public_suffix/list.rb', line 126 def initialize @rules = [] yield(self) if block_given? reindex! end |
Instance Attribute Details
#rules ⇒ Array<PublicSuffix::Rule::*> (readonly)
Gets the array of rules.
118 119 120 |
# File 'lib/public_suffix/list.rb', line 118 def rules @rules end |
Class Method Details
.clear ⇒ self
Sets the default rule list to nil
.
68 69 70 71 |
# File 'lib/public_suffix/list.rb', line 68 def self.clear self.default = nil self end |
.default(**options) ⇒ PublicSuffix::List
Gets the default rule list.
Initializes a new PublicSuffix::List parsing the content of default_list_content, if required.
51 52 53 |
# File 'lib/public_suffix/list.rb', line 51 def self.default(**) @default ||= parse(File.read(DEFAULT_LIST_PATH), ) end |
.default=(value) ⇒ PublicSuffix::List
Sets the default rule list to value
.
61 62 63 |
# File 'lib/public_suffix/list.rb', line 61 def self.default=(value) @default = value end |
.parse(input, private_domains: true) ⇒ Array<PublicSuffix::Rule::*>
Parse given input
treating the content as Public Suffix List.
See publicsuffix.org/format/ for more details about input format.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/public_suffix/list.rb', line 82 def self.parse(input, private_domains: true) comment_token = "//".freeze private_token = "===BEGIN PRIVATE DOMAINS===".freeze section = nil # 1 == ICANN, 2 == PRIVATE new do |list| input.each_line do |line| line.strip! case # rubocop:disable Style/EmptyCaseCondition # skip blank lines when line.empty? next # include private domains or stop scanner when line.include?(private_token) break if !private_domains section = 2 # skip comments when line.start_with?(comment_token) next else list.add(Rule.factory(line, private: section == 2), reindex: false) end end end end |
Instance Method Details
#==(other) ⇒ Boolean Also known as: eql?
Checks whether two lists are equal.
List one
is equal to two
, if two
is an instance of PublicSuffix::List and each PublicSuffix::Rule::*
in list one
is available in list two
, in the same order.
166 167 168 169 |
# File 'lib/public_suffix/list.rb', line 166 def ==(other) return false unless other.is_a?(List) equal?(other) || rules == other.rules end |
#add(rule, reindex: true) ⇒ self Also known as: <<
Adds the given object to the list and optionally refreshes the rule index.
190 191 192 193 194 |
# File 'lib/public_suffix/list.rb', line 190 def add(rule, reindex: true) @rules << rule reindex! if reindex self end |
#clear ⇒ self
Removes all elements.
214 215 216 217 218 |
# File 'lib/public_suffix/list.rb', line 214 def clear @rules.clear reindex! self end |
#default_rule ⇒ PublicSuffix::Rule::*
Gets the default rule.
281 282 283 |
# File 'lib/public_suffix/list.rb', line 281 def default_rule PublicSuffix::Rule.default end |
#each(*args, &block) ⇒ Object
Iterates each rule in the list.
173 174 175 |
# File 'lib/public_suffix/list.rb', line 173 def each(*args, &block) @rules.each(*args, &block) end |
#empty? ⇒ Boolean
Checks whether the list is empty.
207 208 209 |
# File 'lib/public_suffix/list.rb', line 207 def empty? @rules.empty? end |
#find(name, default: default_rule, **options) ⇒ PublicSuffix::Rule::*
Finds and returns the most appropriate rule for the domain name.
From the Public Suffix List documentation:
-
If a hostname matches more than one rule in the file, the longest matching rule (the one with the most levels) will be used.
-
An exclamation mark (!) at the start of a rule marks an exception to a previous wildcard rule. An exception rule takes priority over any other matching rule.
## Algorithm description
-
Match domain against all rules and take note of the matching ones.
-
If no rules match, the prevailing rule is “*”.
-
If more than one rule matches, the prevailing rule is the one which is an exception rule.
-
If there is no matching exception rule, the prevailing rule is the one with the most labels.
-
If the prevailing rule is a exception rule, modify it by removing the leftmost label.
-
The public suffix is the set of labels from the domain which directly match the labels of the prevailing rule (joined by dots).
-
The registered domain is the public suffix plus one additional label.
243 244 245 246 247 248 249 |
# File 'lib/public_suffix/list.rb', line 243 def find(name, default: default_rule, **) rule = select(name, **).inject do |l, r| return r if r.class == Rule::Exception l.length > r.length ? l : r end rule || default end |
#indexes ⇒ Object
Gets the naive index, a hash that with the keys being the first label of every rule pointing to an array of integers (indexes of the rules in @rules).
151 152 153 |
# File 'lib/public_suffix/list.rb', line 151 def indexes @indexes.dup end |
#reindex! ⇒ Object
Creates a naive index for @rules. Just a hash that will tell us where the elements of @rules are relative to its first Rule::Base#labels element.
For instance if @rules and @rules are the only elements of the list where Rule#labels.first is ‘us’ @indexes #=> [5,4], that way in select we can avoid mapping every single rule against the candidate domain.
140 141 142 143 144 145 146 147 |
# File 'lib/public_suffix/list.rb', line 140 def reindex! @indexes = {} @rules.each_with_index do |rule, index| tld = Domain.name_to_labels(rule.value).last @indexes[tld] ||= [] @indexes[tld] << index end end |
#select(name, ignore_private: false) ⇒ Array<PublicSuffix::Rule::*>
Selects all the rules matching given domain.
Internally, the lookup heavily rely on the ‘@indexes`. The input is split into labels, and we retriever from the index only the rules that end with the input label. After that, a sequential scan is performed. In most cases, where the number of rules for the same label is limited, this algorithm is efficient enough.
If ‘ignore_private` is set to true, the algorithm will skip the rules that are flagged as private domain. Note that the rules will still be part of the loop. If you frequently need to access lists ignoring the private domains, you should create a list that doesn’t include these domains setting the ‘private_domains: false` option when calling parse.
266 267 268 269 270 271 272 273 274 |
# File 'lib/public_suffix/list.rb', line 266 def select(name, ignore_private: false) name = name.to_s indices = (@indexes[Domain.name_to_labels(name).last] || []) finder = @rules.values_at(*indices).lazy finder = finder.select { |rule| rule.match?(name) } finder = finder.select { |rule| !rule.private } if ignore_private finder.to_a end |
#size ⇒ Integer
Gets the number of elements in the list.
200 201 202 |
# File 'lib/public_suffix/list.rb', line 200 def size @rules.size end |