Class: Linguist::Heuristics
- Inherits:
-
Object
- Object
- Linguist::Heuristics
- Defined in:
- lib/linguist/heuristics.rb
Overview
A collection of simple heuristics that can be used to better analyze languages.
Constant Summary collapse
- HEURISTICS_CONSIDER_BYTES =
50 * 1024
Class Method Summary collapse
-
.all ⇒ Object
Public: Get all heuristic definitions.
-
.call(blob, candidates) ⇒ Object
Public: Use heuristics to detect language of the blob.
-
.load ⇒ Object
Internal: Load heuristics from ‘heuristics.yml’.
- .parse_rule(named_patterns, rule) ⇒ Object
-
.to_regex(str) ⇒ Object
Internal: Converts a string or array of strings to regexp.
Instance Method Summary collapse
-
#call(data) ⇒ Object
Internal: Perform the heuristic.
-
#extensions ⇒ Object
Internal: Return the heuristic’s target extensions.
-
#initialize(exts, rules) ⇒ Heuristics
constructor
Internal.
-
#languages ⇒ Object
Internal: Return the heuristic’s candidate languages.
-
#matches?(filename, candidates) ⇒ Boolean
Internal: Check if this heuristic matches the candidate filenames or languages.
Constructor Details
#initialize(exts, rules) ⇒ Heuristics
Internal
95 96 97 98 |
# File 'lib/linguist/heuristics.rb', line 95 def initialize(exts, rules) @exts = exts @rules = rules end |
Class Method Details
.all ⇒ Object
Public: Get all heuristic definitions
Returns an Array of heuristic objects.
38 39 40 41 |
# File 'lib/linguist/heuristics.rb', line 38 def self.all self.load() @heuristics end |
.call(blob, candidates) ⇒ Object
Public: Use heuristics to detect language of the blob.
blob - An object that quacks like a blob. possible_languages - Array of Language objects
Examples
Heuristics.call(FileBlob.new("path/to/file"), [
Language["Ruby"], Language["Python"]
])
Returns an Array of languages, or empty if none matched or were inconclusive.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/linguist/heuristics.rb', line 20 def self.call(blob, candidates) return [] if blob.symlink? self.load() data = blob.data[0...HEURISTICS_CONSIDER_BYTES] @heuristics.each do |heuristic| if heuristic.matches?(blob.name, candidates) return Array(heuristic.call(data)) end end [] # No heuristics matched end |
.load ⇒ Object
Internal: Load heuristics from ‘heuristics.yml’.
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/linguist/heuristics.rb', line 44 def self.load() if @heuristics.any? return end data = YAML.load_file(File.("../heuristics.yml", __FILE__)) named_patterns = data['named_patterns'].map { |k,v| [k, self.to_regex(v)] }.to_h data['disambiguations'].each do |disambiguation| exts = disambiguation['extensions'] rules = disambiguation['rules'] rules.map! do |rule| rule['pattern'] = self.parse_rule(named_patterns, rule) rule end @heuristics << new(exts, rules) end end |
.parse_rule(named_patterns, rule) ⇒ Object
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/linguist/heuristics.rb', line 63 def self.parse_rule(named_patterns, rule) if !rule['and'].nil? rules = rule['and'].map { |block| self.parse_rule(named_patterns, block) } return And.new(rules) elsif !rule['pattern'].nil? return self.to_regex(rule['pattern']) elsif !rule['negative_pattern'].nil? pat = self.to_regex(rule['negative_pattern']) return NegativePattern.new(pat) elsif !rule['named_pattern'].nil? return named_patterns[rule['named_pattern']] else return AlwaysMatch.new() end end |
.to_regex(str) ⇒ Object
Internal: Converts a string or array of strings to regexp
str: string or array of strings. If it is an array of strings,
Regexp.union will be used.
83 84 85 86 87 88 89 |
# File 'lib/linguist/heuristics.rb', line 83 def self.to_regex(str) if str.kind_of?(Array) Regexp.union(str.map { |s| Regexp.new(s) }) else Regexp.new(str) end end |
Instance Method Details
#call(data) ⇒ Object
Internal: Perform the heuristic
121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/linguist/heuristics.rb', line 121 def call(data) matched = @rules.find do |rule| rule['pattern'].match(data) end if !matched.nil? languages = matched['language'] if languages.is_a?(Array) languages.map{ |l| Language[l] } else Language[languages] end end end |
#extensions ⇒ Object
Internal: Return the heuristic’s target extensions
101 102 103 |
# File 'lib/linguist/heuristics.rb', line 101 def extensions @exts end |
#languages ⇒ Object
Internal: Return the heuristic’s candidate languages
106 107 108 109 110 |
# File 'lib/linguist/heuristics.rb', line 106 def languages @rules.map do |rule| [rule['language']].flatten(2).map { |name| Language[name] } end.flatten.uniq end |
#matches?(filename, candidates) ⇒ Boolean
Internal: Check if this heuristic matches the candidate filenames or languages.
114 115 116 117 118 |
# File 'lib/linguist/heuristics.rb', line 114 def matches?(filename, candidates) filename = filename.downcase candidates = candidates.compact.map(&:name) @exts.any? { |ext| filename.end_with?(ext) } end |