Class: Linguist::Heuristics
- Inherits:
-
Object
- Object
- Linguist::Heuristics
- Defined in:
- lib/linguist/heuristics.rb
Overview
A collection of simple heuristics that can be used to better analyze languages.
Constant Summary collapse
- HEURISTICS_CONSIDER_BYTES =
50 * 1024
Class Method Summary collapse
-
.call(blob, candidates) ⇒ Object
Public: Use heuristics to detect language of the blob.
-
.load ⇒ Object
Internal: Load heuristics from ‘heuristics.yml’.
- .parse_rule(named_patterns, rule) ⇒ Object
-
.to_regex(str) ⇒ Object
Internal: Converts a string or array of strings to regexp.
Instance Method Summary collapse
-
#call(data) ⇒ Object
Internal: Perform the heuristic.
-
#initialize(exts_and_langs, rules) ⇒ Heuristics
constructor
Internal.
-
#matches?(filename, candidates) ⇒ Boolean
Internal: Check if this heuristic matches the candidate filenames or languages.
Constructor Details
#initialize(exts_and_langs, rules) ⇒ Heuristics
Internal
87 88 89 90 |
# File 'lib/linguist/heuristics.rb', line 87 def initialize(exts_and_langs, rules) @exts_and_langs = exts_and_langs @rules = rules end |
Class Method Details
.call(blob, candidates) ⇒ Object
Public: Use heuristics to detect language of the blob.
blob - An object that quacks like a blob. possible_languages - Array of Language objects
Examples
Heuristics.call(FileBlob.new("path/to/file"), [
Language["Ruby"], Language["Python"]
])
Returns an Array of languages, or empty if none matched or were inconclusive.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/linguist/heuristics.rb', line 20 def self.call(blob, candidates) return [] if blob.symlink? self.load() data = blob.data[0...HEURISTICS_CONSIDER_BYTES] @heuristics.each do |heuristic| if heuristic.matches?(blob.name, candidates) return Array(heuristic.call(data)) end end [] # No heuristics matched end |
.load ⇒ Object
Internal: Load heuristics from ‘heuristics.yml’.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/linguist/heuristics.rb', line 36 def self.load() if @heuristics.any? return end data = YAML.load_file(File.("../heuristics.yml", __FILE__)) named_patterns = data['named_patterns'].map { |k,v| [k, self.to_regex(v)] }.to_h data['disambiguations'].each do |disambiguation| exts = disambiguation['extensions'] rules = disambiguation['rules'] rules.map! do |rule| rule['pattern'] = self.parse_rule(named_patterns, rule) rule end @heuristics << new(exts, rules) end end |
.parse_rule(named_patterns, rule) ⇒ Object
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/linguist/heuristics.rb', line 55 def self.parse_rule(named_patterns, rule) if !rule['and'].nil? rules = rule['and'].map { |block| self.parse_rule(named_patterns, block) } return And.new(rules) elsif !rule['pattern'].nil? return self.to_regex(rule['pattern']) elsif !rule['negative_pattern'].nil? pat = self.to_regex(rule['negative_pattern']) return NegativePattern.new(pat) elsif !rule['named_pattern'].nil? return named_patterns[rule['named_pattern']] else return AlwaysMatch.new() end end |
.to_regex(str) ⇒ Object
Internal: Converts a string or array of strings to regexp
str: string or array of strings. If it is an array of strings,
Regexp.union will be used.
75 76 77 78 79 80 81 |
# File 'lib/linguist/heuristics.rb', line 75 def self.to_regex(str) if str.kind_of?(Array) Regexp.union(str.map { |s| Regexp.new(s) }) else Regexp.new(str) end end |
Instance Method Details
#call(data) ⇒ Object
Internal: Perform the heuristic
101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/linguist/heuristics.rb', line 101 def call(data) matched = @rules.find do |rule| rule['pattern'].match(data) end if !matched.nil? languages = matched['language'] if languages.is_a?(Array) languages.map{ |l| Language[l] } else Language[languages] end end end |
#matches?(filename, candidates) ⇒ Boolean
Internal: Check if this heuristic matches the candidate filenames or languages.
94 95 96 97 98 |
# File 'lib/linguist/heuristics.rb', line 94 def matches?(filename, candidates) filename = filename.downcase candidates = candidates.compact.map(&:name) @exts_and_langs.any? { |ext| filename.end_with?(ext) } end |