Class: Linguist::Heuristics

Inherits:
Object
  • Object
show all
Defined in:
lib/linguist/heuristics.rb

Overview

A collection of simple heuristics that can be used to better analyze languages.

Constant Summary collapse

HEURISTICS_CONSIDER_BYTES =
50 * 1024

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(exts_and_langs, rules) ⇒ Heuristics

Internal



87
88
89
90
# File 'lib/linguist/heuristics.rb', line 87

def initialize(exts_and_langs, rules)
  @exts_and_langs = exts_and_langs
  @rules = rules
end

Class Method Details

.call(blob, candidates) ⇒ Object

Public: Use heuristics to detect language of the blob.

blob - An object that quacks like a blob. possible_languages - Array of Language objects

Examples

Heuristics.call(FileBlob.new("path/to/file"), [
  Language["Ruby"], Language["Python"]
])

Returns an Array of languages, or empty if none matched or were inconclusive.



20
21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/linguist/heuristics.rb', line 20

def self.call(blob, candidates)
  return [] if blob.symlink?
  self.load()

  data = blob.data[0...HEURISTICS_CONSIDER_BYTES]

  @heuristics.each do |heuristic|
    if heuristic.matches?(blob.name, candidates)
      return Array(heuristic.call(data))
    end
  end

  [] # No heuristics matched
end

.loadObject

Internal: Load heuristics from ‘heuristics.yml’.



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/linguist/heuristics.rb', line 36

def self.load()
  if @heuristics.any?
    return
  end

  data = YAML.load_file(File.expand_path("../heuristics.yml", __FILE__))
  named_patterns = data['named_patterns'].map { |k,v| [k, self.to_regex(v)] }.to_h

  data['disambiguations'].each do |disambiguation|
    exts = disambiguation['extensions']
    rules = disambiguation['rules']
    rules.map! do |rule|
      rule['pattern'] = self.parse_rule(named_patterns, rule)
      rule
    end
    @heuristics << new(exts, rules)
  end
end

.parse_rule(named_patterns, rule) ⇒ Object



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/linguist/heuristics.rb', line 55

def self.parse_rule(named_patterns, rule)
  if !rule['and'].nil?
    rules = rule['and'].map { |block| self.parse_rule(named_patterns, block) }
    return And.new(rules)
  elsif !rule['pattern'].nil?
    return self.to_regex(rule['pattern'])
  elsif !rule['negative_pattern'].nil?
    pat = self.to_regex(rule['negative_pattern'])
    return NegativePattern.new(pat)
  elsif !rule['named_pattern'].nil?
    return named_patterns[rule['named_pattern']]
  else
    return AlwaysMatch.new()
  end
end

.to_regex(str) ⇒ Object

Internal: Converts a string or array of strings to regexp

str: string or array of strings. If it is an array of strings,

Regexp.union will be used.


75
76
77
78
79
80
81
# File 'lib/linguist/heuristics.rb', line 75

def self.to_regex(str)
  if str.kind_of?(Array)
    Regexp.union(str.map { |s| Regexp.new(s) })
  else
    Regexp.new(str)
  end
end

Instance Method Details

#call(data) ⇒ Object

Internal: Perform the heuristic



101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/linguist/heuristics.rb', line 101

def call(data)
  matched = @rules.find do |rule|
    rule['pattern'].match(data)
  end
  if !matched.nil?
    languages = matched['language']
    if languages.is_a?(Array)
      languages.map{ |l| Language[l] }
    else
      Language[languages]
    end
  end
end

#matches?(filename, candidates) ⇒ Boolean

Internal: Check if this heuristic matches the candidate filenames or languages.

Returns:

  • (Boolean)


94
95
96
97
98
# File 'lib/linguist/heuristics.rb', line 94

def matches?(filename, candidates)
  filename = filename.downcase
  candidates = candidates.compact.map(&:name)
  @exts_and_langs.any? { |ext| filename.end_with?(ext) }
end