Class: Linguist::Heuristics

Inherits:
Object
  • Object
show all
Defined in:
lib/linguist/heuristics.rb

Overview

A collection of simple heuristics that can be used to better analyze languages.

Constant Summary collapse

HEURISTICS_CONSIDER_BYTES =
50 * 1024
CPlusPlusRegex =

Common heuristics

Regexp.union(
/^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>/,
/^\s*template\s*</,
/^[ \t]*try/,
/^[ \t]*catch\s*\(/,
/^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+/,
/^[ \t]*(private|public|protected):$/,
/std::\w+/)
ObjectiveCRegex =
/^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])/
Perl5Regex =
/\buse\s+(?:strict\b|v?5\.)/
Perl6Regex =
/^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)/

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(exts_and_langs, &heuristic) ⇒ Heuristics

Internal



56
57
58
59
# File 'lib/linguist/heuristics.rb', line 56

def initialize(exts_and_langs, &heuristic)
  @exts_and_langs, @candidates = exts_and_langs.partition {|e| e =~ /\A\./}
  @heuristic = heuristic
end

Class Method Details

.call(blob, candidates) ⇒ Object

Public: Use heuristics to detect language of the blob.

blob - An object that quacks like a blob. possible_languages - Array of Language objects

Examples

Heuristics.call(FileBlob.new("path/to/file"), [
  Language["Ruby"], Language["Python"]
])

Returns an Array of languages, or empty if none matched or were inconclusive.



18
19
20
21
22
23
24
25
26
27
28
29
30
# File 'lib/linguist/heuristics.rb', line 18

def self.call(blob, candidates)
  return [] if blob.symlink?

  data = blob.data[0...HEURISTICS_CONSIDER_BYTES]

  @heuristics.each do |heuristic|
    if heuristic.matches?(blob.name, candidates)
      return Array(heuristic.call(data))
    end
  end

  [] # No heuristics matched
end

.disambiguate(*exts_and_langs, &heuristic) ⇒ Object

Internal: Define a new heuristic.

exts_and_langs - String names of file extensions and languages to

disambiguate.

heuristic - Block which takes data as an argument and returns a Language or nil.

Examples

disambiguate ".pm" do |data|
  if data.include?("use strict")
    Language["Perl"]
  elsif /^[^#]+:-/.match(data)
    Language["Prolog"]
  end
end


48
49
50
# File 'lib/linguist/heuristics.rb', line 48

def self.disambiguate(*exts_and_langs, &heuristic)
  @heuristics << new(exts_and_langs, &heuristic)
end

Instance Method Details

#call(data) ⇒ Object

Internal: Perform the heuristic



73
74
75
# File 'lib/linguist/heuristics.rb', line 73

def call(data)
  @heuristic.call(data)
end

#matches?(filename, candidates) ⇒ Boolean

Internal: Check if this heuristic matches the candidate filenames or languages.

Returns:

  • (Boolean)


63
64
65
66
67
68
69
70
# File 'lib/linguist/heuristics.rb', line 63

def matches?(filename, candidates)
  filename = filename.downcase
  candidates = candidates.compact.map(&:name)
  @exts_and_langs.any? { |ext| filename.end_with?(ext) } ||
    (candidates.any? &&
     (@candidates - candidates == [] &&
      candidates - @candidates == []))
end