Class: Linguist::Heuristics
- Inherits:
-
Object
- Object
- Linguist::Heuristics
- Defined in:
- lib/linguist/heuristics.rb
Overview
A collection of simple heuristics that can be used to better analyze languages.
Constant Summary collapse
- HEURISTICS_CONSIDER_BYTES =
50 * 1024
- CPlusPlusRegex =
Common heuristics
Regexp.union( /^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>/, /^\s*template\s*</, /^[ \t]*try/, /^[ \t]*catch\s*\(/, /^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+/, /^[ \t]*(private|public|protected):$/, /std::\w+/)
- ObjectiveCRegex =
/^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])/
- Perl5Regex =
/\buse\s+(?:strict\b|v?5\.)/
- Perl6Regex =
/^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)/
Class Method Summary collapse
-
.call(blob, candidates) ⇒ Object
Public: Use heuristics to detect language of the blob.
-
.disambiguate(*exts_and_langs, &heuristic) ⇒ Object
Internal: Define a new heuristic.
Instance Method Summary collapse
-
#call(data) ⇒ Object
Internal: Perform the heuristic.
-
#initialize(exts_and_langs, &heuristic) ⇒ Heuristics
constructor
Internal.
-
#matches?(filename, candidates) ⇒ Boolean
Internal: Check if this heuristic matches the candidate filenames or languages.
Constructor Details
#initialize(exts_and_langs, &heuristic) ⇒ Heuristics
Internal
56 57 58 59 |
# File 'lib/linguist/heuristics.rb', line 56 def initialize(exts_and_langs, &heuristic) @exts_and_langs, @candidates = exts_and_langs.partition {|e| e =~ /\A\./} @heuristic = heuristic end |
Class Method Details
.call(blob, candidates) ⇒ Object
Public: Use heuristics to detect language of the blob.
blob - An object that quacks like a blob. possible_languages - Array of Language objects
Examples
Heuristics.call(FileBlob.new("path/to/file"), [
Language["Ruby"], Language["Python"]
])
Returns an Array of languages, or empty if none matched or were inconclusive.
18 19 20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/linguist/heuristics.rb', line 18 def self.call(blob, candidates) return [] if blob.symlink? data = blob.data[0...HEURISTICS_CONSIDER_BYTES] @heuristics.each do |heuristic| if heuristic.matches?(blob.name, candidates) return Array(heuristic.call(data)) end end [] # No heuristics matched end |
.disambiguate(*exts_and_langs, &heuristic) ⇒ Object
Internal: Define a new heuristic.
exts_and_langs - String names of file extensions and languages to
disambiguate.
heuristic - Block which takes data as an argument and returns a Language or nil.
Examples
disambiguate ".pm" do |data|
if data.include?("use strict")
Language["Perl"]
elsif /^[^#]+:-/.match(data)
Language["Prolog"]
end
end
48 49 50 |
# File 'lib/linguist/heuristics.rb', line 48 def self.disambiguate(*exts_and_langs, &heuristic) @heuristics << new(exts_and_langs, &heuristic) end |
Instance Method Details
#call(data) ⇒ Object
Internal: Perform the heuristic
73 74 75 |
# File 'lib/linguist/heuristics.rb', line 73 def call(data) @heuristic.call(data) end |
#matches?(filename, candidates) ⇒ Boolean
Internal: Check if this heuristic matches the candidate filenames or languages.
63 64 65 66 67 68 69 70 |
# File 'lib/linguist/heuristics.rb', line 63 def matches?(filename, candidates) filename = filename.downcase candidates = candidates.compact.map(&:name) @exts_and_langs.any? { |ext| filename.end_with?(ext) } || (candidates.any? && (@candidates - candidates == [] && candidates - @candidates == [])) end |