Class: Linguist::Generated

Inherits:
Object
  • Object
show all
Defined in:
lib/linguist/generated.rb

Constant Summary collapse

PROTOBUF_EXTENSIONS =
['.py', '.java', '.h', '.cc', '.cpp']
APACHE_THRIFT_EXTENSIONS =
['.rb', '.py', '.go', '.js', '.m', '.java', '.h', '.cc', '.cpp']

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, data) ⇒ Generated

Internal: Initialize Generated instance

name - String filename data - String blob data



19
20
21
22
23
# File 'lib/linguist/generated.rb', line 19

def initialize(name, data)
  @name = name
  @extname = File.extname(name)
  @_data = data
end

Instance Attribute Details

#extnameObject (readonly)

Returns the value of attribute extname.



25
26
27
# File 'lib/linguist/generated.rb', line 25

def extname
  @extname
end

#nameObject (readonly)

Returns the value of attribute name.



25
26
27
# File 'lib/linguist/generated.rb', line 25

def name
  @name
end

Class Method Details

.generated?(name, data) ⇒ Boolean

Public: Is the blob a generated file?

name - String filename data - String blob data. A block also maybe passed in for lazy

loading. This behavior is deprecated and you should always
pass in a String.

Return true or false

Returns:

  • (Boolean)


11
12
13
# File 'lib/linguist/generated.rb', line 11

def self.generated?(name, data)
  new(name, data).generated?
end

Instance Method Details

#compiled_coffeescript?Boolean

Internal: Is the blob of JS generated by CoffeeScript?

CoffeeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns output by the CS compiler.

Return true or false

Returns:

  • (Boolean)


145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/linguist/generated.rb', line 145

def compiled_coffeescript?
  return false unless extname == '.js'

  # CoffeeScript generated by > 1.2 include a comment on the first line
  if lines[0] =~ /^\/\/ Generated by /
    return true
  end

  if lines[0] == '(function() {' &&     # First line is module closure opening
      lines[-2] == '}).call(this);' &&  # Second to last line closes module closure
      lines[-1] == ''                   # Last line is blank

    score = 0

    lines.each do |line|
      if line =~ /var /
        # Underscored temp vars are likely to be Coffee
        score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count

        # bind and extend functions are very Coffee specific
        score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count
      end
    end

    # Require a score of 3. This is fairly arbitrary. Consider
    # tweaking later.
    score >= 3
  else
    false
  end
end

#compiled_cython_file?Boolean

Internal: Is this a compiled C/C++ file from Cython?

Cython-compiled C/C++ files typically contain: /* Generated by Cython x.x.x on … */ on the first line.

Return true or false

Returns:

  • (Boolean)


353
354
355
356
357
# File 'lib/linguist/generated.rb', line 353

def compiled_cython_file?
  return false unless ['.c', '.cpp'].include? extname
  return false unless lines.count > 1
  return lines[0].include?("Generated by Cython")
end

#composer_lock?Boolean

Internal: Is the blob a generated php composer lock file?

Returns true or false.

Returns:

  • (Boolean)


325
326
327
# File 'lib/linguist/generated.rb', line 325

def composer_lock?
  !!name.match(/composer\.lock/)
end

#dataObject

Lazy load blob data if block was passed in.

Awful, awful stuff happening here.

Returns String data.



32
33
34
# File 'lib/linguist/generated.rb', line 32

def data
  @data ||= @_data.respond_to?(:call) ? @_data.call() : @_data
end

#generated?Boolean

Internal: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.

Return true or false

Returns:

  • (Boolean)


53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/linguist/generated.rb', line 53

def generated?
  xcode_file? ||
  generated_net_designer_file? ||
  generated_net_specflow_feature_file? ||
  composer_lock? ||
  node_modules? ||
  npm_shrinkwrap? ||
  godeps? ||
  generated_by_zephir? ||
  minified_files? ||
  has_source_map? ||
  source_map? ||
  compiled_coffeescript? ||
  generated_parser? ||
  generated_net_docfile? ||
  generated_postscript? ||
  compiled_cython_file? ||
  generated_go? ||
  generated_protocol_buffer? ||
  generated_apache_thrift? ||
  generated_jni_header? ||
  vcr_cassette? ||
  generated_module? ||
  generated_unity3d_meta? ||
  generated_racc? ||
  generated_jflex? ||
  generated_grammarkit?
end

#generated_apache_thrift?Boolean

Internal: Is the blob generated by Apache Thrift compiler?

Returns true or false

Returns:

  • (Boolean)


282
283
284
285
286
287
# File 'lib/linguist/generated.rb', line 282

def generated_apache_thrift?
  return false unless APACHE_THRIFT_EXTENSIONS.include?(extname)
  return false unless lines.count > 1

  return lines[0].include?("Autogenerated by Thrift Compiler") || lines[1].include?("Autogenerated by Thrift Compiler")
end

#generated_by_zephir?Boolean

Internal: Is the blob a generated by Zephir

Returns true or false.

Returns:

  • (Boolean)


332
333
334
# File 'lib/linguist/generated.rb', line 332

def generated_by_zephir?
  !!name.match(/.\.zep\.(?:c|h|php)$/)
end

#generated_go?Boolean

Returns:

  • (Boolean)


257
258
259
260
261
262
# File 'lib/linguist/generated.rb', line 257

def generated_go?
  return false unless extname == '.go'
  return false unless lines.count > 1

  return lines[0].include?("Code generated by")
end

#generated_grammarkit?Boolean

Internal: Is this a GrammarKit-generated file?

A GrammarKit-generated file typically contain: // This is a generated file. Not intended for manual editing. on the first line. This is not always the case, as it’s possible to customize the class header.

Return true or false

Returns:

  • (Boolean)


424
425
426
427
428
# File 'lib/linguist/generated.rb', line 424

def generated_grammarkit?
  return false unless extname == '.java'
  return false unless lines.count > 1
  return lines[0].start_with?("// This is a generated file. Not intended for manual editing.")
end

#generated_jflex?Boolean

Internal: Is this a JFlex-generated file?

A JFlex-generated file contains: /* The following code was generated by JFlex x.y.z on d/at/e ti:me */ on the first line.

Return true or false

Returns:

  • (Boolean)


410
411
412
413
414
# File 'lib/linguist/generated.rb', line 410

def generated_jflex?
  return false unless extname == '.java'
  return false unless lines.count > 1
  return lines[0].start_with?("/* The following code was generated by JFlex ")
end

#generated_jni_header?Boolean

Internal: Is the blob a C/C++ header generated by the Java JNI tool javah?

Returns true of false.

Returns:

  • (Boolean)


292
293
294
295
296
297
298
# File 'lib/linguist/generated.rb', line 292

def generated_jni_header?
  return false unless extname == '.h'
  return false unless lines.count > 2

  return lines[0].include?("/* DO NOT EDIT THIS FILE - it is machine generated */") &&
           lines[1].include?("#include <jni.h>")
end

#generated_module?Boolean

Internal: Is it a KiCAD or GFortran module file?

KiCAD module files contain: PCBNEW-LibModule-V1 yyyy-mm-dd h:mm:ss XM on the first line.

GFortran module files contain: GFORTRAN module version ‘x’ created from on the first line.

Return true of false

Returns:

  • (Boolean)


370
371
372
373
374
375
# File 'lib/linguist/generated.rb', line 370

def generated_module?
  return false unless extname == '.mod'
  return false unless lines.count > 1
  return lines[0].include?("PCBNEW-LibModule-V") ||
          lines[0].include?("GFORTRAN module version '")
end

#generated_net_designer_file?Boolean

Internal: Is this a codegen file for a .NET project?

Visual Studio often uses code generation to generate partial classes, and these files can be quite unwieldy. Let’s hide them.

Returns true or false

Returns:

  • (Boolean)


202
203
204
# File 'lib/linguist/generated.rb', line 202

def generated_net_designer_file?
  name.downcase =~ /\.designer\.cs$/
end

#generated_net_docfile?Boolean

Internal: Is this a generated documentation file for a .NET assembly?

.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.

Returns true or false

Returns:

  • (Boolean)


185
186
187
188
189
190
191
192
193
194
# File 'lib/linguist/generated.rb', line 185

def generated_net_docfile?
  return false unless extname.downcase == ".xml"
  return false unless lines.count > 3

  # .NET Docfiles always open with <doc> and their first tag is an
  # <assembly> tag
  return lines[1].include?("<doc>") &&
    lines[2].include?("<assembly>") &&
    lines[-2].include?("</doc>")
end

#generated_net_specflow_feature_file?Boolean

Internal: Is this a codegen file for Specflow feature file?

Visual Studio’s SpecFlow extension generates *.feature.cs files from *.feature files, they are not meant to be consumed by humans. Let’s hide them.

Returns true or false

Returns:

  • (Boolean)


213
214
215
# File 'lib/linguist/generated.rb', line 213

def generated_net_specflow_feature_file?
  name.downcase =~ /\.feature\.cs$/
end

#generated_parser?Boolean

Internal: Is the blob of JS a parser generated by PEG.js?

PEG.js-generated parsers are not meant to be consumed by humans.

Return true or false

Returns:

  • (Boolean)


222
223
224
225
226
227
228
229
230
231
232
# File 'lib/linguist/generated.rb', line 222

def generated_parser?
  return false unless extname == '.js'

  # PEG.js-generated parsers include a comment near the top  of the file
  # that marks them as such.
  if lines[0..4].join('') =~ /^(?:[^\/]|\/[^\*])*\/\*(?:[^\*]|\*[^\/])*Generated by PEG.js/
    return true
  end

  false
end

#generated_postscript?Boolean

Internal: Is the blob of PostScript generated?

PostScript files are often generated by other programs. If they tell us so, we can detect them.

Returns true or false.

Returns:

  • (Boolean)


240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/linguist/generated.rb', line 240

def generated_postscript?
  return false unless ['.ps', '.eps'].include? extname

  # We analyze the "%%Creator:" comment, which contains the author/generator
  # of the file. If there is one, it should be in one of the first few lines.
  creator = lines[0..9].find {|line| line =~ /^%%Creator: /}
  return false if creator.nil?

  # Most generators write their version number, while human authors' or companies'
  # names don't contain numbers. So look if the line contains digits. Also
  # look for some special cases without version numbers.
  return creator =~ /[0-9]/ ||
    creator.include?("mpage") ||
    creator.include?("draw") ||
    creator.include?("ImageMagick")
end

#generated_protocol_buffer?Boolean

Internal: Is the blob a C++, Java or Python source file generated by the Protocol Buffer compiler?

Returns true of false.

Returns:

  • (Boolean)


270
271
272
273
274
275
# File 'lib/linguist/generated.rb', line 270

def generated_protocol_buffer?
  return false unless PROTOBUF_EXTENSIONS.include?(extname)
  return false unless lines.count > 1

  return lines[0].include?("Generated by the protocol buffer compiler.  DO NOT EDIT!")
end

#generated_racc?Boolean

Internal: Is this a Racc-generated file?

A Racc-generated file contains: # This file is automatically generated by Racc x.y.z on the third line.

Return true or false

Returns:

  • (Boolean)


397
398
399
400
401
# File 'lib/linguist/generated.rb', line 397

def generated_racc?
  return false unless extname == '.rb'
  return false unless lines.count > 2
  return lines[2].start_with?("# This file is automatically generated by Racc")
end

#generated_unity3d_meta?Boolean

Internal: Is this a metadata file from Unity3D?

Unity3D Meta files start with:

fileFormatVersion: X
guid: XXXXXXXXXXXXXXX

Return true or false

Returns:

  • (Boolean)


384
385
386
387
388
# File 'lib/linguist/generated.rb', line 384

def generated_unity3d_meta?
  return false unless extname == '.meta'
  return false unless lines.count > 1
  return lines[0].include?("fileFormatVersion: ")
end

#godeps?Boolean

Internal: Is the blob part of Godeps/, which are not meant for humans in pull requests.

Returns true or false.

Returns:

  • (Boolean)


318
319
320
# File 'lib/linguist/generated.rb', line 318

def godeps?
  !!name.match(/Godeps\//)
end

#has_source_map?Boolean

Internal: Does the blob contain a source map reference?

We assume that if one of the last 2 lines starts with a source map reference, then the current file was generated from other files.

We use the last 2 lines because the last line might be empty.

We only handle JavaScript, no CSS support yet.

Returns true or false.

Returns:

  • (Boolean)


119
120
121
122
# File 'lib/linguist/generated.rb', line 119

def has_source_map?
  return false unless extname.downcase == '.js'
  lines.last(2).any? { |line| line.start_with?('//# sourceMappingURL') }
end

#linesObject

Public: Get each line of data

Returns an Array of lines



39
40
41
42
# File 'lib/linguist/generated.rb', line 39

def lines
  # TODO: data should be required to be a String, no nils
  @lines ||= data ? data.split("\n", -1) : []
end

#minified_files?Boolean

Internal: Is the blob minified files?

Consider a file minified if the average line length is greater then 110c.

Currently, only JS and CSS files are detected by this method.

Returns true or false.

Returns:

  • (Boolean)


100
101
102
103
104
105
106
107
# File 'lib/linguist/generated.rb', line 100

def minified_files?
  return unless ['.js', '.css'].include? extname
  if lines.any?
    (lines.inject(0) { |n, l| n += l.length } / lines.length) > 110
  else
    false
  end
end

#node_modules?Boolean

Internal: Is the blob part of node_modules/, which are not meant for humans in pull requests.

Returns true or false.

Returns:

  • (Boolean)


303
304
305
# File 'lib/linguist/generated.rb', line 303

def node_modules?
  !!name.match(/node_modules\//)
end

#npm_shrinkwrap?Boolean

Internal: Is the blob a generated npm shrinkwrap file.

Returns true or false.

Returns:

  • (Boolean)


310
311
312
# File 'lib/linguist/generated.rb', line 310

def npm_shrinkwrap?
  !!name.match(/npm-shrinkwrap\.json/)
end

#source_map?Boolean

Internal: Is the blob a generated source map?

Source Maps usually have .css.map or .js.map extensions. In case they are not following the name convention, detect them based on the content.

Returns true or false.

Returns:

  • (Boolean)


130
131
132
133
134
135
136
# File 'lib/linguist/generated.rb', line 130

def source_map?
  return false unless extname.downcase == '.map'

  name =~ /(\.css|\.js)\.map$/i ||                 # Name convention
  lines[0] =~ /^{"version":\d+,/ ||                # Revision 2 and later begin with the version number
  lines[0] =~ /^\/\*\* Begin line maps\. \*\*\/{/  # Revision 1 begins with a magic comment
end

#vcr_cassette?Boolean

Is the blob a VCR Cassette file?

Returns true or false

Returns:

  • (Boolean)


339
340
341
342
343
344
# File 'lib/linguist/generated.rb', line 339

def vcr_cassette?
  return false unless extname == '.yml'
  return false unless lines.count > 2
  # VCR Cassettes have "recorded_with: VCR" in the second last line.
  return lines[-2].include?("recorded_with: VCR")
end

#xcode_file?Boolean

Internal: Is the blob an Xcode file?

Generated if the file extension is an Xcode file extension.

Returns true of false.

Returns:

  • (Boolean)


88
89
90
# File 'lib/linguist/generated.rb', line 88

def xcode_file?
  ['.nib', '.xcworkspacedata', '.xcuserstate'].include?(extname)
end