Module: Linguist::BlobHelper

Included in:
Blob, FileBlob, LazyBlob
Defined in:
lib/linguist/blob_helper.rb

Overview

DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Linguist.detect` over `Blob#language`. Functions are much easier to cache and compose.

Avoid adding additional bloat to this module.

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constant Summary collapse

MEGABYTE =
1024 * 1024
VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
DocumentationRegexp =
Regexp.new(documentation_paths.join('|'))
DETECTABLE_TYPES =
[:programming, :markup].freeze

Instance Method Summary collapse

Instance Method Details

#_mime_typeObject

Internal: Lookup mime type for filename.

Returns a MIME::Type



32
33
34
35
36
37
38
# File 'lib/linguist/blob_helper.rb', line 32

def _mime_type
  if defined? @_mime_type
    @_mime_type
  else
    @_mime_type = MiniMime.lookup_by_filename(name.to_s)
  end
end

#binary?Boolean

Public: Is the blob binary?

Return true or false

Returns:

  • (Boolean)


125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/linguist/blob_helper.rb', line 125

def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end

#binary_mime_type?Boolean

Internal: Is the blob binary according to its mime type

Return true or false

Returns:

  • (Boolean)


55
56
57
# File 'lib/linguist/blob_helper.rb', line 55

def binary_mime_type?
  _mime_type ? _mime_type.binary? : false
end

#content_typeObject

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.



78
79
80
81
# File 'lib/linguist/blob_helper.rb', line 78

def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end

#csv?Boolean

Public: Is this blob a CSV file?

Return true or false

Returns:

  • (Boolean)


175
176
177
# File 'lib/linguist/blob_helper.rb', line 175

def csv?
  text? && extname.downcase == '.csv'
end

#detect_encodingObject

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found


118
119
120
# File 'lib/linguist/blob_helper.rb', line 118

def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end

#dispositionObject

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.



91
92
93
94
95
96
97
98
99
# File 'lib/linguist/blob_helper.rb', line 91

def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(name)}"
  end
end

#documentation?Boolean

Public: Is the blob in a documentation directory?

Documentation files are ignored by language statistics.

See “documentation.yml” for a list of documentation conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


245
246
247
# File 'lib/linguist/blob_helper.rb', line 245

def documentation?
  path =~ DocumentationRegexp ? true : false
end

#empty?Boolean

Public: Is the blob empty?

Return true or false

Returns:

  • (Boolean)


147
148
149
# File 'lib/linguist/blob_helper.rb', line 147

def empty?
  data.nil? || data == ""
end

#encoded_newlines_reObject



287
288
289
290
291
# File 'lib/linguist/blob_helper.rb', line 287

def encoded_newlines_re
  @encoded_newlines_re ||= Regexp.union(["\r\n", "\r", "\n"].
                                          map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) })

end

#encodingObject



101
102
103
104
105
# File 'lib/linguist/blob_helper.rb', line 101

def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



25
26
27
# File 'lib/linguist/blob_helper.rb', line 25

def extname
  File.extname(name.to_s)
end

#first_lines(n) ⇒ Object



293
294
295
296
297
298
299
300
301
302
303
# File 'lib/linguist/blob_helper.rb', line 293

def first_lines(n)
  return lines[0...n] if defined? @lines
  return [] unless viewable? && data

  i, c = 0, 0
  while c < n && j = data.index(encoded_newlines_re, i)
    i = j + $&.length
    c += 1
  end
  data[0...i].split(encoded_newlines_re, -1)
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

May load Blob#data

Return true or false

Returns:

  • (Boolean)


358
359
360
# File 'lib/linguist/blob_helper.rb', line 358

def generated?
  @_generated ||= Generated.generated?(path, lambda { data })
end

#high_ratio_of_long_lines?Boolean

Internal: Does the blob have a ratio of long lines?

Return true or false

Returns:

  • (Boolean)


205
206
207
208
# File 'lib/linguist/blob_helper.rb', line 205

def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end

#image?Boolean

Public: Is the blob a supported image format?

Return true or false

Returns:

  • (Boolean)


161
162
163
# File 'lib/linguist/blob_helper.rb', line 161

def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase)
end

#include_in_language_stats?Boolean

Internal: Should this blob be included in repository language statistics?

Returns:

  • (Boolean)


379
380
381
382
383
384
385
386
387
# File 'lib/linguist/blob_helper.rb', line 379

def include_in_language_stats?
  !vendored? &&
  !documentation? &&
  !generated? &&
  language && ( defined?(detectable?) && !detectable?.nil? ?
    detectable? :
    DETECTABLE_TYPES.include?(language.type)
  )
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



367
368
369
# File 'lib/linguist/blob_helper.rb', line 367

def language
  @language ||= Linguist.detect(self)
end

#large?Boolean

Public: Is the blob too big to load?

Return true or false

Returns:

  • (Boolean)


191
192
193
# File 'lib/linguist/blob_helper.rb', line 191

def large?
  size.to_i > MEGABYTE
end

#last_lines(n) ⇒ Object



305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
# File 'lib/linguist/blob_helper.rb', line 305

def last_lines(n)
  if defined? @lines
    if n >= @lines.length
      @lines
    else
      lines[-n..-1]
    end
  end
  return [] unless viewable? && data

  no_eol = true
  i, c = data.length, 0
  k = i
  while c < n && j = data.rindex(encoded_newlines_re, i - 1)
    if c == 0 && j + $&.length == i
      no_eol = false
      n += 1
    end
    i = j
    k = j + $&.length
    c += 1
  end
  r = data[k..-1].split(encoded_newlines_re, -1)
  r.pop if !no_eol
  r
end

#likely_binary?Boolean

Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.

Return true or false

Returns:

  • (Boolean)


64
65
66
# File 'lib/linguist/blob_helper.rb', line 64

def likely_binary?
  binary_mime_type? && !Language.find_by_filename(name)
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# File 'lib/linguist/blob_helper.rb', line 254

def lines
  @lines ||=
    if viewable? && data
      # `data` is usually encoded as ASCII-8BIT even when the content has
      # been detected as a different encoding. However, we are not allowed
      # to change the encoding of `data` because we've made the implicit
      # guarantee that each entry in `lines` is encoded the same way as
      # `data`.
      #
      # Instead, we re-encode each possible newline sequence as the
      # detected encoding, then force them back to the encoding of `data`
      # (usually a binary encoding like ASCII-8BIT). This means that the
      # byte sequence will match how newlines are likely encoded in the
      # file, but we don't have to change the encoding of `data` as far as
      # Ruby is concerned. This allows us to correctly parse out each line
      # without changing the encoding of `data`, and
      # also--importantly--without having to duplicate many (potentially
      # large) strings.
      begin
        # `data` is split after having its last `\n` removed by
        # chomp (if any). This prevents the creation of an empty
        # element after the final `\n` character on POSIX files.
        data.chomp.split(encoded_newlines_re, -1)
      rescue Encoding::ConverterNotFoundError
        # The data is not splittable in the detected encoding.  Assume it's
        # one big line.
        [data]
      end
    else
      []
    end
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



337
338
339
# File 'lib/linguist/blob_helper.rb', line 337

def loc
  lines.size
end

#mime_typeObject

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.



48
49
50
# File 'lib/linguist/blob_helper.rb', line 48

def mime_type
  _mime_type ? _mime_type.content_type : 'text/plain'
end

#pdf?Boolean

Public: Is the blob a PDF?

Return true or false

Returns:

  • (Boolean)


182
183
184
# File 'lib/linguist/blob_helper.rb', line 182

def pdf?
  extname.downcase == '.pdf'
end

#ruby_encodingObject



107
108
109
110
111
# File 'lib/linguist/blob_helper.rb', line 107

def ruby_encoding
  if hash = detect_encoding
    hash[:ruby_encoding]
  end
end

#safe_to_colorize?Boolean

Public: Is the blob safe to colorize?

Return true or false

Returns:

  • (Boolean)


198
199
200
# File 'lib/linguist/blob_helper.rb', line 198

def safe_to_colorize?
  !large? && text? && !high_ratio_of_long_lines?
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



346
347
348
# File 'lib/linguist/blob_helper.rb', line 346

def sloc
  lines.grep(/\S/).size
end

#solid?Boolean

Public: Is the blob a supported 3D model format?

Return true or false

Returns:

  • (Boolean)


168
169
170
# File 'lib/linguist/blob_helper.rb', line 168

def solid?
  extname.downcase == '.stl'
end

#text?Boolean

Public: Is the blob text?

Return true or false

Returns:

  • (Boolean)


154
155
156
# File 'lib/linguist/blob_helper.rb', line 154

def text?
  !binary?
end

#tm_scopeObject

Internal: Get the TextMate compatible scope for the blob



372
373
374
# File 'lib/linguist/blob_helper.rb', line 372

def tm_scope
  language && language.tm_scope
end

#vendored?Boolean

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


230
231
232
# File 'lib/linguist/blob_helper.rb', line 230

def vendored?
  path =~ VendoredRegexp ? true : false
end

#viewable?Boolean

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

Returns:

  • (Boolean)


215
216
217
# File 'lib/linguist/blob_helper.rb', line 215

def viewable?
  !large? && text?
end