Module: Linguist::BlobHelper

Included in:
Blob, FileBlob, LazyBlob
Defined in:
lib/linguist/blob_helper.rb

Overview

DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Linguist.detect` over `Blob#language`. Functions are much easier to cache and compose.

Avoid adding additional bloat to this module.

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constant Summary collapse

MEGABYTE =
1024 * 1024
VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
DocumentationRegexp =
Regexp.new(documentation_paths.join('|'))
DETECTABLE_TYPES =
[:programming, :markup].freeze

Instance Method Summary collapse

Instance Method Details

#_mime_typeObject

Internal: Lookup mime type for extension.

Returns a MIME::Type



32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/linguist/blob_helper.rb', line 32

def _mime_type
  if defined? @_mime_type
    @_mime_type
  else
    guesses = ::MIME::Types.type_for(extname.to_s)

    # Prefer text mime types over binary
    @_mime_type = guesses.detect { |type| type.ascii? } ||
      # Otherwise use the first guess
      guesses.first
  end
end

#binary?Boolean

Public: Is the blob binary?

Return true or false

Returns:

  • (Boolean)


130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/linguist/blob_helper.rb', line 130

def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end

#binary_mime_type?Boolean

Internal: Is the blob binary according to its mime type

Return true or false

Returns:

  • (Boolean)


60
61
62
# File 'lib/linguist/blob_helper.rb', line 60

def binary_mime_type?
  _mime_type ? _mime_type.binary? : false
end

#content_typeObject

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.



83
84
85
86
# File 'lib/linguist/blob_helper.rb', line 83

def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end

#csv?Boolean

Public: Is this blob a CSV file?

Return true or false

Returns:

  • (Boolean)


180
181
182
# File 'lib/linguist/blob_helper.rb', line 180

def csv?
  text? && extname.downcase == '.csv'
end

#detect_encodingObject

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found


123
124
125
# File 'lib/linguist/blob_helper.rb', line 123

def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end

#dispositionObject

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.



96
97
98
99
100
101
102
103
104
# File 'lib/linguist/blob_helper.rb', line 96

def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(name)}"
  end
end

#documentation?Boolean

Public: Is the blob in a documentation directory?

Documentation files are ignored by language statistics.

See “documentation.yml” for a list of documentation conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


250
251
252
# File 'lib/linguist/blob_helper.rb', line 250

def documentation?
  path =~ DocumentationRegexp ? true : false
end

#empty?Boolean

Public: Is the blob empty?

Return true or false

Returns:

  • (Boolean)


152
153
154
# File 'lib/linguist/blob_helper.rb', line 152

def empty?
  data.nil? || data == ""
end

#encoded_newlines_reObject



290
291
292
293
294
# File 'lib/linguist/blob_helper.rb', line 290

def encoded_newlines_re
  @encoded_newlines_re ||= Regexp.union(["\r\n", "\r", "\n"].
                                          map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) })

end

#encodingObject



106
107
108
109
110
# File 'lib/linguist/blob_helper.rb', line 106

def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



25
26
27
# File 'lib/linguist/blob_helper.rb', line 25

def extname
  File.extname(name.to_s)
end

#first_lines(n) ⇒ Object



296
297
298
299
300
301
302
303
304
305
306
# File 'lib/linguist/blob_helper.rb', line 296

def first_lines(n)
  return lines[0...n] if defined? @lines
  return [] unless viewable? && data

  i, c = 0, 0
  while c < n && j = data.index(encoded_newlines_re, i)
    i = j + $&.length
    c += 1
  end
  data[0...i].split(encoded_newlines_re, -1)
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

May load Blob#data

Return true or false

Returns:

  • (Boolean)


361
362
363
# File 'lib/linguist/blob_helper.rb', line 361

def generated?
  @_generated ||= Generated.generated?(path, lambda { data })
end

#high_ratio_of_long_lines?Boolean

Internal: Does the blob have a ratio of long lines?

Return true or false

Returns:

  • (Boolean)


210
211
212
213
# File 'lib/linguist/blob_helper.rb', line 210

def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end

#image?Boolean

Public: Is the blob a supported image format?

Return true or false

Returns:

  • (Boolean)


166
167
168
# File 'lib/linguist/blob_helper.rb', line 166

def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase)
end

#include_in_language_stats?Boolean

Internal: Should this blob be included in repository language statistics?

Returns:

  • (Boolean)


382
383
384
385
386
387
388
389
390
# File 'lib/linguist/blob_helper.rb', line 382

def include_in_language_stats?
  !vendored? &&
  !documentation? &&
  !generated? &&
  language && ( defined?(detectable?) && !detectable?.nil? ?
    detectable? :
    DETECTABLE_TYPES.include?(language.type)
  )
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



370
371
372
# File 'lib/linguist/blob_helper.rb', line 370

def language
  @language ||= Linguist.detect(self)
end

#large?Boolean

Public: Is the blob too big to load?

Return true or false

Returns:

  • (Boolean)


196
197
198
# File 'lib/linguist/blob_helper.rb', line 196

def large?
  size.to_i > MEGABYTE
end

#last_lines(n) ⇒ Object



308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
# File 'lib/linguist/blob_helper.rb', line 308

def last_lines(n)
  if defined? @lines
    if n >= @lines.length
      @lines
    else
      lines[-n..-1]
    end
  end
  return [] unless viewable? && data

  no_eol = true
  i, c = data.length, 0
  k = i
  while c < n && j = data.rindex(encoded_newlines_re, i - 1)
    if c == 0 && j + $&.length == i
      no_eol = false
      n += 1
    end
    i = j
    k = j + $&.length
    c += 1
  end
  r = data[k..-1].split(encoded_newlines_re, -1)
  r.pop if !no_eol
  r
end

#likely_binary?Boolean

Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.

Return true or false

Returns:

  • (Boolean)


69
70
71
# File 'lib/linguist/blob_helper.rb', line 69

def likely_binary?
  binary_mime_type? && !Language.find_by_filename(name)
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# File 'lib/linguist/blob_helper.rb', line 259

def lines
  @lines ||=
    if viewable? && data
      # `data` is usually encoded as ASCII-8BIT even when the content has
      # been detected as a different encoding. However, we are not allowed
      # to change the encoding of `data` because we've made the implicit
      # guarantee that each entry in `lines` is encoded the same way as
      # `data`.
      #
      # Instead, we re-encode each possible newline sequence as the
      # detected encoding, then force them back to the encoding of `data`
      # (usually a binary encoding like ASCII-8BIT). This means that the
      # byte sequence will match how newlines are likely encoded in the
      # file, but we don't have to change the encoding of `data` as far as
      # Ruby is concerned. This allows us to correctly parse out each line
      # without changing the encoding of `data`, and
      # also--importantly--without having to duplicate many (potentially
      # large) strings.
      begin
        
        data.split(encoded_newlines_re, -1)
      rescue Encoding::ConverterNotFoundError
        # The data is not splittable in the detected encoding.  Assume it's
        # one big line.
        [data]
      end
    else
      []
    end
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



340
341
342
# File 'lib/linguist/blob_helper.rb', line 340

def loc
  lines.size
end

#mime_typeObject

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.



53
54
55
# File 'lib/linguist/blob_helper.rb', line 53

def mime_type
  _mime_type ? _mime_type.to_s : 'text/plain'
end

#pdf?Boolean

Public: Is the blob a PDF?

Return true or false

Returns:

  • (Boolean)


187
188
189
# File 'lib/linguist/blob_helper.rb', line 187

def pdf?
  extname.downcase == '.pdf'
end

#ruby_encodingObject



112
113
114
115
116
# File 'lib/linguist/blob_helper.rb', line 112

def ruby_encoding
  if hash = detect_encoding
    hash[:ruby_encoding]
  end
end

#safe_to_colorize?Boolean

Public: Is the blob safe to colorize?

Return true or false

Returns:

  • (Boolean)


203
204
205
# File 'lib/linguist/blob_helper.rb', line 203

def safe_to_colorize?
  !large? && text? && !high_ratio_of_long_lines?
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



349
350
351
# File 'lib/linguist/blob_helper.rb', line 349

def sloc
  lines.grep(/\S/).size
end

#solid?Boolean

Public: Is the blob a supported 3D model format?

Return true or false

Returns:

  • (Boolean)


173
174
175
# File 'lib/linguist/blob_helper.rb', line 173

def solid?
  extname.downcase == '.stl'
end

#text?Boolean

Public: Is the blob text?

Return true or false

Returns:

  • (Boolean)


159
160
161
# File 'lib/linguist/blob_helper.rb', line 159

def text?
  !binary?
end

#tm_scopeObject

Internal: Get the TextMate compatible scope for the blob



375
376
377
# File 'lib/linguist/blob_helper.rb', line 375

def tm_scope
  language && language.tm_scope
end

#vendored?Boolean

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


235
236
237
# File 'lib/linguist/blob_helper.rb', line 235

def vendored?
  path =~ VendoredRegexp ? true : false
end

#viewable?Boolean

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

Returns:

  • (Boolean)


220
221
222
# File 'lib/linguist/blob_helper.rb', line 220

def viewable?
  !large? && text?
end