Module: Linguist::BlobHelper

Included in:
FileBlob
Defined in:
lib/linguist/blob_helper.rb

Overview

DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Language.detect` over `Blob#language`. Functions are much easier to cache and compose.

Avoid adding additional bloat to this module.

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constant Summary collapse

MEGABYTE =
1024 * 1024
VendoredRegexp =
Regexp.new(vendored_paths.join('|'))

Instance Method Summary collapse

Instance Method Details

#_mime_typeObject

Internal: Lookup mime type for extension.

Returns a MIME::Type



35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/linguist/blob_helper.rb', line 35

def _mime_type
  if defined? @_mime_type
    @_mime_type
  else
    guesses = ::MIME::Types.type_for(extname.to_s)

    # Prefer text mime types over binary
    @_mime_type = guesses.detect { |type| type.ascii? } ||
      # Otherwise use the first guess
      guesses.first
  end
end

#binary?Boolean

Public: Is the blob binary?

Return true or false

Returns:

  • (Boolean)


133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/linguist/blob_helper.rb', line 133

def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end

#binary_mime_type?Boolean

Internal: Is the blob binary according to its mime type

Return true or false

Returns:

  • (Boolean)


63
64
65
# File 'lib/linguist/blob_helper.rb', line 63

def binary_mime_type?
  _mime_type ? _mime_type.binary? : false
end

#colorize(options = {}) ⇒ Object

Public: Highlight syntax of blob

options - A Hash of options (defaults to {})

Returns html String



339
340
341
342
343
344
# File 'lib/linguist/blob_helper.rb', line 339

def colorize(options = {})
  return unless safe_to_colorize?
  options[:options] ||= {}
  options[:options][:encoding] ||= encoding
  lexer.highlight(data, options)
end

#content_typeObject

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.



86
87
88
89
# File 'lib/linguist/blob_helper.rb', line 86

def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end

#csv?Boolean

Public: Is this blob a CSV file?

Return true or false

Returns:

  • (Boolean)


176
177
178
# File 'lib/linguist/blob_helper.rb', line 176

def csv?
  text? && extname.downcase == '.csv'
end

#detect_encodingObject

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found


126
127
128
# File 'lib/linguist/blob_helper.rb', line 126

def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end

#dispositionObject

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.



99
100
101
102
103
104
105
106
107
# File 'lib/linguist/blob_helper.rb', line 99

def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}"
  end
end

#encodingObject



109
110
111
112
113
# File 'lib/linguist/blob_helper.rb', line 109

def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



28
29
30
# File 'lib/linguist/blob_helper.rb', line 28

def extname
  File.extname(name.to_s)
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

May load Blob#data

Return true or false

Returns:

  • (Boolean)


306
307
308
# File 'lib/linguist/blob_helper.rb', line 306

def generated?
  @_generated ||= Generated.generated?(name, lambda { data })
end

#high_ratio_of_long_lines?Boolean

Internal: Does the blob have a ratio of long lines?

These types of files are usually going to make Pygments.rb angry if we try to colorize them.

Return true or false

Returns:

  • (Boolean)


213
214
215
216
# File 'lib/linguist/blob_helper.rb', line 213

def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end

#image?Boolean

Public: Is the blob a supported image format?

Return true or false

Returns:

  • (Boolean)


162
163
164
# File 'lib/linguist/blob_helper.rb', line 162

def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase)
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



315
316
317
318
319
320
321
322
323
324
325
# File 'lib/linguist/blob_helper.rb', line 315

def language
  return @language if defined? @language

  if defined?(@data) && @data.is_a?(String)
    data = @data
  else
    data = lambda { (binary_mime_type? || binary?) ? "" : self.data }
  end

  @language = Language.detect(name.to_s, data, mode)
end

#large?Boolean

Public: Is the blob too big to load?

Return true or false

Returns:

  • (Boolean)


192
193
194
# File 'lib/linguist/blob_helper.rb', line 192

def large?
  size.to_i > MEGABYTE
end

#lexerObject

Internal: Get the lexer of the blob.

Returns a Lexer.



330
331
332
# File 'lib/linguist/blob_helper.rb', line 330

def lexer
  language ? language.lexer : Pygments::Lexer.find_by_name('Text only')
end

#likely_binary?Boolean

Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.

Return true or false

Returns:

  • (Boolean)


72
73
74
# File 'lib/linguist/blob_helper.rb', line 72

def likely_binary?
  binary_mime_type? && !Language.find_by_filename(name)
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/linguist/blob_helper.rb', line 247

def lines
  @lines ||=
    if viewable? && data
      # `data` is usually encoded as ASCII-8BIT even when the content has
      # been detected as a different encoding. However, we are not allowed
      # to change the encoding of `data` because we've made the implicit
      # guarantee that each entry in `lines` is encoded the same way as
      # `data`.
      #
      # Instead, we re-encode each possible newline sequence as the
      # detected encoding, then force them back to the encoding of `data`
      # (usually a binary encoding like ASCII-8BIT). This means that the
      # byte sequence will match how newlines are likely encoded in the
      # file, but we don't have to change the encoding of `data` as far as
      # Ruby is concerned. This allows us to correctly parse out each line
      # without changing the encoding of `data`, and
      # also--importantly--without having to duplicate many (potentially
      # large) strings.
      begin
        encoded_newlines = ["\r\n", "\r", "\n"].
          map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) }

        data.split(Regexp.union(encoded_newlines), -1)
      rescue Encoding::ConverterNotFoundError
        # The data is not splittable in the detected encoding.  Assume it's
        # one big line.
        [data]
      end
    else
      []
    end
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



285
286
287
# File 'lib/linguist/blob_helper.rb', line 285

def loc
  lines.size
end

#mime_typeObject

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.



56
57
58
# File 'lib/linguist/blob_helper.rb', line 56

def mime_type
  _mime_type ? _mime_type.to_s : 'text/plain'
end

#pdf?Boolean

Public: Is the blob a PDF?

Return true or false

Returns:

  • (Boolean)


183
184
185
# File 'lib/linguist/blob_helper.rb', line 183

def pdf?
  extname.downcase == '.pdf'
end

#ruby_encodingObject



115
116
117
118
119
# File 'lib/linguist/blob_helper.rb', line 115

def ruby_encoding
  if hash = detect_encoding
    hash[:ruby_encoding]
  end
end

#safe_to_colorize?Boolean

Public: Is the blob safe to colorize?

We use Pygments for syntax highlighting blobs. Pygments can be too slow for very large blobs or for certain corner-case blobs.

Return true or false

Returns:

  • (Boolean)


203
204
205
# File 'lib/linguist/blob_helper.rb', line 203

def safe_to_colorize?
  !large? && text? && !high_ratio_of_long_lines?
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



294
295
296
# File 'lib/linguist/blob_helper.rb', line 294

def sloc
  lines.grep(/\S/).size
end

#solid?Boolean

Public: Is the blob a supported 3D model format?

Return true or false

Returns:

  • (Boolean)


169
170
171
# File 'lib/linguist/blob_helper.rb', line 169

def solid?
  extname.downcase == '.stl'
end

#text?Boolean

Public: Is the blob text?

Return true or false

Returns:

  • (Boolean)


155
156
157
# File 'lib/linguist/blob_helper.rb', line 155

def text?
  !binary?
end

#vendored?Boolean

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


238
239
240
# File 'lib/linguist/blob_helper.rb', line 238

def vendored?
  name =~ VendoredRegexp ? true : false
end

#viewable?Boolean

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

Returns:

  • (Boolean)


223
224
225
# File 'lib/linguist/blob_helper.rb', line 223

def viewable?
  !large? && text?
end