Module: Linguist::BlobHelper

Included in:
FileBlob, LazyBlob
Defined in:
lib/linguist/blob_helper.rb

Overview

DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Language.detect` over `Blob#language`. Functions are much easier to cache and compose.

Avoid adding additional bloat to this module.

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constant Summary collapse

MEGABYTE =
1024 * 1024
VendoredRegexp =
Regexp.new(vendored_paths.join('|'))

Instance Method Summary collapse

Instance Method Details

#_mime_typeObject

Internal: Lookup mime type for extension.

Returns a MIME::Type



33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/linguist/blob_helper.rb', line 33

def _mime_type
  if defined? @_mime_type
    @_mime_type
  else
    guesses = ::MIME::Types.type_for(extname.to_s)

    # Prefer text mime types over binary
    @_mime_type = guesses.detect { |type| type.ascii? } ||
      # Otherwise use the first guess
      guesses.first
  end
end

#binary?Boolean

Public: Is the blob binary?

Return true or false

Returns:

  • (Boolean)


131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# File 'lib/linguist/blob_helper.rb', line 131

def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end

#binary_mime_type?Boolean

Internal: Is the blob binary according to its mime type

Return true or false

Returns:

  • (Boolean)


61
62
63
# File 'lib/linguist/blob_helper.rb', line 61

def binary_mime_type?
  _mime_type ? _mime_type.binary? : false
end

#colorize(options = {}) ⇒ Object

Public: Highlight syntax of blob

options - A Hash of options (defaults to {})

Returns html String



329
330
331
332
333
334
# File 'lib/linguist/blob_helper.rb', line 329

def colorize(options = {})
  return unless safe_to_colorize?
  options[:options] ||= {}
  options[:options][:encoding] ||= encoding
  lexer.highlight(data, options)
end

#content_typeObject

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.



84
85
86
87
# File 'lib/linguist/blob_helper.rb', line 84

def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end

#csv?Boolean

Public: Is this blob a CSV file?

Return true or false

Returns:

  • (Boolean)


174
175
176
# File 'lib/linguist/blob_helper.rb', line 174

def csv?
  text? && extname.downcase == '.csv'
end

#detect_encodingObject

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found


124
125
126
# File 'lib/linguist/blob_helper.rb', line 124

def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end

#dispositionObject

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.



97
98
99
100
101
102
103
104
105
# File 'lib/linguist/blob_helper.rb', line 97

def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}"
  end
end

#encodingObject



107
108
109
110
111
# File 'lib/linguist/blob_helper.rb', line 107

def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



26
27
28
# File 'lib/linguist/blob_helper.rb', line 26

def extname
  File.extname(name.to_s)
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

May load Blob#data

Return true or false

Returns:

  • (Boolean)


304
305
306
# File 'lib/linguist/blob_helper.rb', line 304

def generated?
  @_generated ||= Generated.generated?(name, lambda { data })
end

#high_ratio_of_long_lines?Boolean

Internal: Does the blob have a ratio of long lines?

These types of files are usually going to make Pygments.rb angry if we try to colorize them.

Return true or false

Returns:

  • (Boolean)


211
212
213
214
# File 'lib/linguist/blob_helper.rb', line 211

def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end

#image?Boolean

Public: Is the blob a supported image format?

Return true or false

Returns:

  • (Boolean)


160
161
162
# File 'lib/linguist/blob_helper.rb', line 160

def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase)
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



313
314
315
# File 'lib/linguist/blob_helper.rb', line 313

def language
  @language ||= Language.detect(self)
end

#large?Boolean

Public: Is the blob too big to load?

Return true or false

Returns:

  • (Boolean)


190
191
192
# File 'lib/linguist/blob_helper.rb', line 190

def large?
  size.to_i > MEGABYTE
end

#lexerObject

Internal: Get the lexer of the blob.

Returns a Lexer.



320
321
322
# File 'lib/linguist/blob_helper.rb', line 320

def lexer
  language ? language.lexer : Pygments::Lexer.find_by_name('Text only')
end

#likely_binary?Boolean

Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.

Return true or false

Returns:

  • (Boolean)


70
71
72
# File 'lib/linguist/blob_helper.rb', line 70

def likely_binary?
  binary_mime_type? && !Language.find_by_filename(name)
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
# File 'lib/linguist/blob_helper.rb', line 245

def lines
  @lines ||=
    if viewable? && data
      # `data` is usually encoded as ASCII-8BIT even when the content has
      # been detected as a different encoding. However, we are not allowed
      # to change the encoding of `data` because we've made the implicit
      # guarantee that each entry in `lines` is encoded the same way as
      # `data`.
      #
      # Instead, we re-encode each possible newline sequence as the
      # detected encoding, then force them back to the encoding of `data`
      # (usually a binary encoding like ASCII-8BIT). This means that the
      # byte sequence will match how newlines are likely encoded in the
      # file, but we don't have to change the encoding of `data` as far as
      # Ruby is concerned. This allows us to correctly parse out each line
      # without changing the encoding of `data`, and
      # also--importantly--without having to duplicate many (potentially
      # large) strings.
      begin
        encoded_newlines = ["\r\n", "\r", "\n"].
          map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) }

        data.split(Regexp.union(encoded_newlines), -1)
      rescue Encoding::ConverterNotFoundError
        # The data is not splittable in the detected encoding.  Assume it's
        # one big line.
        [data]
      end
    else
      []
    end
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



283
284
285
# File 'lib/linguist/blob_helper.rb', line 283

def loc
  lines.size
end

#mime_typeObject

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.



54
55
56
# File 'lib/linguist/blob_helper.rb', line 54

def mime_type
  _mime_type ? _mime_type.to_s : 'text/plain'
end

#pdf?Boolean

Public: Is the blob a PDF?

Return true or false

Returns:

  • (Boolean)


181
182
183
# File 'lib/linguist/blob_helper.rb', line 181

def pdf?
  extname.downcase == '.pdf'
end

#ruby_encodingObject



113
114
115
116
117
# File 'lib/linguist/blob_helper.rb', line 113

def ruby_encoding
  if hash = detect_encoding
    hash[:ruby_encoding]
  end
end

#safe_to_colorize?Boolean

Public: Is the blob safe to colorize?

We use Pygments for syntax highlighting blobs. Pygments can be too slow for very large blobs or for certain corner-case blobs.

Return true or false

Returns:

  • (Boolean)


201
202
203
# File 'lib/linguist/blob_helper.rb', line 201

def safe_to_colorize?
  !large? && text? && !high_ratio_of_long_lines?
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



292
293
294
# File 'lib/linguist/blob_helper.rb', line 292

def sloc
  lines.grep(/\S/).size
end

#solid?Boolean

Public: Is the blob a supported 3D model format?

Return true or false

Returns:

  • (Boolean)


167
168
169
# File 'lib/linguist/blob_helper.rb', line 167

def solid?
  extname.downcase == '.stl'
end

#text?Boolean

Public: Is the blob text?

Return true or false

Returns:

  • (Boolean)


153
154
155
# File 'lib/linguist/blob_helper.rb', line 153

def text?
  !binary?
end

#vendored?Boolean

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


236
237
238
# File 'lib/linguist/blob_helper.rb', line 236

def vendored?
  name =~ VendoredRegexp ? true : false
end

#viewable?Boolean

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

Returns:

  • (Boolean)


221
222
223
# File 'lib/linguist/blob_helper.rb', line 221

def viewable?
  !large? && text?
end