Module: Linguist::BlobHelper

Included in:
FileBlob
Defined in:
lib/linguist/blob_helper.rb

Overview

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constant Summary collapse

MEGABYTE =
1024 * 1024
VendoredRegexp =
Regexp.new(vendored_paths.join('|'))

Instance Method Summary collapse

Instance Method Details

#_mime_typeObject

Internal: Lookup mime type for extension.

Returns a MIME::Type



29
30
31
32
33
34
35
36
37
38
39
40
# File 'lib/linguist/blob_helper.rb', line 29

def _mime_type
  if defined? @_mime_type
    @_mime_type
  else
    guesses = ::MIME::Types.type_for(extname.to_s)

    # Prefer text mime types over binary
    @_mime_type = guesses.detect { |type| type.ascii? } ||
      # Otherwise use the first guess
      guesses.first
  end
end

#binary?Boolean

Public: Is the blob binary?

Return true or false

Returns:

  • (Boolean)


121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/linguist/blob_helper.rb', line 121

def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end

#binary_mime_type?Boolean

Internal: Is the blob binary according to its mime type

Return true or false

Returns:

  • (Boolean)


57
58
59
# File 'lib/linguist/blob_helper.rb', line 57

def binary_mime_type?
  _mime_type ? _mime_type.binary? : false
end

#colorize(options = {}) ⇒ Object

Public: Highlight syntax of blob

options - A Hash of options (defaults to {})

Returns html String



339
340
341
342
343
344
# File 'lib/linguist/blob_helper.rb', line 339

def colorize(options = {})
  return unless safe_to_colorize?
  options[:options] ||= {}
  options[:options][:encoding] ||= encoding
  lexer.highlight(data, options)
end

#colorize_without_wrapper(options = {}) ⇒ Object

Public: Highlight syntax of blob without the outer highlight div wrapper.

options - A Hash of options (defaults to {})

Returns html String



352
353
354
355
356
357
358
# File 'lib/linguist/blob_helper.rb', line 352

def colorize_without_wrapper(options = {})
  if text = colorize(options)
    text[%r{<div class="highlight"><pre>(.*?)</pre>\s*</div>}m, 1]
  else
    ''
  end
end

#content_typeObject

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.



80
81
82
83
# File 'lib/linguist/blob_helper.rb', line 80

def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end

#detect_encodingObject

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found


114
115
116
# File 'lib/linguist/blob_helper.rb', line 114

def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end

#dispositionObject

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.



93
94
95
96
97
98
99
100
101
# File 'lib/linguist/blob_helper.rb', line 93

def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}"
  end
end

#encodingObject



103
104
105
106
107
# File 'lib/linguist/blob_helper.rb', line 103

def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



22
23
24
# File 'lib/linguist/blob_helper.rb', line 22

def extname
  File.extname(name.to_s)
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

May load Blob#data

Return true or false

Returns:

  • (Boolean)


276
277
278
# File 'lib/linguist/blob_helper.rb', line 276

def generated?
  @_generated ||= Generated.generated?(name, lambda { data })
end

#high_ratio_of_long_lines?Boolean

Internal: Does the blob have a ratio of long lines?

These types of files are usually going to make Pygments.rb angry if we try to colorize them.

Return true or false

Returns:

  • (Boolean)


188
189
190
191
# File 'lib/linguist/blob_helper.rb', line 188

def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end

#image?Boolean

Public: Is the blob a supported image format?

Return true or false

Returns:

  • (Boolean)


150
151
152
# File 'lib/linguist/blob_helper.rb', line 150

def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname)
end

#indexable?Boolean

Public: Should the blob be indexed for searching?

Excluded:

  • Files over 0.1MB

  • Non-text files

  • Languages marked as not searchable

  • Generated source files

Please add additional test coverage to ‘test/test_blob.rb#test_indexable` if you make any changes.

Return true or false

Returns:

  • (Boolean)


292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
# File 'lib/linguist/blob_helper.rb', line 292

def indexable?
  if size > 100 * 1024
    false
  elsif binary?
    false
  elsif extname == '.txt'
    true
  elsif language.nil?
    false
  elsif !language.searchable?
    false
  elsif generated?
    false
  else
    true
  end
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



315
316
317
318
319
320
321
322
323
324
325
# File 'lib/linguist/blob_helper.rb', line 315

def language
  return @language if defined? @language

  if defined?(@data) && @data.is_a?(String)
    data = @data
  else
    data = lambda { (binary_mime_type? || binary?) ? "" : self.data }
  end

  @language = Language.detect(name.to_s, data, mode)
end

#large?Boolean

Public: Is the blob too big to load?

Return true or false

Returns:

  • (Boolean)


166
167
168
# File 'lib/linguist/blob_helper.rb', line 166

def large?
  size.to_i > MEGABYTE
end

#lexerObject

Internal: Get the lexer of the blob.

Returns a Lexer.



330
331
332
# File 'lib/linguist/blob_helper.rb', line 330

def lexer
  language ? language.lexer : Pygments::Lexer.find_by_name('Text only')
end

#likely_binary?Boolean

Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.

Return true or false

Returns:

  • (Boolean)


66
67
68
# File 'lib/linguist/blob_helper.rb', line 66

def likely_binary?
   binary_mime_type? and not Language.find_by_filename(name)
end

#line_split_characterObject

Character used to split lines. This is almost always “n” except when Mac Format is detected in which case it’s “r”.

Returns a split pattern string.



235
236
237
# File 'lib/linguist/blob_helper.rb', line 235

def line_split_character
  @line_split_character ||= (mac_format?? "\r" : "\n")
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



222
223
224
225
226
227
228
229
# File 'lib/linguist/blob_helper.rb', line 222

def lines
  @lines ||=
    if viewable? && data
      data.split(line_split_character, -1)
    else
      []
    end
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



255
256
257
# File 'lib/linguist/blob_helper.rb', line 255

def loc
  lines.size
end

#mac_format?Boolean

Public: Is the data in ** Mac Format **. This format uses r (0x0d) characters for line ends and does not include a n (0x0a).

Returns true when mac format is detected.

Returns:

  • (Boolean)


243
244
245
246
247
248
# File 'lib/linguist/blob_helper.rb', line 243

def mac_format?
  return if !viewable?
  if pos = data[0, 4096].index("\r")
    data[pos + 1] != ?\n
  end
end

#mime_typeObject

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.



50
51
52
# File 'lib/linguist/blob_helper.rb', line 50

def mime_type
  _mime_type ? _mime_type.to_s : 'text/plain'
end

#safe_to_colorize?Boolean

Public: Is the blob safe to colorize?

We use Pygments.rb for syntax highlighting blobs, which has some quirks and also is essentially ‘un-killable’ via normal timeout. To workaround this we try to carefully handling Pygments.rb anything it can’t handle.

Return true or false

Returns:

  • (Boolean)


178
179
180
# File 'lib/linguist/blob_helper.rb', line 178

def safe_to_colorize?
  !large? && text? && !high_ratio_of_long_lines?
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



264
265
266
# File 'lib/linguist/blob_helper.rb', line 264

def sloc
  lines.grep(/\S/).size
end

#solid?Boolean

Public: Is the blob a supported 3D model format?

Return true or false

Returns:

  • (Boolean)


157
158
159
# File 'lib/linguist/blob_helper.rb', line 157

def solid?
  extname.downcase == '.stl'
end

#text?Boolean

Public: Is the blob text?

Return true or false

Returns:

  • (Boolean)


143
144
145
# File 'lib/linguist/blob_helper.rb', line 143

def text?
  !binary?
end

#vendored?Boolean

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


213
214
215
# File 'lib/linguist/blob_helper.rb', line 213

def vendored?
  name =~ VendoredRegexp ? true : false
end

#viewable?Boolean

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

Returns:

  • (Boolean)


198
199
200
# File 'lib/linguist/blob_helper.rb', line 198

def viewable?
  !large? && text?
end