Module: Linguist::BlobHelper
Overview
DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Linguist.detect` over `Blob#language`. Functions are much easier to cache and compose.
Avoid adding additional bloat to this module.
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constant Summary collapse
- MEGABYTE =
1024 * 1024
- VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
- DocumentationRegexp =
Regexp.new(documentation_paths.join('|'))
- DETECTABLE_TYPES =
[:programming, :markup].freeze
Instance Method Summary collapse
-
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
-
#binary? ⇒ Boolean
Public: Is the blob binary?.
-
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type.
-
#content_type ⇒ Object
Public: Get the Content-Type header value.
-
#csv? ⇒ Boolean
Public: Is this blob a CSV file?.
-
#detect_encoding ⇒ Object
Try to guess the encoding.
-
#disposition ⇒ Object
Public: Get the Content-Disposition header value.
-
#documentation? ⇒ Boolean
Public: Is the blob in a documentation directory?.
-
#empty? ⇒ Boolean
Public: Is the blob empty?.
- #encoded_newlines_re ⇒ Object
- #encoding ⇒ Object
-
#extname ⇒ Object
Public: Get the extname of the path.
- #first_lines(n) ⇒ Object
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?.
-
#image? ⇒ Boolean
Public: Is the blob a supported image format?.
-
#include_in_language_stats? ⇒ Boolean
Internal: Should this blob be included in repository language statistics?.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#large? ⇒ Boolean
Public: Is the blob too big to load?.
- #last_lines(n) ⇒ Object
-
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#mime_type ⇒ Object
Public: Get the actual blob mime type.
-
#pdf? ⇒ Boolean
Public: Is the blob a PDF?.
- #ruby_encoding ⇒ Object
-
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?.
-
#text? ⇒ Boolean
Public: Is the blob text?.
-
#tm_scope ⇒ Object
Internal: Get the TextMate compatible scope for the blob.
-
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?.
-
#viewable? ⇒ Boolean
Public: Is the blob viewable?.
Instance Method Details
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
Returns a MIME::Type
32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/linguist/blob_helper.rb', line 32 def _mime_type if defined? @_mime_type @_mime_type else guesses = ::MIME::Types.type_for(extname.to_s) # Prefer text mime types over binary @_mime_type = guesses.detect { |type| type.ascii? } || # Otherwise use the first guess guesses.first end end |
#binary? ⇒ Boolean
Public: Is the blob binary?
Return true or false
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/linguist/blob_helper.rb', line 130 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end |
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type
Return true or false
60 61 62 |
# File 'lib/linguist/blob_helper.rb', line 60 def binary_mime_type? _mime_type ? _mime_type.binary? : false end |
#content_type ⇒ Object
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'
Returns a content type String.
83 84 85 86 |
# File 'lib/linguist/blob_helper.rb', line 83 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end |
#csv? ⇒ Boolean
Public: Is this blob a CSV file?
Return true or false
180 181 182 |
# File 'lib/linguist/blob_helper.rb', line 180 def csv? text? && extname.downcase == '.csv' end |
#detect_encoding ⇒ Object
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or
no valid encoding could be found
123 124 125 |
# File 'lib/linguist/blob_helper.rb', line 123 def detect_encoding @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end |
#disposition ⇒ Object
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar"
# => "inline"
Returns a content disposition String.
96 97 98 99 100 101 102 103 104 |
# File 'lib/linguist/blob_helper.rb', line 96 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else "attachment; filename=#{EscapeUtils.escape_url(name)}" end end |
#documentation? ⇒ Boolean
Public: Is the blob in a documentation directory?
Documentation files are ignored by language statistics.
See “documentation.yml” for a list of documentation conventions that match this pattern.
Return true or false
250 251 252 |
# File 'lib/linguist/blob_helper.rb', line 250 def documentation? path =~ DocumentationRegexp ? true : false end |
#empty? ⇒ Boolean
Public: Is the blob empty?
Return true or false
152 153 154 |
# File 'lib/linguist/blob_helper.rb', line 152 def empty? data.nil? || data == "" end |
#encoded_newlines_re ⇒ Object
290 291 292 293 294 |
# File 'lib/linguist/blob_helper.rb', line 290 def encoded_newlines_re @encoded_newlines_re ||= Regexp.union(["\r\n", "\r", "\n"]. map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) }) end |
#encoding ⇒ Object
106 107 108 109 110 |
# File 'lib/linguist/blob_helper.rb', line 106 def encoding if hash = detect_encoding hash[:encoding] end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
25 26 27 |
# File 'lib/linguist/blob_helper.rb', line 25 def extname File.extname(name.to_s) end |
#first_lines(n) ⇒ Object
296 297 298 299 300 301 302 303 304 305 306 |
# File 'lib/linguist/blob_helper.rb', line 296 def first_lines(n) return lines[0...n] if defined? @lines return [] unless viewable? && data i, c = 0, 0 while c < n && j = data.index(encoded_newlines_re, i) i = j + $&.length c += 1 end data[0...i].split(encoded_newlines_re, -1) end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is suppressed in diffs and is ignored by language statistics.
May load Blob#data
Return true or false
361 362 363 |
# File 'lib/linguist/blob_helper.rb', line 361 def generated? @_generated ||= Generated.generated?(path, lambda { data }) end |
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?
Return true or false
210 211 212 213 |
# File 'lib/linguist/blob_helper.rb', line 210 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end |
#image? ⇒ Boolean
Public: Is the blob a supported image format?
Return true or false
166 167 168 |
# File 'lib/linguist/blob_helper.rb', line 166 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase) end |
#include_in_language_stats? ⇒ Boolean
Internal: Should this blob be included in repository language statistics?
382 383 384 385 386 387 388 389 390 |
# File 'lib/linguist/blob_helper.rb', line 382 def include_in_language_stats? !vendored? && !documentation? && !generated? && language && ( defined?(detectable?) && !detectable?.nil? ? detectable? : DETECTABLE_TYPES.include?(language.type) ) end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
370 371 372 |
# File 'lib/linguist/blob_helper.rb', line 370 def language @language ||= Linguist.detect(self) end |
#large? ⇒ Boolean
Public: Is the blob too big to load?
Return true or false
196 197 198 |
# File 'lib/linguist/blob_helper.rb', line 196 def large? size.to_i > MEGABYTE end |
#last_lines(n) ⇒ Object
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
# File 'lib/linguist/blob_helper.rb', line 308 def last_lines(n) if defined? @lines if n >= @lines.length @lines else lines[-n..-1] end end return [] unless viewable? && data no_eol = true i, c = data.length, 0 k = i while c < n && j = data.rindex(encoded_newlines_re, i - 1) if c == 0 && j + $&.length == i no_eol = false n += 1 end i = j k = j + $&.length c += 1 end r = data[k..-1].split(encoded_newlines_re, -1) r.pop if !no_eol r end |
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
Return true or false
69 70 71 |
# File 'lib/linguist/blob_helper.rb', line 69 def likely_binary? binary_mime_type? && !Language.find_by_filename(name) end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
# File 'lib/linguist/blob_helper.rb', line 259 def lines @lines ||= if viewable? && data # `data` is usually encoded as ASCII-8BIT even when the content has # been detected as a different encoding. However, we are not allowed # to change the encoding of `data` because we've made the implicit # guarantee that each entry in `lines` is encoded the same way as # `data`. # # Instead, we re-encode each possible newline sequence as the # detected encoding, then force them back to the encoding of `data` # (usually a binary encoding like ASCII-8BIT). This means that the # byte sequence will match how newlines are likely encoded in the # file, but we don't have to change the encoding of `data` as far as # Ruby is concerned. This allows us to correctly parse out each line # without changing the encoding of `data`, and # also--importantly--without having to duplicate many (potentially # large) strings. begin data.split(encoded_newlines_re, -1) rescue Encoding::ConverterNotFoundError # The data is not splittable in the detected encoding. Assume it's # one big line. [data] end else [] end end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
340 341 342 |
# File 'lib/linguist/blob_helper.rb', line 340 def loc lines.size end |
#mime_type ⇒ Object
Public: Get the actual blob mime type
Examples
# => 'text/plain'
# => 'text/html'
Returns a mime type String.
53 54 55 |
# File 'lib/linguist/blob_helper.rb', line 53 def mime_type _mime_type ? _mime_type.to_s : 'text/plain' end |
#pdf? ⇒ Boolean
Public: Is the blob a PDF?
Return true or false
187 188 189 |
# File 'lib/linguist/blob_helper.rb', line 187 def pdf? extname.downcase == '.pdf' end |
#ruby_encoding ⇒ Object
112 113 114 115 116 |
# File 'lib/linguist/blob_helper.rb', line 112 def ruby_encoding if hash = detect_encoding hash[:ruby_encoding] end end |
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?
Return true or false
203 204 205 |
# File 'lib/linguist/blob_helper.rb', line 203 def safe_to_colorize? !large? && text? && !high_ratio_of_long_lines? end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
349 350 351 |
# File 'lib/linguist/blob_helper.rb', line 349 def sloc lines.grep(/\S/).size end |
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?
Return true or false
173 174 175 |
# File 'lib/linguist/blob_helper.rb', line 173 def solid? extname.downcase == '.stl' end |
#text? ⇒ Boolean
Public: Is the blob text?
Return true or false
159 160 161 |
# File 'lib/linguist/blob_helper.rb', line 159 def text? !binary? end |
#tm_scope ⇒ Object
Internal: Get the TextMate compatible scope for the blob
375 376 377 |
# File 'lib/linguist/blob_helper.rb', line 375 def tm_scope language && language.tm_scope end |
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
235 236 237 |
# File 'lib/linguist/blob_helper.rb', line 235 def vendored? path =~ VendoredRegexp ? true : false end |
#viewable? ⇒ Boolean
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
220 221 222 |
# File 'lib/linguist/blob_helper.rb', line 220 def viewable? !large? && text? end |