Module: Linguist::BlobHelper
Overview
DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Language.detect` over `Blob#language`. Functions are much easier to cache and compose.
Avoid adding additional bloat to this module.
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constant Summary collapse
- MEGABYTE =
1024 * 1024
- VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
Instance Method Summary collapse
-
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
-
#binary? ⇒ Boolean
Public: Is the blob binary?.
-
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type.
-
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob.
-
#content_type ⇒ Object
Public: Get the Content-Type header value.
-
#csv? ⇒ Boolean
Public: Is this blob a CSV file?.
-
#detect_encoding ⇒ Object
Try to guess the encoding.
-
#disposition ⇒ Object
Public: Get the Content-Disposition header value.
- #encoding ⇒ Object
-
#extname ⇒ Object
Public: Get the extname of the path.
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?.
-
#image? ⇒ Boolean
Public: Is the blob a supported image format?.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#large? ⇒ Boolean
Public: Is the blob too big to load?.
-
#lexer ⇒ Object
Internal: Get the lexer of the blob.
-
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#mime_type ⇒ Object
Public: Get the actual blob mime type.
-
#pdf? ⇒ Boolean
Public: Is the blob a PDF?.
- #ruby_encoding ⇒ Object
-
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?.
-
#text? ⇒ Boolean
Public: Is the blob text?.
-
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?.
-
#viewable? ⇒ Boolean
Public: Is the blob viewable?.
Instance Method Details
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
Returns a MIME::Type
33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/linguist/blob_helper.rb', line 33 def _mime_type if defined? @_mime_type @_mime_type else guesses = ::MIME::Types.type_for(extname.to_s) # Prefer text mime types over binary @_mime_type = guesses.detect { |type| type.ascii? } || # Otherwise use the first guess guesses.first end end |
#binary? ⇒ Boolean
Public: Is the blob binary?
Return true or false
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/linguist/blob_helper.rb', line 131 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end |
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type
Return true or false
61 62 63 |
# File 'lib/linguist/blob_helper.rb', line 61 def binary_mime_type? _mime_type ? _mime_type.binary? : false end |
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob
options - A Hash of options (defaults to {})
Returns html String
329 330 331 332 333 334 |
# File 'lib/linguist/blob_helper.rb', line 329 def colorize( = {}) return unless safe_to_colorize? [:options] ||= {} [:options][:encoding] ||= encoding lexer.highlight(data, ) end |
#content_type ⇒ Object
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'
Returns a content type String.
84 85 86 87 |
# File 'lib/linguist/blob_helper.rb', line 84 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end |
#csv? ⇒ Boolean
Public: Is this blob a CSV file?
Return true or false
174 175 176 |
# File 'lib/linguist/blob_helper.rb', line 174 def csv? text? && extname.downcase == '.csv' end |
#detect_encoding ⇒ Object
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or
no valid encoding could be found
124 125 126 |
# File 'lib/linguist/blob_helper.rb', line 124 def detect_encoding @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end |
#disposition ⇒ Object
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar"
# => "inline"
Returns a content disposition String.
97 98 99 100 101 102 103 104 105 |
# File 'lib/linguist/blob_helper.rb', line 97 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}" end end |
#encoding ⇒ Object
107 108 109 110 111 |
# File 'lib/linguist/blob_helper.rb', line 107 def encoding if hash = detect_encoding hash[:encoding] end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
26 27 28 |
# File 'lib/linguist/blob_helper.rb', line 26 def extname File.extname(name.to_s) end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is suppressed in diffs and is ignored by language statistics.
May load Blob#data
Return true or false
304 305 306 |
# File 'lib/linguist/blob_helper.rb', line 304 def generated? @_generated ||= Generated.generated?(name, lambda { data }) end |
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?
These types of files are usually going to make Pygments.rb angry if we try to colorize them.
Return true or false
211 212 213 214 |
# File 'lib/linguist/blob_helper.rb', line 211 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end |
#image? ⇒ Boolean
Public: Is the blob a supported image format?
Return true or false
160 161 162 |
# File 'lib/linguist/blob_helper.rb', line 160 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase) end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
313 314 315 |
# File 'lib/linguist/blob_helper.rb', line 313 def language @language ||= Language.detect(self) end |
#large? ⇒ Boolean
Public: Is the blob too big to load?
Return true or false
190 191 192 |
# File 'lib/linguist/blob_helper.rb', line 190 def large? size.to_i > MEGABYTE end |
#lexer ⇒ Object
Internal: Get the lexer of the blob.
Returns a Lexer.
320 321 322 |
# File 'lib/linguist/blob_helper.rb', line 320 def lexer language ? language.lexer : Pygments::Lexer.find_by_name('Text only') end |
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
Return true or false
70 71 72 |
# File 'lib/linguist/blob_helper.rb', line 70 def likely_binary? binary_mime_type? && !Language.find_by_filename(name) end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 |
# File 'lib/linguist/blob_helper.rb', line 245 def lines @lines ||= if viewable? && data # `data` is usually encoded as ASCII-8BIT even when the content has # been detected as a different encoding. However, we are not allowed # to change the encoding of `data` because we've made the implicit # guarantee that each entry in `lines` is encoded the same way as # `data`. # # Instead, we re-encode each possible newline sequence as the # detected encoding, then force them back to the encoding of `data` # (usually a binary encoding like ASCII-8BIT). This means that the # byte sequence will match how newlines are likely encoded in the # file, but we don't have to change the encoding of `data` as far as # Ruby is concerned. This allows us to correctly parse out each line # without changing the encoding of `data`, and # also--importantly--without having to duplicate many (potentially # large) strings. begin encoded_newlines = ["\r\n", "\r", "\n"]. map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) } data.split(Regexp.union(encoded_newlines), -1) rescue Encoding::ConverterNotFoundError # The data is not splittable in the detected encoding. Assume it's # one big line. [data] end else [] end end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
283 284 285 |
# File 'lib/linguist/blob_helper.rb', line 283 def loc lines.size end |
#mime_type ⇒ Object
Public: Get the actual blob mime type
Examples
# => 'text/plain'
# => 'text/html'
Returns a mime type String.
54 55 56 |
# File 'lib/linguist/blob_helper.rb', line 54 def mime_type _mime_type ? _mime_type.to_s : 'text/plain' end |
#pdf? ⇒ Boolean
Public: Is the blob a PDF?
Return true or false
181 182 183 |
# File 'lib/linguist/blob_helper.rb', line 181 def pdf? extname.downcase == '.pdf' end |
#ruby_encoding ⇒ Object
113 114 115 116 117 |
# File 'lib/linguist/blob_helper.rb', line 113 def ruby_encoding if hash = detect_encoding hash[:ruby_encoding] end end |
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?
We use Pygments for syntax highlighting blobs. Pygments can be too slow for very large blobs or for certain corner-case blobs.
Return true or false
201 202 203 |
# File 'lib/linguist/blob_helper.rb', line 201 def safe_to_colorize? !large? && text? && !high_ratio_of_long_lines? end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
292 293 294 |
# File 'lib/linguist/blob_helper.rb', line 292 def sloc lines.grep(/\S/).size end |
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?
Return true or false
167 168 169 |
# File 'lib/linguist/blob_helper.rb', line 167 def solid? extname.downcase == '.stl' end |
#text? ⇒ Boolean
Public: Is the blob text?
Return true or false
153 154 155 |
# File 'lib/linguist/blob_helper.rb', line 153 def text? !binary? end |
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
236 237 238 |
# File 'lib/linguist/blob_helper.rb', line 236 def vendored? name =~ VendoredRegexp ? true : false end |
#viewable? ⇒ Boolean
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
221 222 223 |
# File 'lib/linguist/blob_helper.rb', line 221 def viewable? !large? && text? end |