Module: Linguist::BlobHelper
- Included in:
- FileBlob
- Defined in:
- lib/linguist/blob_helper.rb
Overview
DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Language.detect` over `Blob#language`. Functions are much easier to cache and compose.
Avoid adding additional bloat to this module.
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constant Summary collapse
- MEGABYTE =
1024 * 1024
- VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
Instance Method Summary collapse
-
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
-
#binary? ⇒ Boolean
Public: Is the blob binary?.
-
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type.
-
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob.
-
#content_type ⇒ Object
Public: Get the Content-Type header value.
-
#csv? ⇒ Boolean
Public: Is this blob a CSV file?.
-
#detect_encoding ⇒ Object
Try to guess the encoding.
-
#disposition ⇒ Object
Public: Get the Content-Disposition header value.
- #encoding ⇒ Object
-
#extname ⇒ Object
Public: Get the extname of the path.
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?.
-
#image? ⇒ Boolean
Public: Is the blob a supported image format?.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#large? ⇒ Boolean
Public: Is the blob too big to load?.
-
#lexer ⇒ Object
Internal: Get the lexer of the blob.
-
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#mime_type ⇒ Object
Public: Get the actual blob mime type.
-
#pdf? ⇒ Boolean
Public: Is the blob a PDF?.
- #ruby_encoding ⇒ Object
-
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?.
-
#text? ⇒ Boolean
Public: Is the blob text?.
-
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?.
-
#viewable? ⇒ Boolean
Public: Is the blob viewable?.
Instance Method Details
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
Returns a MIME::Type
35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/linguist/blob_helper.rb', line 35 def _mime_type if defined? @_mime_type @_mime_type else guesses = ::MIME::Types.type_for(extname.to_s) # Prefer text mime types over binary @_mime_type = guesses.detect { |type| type.ascii? } || # Otherwise use the first guess guesses.first end end |
#binary? ⇒ Boolean
Public: Is the blob binary?
Return true or false
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/linguist/blob_helper.rb', line 133 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end |
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type
Return true or false
63 64 65 |
# File 'lib/linguist/blob_helper.rb', line 63 def binary_mime_type? _mime_type ? _mime_type.binary? : false end |
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob
options - A Hash of options (defaults to {})
Returns html String
339 340 341 342 343 344 |
# File 'lib/linguist/blob_helper.rb', line 339 def colorize( = {}) return unless safe_to_colorize? [:options] ||= {} [:options][:encoding] ||= encoding lexer.highlight(data, ) end |
#content_type ⇒ Object
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'
Returns a content type String.
86 87 88 89 |
# File 'lib/linguist/blob_helper.rb', line 86 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end |
#csv? ⇒ Boolean
Public: Is this blob a CSV file?
Return true or false
176 177 178 |
# File 'lib/linguist/blob_helper.rb', line 176 def csv? text? && extname.downcase == '.csv' end |
#detect_encoding ⇒ Object
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or
no valid encoding could be found
126 127 128 |
# File 'lib/linguist/blob_helper.rb', line 126 def detect_encoding @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end |
#disposition ⇒ Object
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar"
# => "inline"
Returns a content disposition String.
99 100 101 102 103 104 105 106 107 |
# File 'lib/linguist/blob_helper.rb', line 99 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}" end end |
#encoding ⇒ Object
109 110 111 112 113 |
# File 'lib/linguist/blob_helper.rb', line 109 def encoding if hash = detect_encoding hash[:encoding] end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
28 29 30 |
# File 'lib/linguist/blob_helper.rb', line 28 def extname File.extname(name.to_s) end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is suppressed in diffs and is ignored by language statistics.
May load Blob#data
Return true or false
306 307 308 |
# File 'lib/linguist/blob_helper.rb', line 306 def generated? @_generated ||= Generated.generated?(name, lambda { data }) end |
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?
These types of files are usually going to make Pygments.rb angry if we try to colorize them.
Return true or false
213 214 215 216 |
# File 'lib/linguist/blob_helper.rb', line 213 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end |
#image? ⇒ Boolean
Public: Is the blob a supported image format?
Return true or false
162 163 164 |
# File 'lib/linguist/blob_helper.rb', line 162 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase) end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/linguist/blob_helper.rb', line 315 def language return @language if defined? @language if defined?(@data) && @data.is_a?(String) data = @data else data = lambda { (binary_mime_type? || binary?) ? "" : self.data } end @language = Language.detect(name.to_s, data, mode) end |
#large? ⇒ Boolean
Public: Is the blob too big to load?
Return true or false
192 193 194 |
# File 'lib/linguist/blob_helper.rb', line 192 def large? size.to_i > MEGABYTE end |
#lexer ⇒ Object
Internal: Get the lexer of the blob.
Returns a Lexer.
330 331 332 |
# File 'lib/linguist/blob_helper.rb', line 330 def lexer language ? language.lexer : Pygments::Lexer.find_by_name('Text only') end |
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
Return true or false
72 73 74 |
# File 'lib/linguist/blob_helper.rb', line 72 def likely_binary? binary_mime_type? && !Language.find_by_filename(name) end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
# File 'lib/linguist/blob_helper.rb', line 247 def lines @lines ||= if viewable? && data # `data` is usually encoded as ASCII-8BIT even when the content has # been detected as a different encoding. However, we are not allowed # to change the encoding of `data` because we've made the implicit # guarantee that each entry in `lines` is encoded the same way as # `data`. # # Instead, we re-encode each possible newline sequence as the # detected encoding, then force them back to the encoding of `data` # (usually a binary encoding like ASCII-8BIT). This means that the # byte sequence will match how newlines are likely encoded in the # file, but we don't have to change the encoding of `data` as far as # Ruby is concerned. This allows us to correctly parse out each line # without changing the encoding of `data`, and # also--importantly--without having to duplicate many (potentially # large) strings. begin encoded_newlines = ["\r\n", "\r", "\n"]. map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) } data.split(Regexp.union(encoded_newlines), -1) rescue Encoding::ConverterNotFoundError # The data is not splittable in the detected encoding. Assume it's # one big line. [data] end else [] end end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
285 286 287 |
# File 'lib/linguist/blob_helper.rb', line 285 def loc lines.size end |
#mime_type ⇒ Object
Public: Get the actual blob mime type
Examples
# => 'text/plain'
# => 'text/html'
Returns a mime type String.
56 57 58 |
# File 'lib/linguist/blob_helper.rb', line 56 def mime_type _mime_type ? _mime_type.to_s : 'text/plain' end |
#pdf? ⇒ Boolean
Public: Is the blob a PDF?
Return true or false
183 184 185 |
# File 'lib/linguist/blob_helper.rb', line 183 def pdf? extname.downcase == '.pdf' end |
#ruby_encoding ⇒ Object
115 116 117 118 119 |
# File 'lib/linguist/blob_helper.rb', line 115 def ruby_encoding if hash = detect_encoding hash[:ruby_encoding] end end |
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?
We use Pygments for syntax highlighting blobs. Pygments can be too slow for very large blobs or for certain corner-case blobs.
Return true or false
203 204 205 |
# File 'lib/linguist/blob_helper.rb', line 203 def safe_to_colorize? !large? && text? && !high_ratio_of_long_lines? end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
294 295 296 |
# File 'lib/linguist/blob_helper.rb', line 294 def sloc lines.grep(/\S/).size end |
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?
Return true or false
169 170 171 |
# File 'lib/linguist/blob_helper.rb', line 169 def solid? extname.downcase == '.stl' end |
#text? ⇒ Boolean
Public: Is the blob text?
Return true or false
155 156 157 |
# File 'lib/linguist/blob_helper.rb', line 155 def text? !binary? end |
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
238 239 240 |
# File 'lib/linguist/blob_helper.rb', line 238 def vendored? name =~ VendoredRegexp ? true : false end |
#viewable? ⇒ Boolean
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
223 224 225 |
# File 'lib/linguist/blob_helper.rb', line 223 def viewable? !large? && text? end |