Class: Gitlab::Git::Blob

Inherits:
Object
  • Object
show all
Extended by:
WrapsGitalyErrors
Includes:
BlobHelper, EncodingHelper
Defined in:
lib/gitlab/git/blob.rb

Constant Summary collapse

MAX_DATA_DISPLAY_SIZE =

This number is the maximum amount of data that we want to display to the user. We load as much as we can for encoding detection and LFS pointer parsing. All other cases where we need full blob data should use load_all_data!.

10.megabytes
BATCH_SIZE =

The number of blobs loaded in a single Gitaly call When a large number of blobs requested, we’d want to fetch them in multiple Gitaly calls

250
LFS_POINTER_MIN_SIZE =

These limits are used as a heuristic to ignore files which can’t be LFS pointers. The format of these is described in github.com/git-lfs/git-lfs/blob/master/docs/spec.md#the-pointer

120.bytes
LFS_POINTER_MAX_SIZE =
200.bytes

Constants included from EncodingHelper

EncodingHelper::BOM_UTF8, EncodingHelper::ENCODING_CONFIDENCE_THRESHOLD, EncodingHelper::ESCAPED_CHARS, EncodingHelper::UNICODE_REPLACEMENT_CHARACTER

Constants included from BlobHelper

BlobHelper::MEGABYTE

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from WrapsGitalyErrors

wrapped_gitaly_errors

Methods included from EncodingHelper

#binary_io, #detect_binary?, #detect_encoding, #detect_libgit2_binary?, #encode!, #encode_binary, #encode_utf8, #encode_utf8_no_detect, #encode_utf8_with_escaping!, #encode_utf8_with_replacement_character, #strip_bom, #unquote_path

Methods included from BlobHelper

#_mime_type, #binary_mime_type?, #content_type, #empty?, #encoded_newlines_re, #encoding, #extname, #image?, #known_extension?, #large?, #lines, #mime_type, #ruby_encoding, #text_in_repo?, #viewable?

Constructor Details

#initialize(options) ⇒ Blob

Returns a new instance of Blob.



122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/gitlab/git/blob.rb', line 122

def initialize(options)
  %w[id name path size data mode commit_id binary].each do |key|
    self.__send__("#{key}=", options[key.to_sym]) # rubocop:disable GitlabSecurity/PublicSend
  end

  # Retain the actual size before it is encoded
  @loaded_size = @data.bytesize if @data
  @loaded_all_data = @loaded_size == size

  # Recalculate binary status if we loaded all data
  @binary = nil if @loaded_all_data

  record_metric_blob_size
  record_metric_truncated(truncated?)
end

Instance Attribute Details

#binaryObject

Returns the value of attribute binary.



27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def binary
  @binary
end

#commit_idObject

Returns the value of attribute commit_id.



27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def commit_id
  @commit_id
end

#dataObject



142
143
144
# File 'lib/gitlab/git/blob.rb', line 142

def data
  encode! @data
end

#idObject

Returns the value of attribute id.



27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def id
  @id
end

#loaded_sizeObject

Returns the value of attribute loaded_size.



27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def loaded_size
  @loaded_size
end

#modeObject

Returns the value of attribute mode.



27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def mode
  @mode
end

#nameObject



162
163
164
# File 'lib/gitlab/git/blob.rb', line 162

def name
  encode! @name
end

#pathObject



166
167
168
# File 'lib/gitlab/git/blob.rb', line 166

def path
  encode! @path
end

#sizeObject

Returns the value of attribute size.



27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def size
  @size
end

Class Method Details

.batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object

Returns an array of Blob instances, specified in blob_references as [[commit_sha, path], [commit_sha, path], …]. If blob_size_limit < 0 then the full blob contents are returned. If blob_size_limit >= 0 then each blob will contain no more than limit bytes in its data attribute.

Keep in mind that this method may allocate a lot of memory. It is up to the caller to limit the number of blobs and blob_size_limit.



92
93
94
95
96
# File 'lib/gitlab/git/blob.rb', line 92

def batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE)
  blob_references.each_slice(BATCH_SIZE).flat_map do |refs|
    repository.gitaly_blob_client.get_blobs(refs, blob_size_limit).to_a
  end
end

.batch_lfs_pointers(repository, blob_ids) ⇒ Object

Find LFS blobs given an array of sha ids Returns array of Gitlab::Git::Blob Does not guarantee blob data will be set



107
108
109
110
111
# File 'lib/gitlab/git/blob.rb', line 107

def batch_lfs_pointers(repository, blob_ids)
  wrapped_gitaly_errors do
    repository.gitaly_blob_client.batch_lfs_pointers(blob_ids.to_a)
  end
end

.batch_metadata(repository, blob_references) ⇒ Object

Returns an array of Blob instances just with the metadata, that means the data attribute has no content.



100
101
102
# File 'lib/gitlab/git/blob.rb', line 100

def (repository, blob_references)
  batch(repository, blob_references, blob_size_limit: 0)
end

.binary?(data, cache_key: nil) ⇒ Boolean

Returns:

  • (Boolean)


113
114
115
# File 'lib/gitlab/git/blob.rb', line 113

def binary?(data, cache_key: nil)
  EncodingHelper.detect_libgit2_binary?(data, cache_key: cache_key)
end

.find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object



48
49
50
# File 'lib/gitlab/git/blob.rb', line 48

def find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE)
  tree_entry(repository, sha, path, limit)
end

.gitlab_blob_sizeObject



38
39
40
41
42
43
44
45
# File 'lib/gitlab/git/blob.rb', line 38

def self.gitlab_blob_size
  @gitlab_blob_size ||= ::Gitlab::Metrics.histogram(
    :gitlab_blob_size,
    'Gitlab::Git::Blob size',
    {},
    [1_000, 5_000, 10_000, 50_000, 100_000, 500_000, 1_000_000]
  )
end

.gitlab_blob_truncated_falseObject



34
35
36
# File 'lib/gitlab/git/blob.rb', line 34

def self.gitlab_blob_truncated_false
  @gitlab_blob_truncated_false ||= ::Gitlab::Metrics.counter(:gitlab_blob_truncated_false, 'blob.truncated? == false')
end

.gitlab_blob_truncated_trueObject



30
31
32
# File 'lib/gitlab/git/blob.rb', line 30

def self.gitlab_blob_truncated_true
  @gitlab_blob_truncated_true ||= ::Gitlab::Metrics.counter(:gitlab_blob_truncated_true, 'blob.truncated? == true')
end

.raw(repository, sha, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object



80
81
82
# File 'lib/gitlab/git/blob.rb', line 80

def raw(repository, sha, limit: MAX_DATA_DISPLAY_SIZE)
  repository.gitaly_blob_client.get_blob(oid: sha, limit: limit)
end

.size_could_be_lfs?(size) ⇒ Boolean

Returns:

  • (Boolean)


117
118
119
# File 'lib/gitlab/git/blob.rb', line 117

def size_could_be_lfs?(size)
  size.between?(LFS_POINTER_MIN_SIZE, LFS_POINTER_MAX_SIZE)
end

.tree_entry(repository, sha, path, limit) ⇒ Object



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/gitlab/git/blob.rb', line 52

def tree_entry(repository, sha, path, limit)
  return unless path

  path = path.sub(%r{\A/*}, '')
  path = '/' if path.empty?
  name = File.basename(path)

  # Gitaly will think that setting the limit to 0 means unlimited, while
  # the client might only need the metadata and thus set the limit to 0.
  # In this method we'll then set the limit to 1, but clear the byte of data
  # that we got back so for the outside world it looks like the limit was
  # actually 0.
  req_limit = limit == 0 ? 1 : limit

  entry = Gitlab::GitalyClient::CommitService.new(repository).tree_entry(sha, path, req_limit)
  return unless entry

  entry.data = "" if limit == 0

  case entry.type
  when :COMMIT
    new(id: entry.oid, name: name, size: 0, data: '', path: path, commit_id: sha)
  when :BLOB
    new(id: entry.oid, name: name, size: entry.size, data: entry.data.dup, mode: entry.mode.to_s(8),
        path: path, commit_id: sha, binary: binary?(entry.data))
  end
end

Instance Method Details

#binary_in_repo?Boolean

Returns:

  • (Boolean)


138
139
140
# File 'lib/gitlab/git/blob.rb', line 138

def binary_in_repo?
  @binary.nil? ? super : @binary == true
end

#external_storageObject



203
204
205
206
207
# File 'lib/gitlab/git/blob.rb', line 203

def external_storage
  return unless lfs_pointer?

  :lfs
end

#lfs_oidObject



185
186
187
188
189
190
191
192
# File 'lib/gitlab/git/blob.rb', line 185

def lfs_oid
  if has_lfs_version_key?
    oid = data.match(/(?<=sha256:)([0-9a-f]{64})/)
    return oid[1] if oid
  end

  nil
end

#lfs_pointer?Boolean

Valid LFS object pointer is a text file consisting of version oid size see github.com/github/git-lfs/blob/v1.1.0/docs/spec.md#the-pointer

Returns:

  • (Boolean)


181
182
183
# File 'lib/gitlab/git/blob.rb', line 181

def lfs_pointer?
  self.class.size_could_be_lfs?(size) && has_lfs_version_key? && lfs_oid.present? && lfs_size.present?
end

#lfs_sizeObject Also known as: external_size



194
195
196
197
198
199
200
201
# File 'lib/gitlab/git/blob.rb', line 194

def lfs_size
  if has_lfs_version_key?
    size = data.match(/(?<=size )([0-9]+)/)
    return size[1].to_i if size
  end

  nil
end

#load_all_data!(repository) ⇒ Object

Load all blob data (not just the first MAX_DATA_DISPLAY_SIZE bytes) into memory as a Ruby string.



148
149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/gitlab/git/blob.rb', line 148

def load_all_data!(repository)
  return if @data == '' # don't mess with submodule blobs

  # Even if we return early, recalculate whether this blob is binary in
  # case a blob was initialized as text but the full data isn't
  @binary = nil

  return if @loaded_all_data

  @data = repository.gitaly_blob_client.get_blob(oid: id, limit: -1).data
  @loaded_all_data = true
  @loaded_size = @data.bytesize
end

#truncated?Boolean

Returns:

  • (Boolean)


170
171
172
173
174
# File 'lib/gitlab/git/blob.rb', line 170

def truncated?
  return false unless size && loaded_size

  size > loaded_size
end