Class: Gitlab::Git::Blob

Inherits:
Object
  • Object
show all
Extended by:
WrapsGitalyErrors
Includes:
BlobHelper, EncodingHelper
Defined in:
lib/gitlab/git/blob.rb

Constant Summary collapse

MAX_DATA_DISPLAY_SIZE =

This number is the maximum amount of data that we want to display to the user. We load as much as we can for encoding detection and LFS pointer parsing. All other cases where we need full blob data should use load_all_data!.

10.megabytes
BATCH_SIZE =

The number of blobs loaded in a single Gitaly call When a large number of blobs requested, we'd want to fetch them in multiple Gitaly calls

250
LFS_POINTER_MIN_SIZE =

These limits are used as a heuristic to ignore files which can't be LFS pointers. The format of these is described in github.com/git-lfs/git-lfs/blob/master/docs/spec.md#the-pointer

120.bytes
LFS_POINTER_MAX_SIZE =
200.bytes

Constants included from EncodingHelper

EncodingHelper::ENCODING_CONFIDENCE_THRESHOLD

Constants included from BlobHelper

BlobHelper::MEGABYTE

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from WrapsGitalyErrors

wrapped_gitaly_errors

Methods included from EncodingHelper

#binary_io, #detect_binary?, #detect_libgit2_binary?, #encode!, #encode_binary, #encode_utf8

Methods included from BlobHelper

#_mime_type, #binary_mime_type?, #content_type, #detect_encoding, #empty?, #encoded_newlines_re, #encoding, #extname, #image?, #known_extension?, #large?, #lines, #mime_type, #ruby_encoding, #text_in_repo?, #viewable?

Methods included from Utils::StrongMemoize

#clear_memoization, #strong_memoize, #strong_memoized?

Constructor Details

#initialize(options) ⇒ Blob

Returns a new instance of Blob.


122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/gitlab/git/blob.rb', line 122

def initialize(options)
  %w(id name path size data mode commit_id binary).each do |key|
    self.__send__("#{key}=", options[key.to_sym]) # rubocop:disable GitlabSecurity/PublicSend
  end

  # Retain the actual size before it is encoded
  @loaded_size = @data.bytesize if @data
  @loaded_all_data = @loaded_size == size

  record_metric_blob_size
  record_metric_truncated(truncated?)
end

Instance Attribute Details

#binaryObject

Returns the value of attribute binary


27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def binary
  @binary
end

#commit_idObject

Returns the value of attribute commit_id


27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def commit_id
  @commit_id
end

#dataObject


139
140
141
# File 'lib/gitlab/git/blob.rb', line 139

def data
  encode! @data
end

#idObject

Returns the value of attribute id


27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def id
  @id
end

#loaded_sizeObject

Returns the value of attribute loaded_size


27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def loaded_size
  @loaded_size
end

#modeObject

Returns the value of attribute mode


27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def mode
  @mode
end

#nameObject


159
160
161
# File 'lib/gitlab/git/blob.rb', line 159

def name
  encode! @name
end

#pathObject


163
164
165
# File 'lib/gitlab/git/blob.rb', line 163

def path
  encode! @path
end

#sizeObject

Returns the value of attribute size


27
28
29
# File 'lib/gitlab/git/blob.rb', line 27

def size
  @size
end

Class Method Details

.batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object

Returns an array of Blob instances, specified in blob_references as [[commit_sha, path], [commit_sha, path], …]. If blob_size_limit < 0 then the full blob contents are returned. If blob_size_limit >= 0 then each blob will contain no more than limit bytes in its data attribute.

Keep in mind that this method may allocate a lot of memory. It is up to the caller to limit the number of blobs and blob_size_limit.


92
93
94
95
96
# File 'lib/gitlab/git/blob.rb', line 92

def batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE)
  blob_references.each_slice(BATCH_SIZE).flat_map do |refs|
    repository.gitaly_blob_client.get_blobs(refs, blob_size_limit).to_a
  end
end

.batch_lfs_pointers(repository, blob_ids) ⇒ Object

Find LFS blobs given an array of sha ids Returns array of Gitlab::Git::Blob Does not guarantee blob data will be set


107
108
109
110
111
# File 'lib/gitlab/git/blob.rb', line 107

def batch_lfs_pointers(repository, blob_ids)
  wrapped_gitaly_errors do
    repository.gitaly_blob_client.batch_lfs_pointers(blob_ids.to_a)
  end
end

.batch_metadata(repository, blob_references) ⇒ Object

Returns an array of Blob instances just with the metadata, that means the data attribute has no content.


100
101
102
# File 'lib/gitlab/git/blob.rb', line 100

def (repository, blob_references)
  batch(repository, blob_references, blob_size_limit: 0)
end

.binary?(data) ⇒ Boolean

Returns:

  • (Boolean)

113
114
115
# File 'lib/gitlab/git/blob.rb', line 113

def binary?(data)
  EncodingHelper.detect_libgit2_binary?(data)
end

.find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object


48
49
50
# File 'lib/gitlab/git/blob.rb', line 48

def find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE)
  tree_entry(repository, sha, path, limit)
end

.gitlab_blob_sizeObject


38
39
40
41
42
43
44
45
# File 'lib/gitlab/git/blob.rb', line 38

def self.gitlab_blob_size
  @gitlab_blob_size ||= ::Gitlab::Metrics.histogram(
    :gitlab_blob_size,
    'Gitlab::Git::Blob size',
    {},
    [1_000, 5_000, 10_000, 50_000, 100_000, 500_000, 1_000_000]
  )
end

.gitlab_blob_truncated_falseObject


34
35
36
# File 'lib/gitlab/git/blob.rb', line 34

def self.gitlab_blob_truncated_false
  @gitlab_blob_truncated_false ||= ::Gitlab::Metrics.counter(:gitlab_blob_truncated_false, 'blob.truncated? == false')
end

.gitlab_blob_truncated_trueObject


30
31
32
# File 'lib/gitlab/git/blob.rb', line 30

def self.gitlab_blob_truncated_true
  @gitlab_blob_truncated_true ||= ::Gitlab::Metrics.counter(:gitlab_blob_truncated_true, 'blob.truncated? == true')
end

.raw(repository, sha) ⇒ Object


80
81
82
# File 'lib/gitlab/git/blob.rb', line 80

def raw(repository, sha)
  repository.gitaly_blob_client.get_blob(oid: sha, limit: MAX_DATA_DISPLAY_SIZE)
end

.size_could_be_lfs?(size) ⇒ Boolean

Returns:

  • (Boolean)

117
118
119
# File 'lib/gitlab/git/blob.rb', line 117

def size_could_be_lfs?(size)
  size.between?(LFS_POINTER_MIN_SIZE, LFS_POINTER_MAX_SIZE)
end

.tree_entry(repository, sha, path, limit) ⇒ Object


52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/gitlab/git/blob.rb', line 52

def tree_entry(repository, sha, path, limit)
  return unless path

  path = path.sub(%r{\A/*}, '')
  path = '/' if path.empty?
  name = File.basename(path)

  # Gitaly will think that setting the limit to 0 means unlimited, while
  # the client might only need the metadata and thus set the limit to 0.
  # In this method we'll then set the limit to 1, but clear the byte of data
  # that we got back so for the outside world it looks like the limit was
  # actually 0.
  req_limit = limit == 0 ? 1 : limit

  entry = Gitlab::GitalyClient::CommitService.new(repository).tree_entry(sha, path, req_limit)
  return unless entry

  entry.data = "" if limit == 0

  case entry.type
  when :COMMIT
    new(id: entry.oid, name: name, size: 0, data: '', path: path, commit_id: sha)
  when :BLOB
    new(id: entry.oid, name: name, size: entry.size, data: entry.data.dup, mode: entry.mode.to_s(8),
        path: path, commit_id: sha, binary: binary?(entry.data))
  end
end

Instance Method Details

#binary_in_repo?Boolean

Returns:

  • (Boolean)

135
136
137
# File 'lib/gitlab/git/blob.rb', line 135

def binary_in_repo?
  @binary.nil? ? super : @binary == true
end

#external_storageObject


200
201
202
203
204
# File 'lib/gitlab/git/blob.rb', line 200

def external_storage
  return unless lfs_pointer?

  :lfs
end

#lfs_oidObject


182
183
184
185
186
187
188
189
# File 'lib/gitlab/git/blob.rb', line 182

def lfs_oid
  if has_lfs_version_key?
    oid = data.match(/(?<=sha256:)([0-9a-f]{64})/)
    return oid[1] if oid
  end

  nil
end

#lfs_pointer?Boolean

Valid LFS object pointer is a text file consisting of version oid size see github.com/github/git-lfs/blob/v1.1.0/docs/spec.md#the-pointer

Returns:

  • (Boolean)

178
179
180
# File 'lib/gitlab/git/blob.rb', line 178

def lfs_pointer?
  self.class.size_could_be_lfs?(size) && has_lfs_version_key? && lfs_oid.present? && lfs_size.present?
end

#lfs_sizeObject Also known as: external_size


191
192
193
194
195
196
197
198
# File 'lib/gitlab/git/blob.rb', line 191

def lfs_size
  if has_lfs_version_key?
    size = data.match(/(?<=size )([0-9]+)/)
    return size[1].to_i if size
  end

  nil
end

#load_all_data!(repository) ⇒ Object

Load all blob data (not just the first MAX_DATA_DISPLAY_SIZE bytes) into memory as a Ruby string.


145
146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/gitlab/git/blob.rb', line 145

def load_all_data!(repository)
  return if @data == '' # don't mess with submodule blobs

  # Even if we return early, recalculate whether this blob is binary in
  # case a blob was initialized as text but the full data isn't
  @binary = nil

  return if @loaded_all_data

  @data = repository.gitaly_blob_client.get_blob(oid: id, limit: -1).data
  @loaded_all_data = true
  @loaded_size = @data.bytesize
end

#truncated?Boolean

Returns:

  • (Boolean)

167
168
169
170
171
# File 'lib/gitlab/git/blob.rb', line 167

def truncated?
  return false unless size && loaded_size

  size > loaded_size
end