Class: Care::Cache

Inherits:
Object
  • Object
show all
Defined in:
lib/care.rb

Overview

Stores cached pages of data from the given IO as strings. Pages are sized to be ‘page_size` or less (for the last page).

Instance Method Summary collapse

Constructor Details

#initialize(page_size = DEFAULT_PAGE_SIZE) ⇒ Cache

Initializes a new cache pages container with pages of given size

Raises:

  • (ArgumentError)


80
81
82
83
84
85
# File 'lib/care.rb', line 80

def initialize(page_size = DEFAULT_PAGE_SIZE)
  @page_size = page_size.to_i
  raise ArgumentError, 'The page size must be a positive Integer' unless @page_size > 0
  @pages = {}
  @lowest_known_empty_page = nil
end

Instance Method Details

#byteslice(io, at, n_bytes) ⇒ String?

Returns the maximum possible byte string that can be recovered from the given ‘io` at the given offset. If the IO has been exhausted, `nil` will be returned instead. Will use the cached pages where available, or fetch pages where necessary

Parameters:

  • io (#seek, #read)

    the IO to read data from

  • at (Integer)

    at which offset we have to read

  • n_bytes (Integer)

    how many bytes we want to read/cache

Returns:

  • (String, nil)

    the content read from the IO or ‘nil` if no data was available

Raises:

  • ArgumentError



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/care.rb', line 98

def byteslice(io, at, n_bytes)
  raise ArgumentError, "The number of bytes to fetch must be a positive Integer, but was #{n_bytes}" if n_bytes < 1
  raise ArgumentError, "Negative offsets are not supported (got #{at})" if at < 0

  first_page = at / @page_size
  last_page = (at + n_bytes) / @page_size

  relevant_pages = (first_page..last_page).map { |i| hydrate_page(io, i) }

  # Create one string combining all the pages which are relevant for
  # us - it is much easier to address that string instead of piecing
  # the output together page by page, and joining arrays of strings
  # is supposed to be optimized.
  slab = if relevant_pages.length > 1
    # If our read overlaps multiple pages, we do have to join them, this is
    # the general case
    relevant_pages.join
  else # We only have one page
    # Optimize a little. If we only have one page that we need to read from
    # - which is likely going to be the case *often* we can avoid allocating
    # a new string for the joined pages and juse use the only page
    # directly as the slab. Since it might contain a `nil` and we do
    # not join (which casts nils to strings) we take care of that too
    relevant_pages.first || ''
  end

  offset_in_slab = at % @page_size
  slice = slab.byteslice(offset_in_slab, n_bytes)

  # Returning an empty string from read() is very confusing for the caller,
  # and no builtins do this - if we are at EOF we should return nil
  slice if slice && !slice.empty?
end

#clearObject

Clears the page cache of all strings with data

Returns:

  • void



135
136
137
138
# File 'lib/care.rb', line 135

def clear
  @pages.map { |maybe_page_str| maybe_page_str.clear if maybe_page_str.respond_to?(:clear) }
  @pages.clear
end

#hydrate_page(io, page_i) ⇒ Object

Hydrates a page at the certain index or returns the contents of that page if it is already in the cache

Parameters:

  • io (IO)

    the IO to read from

  • page_i (Integer)

    which page (zero-based) to hydrate and return



145
146
147
148
149
150
151
# File 'lib/care.rb', line 145

def hydrate_page(io, page_i)
  # Avoid trying to read the page if we know there is no content to fill it
  # in the underlying IO
  return if @lowest_known_empty_page && page_i >= @lowest_known_empty_page

  @pages[page_i] ||= read_page(io, page_i)
end

#inspectObject

We provide an overridden implementation of #inspect to avoid printing the actual contents of the cached pages



155
156
157
158
159
160
161
162
163
164
165
166
# File 'lib/care.rb', line 155

def inspect
  # Simulate the builtin object ID output https://stackoverflow.com/a/11765495/153886
  oid_str = (object_id << 1).to_s(16).rjust(16, '0')

  ivars = instance_variables
  ivars.delete(:@pages)
  ivars_str = ivars.map do |ivar|
    "#{ivar}=#{instance_variable_get(ivar).inspect}"
  end.join(' ')
  synthetic_vars = 'num_hydrated_pages=%d' % @pages.length
  '#<%s:%s %s %s>' % [self.class, oid_str, synthetic_vars, ivars_str]
end

#read_page(io, page_i) ⇒ Object

Reads the requested page from the given IO

Parameters:

  • io (IO)

    the IO to read from

  • page_i (Integer)

    which page (zero-based) to read



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/care.rb', line 172

def read_page(io, page_i)
  Measurometer.increment_counter('format_parser.parser.care.page_reads_from_upsteam', 1)

  io.seek(page_i * @page_size)
  read_result = Measurometer.instrument('format_parser.care.read_page') { io.read(@page_size) }
  if read_result.nil?
    # If the read went past the end of the IO the read result will be nil,
    # so we know our IO is exhausted here
    @lowest_known_empty_page = page_i if @lowest_known_empty_page.nil? || @lowest_known_empty_page > page_i
  elsif read_result.bytesize < @page_size
    # If we read less than we initially wanted we know there are no pages
    # to read following this one, so we can also optimize
    @lowest_known_empty_page = page_i + 1
  end

  read_result
end