Class: Care::Cache

Inherits:
Object
  • Object
show all
Defined in:
lib/care.rb

Overview

Stores cached pages of data from the given IO as strings. Pages are sized to be ‘page_size` or less (for the last page).

Instance Method Summary collapse

Constructor Details

#initialize(page_size = DEFAULT_PAGE_SIZE) ⇒ Cache

Initializes a new cache pages container with pages of given size

Raises:

  • (ArgumentError)


80
81
82
83
84
85
# File 'lib/care.rb', line 80

def initialize(page_size = DEFAULT_PAGE_SIZE)
  @page_size = page_size.to_i
  raise ArgumentError, 'The page size must be a positive Integer' unless @page_size > 0
  @pages = {}
  @lowest_known_empty_page = nil
end

Instance Method Details

#byteslice(io, at, n_bytes) ⇒ String?

Returns the maximum possible byte string that can be recovered from the given ‘io` at the given offset. If the IO has been exhausted, `nil` will be returned instead. Will use the cached pages where available, or fetch pages where necessary

Parameters:

  • io (#seek, #read)

    the IO to read data from

  • at (Integer)

    at which offset we have to read

  • n_bytes (Integer)

    how many bytes we want to read/cache

Returns:

  • (String, nil)

    the content read from the IO or ‘nil` if no data was available

Raises:

  • ArgumentError



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# File 'lib/care.rb', line 98

def byteslice(io, at, n_bytes)
  if n_bytes < 1
    raise ArgumentError, "The number of bytes to fetch must be a positive Integer, but was #{n_bytes}"
  end
  if at < 0
    raise ArgumentError, "Negative offsets are not supported (got #{at})"
  end

  first_page = at / @page_size
  last_page = (at + n_bytes) / @page_size

  relevant_pages = (first_page..last_page).map { |i| hydrate_page(io, i) }

  # Create one string combining all the pages which are relevant for
  # us - it is much easier to address that string instead of piecing
  # the output together page by page, and joining arrays of strings
  # is supposed to be optimized.
  slab = if relevant_pages.length > 1
    # If our read overlaps multiple pages, we do have to join them, this is
    # the general case
    relevant_pages.join
  else # We only have one page
    # Optimize a little. If we only have one page that we need to read from
    # - which is likely going to be the case *often* we can avoid allocating
    # a new string for the joined pages and juse use the only page
    # directly as the slab. Since it might contain a `nil` and we do
    # not join (which casts nils to strings) we take care of that too
    relevant_pages.first || ''
  end

  offset_in_slab = at % @page_size
  slice = slab.byteslice(offset_in_slab, n_bytes)

  # Returning an empty string from read() is very confusing for the caller,
  # and no builtins do this - if we are at EOF we should return nil
  slice if slice && !slice.empty?
end

#clearObject

Clears the page cache of all strings with data

Returns:

  • void



139
140
141
# File 'lib/care.rb', line 139

def clear
  @pages.clear
end

#hydrate_page(io, page_i) ⇒ Object

Hydrates a page at the certain index or returns the contents of that page if it is already in the cache

Parameters:

  • io (IO)

    the IO to read from

  • page_i (Integer)

    which page (zero-based) to hydrate and return



148
149
150
151
152
153
154
# File 'lib/care.rb', line 148

def hydrate_page(io, page_i)
  # Avoid trying to read the page if we know there is no content to fill it
  # in the underlying IO
  return if @lowest_known_empty_page && page_i >= @lowest_known_empty_page

  @pages[page_i] ||= read_page(io, page_i)
end

#inspectObject

We provide an overridden implementation of #inspect to avoid printing the actual contents of the cached pages



158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/care.rb', line 158

def inspect
  # Simulate the builtin object ID output https://stackoverflow.com/a/11765495/153886
  oid_str = (object_id << 1).to_s(16).rjust(16, '0')

  ivars = instance_variables
  ivars.delete(:@pages)
  ivars_str = ivars.map do |ivar|
    "#{ivar}=#{instance_variable_get(ivar).inspect}"
  end.join(' ')
  synthetic_vars = 'num_hydrated_pages=%d' % @pages.length
  '#<%s:%s %s %s>' % [self.class, oid_str, synthetic_vars, ivars_str]
end

#read_page(io, page_i) ⇒ Object

Reads the requested page from the given IO

Parameters:

  • io (IO)

    the IO to read from

  • page_i (Integer)

    which page (zero-based) to read



175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
# File 'lib/care.rb', line 175

def read_page(io, page_i)
  FormatParser::Measurometer.increment_counter('format_parser.parser.Care.page_reads_from_upsteam', 1)

  io.seek(page_i * @page_size)
  read_result = io.read(@page_size)
  if read_result.nil?
    # If the read went past the end of the IO the read result will be nil,
    # so we know our IO is exhausted here
    if @lowest_known_empty_page.nil? || @lowest_known_empty_page > page_i
      @lowest_known_empty_page = page_i
    end
  elsif read_result.bytesize < @page_size
    # If we read less than we initially wanted we know there are no pages
    # to read following this one, so we can also optimize
    @lowest_known_empty_page = page_i + 1
  end

  read_result
end