Class: SidekiqUniqueJobs::Orphans::RubyReaper
- Includes:
- Timing
- Defined in:
- lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb
Overview
this is a much slower version of the lua script but does not crash redis
Class DeleteOrphans provides deletion of orphaned digests
Constant Summary collapse
- SIDEKIQ_BEAT_PAUSE =
Returns a best guess of Sidekiq::Launcher::BEAT_PAUSE.
10- RUN_SUFFIX =
Returns the suffix for :RUN locks.
":RUN"- MAX_QUEUE_LENGTH =
Returns the maximum combined length of sidekiq queues for running the reaper.
1000
Constants inherited from Reaper
SidekiqUniqueJobs::Orphans::Reaper::REAPERS
Instance Attribute Summary collapse
-
#digests ⇒ Object
readonly
Returns the value of attribute digests.
-
#retried ⇒ Object
readonly
Returns the value of attribute retried.
-
#scheduled ⇒ Object
readonly
Returns the value of attribute scheduled.
-
#start_source ⇒ Object
readonly
Returns the value of attribute start_source.
-
#start_time ⇒ Integer
readonly
The clock stamp this execution started represented as integer (used for redis compatibility as it is more accurate than time).
-
#timeout_ms ⇒ Object
readonly
Returns the value of attribute timeout_ms.
Attributes inherited from Reaper
Instance Method Summary collapse
- #active?(digest) ⇒ Boolean
-
#belongs_to_job?(digest) ⇒ true, false
Checks if the digest has a matching job.
-
#call ⇒ Integer
Delete orphaned digests.
- #considered_active?(time_f) ⇒ Boolean
- #elapsed_ms ⇒ Object
-
#enqueued?(digest) ⇒ true
Checks if the digest exists in a Sidekiq::Queue.
- #entries(conn, queue, &block) ⇒ Object
- #expired_digests ⇒ Object
-
#in_sorted_set?(key, digest) ⇒ true, false
Checks a sorted set for the existance of this digest.
-
#initialize(conn) ⇒ RubyReaper
constructor
Initialize a new instance of DeleteOrphans.
- #match?(key_one, key_two) ⇒ Boolean
- #max_score ⇒ Object
- #orphaned_digests ⇒ Object
-
#orphans ⇒ Array<String>
Find orphaned digests.
-
#queues(conn) { ... } ⇒ void
Loops through all the redis queues and yields them one by one.
-
#queues_very_full? ⇒ Boolean
If sidekiq queues are very full, it becomes highly inefficient for the reaper because it must check every queued job to verify a digest is safe to delete The reaper checks queued jobs in batches of 50, adding 2 reads per digest With a queue length of 1,000 jobs, that’s over 20 extra reads per digest.
-
#retried?(digest) ⇒ true
Checks if the digest exists in the Sidekiq::RetrySet.
-
#scheduled?(digest) ⇒ true
Checks if the digest exists in the Sidekiq::ScheduledSet.
- #time_from_payload_timestamp(timestamp) ⇒ Object
- #timeout? ⇒ Boolean
Methods included from Timing
clock_stamp, now_f, time_source, timed
Methods inherited from Reaper
call, #config, #reaper, #reaper_count, #reaper_timeout
Methods included from JSON
dump_json, load_json, safe_load_json
Methods included from Logging
#build_message, included, #log_debug, #log_error, #log_fatal, #log_info, #log_warn, #logger, #logging_context, #with_configured_loggers_context, #with_logging_context
Methods included from Script::Caller
call_script, debug_lua, do_call, extract_args, max_history, normalize_argv, now_f, redis_version
Methods included from Connection
Constructor Details
#initialize(conn) ⇒ RubyReaper
Initialize a new instance of DeleteOrphans
58 59 60 61 62 63 64 65 66 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 58 def initialize(conn) super @digests = SidekiqUniqueJobs::Digests.new @scheduled = Redis::SortedSet.new(SCHEDULE) @retried = Redis::SortedSet.new(RETRY) @start_time = Time.now @start_source = time_source.call @timeout_ms = SidekiqUniqueJobs.config.reaper_timeout * 1000 end |
Instance Attribute Details
#digests ⇒ Object (readonly)
Returns the value of attribute digests.
27 28 29 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 27 def digests @digests end |
#retried ⇒ Object (readonly)
Returns the value of attribute retried.
35 36 37 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 35 def retried @retried end |
#scheduled ⇒ Object (readonly)
Returns the value of attribute scheduled.
31 32 33 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 31 def scheduled @scheduled end |
#start_source ⇒ Object (readonly)
Returns the value of attribute start_source.
46 47 48 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 46 def start_source @start_source end |
#start_time ⇒ Integer (readonly)
Returns The clock stamp this execution started represented as integer (used for redis compatibility as it is more accurate than time).
40 41 42 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 40 def start_time @start_time end |
#timeout_ms ⇒ Object (readonly)
Returns the value of attribute timeout_ms.
51 52 53 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 51 def timeout_ms @timeout_ms end |
Instance Method Details
#active?(digest) ⇒ Boolean
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 212 def active?(digest) Sidekiq.redis do |conn| procs = conn.sscan("processes").to_a return false if procs.empty? procs.sort.each do |key| valid, workers = conn.pipelined do |pipeline| # TODO: Remove the if statement in the future if pipeline.respond_to?(:exists?) pipeline.exists?(key) else pipeline.exists(key) end pipeline.hgetall("#{key}:work") end next unless valid next unless workers.any? workers.each_pair do |_tid, job| next unless (item = safe_load_json(job)) next unless (raw_payload = item[PAYLOAD]) payload = safe_load_json(raw_payload) return true if match?(digest, payload[LOCK_DIGEST]) return true if considered_active?((payload[CREATED_AT]).to_f) end end false end end |
#belongs_to_job?(digest) ⇒ true, false
Checks if the digest has a matching job.
1. It checks the scheduled set
2. It checks the retry set
3. It goes through all queues
4. It checks active processes
Note: Uses early returns for short-circuit evaluation. We can’t pipeline ZSCAN operations as they’re iterative.
157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 157 def belongs_to_job?(digest) # Short-circuit: Return immediately if found in scheduled set return true if scheduled?(digest) # Short-circuit: Return immediately if found in retry set return true if retried?(digest) # Short-circuit: Return immediately if found in any queue return true if enqueued?(digest) # Last check: active processes active?(digest) end |
#call ⇒ Integer
Delete orphaned digests
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 74 def call return if queues_very_full? BatchDelete.call(expired_digests, conn) BatchDelete.call(orphans, conn) # orphans.each_slice(500) do |chunk| # conn.pipelined do |pipeline| # chunk.each do |digest| # next if belongs_to_job?(digest) # pipeline.zadd(ORPHANED_DIGESTS, now_f, digest) # end # end # end end |
#considered_active?(time_f) ⇒ Boolean
253 254 255 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 253 def considered_active?(time_f) max_score < time_f end |
#elapsed_ms ⇒ Object
138 139 140 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 138 def elapsed_ms time_source.call - start_source end |
#enqueued?(digest) ⇒ true
Checks if the digest exists in a Sidekiq::Queue
200 201 202 203 204 205 206 207 208 209 210 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 200 def enqueued?(digest) Sidekiq.redis do |conn| queues(conn) do |queue| entries(conn, queue) do |entry| return true if entry.include?(digest) end end false end end |
#entries(conn, queue, &block) ⇒ Object
279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 279 def entries(conn, queue, &block) queue_key = "queue:#{queue}" initial_size = conn.llen(queue_key) deleted_size = 0 page = 0 page_size = 50 loop do range_start = (page * page_size) - deleted_size range_end = range_start + page_size - 1 entries = conn.lrange(queue_key, range_start, range_end) page += 1 break if entries.empty? entries.each(&block) deleted_size = initial_size - conn.llen(queue_key) # The queue is growing, not shrinking, just keep looping deleted_size = 0 if deleted_size.negative? end end |
#expired_digests ⇒ Object
91 92 93 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 91 def expired_digests conn.zrange(EXPIRING_DIGESTS, 0, max_score, "byscore") end |
#in_sorted_set?(key, digest) ⇒ true, false
Checks a sorted set for the existance of this digest
Note: Must use pattern matching because sorted sets contain job JSON strings, not just digests. The digest is embedded in the JSON as the “lock_digest” field. ZSCORE won’t work here as we need to search within the member content.
333 334 335 336 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 333 def in_sorted_set?(key, digest) # Increased count from 1 to 50 for better throughput conn.zscan(key, match: "*#{digest}*", count: 50).to_a.any? end |
#match?(key_one, key_two) ⇒ Boolean
247 248 249 250 251 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 247 def match?(key_one, key_two) return false if key_one.nil? || key_two.nil? key_one.delete_suffix(RUN_SUFFIX) == key_two.delete_suffix(RUN_SUFFIX) end |
#max_score ⇒ Object
99 100 101 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 99 def max_score (start_time - reaper_timeout - SIDEKIQ_BEAT_PAUSE).to_f end |
#orphaned_digests ⇒ Object
95 96 97 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 95 def orphaned_digests conn.zrange(ORPHANED_DIGESTS, 0, max_score, "byscore") end |
#orphans ⇒ Array<String>
Find orphaned digests
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 109 def orphans orphans = [] page = 0 per = reaper_count * 2 results = digests.byscore(0, max_score, offset: page * per, count: (page + 1) * per) while results.size.positive? results.each do |digest| break if timeout? next if belongs_to_job?(digest) orphans << digest break if orphans.size >= reaper_count end break if timeout? break if orphans.size >= reaper_count page += 1 results = digests.byscore(0, max_score, offset: page * per, count: (page + 1) * per) end orphans end |
#queues(conn) { ... } ⇒ void
This method returns an undefined value.
Loops through all the redis queues and yields them one by one
275 276 277 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 275 def queues(conn, &block) conn.sscan("queues").each(&block) end |
#queues_very_full? ⇒ Boolean
If sidekiq queues are very full, it becomes highly inefficient for the reaper because it must check every queued job to verify a digest is safe to delete The reaper checks queued jobs in batches of 50, adding 2 reads per digest With a queue length of 1,000 jobs, that’s over 20 extra reads per digest.
308 309 310 311 312 313 314 315 316 317 318 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 308 def queues_very_full? total_queue_size = 0 Sidekiq.redis do |conn| queues(conn) do |queue| total_queue_size += conn.llen("queue:#{queue}") return true if total_queue_size > MAX_QUEUE_LENGTH end end false end |
#retried?(digest) ⇒ true
Checks if the digest exists in the Sidekiq::RetrySet
189 190 191 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 189 def retried?(digest) in_sorted_set?(RETRY, digest) end |
#scheduled?(digest) ⇒ true
Checks if the digest exists in the Sidekiq::ScheduledSet
178 179 180 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 178 def scheduled?(digest) in_sorted_set?(SCHEDULE, digest) end |
#time_from_payload_timestamp(timestamp) ⇒ Object
257 258 259 260 261 262 263 264 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 257 def () if .is_a?(Float) # < Sidekiq 8, timestamps were stored as fractional seconds since the epoch Time.at().utc else Time.at( / 1000, % 1000, :millisecond) end end |
#timeout? ⇒ Boolean
134 135 136 |
# File 'lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb', line 134 def timeout? elapsed_ms >= timeout_ms end |