Class: Fbe::Iterate
- Inherits:
-
Object
- Object
- Fbe::Iterate
- Defined in:
- lib/fbe/iterate.rb
Overview
Repository iterator with stateful query execution.
This class provides a DSL for iterating through repositories and executing queries while maintaining state between iterations. It tracks progress using “marker” facts in the factbase and supports features like:
-
Stateful iteration with automatic restart capability
-
GitHub API quota awareness to prevent rate limit issues
-
Configurable repeat counts per repository
-
Timeout controls for long-running operations
The iterator executes a query for each repository, passing the previous result as context. If the query returns nil, it restarts from the beginning for that repository. Progress is persisted in the factbase to support resuming after interruptions.
- Author
-
Yegor Bugayenko ([email protected])
- Copyright
-
Copyright © 2024-2025 Zerocracy
- License
-
MIT
Instance Method Summary collapse
-
#as(label) ⇒ nil
Sets the label for tracking iteration state.
-
#by(query) ⇒ nil
Sets the query to execute for each iteration.
-
#initialize(fb:, loog:, options:, global:) ⇒ Iterate
constructor
Creates a new iterator instance.
-
#over(timeout: 2 * 60) {|Integer, Object| ... } ⇒ nil
Executes the iteration over all configured repositories.
-
#quota_aware ⇒ nil
Makes the iterator aware of GitHub API quota limits.
-
#repeats(repeats) ⇒ nil
Sets the maximum number of iterations per repository.
Constructor Details
#initialize(fb:, loog:, options:, global:) ⇒ Iterate
Creates a new iterator instance.
85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/fbe/iterate.rb', line 85 def initialize(fb:, loog:, options:, global:) @fb = fb @loog = loog @options = @global = global @label = nil @since = 0 @query = nil @repeats = 1 @quota_aware = false end |
Instance Method Details
#as(label) ⇒ nil
Sets the label for tracking iteration state.
The label is used to create marker facts in the factbase that track the last processed item for each repository. This enables resuming iteration after interruptions.
155 156 157 158 159 |
# File 'lib/fbe/iterate.rb', line 155 def as(label) raise 'Label is already set' unless @label.nil? raise 'Cannot set "label" to nil' if label.nil? @label = label end |
#by(query) ⇒ nil
Sets the query to execute for each iteration.
The query can use two special variables:
-
$before: The value from the previous iteration (or initial value)
-
$repository: The current repository ID
138 139 140 141 142 |
# File 'lib/fbe/iterate.rb', line 138 def by(query) raise 'Query is already set' unless @query.nil? raise 'Cannot set query to nil' if query.nil? @query = query end |
#over(timeout: 2 * 60) {|Integer, Object| ... } ⇒ nil
Executes the iteration over all configured repositories.
For each repository, retrieves the last processed value (or uses the initial value from since
) and executes the configured query with it. The query receives two parameters: $before (the last processed value) and $repository (GitHub repository ID).
When the query returns a non-nil result, the block is called with the repository ID and query result. The block must return an Integer that will be stored as the new “latest” value for the next iteration.
When the query returns nil, the iteration for that repository restarts from the initial value (set by since
), and the block is NOT called.
The method tracks progress using marker facts and supports:
-
Automatic restart when query returns nil
-
Timeout to prevent infinite loops
-
GitHub API quota checking (if enabled)
-
State persistence for resuming after interruptions
Processing flow for each repository:
-
Read the “latest” value from factbase (or use
since
if not found) -
Execute the query with $before=latest and $repository=repo_id
-
If query returns nil: restart from
since
value, skip to next repo -
If query returns a value: call the block with (repo_id, query_result)
-
Store the block’s return value as the new “latest” for next iteration
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# File 'lib/fbe/iterate.rb', line 198 def over(timeout: 2 * 60, &) raise 'Use "as" first' if @label.nil? raise 'Use "by" first' if @query.nil? seen = {} oct = Fbe.octo(loog: @loog, options: @options, global: @global) if oct.off_quota? @loog.debug('We are off GitHub quota, cannot even start, sorry') return end repos = Fbe.unmask_repos(loog: @loog, options: @options, global: @global) restarted = [] start = Time.now loop do if oct.off_quota? @loog.info("We are off GitHub quota, time to stop after #{start.ago}") break end repos.each do |repo| if oct.off_quota? @loog.debug("We are off GitHub quota, we must skip #{repo}") break end if Time.now - start > timeout @loog.info("We are doing this for #{start.ago} already, won't check #{repo}") next end next if restarted.include?(repo) seen[repo] = 0 if seen[repo].nil? if seen[repo] >= @repeats @loog.debug("We've seen too many (#{seen[repo]}) in #{repo}, let's see next one") next end rid = oct.repo_id_by_name(repo) before = @fb.query( "(agg (and (eq what '#{@label}') (eq where 'github') (eq repository #{rid})) (first latest))" ).one @fb.query("(and (eq what '#{@label}') (eq where 'github') (eq repository #{rid}))").delete! before = before.nil? ? @since : before.first nxt = @fb.query(@query).one(@fb, before:, repository: rid) after = if nxt.nil? @loog.debug("Next element after ##{before} not suggested, re-starting from ##{@since}: #{@query}") restarted << repo @since else @loog.debug("Next is ##{nxt}, starting from it...") yield(rid, nxt) end raise "Iterator must return an Integer, while #{after.class} returned" unless after.is_a?(Integer) f = @fb.insert f.where = 'github' f.repository = rid f.latest = if after.nil? @loog.debug("After is nil at #{repo}, setting the 'latest' to ##{nxt}") nxt else @loog.debug("After is ##{after} at #{repo}, setting the 'latest' to it") after end f.what = @label seen[repo] += 1 end unless seen.any? { |r, v| v < @repeats && !restarted.include?(r) } @loog.debug("No more repos to scan (out of #{repos.size}), quitting after #{start.ago}") break end if restarted.size == repos.size @loog.debug("All #{repos.size} repos restarted, quitting after #{start.ago}") break end if Time.now - start > timeout @loog.info("We are iterating for #{start.ago} already, time to give up") break end end @loog.debug("Finished scanning #{repos.size} repos in #{start.ago}: #{seen.map { |k, v| "#{k}:#{v}" }.joined}") end |
#quota_aware ⇒ nil
Makes the iterator aware of GitHub API quota limits.
When enabled, the iterator will check quota status before processing each repository and gracefully stop when the quota is exhausted. This prevents API errors and allows for resuming later.
107 108 109 |
# File 'lib/fbe/iterate.rb', line 107 def quota_aware @quota_aware = true end |
#repeats(repeats) ⇒ nil
Sets the maximum number of iterations per repository.
Controls how many times the query will be executed for each repository before moving to the next one. Useful for limiting processing scope.
121 122 123 124 125 |
# File 'lib/fbe/iterate.rb', line 121 def repeats(repeats) raise 'Cannot set "repeats" to nil' if repeats.nil? raise 'The "repeats" must be a positive integer' unless repeats.positive? @repeats = repeats end |