Method: Sequel::Dataset#paged_each
- Defined in:
- lib/sequel/dataset/actions.rb
#paged_each(opts = OPTS) ⇒ Object
Yields each row in the dataset, but internally uses multiple queries as needed to process the entire result set without keeping all rows in the dataset in memory, even if the underlying driver buffers all query results in memory.
Because this uses multiple queries internally, in order to remain consistent, it also uses a transaction internally. Additionally, to work correctly, the dataset must have unambiguous order. Using an ambiguous order can result in an infinite loop, as well as subtler bugs such as yielding duplicate rows or rows being skipped.
Sequel checks that the datasets using this method have an order, but it cannot ensure that the order is unambiguous.
Note that this method is not safe to use on many adapters if you are running additional queries inside the provided block. If you are running queries inside the block, use a separate thread or shard inside paged_each
.
Options:
- :rows_per_fetch
-
The number of rows to fetch per query. Defaults to 1000.
- :strategy
-
The strategy to use for paging of results. By default this is :offset, for using an approach with a limit and offset for every page. This can be set to :filter, which uses a limit and a filter that excludes rows from previous pages. In order for this strategy to work, you must be selecting the columns you are ordering by, and none of the columns can contain NULLs. Note that some Sequel adapters have optimized implementations that will use cursors or streaming regardless of the :strategy option used.
- :filter_values
-
If the strategy: :filter option is used, this option should be a proc that accepts the last retrieved row for the previous page and an array of ORDER BY expressions, and returns an array of values relating to those expressions for the last retrieved row. You will need to use this option if your ORDER BY expressions are not simple columns, if they contain qualified identifiers that would be ambiguous unqualified, if they contain any identifiers that are aliased in SELECT, and potentially other cases.
- :skip_transaction
-
Do not use a transaction. This can be useful if you want to prevent a lock on the database table, at the expense of consistency.
Examples:
DB[:table].order(:id).paged_each{|row| }
# SELECT * FROM table ORDER BY id LIMIT 1000
# SELECT * FROM table ORDER BY id LIMIT 1000 OFFSET 1000
# ...
DB[:table].order(:id).paged_each(rows_per_fetch: 100){|row| }
# SELECT * FROM table ORDER BY id LIMIT 100
# SELECT * FROM table ORDER BY id LIMIT 100 OFFSET 100
# ...
DB[:table].order(:id).paged_each(strategy: :filter){|row| }
# SELECT * FROM table ORDER BY id LIMIT 1000
# SELECT * FROM table WHERE id > 1001 ORDER BY id LIMIT 1000
# ...
DB[:table].order(:id).paged_each(strategy: :filter,
filter_values: lambda{|row, exprs| [row[:id]]}){|row| }
# SELECT * FROM table ORDER BY id LIMIT 1000
# SELECT * FROM table WHERE id > 1001 ORDER BY id LIMIT 1000
# ...
618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 |
# File 'lib/sequel/dataset/actions.rb', line 618 def paged_each(opts=OPTS) unless @opts[:order] raise Sequel::Error, "Dataset#paged_each requires the dataset be ordered" end unless defined?(yield) return enum_for(:paged_each, opts) end total_limit = @opts[:limit] offset = @opts[:offset] if server = @opts[:server] opts = Hash[opts] opts[:server] = server end rows_per_fetch = opts[:rows_per_fetch] || 1000 strategy = if offset || total_limit :offset else opts[:strategy] || :offset end db.transaction(opts) do case strategy when :filter filter_values = opts[:filter_values] || proc{|row, exprs| exprs.map{|e| row[hash_key_symbol(e)]}} base_ds = ds = limit(rows_per_fetch) while ds last_row = nil ds.each do |row| last_row = row yield row end ds = (base_ds.where(ignore_values_preceding(last_row, &filter_values)) if last_row) end else offset ||= 0 num_rows_yielded = rows_per_fetch total_rows = 0 while num_rows_yielded == rows_per_fetch && (total_limit.nil? || total_rows < total_limit) if total_limit && total_rows + rows_per_fetch > total_limit rows_per_fetch = total_limit - total_rows end num_rows_yielded = 0 limit(rows_per_fetch, offset).each do |row| num_rows_yielded += 1 total_rows += 1 if total_limit yield row end offset += rows_per_fetch end end end self end |