Method: Sequel::Dataset#paged_each

Defined in:
lib/sequel/dataset/actions.rb

#paged_each(opts = OPTS) ⇒ Object

Yields each row in the dataset, but internally uses multiple queries as needed to process the entire result set without keeping all rows in the dataset in memory, even if the underlying driver buffers all query results in memory.

Because this uses multiple queries internally, in order to remain consistent, it also uses a transaction internally. Additionally, to work correctly, the dataset must have unambiguous order. Using an ambiguous order can result in an infinite loop, as well as subtler bugs such as yielding duplicate rows or rows being skipped.

Sequel checks that the datasets using this method have an order, but it cannot ensure that the order is unambiguous.

Note that this method is not safe to use on many adapters if you are running additional queries inside the provided block. If you are running queries inside the block, use a separate thread or shard inside paged_each.

Options:

:rows_per_fetch

The number of rows to fetch per query. Defaults to 1000.

:strategy

The strategy to use for paging of results. By default this is :offset, for using an approach with a limit and offset for every page. This can be set to :filter, which uses a limit and a filter that excludes rows from previous pages. In order for this strategy to work, you must be selecting the columns you are ordering by, and none of the columns can contain NULLs. Note that some Sequel adapters have optimized implementations that will use cursors or streaming regardless of the :strategy option used.

:filter_values

If the strategy: :filter option is used, this option should be a proc that accepts the last retrieved row for the previous page and an array of ORDER BY expressions, and returns an array of values relating to those expressions for the last retrieved row. You will need to use this option if your ORDER BY expressions are not simple columns, if they contain qualified identifiers that would be ambiguous unqualified, if they contain any identifiers that are aliased in SELECT, and potentially other cases.

:skip_transaction

Do not use a transaction. This can be useful if you want to prevent a lock on the database table, at the expense of consistency.

Examples:

DB[:table].order(:id).paged_each{|row| }
# SELECT * FROM table ORDER BY id LIMIT 1000
# SELECT * FROM table ORDER BY id LIMIT 1000 OFFSET 1000
# ...

DB[:table].order(:id).paged_each(rows_per_fetch: 100){|row| }
# SELECT * FROM table ORDER BY id LIMIT 100
# SELECT * FROM table ORDER BY id LIMIT 100 OFFSET 100
# ...

DB[:table].order(:id).paged_each(strategy: :filter){|row| }
# SELECT * FROM table ORDER BY id LIMIT 1000
# SELECT * FROM table WHERE id > 1001 ORDER BY id LIMIT 1000
# ...

DB[:table].order(:id).paged_each(strategy: :filter,
  filter_values: lambda{|row, exprs| [row[:id]]}){|row| }
# SELECT * FROM table ORDER BY id LIMIT 1000
# SELECT * FROM table WHERE id > 1001 ORDER BY id LIMIT 1000
# ...


618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
# File 'lib/sequel/dataset/actions.rb', line 618

def paged_each(opts=OPTS)
  unless @opts[:order]
    raise Sequel::Error, "Dataset#paged_each requires the dataset be ordered"
  end
  unless defined?(yield)
    return enum_for(:paged_each, opts)
  end

  total_limit = @opts[:limit]
  offset = @opts[:offset]
  if server = @opts[:server]
    opts = Hash[opts]
    opts[:server] = server
  end

  rows_per_fetch = opts[:rows_per_fetch] || 1000
  strategy = if offset || total_limit
    :offset
  else
    opts[:strategy] || :offset
  end

  db.transaction(opts) do
    case strategy
    when :filter
      filter_values = opts[:filter_values] || proc{|row, exprs| exprs.map{|e| row[hash_key_symbol(e)]}}
      base_ds = ds = limit(rows_per_fetch)
      while ds
        last_row = nil
        ds.each do |row|
          last_row = row
          yield row
        end
        ds = (base_ds.where(ignore_values_preceding(last_row, &filter_values)) if last_row)
      end
    else
      offset ||= 0
      num_rows_yielded = rows_per_fetch
      total_rows = 0

      while num_rows_yielded == rows_per_fetch && (total_limit.nil? || total_rows < total_limit)
        if total_limit && total_rows + rows_per_fetch > total_limit
          rows_per_fetch = total_limit - total_rows
        end

        num_rows_yielded = 0
        limit(rows_per_fetch, offset).each do |row|
          num_rows_yielded += 1
          total_rows += 1 if total_limit
          yield row
        end

        offset += rows_per_fetch
      end
    end
  end

  self
end