pgslice

Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. Archive older data on a rolling basis to keep your database size under control.

:tangerine: Battle-tested at Instacart

Install

pgslice is a command line tool. To install, run:

gem install pgslice

This will give you the pgslice command.

Steps

  1. Ensure the table you want to partition has been created. We’ll refer to this as <table>.

  2. Specify your database credentials

  export PGSLICE_URL=postgres://localhost/myapp_development
  1. Create an intermediate table
  pgslice prep <table> <column> <period>

Period can be day or month.

This creates a table named <table>_intermediate with the appropriate trigger for partitioning.

  1. Add partitions
  pgslice add_partitions <table> --intermediate --past 3 --future 3

This creates child tables that inherit from the intermediate table.

Use the --past and --future options to control the number of partitions.

  1. Optional, for tables with data - Fill the partitions in batches with data from the original table
  pgslice fill <table>

Use the --batch-size and --sleep options to control the speed.

To sync data across different databases, check out pgsync.

  1. Swap the intermediate table with the original table
  pgslice swap <table>

The original table is renamed <table>_retired and the intermediate table is renamed <table>.

  1. Fill the rest (rows inserted between the first fill and the swap)
  pgslice fill <table> --swapped
  1. Back up the retired table with a tool like pg_dump and drop it
  pg_dump -c -Fc -t <table>_retired $PGSLICE_URL > <table>_retired.dump
  psql -c "DROP <table>_retired" $PGSLICE_URL

Sample Output

pgslice prints the SQL commands that were executed on the server. To print without executing, use the --dry-run option.

pgslice prep visits created_at month
BEGIN;

CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL);

CREATE FUNCTION visits_insert_trigger()
    RETURNS trigger AS $$
    BEGIN
        RAISE EXCEPTION 'Create partitions first.';
    END;
    $$ LANGUAGE plpgsql;

CREATE TRIGGER visits_insert_trigger
    BEFORE INSERT ON visits_intermediate
    FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger();

COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month';

COMMIT;
pgslice add_partitions visits --intermediate --past 1 --future 1
BEGIN;

CREATE TABLE visits_201608
    (CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date))
    INHERITS (visits_intermediate);

ALTER TABLE visits_201608 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201608 USING btree (user_id);

CREATE TABLE visits_201609
    (CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date))
    INHERITS (visits_intermediate);

ALTER TABLE visits_201609 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201609 USING btree (user_id);

CREATE TABLE visits_201610
    (CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date))
    INHERITS (visits_intermediate);

ALTER TABLE visits_201610 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201610 USING btree (user_id);

CREATE OR REPLACE FUNCTION visits_insert_trigger()
    RETURNS trigger AS $$
    BEGIN
        IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN
            INSERT INTO visits_201609 VALUES (NEW.*);
        ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN
            INSERT INTO visits_201610 VALUES (NEW.*);
        ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN
            INSERT INTO visits_201608 VALUES (NEW.*);
        ELSE
            RAISE EXCEPTION 'Date out of range. Ensure partitions are created.';
        END IF;
        RETURN NULL;
    END;
    $$ LANGUAGE plpgsql;

COMMIT;
pgslice fill visits
/* 1 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM visits
    WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date

/* 2 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM visits
    WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date

/* 3 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM visits
    WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
pgslice swap visits
BEGIN;

SET LOCAL lock_timeout = '5s';

ALTER TABLE visits RENAME TO visits_retired;

ALTER TABLE visits_intermediate RENAME TO visits;

ALTER SEQUENCE visits_id_seq OWNED BY visits.id;

COMMIT;

Adding Partitions

To add partitions, use:

pgslice add_partitions <table> --future 3

Add this as a cron job to create a new partition each day or month.

# day
0 0 * * * pgslice add_partitions <table> --future 3 --url ...

# month
0 0 1 * * pgslice add_partitions <table> --future 3 --url ...

Add a monitor to ensure partitions are being created.

SELECT 1 FROM
    pg_catalog.pg_class c
INNER JOIN
    pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE
    c.relkind = 'r' AND
    n.nspname = 'public' AND
    c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
    -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')

Archiving Partitions

Back up and drop older partitions each day or month.

pg_dump -c -Fc -t <table>_201609 $PGSLICE_URL > <table>_201609.dump
psql -c "DROP <table>_201609" $PGSLICE_URL

If you use Amazon S3 for backups, s3cmd is a nice tool.

s3cmd put <table>_201609.dump s3://<s3-bucket>/<table>_201609.dump

Additional Commands

To undo prep (which will delete partitions), use:

pgslice unprep <table>

To undo swap, use:

pgslice unswap <table>

App Considerations

This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.

Writes

If you use INSERT statements with a RETURNING clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record(s) back. If you need this, you can either:

  1. Insert directly into the partition
  2. Get the value after the insert with SELECT CURRVAL('sequence_name')

Reads

When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on created_at, try to include it in queries:

SELECT * FROM
    visits
WHERE
    user_id = 123 AND
    -- for performance
    created_at >= '2016-09-01' AND created_at < '2016-09-02'

For this to be effective, ensure constraint_exclusion is set to partition (default value) or on.

SHOW constraint_exclusion;

One Off Tasks

You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.

pgslice prep <table> --no-partition
pgslice fill <table> --where "id > 1000" # use any conditions
pgslice swap <table>

Upgrading

Run:

gem install pgslice

To use master, run:

gem install specific_install
gem specific_install ankane/pgslice

Docker

docker build -t pgslice .
alias pgslice="docker run --rm -e PGSLICE_URL pgslice"

This will give you the pgslice command.

Reference

TODO

  • Command to sync index changes with partitions
  • Disable indexing for faster fill
  • ETA for fill

Also check out:

  • PgHero - A performance dashboard for Postgres
  • pgsync - Sync Postgres data to your local machine

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help: