Dexter

The automatic indexer for Postgres

Read about how it works or watch the talk

Build Status

Installation

First, install HypoPG on your database server. This doesn’t require a restart.

cd /tmp
curl -L https://github.com/HypoPG/hypopg/archive/1.3.1.tar.gz | tar xz
cd hypopg-1.3.1
make
make install # may need sudo

Note: If you have issues, make sure postgresql-server-dev-* is installed.

Enable logging for slow queries in your Postgres config file.

log_min_duration_statement = 10 # ms

And install the command line tool with:

gem install pgdexter

The command line tool is also available with Docker, Homebrew, or as a Linux package.

How to Use

Dexter needs a connection to your database and a log file to process.

tail -F -n +1 <log-file> | dexter <connection-options>

This finds slow queries and generates output like:

Started
Processing 189 new query fingerprints
Index found: public.genres_movies (genre_id)
Index found: public.genres_movies (movie_id)
Index found: public.movies (title)
Index found: public.ratings (movie_id)
Index found: public.ratings (rating)
Index found: public.ratings (user_id)
Processing 12 new query fingerprints

To be safe, Dexter will not create indexes unless you pass the --create flag. In this case, you’ll see:

Index found: public.ratings (user_id)
Creating index: CREATE INDEX CONCURRENTLY ON "public"."ratings" ("user_id")
Index created: 15243 ms

Connection Options

Dexter supports the same connection options as psql.

-h host -U user -p 5432 -d dbname

This includes URIs:

postgresql://user:pass@host:5432/dbname

and connection strings:

host=localhost port=5432 dbname=mydb

Collecting Queries

There are many ways to collect queries. For real-time indexing, pipe your logfile:

tail -F -n +1 <log-file> | dexter <connection-options>

Pass a single statement with:

dexter <connection-options> -s "SELECT * FROM ..."

or pass files:

dexter <connection-options> <file1> <file2>

or collect running queries with:

dexter <connection-options> --pg-stat-activity

Collection Options

To prevent one-off queries from being indexed, specify a minimum number of calls before a query is considered for indexing

dexter --min-calls 100

You can do the same for total time a query has run

dexter --min-time 10 # minutes

Specify the format

dexter --input-format csv

When streaming logs, specify the time to wait between processing queries

dexter --interval 60 # seconds

Examples

Ubuntu with PostgreSQL 12

tail -F -n +1 /var/log/postgresql/postgresql-12-main.log | sudo -u postgres dexter dbname

Homebrew on Mac

tail -F -n +1 /usr/local/var/postgres/server.log | dexter dbname

Analyze

For best results, make sure your tables have been recently analyzed so statistics are up-to-date. You can ask Dexter to analyze tables it comes across that haven’t been analyzed in the past hour with:

dexter --analyze

Tables

You can exclude large or write-heavy tables from indexing with:

dexter --exclude table1,table2

Alternatively, you can specify which tables to index with:

dexter --include table3,table4

Debugging

See how Dexter is processing queries with:

dexter --log-sql --log-level debug2

Hosted Postgres

The hypopg extension, which Dexter needs to run, is available on these providers.

For other providers, see this guide. To request a new extension:

  • Amazon RDS - follow the instructions on this page
  • Google Cloud SQL - vote or comment on this page
  • DigitalOcean Managed Databases - vote or comment on this page

Additional Installation Methods

Docker

Get the Docker image with:

docker pull ankane/dexter

And run it with:

docker run -ti ankane/dexter <connection-options>

For databases on the host machine, use host.docker.internal as the hostname (on Linux, this requires Docker 20.04+ and --add-host=host.docker.internal:host-gateway).

Homebrew

With Homebrew, you can use:

brew install ankane/brew/dexter

Future Work

Here are some ideas

Upgrading

Run:

gem install pgdexter

To use master, run:

gem install specific_install
gem specific_install https://github.com/ankane/dexter.git

Thanks

This software wouldn’t be possible without HypoPG, which allows you to create hypothetical indexes, and pg_query, which allows you to parse and fingerprint queries. A big thanks to Dalibo and Lukas Fittl respectively.

Research

This is known as the Index Selection Problem (ISP).

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development, run:

git clone https://github.com/ankane/dexter.git
cd dexter
bundle install
bundle exec rake install

To run tests, use:

createdb dexter_test
bundle exec rake test