PostgreSQL protocol gateway for Presto
Prestogres is a gateway server that allows clients to use PostgreSQL protocol to run queries on Presto.
You can use any PostgreSQL clients (see also Limitation section):
psqlcommand- PostgreSQL ODBC driver
- PostgreSQL JDBC driver
- other PostgreSQL client libraries
Prestogres also offers password-based authentication and SSL.
Documents
How it works?
Prestogres uses modified pgpool-II to rewrite queries before sending them to PostgreSQL. pgpool-II is originally an open-source middleware to provide connection pool and other features to PostgreSQL. For regular SQL query, patched pgpool-II wraps it in run_presto_astemp_table(..., 'SELECT ... FROM ...') function. This function sends the query to Presto:

If the query selects records from system catalog (e.g. \\d command by psql), patched pgpool-II wraps the query in run_system_catalog_as_temp_table function. It gets table list from Presto, and actually runs CREATE TABLE to create records in system catalog on PostgreSQL. Then the original query runs as usual on PostgreSQL:

In fact there're some other tricks. See pgsql/prestogres.py for the real behavior.
Limitation
- Extended query is not supported (PostgreSQL Frontend/Backend Protocol)
- ODBC driver needs to set:
- UseServerSidePrepare=0 (Server side prepare: no) property
- UseDeclareFetch=0 (Use Declare/Fetch: no) property
- Unicode mode
- JDBC driver needs to set:
- protocolVersion=2 property
- ODBC driver needs to set:
- Cursor (DECLARE/FETCH) is not supported
Installation
1. Install PostgreSQL >= 9.3
You need to install PostgreSQL separately. Following commands install PostgreSQL 9.3 from postgresql.org:
Ubuntu/Debian:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install postgresql-9.3 postgresql-server-dev-9.3 postgresql-plpython-9.3
sudo apt-get install gcc make libssl-dev libpcre3-dev
sudo apt-get install ruby ruby-dev
RedHat/CentOS:
sudo yum install http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-redhat93-9.3-1.noarch.rpm
sudo yum install postgresql93-server postgresql93-contrib postgresql93-devel
sudo yum install gcc make openssl-devel pcre-devel
sudo yum install ruby ruby-devel
Mac OS X:
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
brew install postgresql
2. Install Prestogres
$ sudo gem install prestogres --no-ri --no-rdoc
Running servers
You need to run 2 server programs: pgpool-II and PostgreSQL.
You can use prestogres command to setup & run them as following:
# 1. Create a data directory:
$ prestogres -D pgdata setup
# 2. Configure presto_server and presto_catalog parameters at least in pgpool.conf file:
$ vi ./pgdata/pgpool/pgpool.conf
# 3. Run patched pgpool-II:
$ prestogres -D pgdata pgpool
# 4. Run PostgreSQL:
$ prestogres -D pgdata pg_ctl start
# 5. Finally, you can connect to pgpool-II using `psql` command:
$ psql -h localhost -p 9900 -U pg postgres
> SELECT * FROM sys.node;
If configuration is correct, you can run SELECT * FROM sys.node; query. Otherwise, see log files in ./pgdata/log/ directory.
Setting shmem max parameter
Probably above command fails first time! Error message is:
FATAL: could not create shared memory segment: Cannot allocate memory
DETAIL: Failed system call was shmget(key=6432001, size=3809280, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded
available memory or swap space, or exceeded your kernel's SHMALL parameter. You can either
reduce the request size or reconfigure the kernel with larger SHMALL. To reduce the request
size (currently 3809280 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing
shared_buffers or max_connections.
You need to set 2 kernel parameters to run PostgreSQL.
Linux:
sudo sudo -c "echo kernel.shmmax = 17179869184 >> /etc/sysctl.conf"
sudo sudo -c "echo kernel.shmall = 4194304 >> /etc/sysctl.conf"
sudo sysctl -p /etc/sysctl.conf
Mac OS X:
$ sudo sysctl -w kern.sysv.shmmax=1073741824
$ sudo sysctl -w kern.sysv.shmall=1073741824
Configuration
pgpool.conf file
Please read pgpool-II documentation for most of parameters. Following parameters are unique to Prestogres:
- presto_server: Default address:port of Presto server.
- presto_catalog: Default catalog (connector) name of Presto such as
hive-cdh4,hive-hadoop1, etc. - presto_external_auth_prog: Default path to an external authentication program used by
prestogres_externalauthentication moethd. See following Authentication section for details.
You can overwrite these parameters for each connecting users. See also following pool_hba.conf section.
pool_hba.conf file
By default configuration, Prestogres accepts all connections from localhost without password and rejects any other connections. You can change this behavior by updating <data_dir>/pgpool2/pool_hba.conf file.
See sample pool_hba.conf file for details. Basic syntax is:
# TYPE DATABASE USER CIDR-ADDRESS METHOD OPTIONS
host postgres pg 127.0.0.1/32 trust
host postgres pg 127.0.0.1/32,192.168.0.0/16 prestogres_md5
host altdb pg 0.0.0.0/0 prestogres_md5 server:localhost:8190,pg_database:postgres
host all all 0.0.0.0/0 prestogres_external auth_prog:/opt/prestogres/auth.py,pg_database:postgres,pg_user:pg
See also Creating database section.
prestogres_md5 method
This authentication method uses a password file <data_dir>/pgpool2/pool_passwd to authenticate an user. You can use prestogres passwd command to add an user to this file:
$ prestogres -D pgdata passwd myuser
password: (enter password here)
In pool_hba.conf file, you can set following options to OPTIONS field:
- server: Address:port of Presto server, which overwrites
presto_serversparameter in pgpool.conf. - catalog: Catalog (connector) name of Presto, which overwrites
presto_catalogparameter in pgpool.conf. - schema: Default schema name of Presto. By default, Prestogres uses the same name with the database name used to login to pgpool-II. Following
pg_databaseparameter doesn't overwrite affect this parameter. - user: User name to run queries on Presto. By default, Prestogres uses the same user name used to login to pgpool-II. Following
pg_userparameter doesn't overwrite affect this parameter. - pg_database: Overwrite database to connect to PostgreSQL.
- pg_user: Overwrite user name to connect to PostgreSQL.
prestogres_external method
This authentication method uses an external file to authentication an user.
- Note: This method is still experimental (because performance is slow). Interface could be changed.
- Note: This method requires clients to send password in clear text. It's recommended to enable SSL in pgpool.conf.
You need to set presto_external_auth_prog parameter in pgpool.conf or auth_prog option in pool_hba.conf. Prestogres runs the program every time when an user connects. The program receives following data through STDIN:
user:USER_NAME
password:PASSWORD
database:DATABASE
address:IPADDR
If you want to allow this connection, the program optionally prints parameters as following to STDOUT, and exists with status code 0:
server:PRESTO_SERVER_ADDRESS
catalog:PRESTO_CATALOG_NAME
schema:PRESTO_SCHEMA_NAME
user:USER_NAME
pg_database:DATABASE
pg_user:USER_NAME
See pgool.conf file section for available parameters.
If you want to reject this connection, the program exists with non-0 status code.
Creating database on PostgreSQL
Prestogres setups a database named postgres on PostgreSQL by default. But you may want to create other databases to take advantage of above authentication mechanism.
To create new databases:
Create a new PostgreSQL database using
createdbcommand. Port number is 6432 because you login to PostgreSQL directly:$ createdb -h localhost -p 6432 -U pg newdbInitialize the database using statements shown by
prestogres show_init_sqlcommand:$ prestogres show_init_sql | psql -h localhost -p 6432 -U pg newdb
prestogres command
Usage of prestogres command:
usage: prestogres -D <data dir> <command>
commands:
setup setup <data dir>
pgpool start pgpool as a daemon process
pgpool stop stop pgpool daemon process
pgpool -n start pgpool as a foreground process
pg_ctl start start postgres server as a daemon process
pg_ctl stop stop postgres server daemon process
postgres start postgres server as a foreground process
passwd <USER NAME> add new md5 password entry for an user
show_init_sql display statements to initialize a new PostgreSQL database
Development
To install git HEAD, use following commands to build:
# 1. clone prestogres repository:
$ git clone https://github.com/treasure-data/prestogres.git
$ cd prestogres
# 2. install bundler gem and run it:
$ gem install bundler
$ bundle
# if you don't have gem command, you need to install Ruby first
# 3. create a gem package:
$ bundle exec rake
# 4. install the created package:
$ gem install --no-ri --no-rdoc pkg/prestogres-*.gem
# if this command failed, you may need to install toolchain (gcc, etc.) to build pgpool-II
Prestogres is licensed under Apache License, Version 2.0.
Copyright (C) 2014 Sadayuki Furuhashi
