Benchmarks
I've done some benchmark tests to compare fully async and blocking em-pg drivers.
The goal of the test is simply to retrieve (~80000) rows from table with a lot of text data, in chunks, using parallel connections. The parallel method uses synchrony for simplicity.
single
is (eventmachine-less) job for retrieving a whole data table in one simple query "select * from resources"parallel
chunk_row_count / concurrency] uses em-pg-client for retrieving result in chunks bychunk_row_count
rows and usingconcurrency
parallel connectionsblocking
chunk_row_count / concurrency is similiar toparallel
except that it uses special patched version of library that uses blocking PGConnection methods
Environment
The machine used for test is Linux CentOS 2.6.18-194.32.1.el5xen #1 SMP with Quad Core Xeon X3360 @ 2.83GHz, 4GB RAM. Postgres version used: 9.0.3.
The results:
>> benchmark 1000
user system total real
single: 80.970000 0.350000 81.320000 (205.592592)
parallel 90000/1: 87.380000 0.710000 88.090000 (208.171564)
parallel 5000/5: 84.250000 3.760000 88.010000 (141.031289)
parallel 2000/10: 90.190000 4.970000 95.160000 (152.844950)
parallel 1000/20: 97.070000 5.390000 102.460000 (212.358631)
blocking 90000/1: 93.590000 0.610000 94.200000 (230.190776)
blocking 5000/5: 79.930000 1.810000 81.740000 (223.342432)
blocking 2000/10: 76.990000 2.820000 79.810000 (225.347169)
blocking 1000/20: 78.790000 3.230000 82.020000 (225.949107)
As we can see the gain from using asynchronous pg client while
using parallel
queries is noticeable (up to ~30%).
The blocking
client however doesn't gain much from parallel execution.
This was expected because it freezes eventmachine until the whole
dataset is consumed by the client.