Benchmarks

I've done some benchmark tests to compare fully async and blocking em-pg drivers.

The goal of the test is simply to retrieve (~80000) rows from table with a lot of text data, in chunks, using parallel connections. The parallel method uses synchrony for simplicity.

  • single is (eventmachine-less) job for retrieving a whole data table in one simple query "select * from resources"
  • parallel chunk_row_count / concurrency] uses em-pg-client for retrieving result in chunks by chunk_row_count rows and using concurrency parallel connections
  • blocking chunk_row_count / concurrency is similiar to parallel except that it uses special patched version of library that uses blocking PGConnection methods

Environment

The machine used for test is Linux CentOS 2.6.18-194.32.1.el5xen #1 SMP with Quad Core Xeon X3360 @ 2.83GHz, 4GB RAM. Postgres version used: 9.0.3.

The results:

    >> benchmark 1000
                              user     system      total        real
    single:              80.970000   0.350000  81.320000 (205.592592)

    parallel 90000/1:    87.380000   0.710000  88.090000 (208.171564)
    parallel 5000/5:     84.250000   3.760000  88.010000 (141.031289)
    parallel 2000/10:    90.190000   4.970000  95.160000 (152.844950)
    parallel 1000/20:    97.070000   5.390000 102.460000 (212.358631)

    blocking 90000/1:    93.590000   0.610000  94.200000 (230.190776)
    blocking 5000/5:     79.930000   1.810000  81.740000 (223.342432)
    blocking 2000/10:    76.990000   2.820000  79.810000 (225.347169)
    blocking 1000/20:    78.790000   3.230000  82.020000 (225.949107)

As we can see the gain from using asynchronous pg client while using parallel queries is noticeable (up to ~30%).

The blocking client however doesn't gain much from parallel execution. This was expected because it freezes eventmachine until the whole dataset is consumed by the client.