nosqoop4u

A sqoop-like jruby/jdbc query application that does not run via map/reduce. It supports direct output to HDFS and unix filesystems as well as STDOUT.

Requires jruby 1.6+.

  • Why write this? What’s wrong with sqoop?

Nothing is wrong with sqoop. It just doesn’t do quite what I need and it’s not so straightforward to debug when something goes wrong. In addition, my Hadoop cluster does not route beyond an access layer which means I need to create individual db-specific firewall rules in order to use sqoop.

Also, I wanted a nice opportunity to play with jruby’s java interop and here I get to use both the JDBC and Hadoop HDFS APIs. On a side note, I’m blown away by jruby 1.6.2…stellar performance and an elegant seamless integration with java.

# installation

jgem install –no-wrapper nosqoop4u

Or simply copy nosqoop4u.rb and nosqoop4u into your path.

# usage

usage: nosqoop4u options

-o, --output      # output file (hdfs://, file://, - for stdout)
-c, --connect url # jdbc connection url (env NS4U_URL)
-u, --user        # db username         (env NS4U_USER)
-p, --pass        # db password         (env NS4U_PASS)
-d, --driver      # JDBC driver class   (env NS4U_DRIVER)
-e, --query       # sql query to run
-F, --delim       # delimiter (default: ^A)
-f, --fetch       # fetch size
-v, --version
-h, --help

Please let me know if you find this software useful!

–frank