Synopsis

Download, merge and convert Amazon S3 bucket log files for a specified date or date range.

  • Download S3 bucket log files produced by Amazon S3. Log files are downloaded once and cached locally.

  • Merge those log files together into a single logfile per bucket (sorting on ascending timestamp)

  • Convert the log file from Amazon Server Access Log Format to Apache Common Log Format

Ralf is an acronym for Retrieve Amazon Log Files. Ralf does the following things:

Usage

Usage: ./bin/ralf [options]

Download and merge Amazon S3 bucket log files for a specified date range and
output a Common Log File. Ralf is an acronym for Retrieve Amazon Log Files.

Ralf downloads bucket log files to local cache directories, merges the Amazon Log
Files and converts them to Common Log Format.

Example: ./bin/ralf --range month --now yesterday --output-file '/var/log/amazon/:year/:month/:bucket.log'

AWS credentials (Access Key Id and Secret Access Key) are required to access
S3 buckets. For security reasons these credentials can only be specified in a
configuration file (see --config-file) or through the environment using the
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Log selection options:
    -l, --[no-]list                  List buckets that have logging enabled. Does not process log files.
    -b, --buckets x,y,z              Buckets for which to process log files. Defaults to all log-enabled buckets.
    -r, --range BEGIN[,END]          Date or date range to process. Defaults to 'today'.
    -t, --now TIME                   Date to use as base for range. Defaults to 'today'.

    You can use Chronic expressions for '--range' and '--now'. See http://chronic.rubyforge.org.

    Example: --range 'last week'
      All days of previous week.
    Example: --range 'this week'
      Beginning of this week (sunday) upto and including today.
    Example: --range '2010-01-01','2010-04-30'
      First four months of this year.
    Example: --range 'this month' --now yesterday
      This will select log files from the beginning of yesterday's month upto and including yesterday.

    The --buckets, --range and --now options are optional. If unspecified, (incomplete)
    logging for today will be processed for all buckets (that have logging enabled).
    This is equivalent to specifying "--range 'today'" and "--now 'today'".

Output options:
    -o, --output-file FORMAT         Output file, e.g. '/var/log/s3/:year/:month/:bucket.log'. Required.

    The --output-file format uses the last day of the range specified by (--range)
    to determine the filename. E.g. when the format contains ':year/:month/:day' and
    the range is 2010-01-15..2010-02-14, then the output file will be '2010/02/14'.

    -x, --cache-dir FORMAT           Directory name(s) in which to cache downloaded log files. Optional.

    The --cache-dir format expands to as many directory names as needed for the 
    range specified by --range. E.g. "/var/run/s3_cache/:year/:month/:day/:bucket"
    expands to 31 directories for range 2010-01-01..2010-01-31.

    Defaults to '~/.ralf/:bucket' or '/var/log/ralf/:bucket' (when running as root).

Config file options:
    -c, --config-file [FILE]         Path to file with configuration settings (in YAML format).

    Configuration settings are read from the (-c) specified configuration file
    or from ~/.ralf.conf or from /etc/ralf.conf (when running as root).
    Command-line options override settings read from the configuration file.

    The configuration file must be in YAML format. Each command-line options has an
    equivalent setting in a configuration file replacing dash (-) by underscore(_).

    The Amazon Access Key Id and Secret Access Key can only be specified in the 

    Example:
      output_file:           /var/log/amazon_s3/:year:month/:bucket.log
      aws_access_key_id:     my_access_key_id
      aws_secret_access_key: my_secret_access_key

    To only use command-line options simply specify -c or --config-file without
    an argument.

Debug options:
    -d, --[no-]debug [aws]           Show debug messages.

Common options:
    -h, --help                       Show this message.
    -v, --version                    Show version.

Library

You can also use Ralf from within your own ruby code. Each command-line option has a corresponding option in the options has passed to Ralf.new and Ralf.run. Replace a dash (-) by an underscore (_) in the names:

options = { :output_file => '/var/log/s3/:bucket.log' }

require 'rubygems'
require 'ralf'
r = Ralf.new({ :config_file => '/Users/me/ralf.yaml' }.merge(options))
r.run

Or run it in one go:

Ralf.run({ :config_file => '/Users/me/ralf.yaml' }.merge(options))

Requirements

  • Credentials for an Amazon S3 account

  • Enable logging on S3 You can use Cyberduck for example.

Gem dependencies

Ralf depends on the following gems which will automatically installed when you install the ralf gem.

  • chronic

  • right_aws

  • logmerge

Authors

Authors: Leon Berenschot and K.J. Wierenga

Contributers: Victor Castell

This program is used for kerkdienstgemist.nl Amazon S3 log file processing.