ServerLogParser

ServerLogParser provides a high-level Ruby library for parsing apache server log files (common log format, with or without virtual hosts and combined log format) as used by Apache, Nginx and others.

It's a fork of ApacheLogRegex, which was in turn a port of Apache::LogRegex 1.4 Perl module. where much of the regex parts come from.

Installation

gem install server_log_parser

Usage

Initialization

require 'server_log_parser'

parser = ServerLogParser::Parser(ServerLogParser::COMBINED_VIRTUAL_HOST)
# or:
# parser = ServerLogParser::Parser.new('%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"')

Parsing

File.foreach('/var/log/apache/access.log') do |line|
  parsed = parser.parse(line)
  # {
  #   '%h'  => '212.74.15.68',
  #   '%l'  => '-',
  #   '%u'  => '-',
  #   '%t'  => '[23/Jan/2004:11:36:20 +0000]',
  #   '%r'  => 'GET /images/previous.png HTTP/1.1',
  #   '%>s' => '200',
  #   '%b'  => '2607',
  #   '%{Referer}i'     => 'http://peterhi.dyndns.org/bandwidth/index.html',
  #   '%{User-Agent}i'  => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021202'
  # }
end

ServerLogParser#parse will silently ignore errors, but if you'd prefer, ServerLogParser#parse! will raise a ParseError exception.

Handling

File.foreach('/var/log/apache/access.log') do |line|
  parsed = parser.handle(line)
  # {
  #   '%h'  => '212.74.15.68',
  #   '%l'  => nil,
  #   '%u'  => nil,
  #   '%t'  => DateTime.new(2004, 1, 23, 11, 36, 20, '+0'),
  #   '%r'  => {"method" => "GET", "resource" => "/images/previous.png", "protocol" => "HTTP/1.1"},
  #   '%>s' => 200,
  #   '%b'  => 2607,
  #   '%{Referer}i'     => 'http://peterhi.dyndns.org/bandwidth/index.html',
  #   '%{User-Agent}i'  => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021202'
  # }
end

Apache log files use - to mean no data is present and these are replaced with nil, like the %l and %u values above. Request is split into a nested hash.

The following fields are stored as Integer: %B, %b, %k, %p, %{format}p, %P, %{format}P, %s, %>s, %I, %O.

The following fields are stored as Float: %D, %T.

The following fields are stored as DateTime: %t. Note: %{format}t is stored as String currently.

The field %r is special, see above.

All other fields are stored as String.

ServerLogParser#handle will silently ignore errors, but if you'd prefer, ServerLogParser#handle! will raise a ParseError exception.

Log Formats

The log format is specified using a rather verbose constant, which map out like:

Name Constant Apache Format
Common Log Format ServerLogParser::COMMON_LOG_FORMAT %h %l %u %t \"%r\" %>s %b
Common Log Format with virtual hosts ServerLogParser::COMMON_LOG_FORMAT_VIRTUAL_HOST %v %h %l %u %t \"%r\" %>s %b
Combined ServerLogParser::COMBINED %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
Combined with virtual hosts ServerLogParser::COMBINDED_VIRTUAL_HOST %v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"

Author

Alexander Kurakin <[email protected]>

Feedback and contribute

https://github.com/kuraga/server_log_parser/issues

License

MIT