Class: RequestLogAnalyzer::Source::LogParser

Inherits:
Base
  • Object
show all
Includes:
Enumerable
Defined in:
lib/request_log_analyzer/source/log_parser.rb

Overview

The LogParser class reads log data from a given source and uses a file format definition to parse all relevent information about requests from the file. A FileFormat module should be provided that contains the definitions of the lines that occur in the log data.

De order in which lines occur is used to combine lines to a single request. If these lines are mixed, requests cannot be combined properly. This can be the case if data is written to the log file simultaneously by different mongrel processes. This problem is detected by the parser. It will emit warnings when this occurs. LogParser supports multiple parse strategies that deal differently with this problem.

Constant Summary collapse

DEFAULT_PARSE_STRATEGY =

The default parse strategy that will be used to parse the input.

'assume-correct'
PARSE_STRATEGIES =

All available parse strategies.

['cautious', 'assume-correct']

Instance Attribute Summary collapse

Attributes inherited from Base

#current_request, #file_format, #options, #parsed_lines, #parsed_requests, #skipped_lines, #skipped_requests

Instance Method Summary collapse

Methods inherited from Base

#finalize, #prepare

Constructor Details

#initialize(format, options = {}) ⇒ LogParser

Initializes the log file parser instance. It will apply the language specific FileFormat module to this instance. It will use the line definitions in this module to parse any input that it is given (see parse_io).

format

The current file format instance

options

A hash of options that are used by the parser



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/request_log_analyzer/source/log_parser.rb', line 30

def initialize(format, options = {})
  super(format, options)
  @parsed_lines     = 0
  @parsed_requests  = 0
  @skipped_lines    = 0
  @skipped_requests = 0
  @current_request  = nil
  @current_source   = nil
  @current_file     = nil
  @current_lineno   = nil
  @source_files     = options[:source_files]
  @progress_handler = nil

  @options[:parse_strategy] ||= DEFAULT_PARSE_STRATEGY
  raise "Unknown parse strategy" unless PARSE_STRATEGIES.include?(@options[:parse_strategy])
end

Instance Attribute Details

#current_fileObject (readonly)

Returns the value of attribute current_file.



22
23
24
# File 'lib/request_log_analyzer/source/log_parser.rb', line 22

def current_file
  @current_file
end

#current_linenoObject (readonly)

Returns the value of attribute current_lineno.



22
23
24
# File 'lib/request_log_analyzer/source/log_parser.rb', line 22

def current_lineno
  @current_lineno
end

#source_filesObject (readonly)

Returns the value of attribute source_files.



22
23
24
# File 'lib/request_log_analyzer/source/log_parser.rb', line 22

def source_files
  @source_files
end

Instance Method Details

#decompress_file?(filename) ⇒ Boolean

Check if a file has a compressed extention in the filename. If recognized, return the command string used to decompress the file

Returns:

  • (Boolean)


81
82
83
84
85
86
87
88
89
# File 'lib/request_log_analyzer/source/log_parser.rb', line 81

def decompress_file?(filename)
  nice_command = "nice -n 5"

  return "#{nice_command} gunzip -c -d #{filename}" if filename.match(/\.tar.gz$/) || filename.match(/\.tgz$/) || filename.match(/\.gz$/)
  return "#{nice_command} bunzip2 -c -d #{filename}" if filename.match(/\.bz2$/)
  return "#{nice_command} unzip -p #{filename}" if filename.match(/\.zip$/)

  return ""
end

#each_request(options = {}, &block) ⇒ Object Also known as: each

Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded. The actual parsing occurs in the parse_io method.

options

A Hash of options that will be pased to parse_io.



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/request_log_analyzer/source/log_parser.rb', line 51

def each_request(options = {}, &block) # :yields: :request, request

  case @source_files
  when IO
    if @source_files == $stdin
      puts "Parsing from the standard input. Press CTRL+C to finish." # FIXME: not here
    end
    parse_stream(@source_files, options, &block)
  when String
    parse_file(@source_files, options, &block)
  when Array
    parse_files(@source_files, options, &block)
  else
    raise "Unknown source provided"
  end
end

#parse_file(file, options = {}, &block) ⇒ Object

Parses a log file. Creates an IO stream for the provided file, and sends it to parse_io for further handling. This method supports progress updates that can be used to display a progressbar

If the logfile is compressed, it is uncompressed to stdout and read. TODO: Check if IO.popen encounters problems with the given command line. TODO: Fix progress bar that is broken for IO.popen, as it returns a single string.

file

The file that should be parsed.

options

A Hash of options that will be pased to parse_io.



100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/request_log_analyzer/source/log_parser.rb', line 100

def parse_file(file, options = {}, &block)

  @current_source = File.expand_path(file)
  @source_changes_handler.call(:started, @current_source) if @source_changes_handler

  if decompress_file?(file).empty?

    @progress_handler = @dormant_progress_handler
    @progress_handler.call(:started, file) if @progress_handler

    File.open(file, 'r') { |f| parse_io(f, options, &block) }

    @progress_handler.call(:finished, file) if @progress_handler
    @progress_handler = nil
  else
    IO.popen(decompress_file?(file), 'r') { |f| parse_io(f, options, &block) }
  end

  @source_changes_handler.call(:finished, @current_source) if @source_changes_handler

  @current_source = nil

end

#parse_files(files, options = {}, &block) ⇒ Object

Parses a list of subsequent files of the same format, by calling parse_file for every file in the array.

files

The Array of files that should be parsed

options

A Hash of options that will be pased to parse_io.



75
76
77
# File 'lib/request_log_analyzer/source/log_parser.rb', line 75

def parse_files(files, options = {}, &block) # :yields: request
  files.each { |file| parse_file(file, options, &block) }
end

#parse_io(io, options = {}, &block) ⇒ Object

This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request) will then be yielded for further processing in the pipeline.

  • RequestLogAnalyzer::LineDefinition#matches is called to test if a line matches a line definition of the file format.

  • update_current_request is used to combine parsed lines into requests using heuristics.

  • The method will yield progress updates if a progress handler is installed using progress=

  • The method will yield parse warnings if a warning handler is installed using warning=

io

The IO instance to use as source

options

A hash of options that can be used by the parser.



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/request_log_analyzer/source/log_parser.rb', line 144

def parse_io(io, options = {}, &block) # :yields: request
  @current_lineno = 1
  while line = io.gets
    @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0

    if request_data = file_format.parse_line(line) { |wt, message| warn(wt, message) }
      @parsed_lines += 1
      update_current_request(request_data.merge(:source => @current_source, :lineno => @current_lineno), &block)
    end

    @current_lineno += 1
  end

  warn(:unfinished_request_on_eof, "End of file reached, but last request was not completed!") unless @current_request.nil?
  @current_lineno = nil
end

#parse_stream(stream, options = {}, &block) ⇒ Object

Parses an IO stream. It will simply call parse_io. This function does not support progress updates because the length of a stream is not known.

stream

The IO stream that should be parsed.

options

A Hash of options that will be pased to parse_io.



128
129
130
# File 'lib/request_log_analyzer/source/log_parser.rb', line 128

def parse_stream(stream, options = {}, &block)
  parse_io(stream, options, &block)
end

#progress=(proc) ⇒ Object

Add a block to this method to install a progress handler while parsing.

proc

The proc that will be called to handle progress update messages



163
164
165
# File 'lib/request_log_analyzer/source/log_parser.rb', line 163

def progress=(proc)
  @dormant_progress_handler = proc
end

#source_changes=(proc) ⇒ Object

Add a block to this method to install a source change handler while parsing,

proc

The proc that will be called to handle source changes



175
176
177
# File 'lib/request_log_analyzer/source/log_parser.rb', line 175

def source_changes=(proc)
  @source_changes_handler = proc
end

#warn(type, message) ⇒ Object

This method is called by the parser if it encounteres any parsing problems. It will call the installed warning handler if any.

By default, RequestLogAnalyzer::Controller will install a warning handler that will pass the warnings to each aggregator so they can do something useful with it.

type

The warning type (a Symbol)

message

A message explaining the warning



188
189
190
# File 'lib/request_log_analyzer/source/log_parser.rb', line 188

def warn(type, message)
  @warning_handler.call(type, message, @current_lineno) if @warning_handler
end

#warning=(proc) ⇒ Object

Add a block to this method to install a warning handler while parsing,

proc

The proc that will be called to handle parse warning messages



169
170
171
# File 'lib/request_log_analyzer/source/log_parser.rb', line 169

def warning=(proc)
  @warning_handler = proc
end