REC Examples

The best way to understand REC is to see how rules are written.

The early examples were inspired by Risto Vaarandi’s brilliant SEC (simple-evcorr.sourceforge.net/), so they employ similar names for easy comparison.

Single Threshold

We are monitoring events where a user has had 3 incorrect password attempts. If we see that happen 3 times (threshold) within a minute (lifespan), alert the administrator.

# single threshold rule
Rule.new(10034, {
  :pattern => %r\w+ sudo\:  (\w+) \: 3 incorrect password attempts/,
  :details => ["userid"],
  :message => "Failed sudo password for user %userid$s",
  :lifespan => 60,
  :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
  :threshold => 3
}) { |state|
if state.count == state.threshold
  Notify.urgent(state.generate_alert())
  state.release()
end
}

When we see the first event, a state is created with title “Failed sudo password for user richard”. The second event has no effect, beyond automatically incrementing the count. When we see the third event, an output message is generated and logged, and then the generated message is also sent via IM to the administrator. Alternatively:

}) { |state|
message = state.generate_alert()  # writes out a new log entry, and returns it
Notify.urgent(message)      # sends the message to the administrator
}

Finally, the state is released (we forget all about it).

If there is a fourth event, that would then create another state of the same kind which would start counting again. Suppose we wanted to avoid that, and just keep on ignoring any more events in a sliding window until the user has given it a 3 minute rest. The action could be modified in this way:

}) { |state|
Notify.urgent(state.generate_alert()) if state.count == state.threshold
# keep on pushing expiry out to 3 minutes after the last event
state.extend_for(180) if state.count >= state.threshold
}

Suppose we want to check for 3 events within 60 seconds, and then ignore further events for a fixed 5 minutes.

}) { |state|
if state.count == state.threshold
  Notify.urgent(state.generate_alert())
  state.extend_for(300)  # expire exactly 5 minutes after the 3rd event
end
}

Adding a final block

If we want to see one message when the user first has trouble, then another message after he has decided to stop trying, the format is a little different. The block given in previous examples is stored in the params as action.

Instead, the action block may be specified directly as a member of the params hash, and the final must be specified in this way if it is to be used.

Rule.new(10034, {
  :pattern => /^\s+\w+\s+sudo\[\d+\]\:\s+(\w+) \:/,
  :details => ["userid"],
  :message => "sudo activity for user %userid$s",
  :threshold => 3,
  :lifespan => 60,
  :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
  :expiry => "'Gave sudo a rest' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
  :action => Proc.new { |state|
    if state.count == state.threshold
      Notify.urgent(state.generate(:alert))
      state.release()
    end
  },
  :final => Proc.new { |state|
    Notify.normal(state.generate(:expiry))
  }
})

When the state is about to expire, its :final block will be called. In this case, it generates a log entry using the :expiry message template, and sends the message to the administrator via normal (email) delivery.

Event compression

Compression involves converting a stream of events into fewer, preferably one. In this example, we report when a skype conversation starts and then suppress all further ‘noise’ (messages that add no further real information) for about 8 minutes.

# suppression rule
Rule.new(10035, {
  :pattern => /^\s\w+\sFirewall\[\d+\]\:\sSkype is listening from 0.0.0.0:(\d+)/,
  :details => ["port"],
  :message => "Skype conversation started on port %port$d",
  :alert => "Skype running on port %port$d",
  :lifespan => 479
}) { |state|
  state.generate_first_only(:alert)
}

The generate_first_only method creates a new event using the :alert template only if the state’s count is 1, so it notices the first event and ignores all subsequent events as long as the state lives.

By default, generate() and generate_first_only() use the :alert template. If no :alert was provided, the :message will be used instead. In this example, we could have omitted the argument:

}) { |state|
  state.generate_first_only()
}

Pairs of rules

We want to know when a server goes down, and when it comes back up again. In this example, rule 10036 creates a new log entry when we first see the server is not responding, and the state persists for 5 minutes.

# pair rule
Rule.new(10036, {
  :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
  :details => ["host"],
  :message => "Server %host$s is down",
  :lifespan => 300
}) { |state|
  state.generate_first_only()
}

Rule 10037 looks for a message saying the server is OK, AND that there is a state with a title like “Server earth is down”. The :allstates parameter contains an array of templates - the rule does not react to the event unless all of the named states exist.

When all the conditions are satisfied, the rule generates a new log entry that the server is up, and then forget both states.

Rule.new(10037, {
  :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
  :details => ["host"],
  :message => "Server %host$s is up again",
  :allstates => ["Server %host$s is down"]
}) {|state|
  state.generate()
  state.release("Server %host$s is down")
  state.release()
}

Since no :alert is specified, it defaults to the :message. So generate will log a message that “Server earth is up again”.

Correlating events (and states)

Now suppose we want to know how long the server was down. We have two options:

  1. we could add a final block to rule 10036 to report its age, but that would just create an extra message and that’s what we’re trying to get away from

  2. we could report the duration in a single “Server earth is up again” message

Since we’ve already seen how to add a final block, lets take option 2.

Rule.new(10037, {
  :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
  :details => ["host"],
  :message => "Server %host$s is up again after %outage$d minutes",
  :allstates => ["Server %host$s is down"]
}) {|state|
  duration = State.find("Server %host$s is down", state).age()
  state.params[:outage] = (duration/60).to_i()
  state.generate()
  state.release("Server %host$s is down")
  state.release()
}

We can obtain the duration of the outage with the State#find method, which interpolates the current state’s values into the template, and finds the matching state.

We now need to store that duration into the state’s values as an integer, because sprintf %d expects an integer.

Having calculated the duration, we generate the message, and forget both states.

Shortcut actions

Several actions are so common they have been provided as constants to make the rules more succinct but still readable. One is to generate a message on the first event only:

Rule.new(10036, {
  :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
  :details => ["host"],
  :message => "Server %host$s is down",
  :lifespan => 300
}) { |state|
  state.generate_first_only()
}

can be abbreviated in this way:

Rule.new(10036, {
  :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
  :details => ["host"],
  :message => "Server %host$s is down",
  :lifespan => 300,
  :action => State::Generate_first_only
})

Another common action is to generate a message and release the state immediately:

Rule.new(10040, {
  :pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
  :details => ["user", "ip"],
  :message => "User %user$s signed in via SSH from %ip$s",
}) { |state|
  state.generate()
  state.release()
}

can be abbreviated in this way:

Rule.new(10040, {
  :pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
  :details => ["user", "ip"],
  :message => "User %user$s signed in via SSH from %ip$s",
  :action => State::Generate_and_release
})

To write rules across multiple event streams, you’ll need a way to tail several log files and merge the streams into a single input stream for REC. The mplex.rb tool does that for you - for more details, see the MULTIPLEX file.