fluent-plugin-querycombiner

This fluentd output plugin helps you to combine multiple queries.

This plugin is based on fluent-plugin-onlineuser written by Yuyang Lan.

Requirement

  • a running Redis

Installation

$ fluent-gem install fluent-plugin-querycombiner

Tutorial

Simple combination

Suppose you have the sequence of event messages like:

{
   'event_id':   '01234567',
   'status':     'event-start',
   'started_at': '2001-02-03T04:05:06Z',
}

and:

{
   'event_id':    '01234567',
   'status':      'event-finish',
   'finished_at': '2001-02-03T04:15:11Z',
}

Now you can combine these messages with this configuration:

<match event.**>
  type query_combiner
  tag combined.test

  # redis settings
  host            localhost
  port            6379
  db_index        0

  query_identify  event_id   # field to combine together
  query_ttl       3600       # messages time-to-live[sec]
  buffer_size     1000       # max queries to store in redis

  <catch>
    condition     status == 'event-start'
  </catch>

  <dump>
    condition     status == 'event-finish'
  </dump>

</match>

Combined results will be:

{
  "event_id":    "01234567",
  "status":      "event-finish",
  "started_at":  "2001-02-03T04:05:06Z",
  "finished_at": "2001-02-03T04:05:06Z"
}

Replace some field names

If messages has the same fields, these are overwritten in the combination process. You can use replace sentence in <catch> and <dump> blocks to avoid overwrite such fields.

For example, you have some event messages like:

{
   'event_id': '01234567',
   'status':   'event-start',
   'time':     '2001-02-03T04:05:06Z',
}

and:

{
   'event_id': '01234567',
   'status':   'event-finish',
   'time':     '2001-02-03T04:15:11Z',
}

You can keep time fields which defined both event-start and event-finish by using replace sentence.

<match event.**>
  (...type, tag and redis configuration...)

  query_identify  event_id   # field to combine together
  query_ttl       3600       # messages time-to-live[sec]
  buffer_size     1000       # max queries to store in redis

  <catch>
    condition     status == 'event-start'
    replace       time => time_start

  </catch>

  <dump>
    condition     status == 'event-finish'
    replace       time => time_finish
  </dump>

</match>

Combined results will be:

{
  "event_id":     "01234567",
  "status":       "event-finish",
  "time_start":   "2001-02-03T04:05:06Z",
  "time_finish":  "2001-02-03T04:15:11Z"
}

<release> block

In previous examples, messages with "status": "event-start" will be watched by plugin immediately.

Suppose some error events occur and you don't want to watch or combine these messages.

In this case <release> block will be useful.

For example, your error messages are such like:

{
  "event_id":  "01234567",
  "status":    "event-error",
  "time":      "2001-02-03T04:05:06Z"
}

Append this <release> block to the configuration and error events will not be watched or combined:

  <release>
    condition     status == 'event-error'
  </release>

You cannot use replace sentence in the <release> block.

<prolong> block

Suppose your query_ttl is 600 (10 minutes) and almost events are finished within 10 minutes. But occasionally very-long events occur which finish about 1 hours. These very-long events send status: 'event-continue' messages every 5 minutes for keep-alive.

In this case you can use <prolong> block to reset expired time.

  <prolong>
    condition     status == 'event-continue'
  </prolong>

You cannot use replace sentence in the <prolong> block.

Also you cannot combine messages which defined <prolong> blocks.

Configuration

tag

The tag prefix for emitted event messages. By default it's query_combiner.

host, port, db_index

The basic information for connecting to Redis. By default it's redis://127.0.0.1:6379/0

redis_retry

How many times should the plugin retry when performing a redis operation before raising a error. By default it's 3.

querl_ttl

The inactive expire time in seconds. By default it's 1800 (30 minutes).

buffer_size

The max queries to store in redis. By default it's 1000.

remove_interval

The interval time to delete expired or overflowed queries which configured by query_ttl and buffer_size. By default it's 10 [sec].

redis_key_prefix

The key prefix for data stored in Redis. By default it's query_combiner:.

query_identify

Indicates how to extract the query identity from event record. It can be set as a single field name or multiple field names join by comma (,).

TODO

  • Multi-query combination
  • Support hyphen - and dollar $ contained field names

Copyright:: Copyright (c) 2014- Takahiro Kamatani

License:: Apache License, Version 2.0