Logstash for SCAPI - output scacsv

scacsv

Synopsis

Receives a stream of events and outputs files complying with the SCAPI requirements related to header and file naming. Essentially provides a match between Logstash's 'streaming' approach and SCAPI's file-based input reqmts. This is what it might look like in your config file:
output {
scacsv {
  <a href="#fields">fields</a> => ... # array (required)
  <a href="#header">header</a> => ... # array (optional), default: {}
  <a href="#path">path</a> => ... # string (required)
  <a href="#group">group</a> => ... # string (required)
  <a href="#max_size">max_size</a> => ... # number (optional), default: 0 (not used)
  <a href="#flush_interval">flush_interval</a> => ... # number (optional), default: 60
  <a href="#file_interval_width">file_interval_width</a> => ... # string (optional), default: ""
  <a href="#time_field">time_field</a> => ... # string (optional), default: 'timestamp'
  <a href="#time_field_format">time_field_format</a> => ... # string (required)
  <a href="#timestamp_output_format">timestamp_output_format</a> => ... # string (optional), default: ""
  <a href="#increment_time">increment_time</a> => ... # boolean (optional), default: false
  }
}

Details

Note: by default this plugin expects timestamp provided to be in epoch time. You can override this expectation and supply non-epoch timestamps which will be used as using the keep_original_timestamps configuration option. However, such non-epoch timestamps will not automatically be incremented when determining the end time of the file

fields

  • Value type is Array
  • There is no default for this setting

Specify which fields from the incoming event you wish to output, and which order

header

  • Value type is Array
  • Default value is {}

Used to specify a string to put as the header (first) line in the file. Useful if you want to override the default ones which are determined from the fields setting

path

  • Value type is string
  • Default value is ""

Path of temporary output file. Output will be written to this file until it is time to close the file. Then it will be renamed to SCAPI file convention. The temporary output file path will be then reused for the next set of output. For example, if output data for a CPU group, we might define the following path

path => "./cpu.csv".

group (required setting)

  • Value type is string
  • There is no default value for this setting.

SCAPI input filenames must have a group identifier as part of the name. The filename generally has this format <group>__<starttime>__<endtime>.csv. This group parameter is used to specify that group name and it will be used as a prefix when the file is renamed from path. For example

path => "./cpu".

max_size

  • Value type is number
  • Default value is 0 (meaning it is not used)

This will closing and rename a file if there have been max_size events received. This is to limit the size of a file, and sometimes can be useful when 'chopping' a stream into chunks for use in SCAPI

flush_interval

  • Value type is number
  • Default value is 60

Amount of time (seconds) to wait before flushing, closing and renaming a file, if there have been no events received. This is to ensure that after a period of idleness, we will output a SCAPI file.

file_interval_width

  • Value type is string
  • Default value is "" (meaning it is not used). Allowed values are "MINUTE", "HOUR", "DAY"

Setting this enables files to be closed on specified boundaries. This is useful to break incoming stream up on PI preferred boundaries. If HOUR was set for example, then all incoming data for a particular hour would be put in a file for that hour, and when new data in the next hour arrives, the file is close and a new one opened

time_field

  • Value type is string
  • Default value is "timestamp"

Specify which field to use as the 'timestamp' when determining filename times. Values from the 'timestamp' field will be used for starttime (first value seen) and endtime (last value seen) in the file name <group>__<starttime>__<endtime>.csv

time_field_format

  • Value type is string
  • There is no default value for this setting

A format string, in java SimpleDateFormat format, to specify how to interpret the timefield values e.g. "yyyy-MM-dd HH:mm:ss".

timestamp_output_format

  • Value type is string
  • If not specified, it uses the format declared by time_field_format

A format string, in java SimpleDateFormat format, to specify how to output filename timestamps

increment_time

  • Value type is boolean
  • Default value is false

By default, the supplied timestamp will be left as is. If set to true, then the timestamp will be incremented by 1. This is to ensure that the end time is greater than the last event time in the file - per PI datafile requirements