Splunk input plugin for Embulk

A simple plug-in to run a once-off Splunk query and emit the results.

This plugin uses Splunks table command to effeciently and flexibly return results. If you want more flexibility, you can add _raw as a table field and then use filter plugins such as embulk-filter-expand_json or embulk-filter-add_time to convert the json column to typed columns. Rename filter is also useful to rename the typed columns.

_time and this plugin

This plugin expects and requires _time. If you do not include time in your list of columns, this plugin will automatically add it. It is possible to rename or reformat _time in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the _time in the query unless you know what you're doing. If you need to do something esoteric with _time, create another field to work with in your Splunk query.

In addition, as a column we treat _time as a String, but only because we couldn't get the plugin to work with timestamps. We'd welcome a pull request to fix this issue.

Overview

  • Plugin type: input
  • Resume supported: yes
  • Cleanup supported: no
  • Guess supported: no

Configuration

  • type: splunk
  • scheme: HTTP scheme for using the Splunk API (string, default: https)
  • host: host of your splunk server (string, required)
  • username: splunk username (string, required)
  • password: splunk password (string, required)
  • port: splunk API port (integer, default: 8089)
  • query: the query you wish to run. It should be prefixed with "search" (string required)
  • earliest_time: the earliest time for the splunk search. (string, default: nil, which is unbounded)
  • latest_time: the latest time for the splunk search. (string, default: nil, which is unbounded)
  • incremental: whether to resume next search from last result time (boolean, default: false)
  • table array of columns to include in the results (array, default: [])

Earliest and latest times

Splunk's required data format is %Y-%m-%dT%H:%M:%S.%L%:z which is the required format for earliest_time and latest_time. In addition, Splunk relative time operations are also accepted, such as -1d@d. For more information, see the Splunk documentation

Incremental loads

Incremental support is basic. The logic is:

  • always rely on _time field in Splunk
  • determine latest _time in search
  • use latest _time as earliest_time in next run

Number of returned results

The default Splunk API limits resuts to 100. In this plugin, the limit is not set, so it is possible to generate very large result sets. To limit the number of results, use the head or tail command in your query.

Examples

Remember the queries much be prefixed with the search command or they are unlikely not to work. See examples below.

Unbounded time range

in:
  type: splunk
  host: splunk.example.com
  username: splunk_user
  password: abc123
  port: 8089
  query: search index="main"
  table:
    # We treat time as a string, only because we can't get timestamp + format to work
    - {name: "_time", type: "string"}

Relative time range

in:
  type: splunk
  host: splunk.example.com
  username: splunk_user
  password: abc123
  port: 8089
  query: search index="main"
  earliest_time: -1m@m
  table:
    - {name: "_time", type: "string"}  
    - {name: "foo", type: "string"}
    - {name: "bar", type: "long"}

Absolute time range

in:
  type: splunk
  host: splunk.example.com
  username: splunk_user
  password: abc123
  port: 8089
  query: search index="main"
  earliest_time: 2017-01-18T19:23:08.237+11:00
  latest_time: 2018-01-18T19:23:08.237+11:00
  table:
    - {name: "_time", type: "string"}  
    - {name: "foo", type: "string"}
    - {name: "bar", type: "long"}    

Complex Searches

For those unfamiliar with YAML, the pipe (|) indicates a multiline value. In Splunk the pipe operator is used for creating multi-step processing.

For non-trivial Splunk queries, you should leverage the YAML pipe alongside Splunk pipes for easier to read queries.

in:
  type: splunk
  host: splunk.example.com
  username: splunk_user
  password: abc123
  port: 8089
  query: |
    search index="main" |
    eval foo=bar |
    where like(bar, "%baz%") |
    head 100
  earliest_time: 2017-01-18T19:23:08.237+11:00
  latest_time: 2018-01-18T19:23:08.237+11:00
  table:
    - {name: "_time", type: "string"}
    - {name: "foo", type: "string"} # Uses foo from the above query

Build

$ rake