Logstash Ouput Plugin for Vespa
Plugin for Logstash to write to Vespa. Apache 2.0 license.
Installation
Download and unpack/install Logstash, then:
bin/logstash-plugin install logstash-output-vespa
Development
If you're developing the plugin, you'll want to do something like:
# build the gem
./gradlew gem
# install it as a Logstash plugin
/opt/logstash/bin/logstash-plugin install /path/to/logstash-output-vespa/logstash-output-vespa-$VERSION.gem
# profit
/opt/logstash/bin/logstash
Some more good info can be found here.
Usage
Logstash config example:
# read stuff
input {
# if you want to just send stuff to a "message" field from the terminal
#stdin {}
file {
# let's assume we have some data in a CSV file here
path => "/path/to/data.csv"
# read the file from the beginning
start_position => "beginning"
# on Logstash restart, forget where we left off and start over again
sincedb_path => "/dev/null"
}
}
# parse and transform data here
filter {
csv {
# how does the CSV file look like?
separator => ","
quote_char => '"'
# if the first line is the header, we'll skip it
skip_header => true
# columns of the CSV file. Make sure you have these fields in the Vespa schema
columns => ["id", "description", ...]
}
# remove fields that we don't need. Here you can do a lot more processing
mutate {
remove_field => ["@timestamp", "@version", "event", "host", "log", "message"]
}
}
# publish to Vespa
output {
# for debugging. You can have multiple outputs (just as you can have multiple inputs/filters)
#stdout {}
vespa_feed { # including defaults here
# Vespa endpoint, namespace, doc type (from the schema)
vespa_url => "http://localhost:8080"
namespace => "no_default_provide_yours"
document_type => "no_default_provide_yours_from_schema"
# take the document ID from this field in each row
# if the field doesn't exist, we generate a UUID
id_field => "id"
# how many HTTP/2 connections to keep open
max_connections => 1
# number of streams per connection
max_streams => 128
# request timeout (seconds) for each write operation
operation_timeout => 180
# after this time (seconds), the circuit breaker will be half-open:
# it will ping the endpoint to see if it's back,
# then resume sending requests when it's back
grace_period => 10
# how many times to retry on transient failures
max_retries => 10
}
}
Then you can start Logstash while pointing to the config file like:
bin/logstash -f logstash.conf