Class: LogStash::Outputs::GoogleBigQuery

Inherits:
Base
  • Object
show all
Defined in:
lib/logstash/outputs/google_bigquery.rb

Overview

Summary: plugin to upload log events to Google BigQuery (BQ), rolling files based on the date pattern provided as a configuration setting. Events are written to files locally and, once file is closed, this plugin uploads it to the configured BigQuery dataset.

VERY IMPORTANT: . To make good use of BigQuery, your log events should be parsed and structured. Consider using grok to parse your events into fields that can be uploaded to BQ. . You must configure your plugin so it gets events with the same structure, so the BigQuery schema suits them. In case you want to upload log events with different structures, you can utilize multiple configuration blocks, separating different log events with Logstash conditionals. More details on Logstash conditionals can be found here: logstash.net/docs/1.2.1/configuration#conditionals

For more info on Google BigQuery, please go to: developers.google.com/bigquery/

In order to use this plugin, a Google service account must be used. For more information, please refer to: developers.google.com/storage/docs/authentication#service_accounts

Recommendations:

. Experiment with the settings depending on how much log data you generate, your needs to see “fresh” data, and how much data you could lose in the event of crash. For instance, if you want to see recent data in BQ quickly, you could configure the plugin to upload data every minute or so (provided you have enough log events to justify that). Note also, that if uploads are too frequent, there is no guarantee that they will be imported in the same order, so later data may be available before earlier data.

. BigQuery charges for storage and for queries, depending on how much data it reads to perform a query. These are other aspects to consider when considering the date pattern which will be used to create new tables and also how to compose the queries when using BQ. For more info on BigQuery Pricing, please access: developers.google.com/bigquery/pricing

USAGE: This is an example of logstash config:

source,json

output {

google_bigquery {
  project_id => "folkloric-guru-278"                        (required)
  dataset => "logs"                                         (required)
  csv_schema => "path:STRING,status:INTEGER,score:FLOAT"    (required) <1>
  key_path => "/path/to/privatekey.p12"                     (required)
  key_password => "notasecret"                              (optional)
   => "[email protected]"   (required)
  temp_directory => "/tmp/logstash-bq"                      (optional)
  temp_file_prefix => "logstash_bq"                         (optional)
  date_pattern => "%Y-%m-%dT%H:00"                          (optional)
  flush_interval_secs => 2                                  (optional)
  uploader_interval_secs => 60                              (optional)
  deleter_interval_secs => 60                               (optional)
}

}


<1> Specify either a csv_schema or a json_schema.

Improvements TODO list:

  • Refactor common code between Google BQ and GCS plugins.

  • Turn Google API code into a Plugin Mixin (like AwsConfig).

Instance Method Summary collapse

Instance Method Details

#closeObject



250
251
252
253
254
255
# File 'lib/logstash/outputs/google_bigquery.rb', line 250

def close
  @logger.debug("BQ: close method called")

  @temp_file.flush()
  @temp_file.close()
end

#receive(event) ⇒ Object



217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
# File 'lib/logstash/outputs/google_bigquery.rb', line 217

def receive(event)
  @logger.debug("BQ: receive method called", :event => event)

  # Message must be written as json
  message = LogStash::Json.dump(event.to_hash)
  # Remove "@" from property names
  message = message.gsub(/\"@(\w+)\"/, '"\1"')

  new_base_path = get_base_path()

  # Time to roll file based on the date pattern? Or are we due to upload it to BQ?
  if (@current_base_path != new_base_path || Time.now - @last_file_time >= @uploader_interval_secs)
    @logger.debug("BQ: log file will be closed and uploaded",
                  :filename => File.basename(@temp_file.to_path),
                  :size => @temp_file.size.to_s,
                  :uploader_interval_secs => @uploader_interval_secs.to_s)
    # Close alone does not guarantee that data is physically written to disk,
    # so flushing it before.
    @temp_file.fsync()
    @temp_file.close()
    initialize_next_log()
  end

  @temp_file.write(message)
  @temp_file.write("\n")

  sync_log_file()

  @logger.debug("BQ: event appended to log file",
                :filename => File.basename(@temp_file.to_path))
end

#registerObject



171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# File 'lib/logstash/outputs/google_bigquery.rb', line 171

def register
  require 'csv'
  require "fileutils"
  require "thread"

  @logger.debug("BQ: register plugin")

  if !@csv_schema.nil?
    @fields = Array.new

    CSV.parse(@csv_schema.gsub('\"', '""')).flatten.each do |field|
      temp = field.strip.split(":")

      # Check that the field in the schema follows the format (<name>:<value>)
      if temp.length != 2
        raise "BigQuery schema must follow the format <field-name>:<field-value>"
      end

      @fields << { "name" => temp[0], "type" => temp[1] }
    end

    # Check that we have at least one field in the schema
    if @fields.length == 0
      raise "BigQuery schema must contain at least one field"
    end

    @json_schema = { "fields" => @fields }
  end
  if @json_schema.nil?
    raise "Configuration must provide either json_schema or csv_schema."
  end

  @upload_queue = Queue.new
  @delete_queue = Queue.new
  @last_flush_cycle = Time.now
  initialize_temp_directory()
  recover_temp_directories()
  initialize_current_log()
  initialize_google_client()
  initialize_uploader()
  initialize_deleter()
end