Class: LogStash::Outputs::GoogleBigQuery
- Inherits:
-
Base
- Object
- Base
- LogStash::Outputs::GoogleBigQuery
- Defined in:
- lib/logstash/outputs/google_bigquery.rb
Overview
Summary: plugin to upload log events to Google BigQuery (BQ), rolling files based on the date pattern provided as a configuration setting. Events are written to files locally and, once file is closed, this plugin uploads it to the configured BigQuery dataset.
VERY IMPORTANT: . To make good use of BigQuery, your log events should be parsed and structured. Consider using grok to parse your events into fields that can be uploaded to BQ. . You must configure your plugin so it gets events with the same structure, so the BigQuery schema suits them. In case you want to upload log events with different structures, you can utilize multiple configuration blocks, separating different log events with Logstash conditionals. More details on Logstash conditionals can be found here: logstash.net/docs/1.2.1/configuration#conditionals
For more info on Google BigQuery, please go to: developers.google.com/bigquery/
In order to use this plugin, a Google service account must be used. For more information, please refer to: developers.google.com/storage/docs/authentication#service_accounts
Recommendations:
. Experiment with the settings depending on how much log data you generate, your needs to see “fresh” data, and how much data you could lose in the event of crash. For instance, if you want to see recent data in BQ quickly, you could configure the plugin to upload data every minute or so (provided you have enough log events to justify that). Note also, that if uploads are too frequent, there is no guarantee that they will be imported in the same order, so later data may be available before earlier data.
. BigQuery charges for storage and for queries, depending on how much data it reads to perform a query. These are other aspects to consider when considering the date pattern which will be used to create new tables and also how to compose the queries when using BQ. For more info on BigQuery Pricing, please access: developers.google.com/bigquery/pricing
USAGE: This is an example of logstash config:
- source,json
output {
google_bigquery { project_id => "folkloric-guru-278" (required) dataset => "logs" (required) csv_schema => "path:STRING,status:INTEGER,score:FLOAT" (required) <1> key_path => "/path/to/privatekey.p12" (required) key_password => "notasecret" (optional) service_account => "[email protected]" (required) temp_directory => "/tmp/logstash-bq" (optional) temp_file_prefix => "logstash_bq" (optional) date_pattern => "%Y-%m-%dT%H:00" (optional) flush_interval_secs => 2 (optional) uploader_interval_secs => 60 (optional) deleter_interval_secs => 60 (optional) }}
<1> Specify either a csv_schema or a json_schema.
Improvements TODO list:
-
Refactor common code between Google BQ and GCS plugins.
-
Turn Google API code into a Plugin Mixin (like AwsConfig).
-
Instance Method Summary collapse
Instance Method Details
#close ⇒ Object
250 251 252 253 254 255 |
# File 'lib/logstash/outputs/google_bigquery.rb', line 250 def close @logger.debug("BQ: close method called") @temp_file.flush() @temp_file.close() end |
#receive(event) ⇒ Object
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
# File 'lib/logstash/outputs/google_bigquery.rb', line 217 def receive(event) @logger.debug("BQ: receive method called", :event => event) # Message must be written as json = LogStash::Json.dump(event.to_hash) # Remove "@" from property names = .gsub(/\"@(\w+)\"/, '"\1"') new_base_path = get_base_path() # Time to roll file based on the date pattern? Or are we due to upload it to BQ? if (@current_base_path != new_base_path || Time.now - @last_file_time >= @uploader_interval_secs) @logger.debug("BQ: log file will be closed and uploaded", :filename => File.basename(@temp_file.to_path), :size => @temp_file.size.to_s, :uploader_interval_secs => @uploader_interval_secs.to_s) # Close alone does not guarantee that data is physically written to disk, # so flushing it before. @temp_file.fsync() @temp_file.close() initialize_next_log() end @temp_file.write() @temp_file.write("\n") sync_log_file() @logger.debug("BQ: event appended to log file", :filename => File.basename(@temp_file.to_path)) end |
#register ⇒ Object
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/logstash/outputs/google_bigquery.rb', line 171 def register require 'csv' require "fileutils" require "thread" @logger.debug("BQ: register plugin") if !@csv_schema.nil? @fields = Array.new CSV.parse(@csv_schema.gsub('\"', '""')).flatten.each do |field| temp = field.strip.split(":") # Check that the field in the schema follows the format (<name>:<value>) if temp.length != 2 raise "BigQuery schema must follow the format <field-name>:<field-value>" end @fields << { "name" => temp[0], "type" => temp[1] } end # Check that we have at least one field in the schema if @fields.length == 0 raise "BigQuery schema must contain at least one field" end @json_schema = { "fields" => @fields } end if @json_schema.nil? raise "Configuration must provide either json_schema or csv_schema." end @upload_queue = Queue.new @delete_queue = Queue.new @last_flush_cycle = Time.now initialize_temp_directory() recover_temp_directories() initialize_current_log() initialize_google_client() initialize_uploader() initialize_deleter() end |