fluent-plugin-kubernetes_metadata_filter, a plugin for Fluentd

Circle CI Code Climate Test Coverage Ruby Style Guide Ruby Style Guide

The Kubernetes metadata plugin filter enriches container log records with pod and namespace metadata.

This plugin derives basic metadata about the container that emitted a given log record using the source of the log record. Records from journald provide metadata about the container environment as named fields. Records from JSON files encode metadata about the container in the file name. The initial metadata derived from the source is used to lookup additional metadata about the container’s associated pod and namespace (e.g. UUIDs, labels, annotations) when the kubernetes_url is configured. If the plugin cannot authoritatively determine the namespace of the container emitting a log record, it will use an ‘orphan’ namespace ID in the metadata. This behaviors supports multi-tenant systems that rely on the authenticity of the namespace for proper log isolation.

Requirements

fluent-plugin-kubernetes_metadata_filter fluentd ruby
>= 2.5.0 >= v1.10.0 >= 2.5
>= 2.0.0 >= v0.14.20 >= 2.1
< 2.0.0 >= v0.12.0 >= 1.9

NOTE: For v0.12 version, you should use 1.x.y version. Please send patch into v0.12 branch if you encountered 1.x version’s bug.

NOTE: This documentation is for fluent-plugin-kubernetes_metadata_filter-plugin-elasticsearch 2.x or later. For 1.x documentation, please see v0.12 branch.

Installation

gem install fluent-plugin-

Configuration

Configuration options for fluent.conf are:

  • kubernetes_url - URL to the API server. Set this to retrieve further kubernetes metadata for logs from kubernetes API server. If not specified, environment variables KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT will be used if both are present which is typically true when running fluentd in a pod.
  • apiVersion - API version to use (default: v1)
  • ca_file - path to CA file for Kubernetes server certificate validation
  • verify_ssl - validate SSL certificates (default: true)
  • client_cert - path to a client cert file to authenticate to the API server
  • client_key - path to a client key file to authenticate to the API server
  • bearer_token_file - path to a file containing the bearer token to use for authentication
  • tag_to_kubernetes_name_regexp - the regular expression used to extract kubernetes metadata (pod name, container name, namespace) from the current fluentd tag. This must use named capture groups for container_name, pod_name, namespace, and either pod_uuid (/var/log/pods) or docker_id (/var/log/containers)
  • cache_size - size of the cache of Kubernetes metadata to reduce requests to the API server (default: 1000)
  • cache_ttl - TTL in seconds of each cached element. Set to negative value to disable TTL eviction (default: 3600 - 1 hour)
  • watch - set up a watch on pods on the API server for updates to metadata (default: true)
  • DEPRECATEDde_dot - replace dots in labels and annotations with configured de_dot_separator, required for Datadog and ElasticSearch 2.x compatibility (default: true)
  • DEPRECATEDde_dot_separator - separator to use if de_dot is enabled (default: _)
  • DEPRECATEDde_slash - replace slashes in labels and annotations with configured de_slash_separator, required for Datadog compatibility (default: false)
  • DEPRECATEDde_slash_separator - separator to use if de_slash is enabled (default: __)
  • DEPRECATED use_journal - If false, messages are expected to be formatted and tagged as if read by the fluentd in_tail plugin with wildcard filename. If true, messages are expected to be formatted as if read from the systemd journal. The MESSAGE field has the full message. The CONTAINER_NAME field has the encoded k8s metadata (see below). The CONTAINER_ID_FULL field has the full container uuid. This requires docker to use the --log-driver=journald log driver. If unset (the default), the plugin will use the CONTAINER_NAME and CONTAINER_ID_FULL fields if available, otherwise, will use the tag in the tag_to_kubernetes_name_regexp format.
  • container_name_to_kubernetes_regexp - The regular expression used to extract the k8s metadata encoded in the journal CONTAINER_NAME field default: See code
  • annotation_match - Array of regular expressions matching annotation field names. Matched annotations are added to a log record.
  • allow_orphans - Modify the namespace and namespace id to the values of orphaned_namespace_name and orphaned_namespace_id when true (default: true)
  • orphaned_namespace_name - The namespace to associate with records where the namespace can not be determined (default: .orphaned)
  • orphaned_namespace_id - The namespace id to associate with records where the namespace can not be determined (default: orphaned)
  • lookup_from_k8s_field - If the field kubernetes is present, lookup the metadata from the given subfields such as kubernetes.namespace_name, kubernetes.pod_name, etc. This allows you to avoid having to pass in metadata to lookup in an explicitly formatted tag name or in an explicitly formatted CONTAINER_NAME value. For example, set kubernetes.namespace_name, kubernetes.pod_name, kubernetes.container_name, and docker.id in the record, and the filter will fill in the rest. (default: true)
  • ssl_partial_chain - if ca_file is for an intermediate CA, or otherwise we do not have the root CA and want to trust the intermediate CA certs we do have, set this to true - this corresponds to the openssl s_client -partial_chain flag and X509_V_FLAG_PARTIAL_CHAIN (default: false)
  • skip_labels - Skip all label fields from the metadata.
  • skip_container_metadata - Skip some of the container data of the metadata. The metadata will not contain the container_image and container_image_id fields.
  • skip_master_url - Skip the master_url field from the metadata.
  • skip_namespace_metadata - Skip the namespace_id field from the metadata. The fetch_namespace_metadata function will be skipped. The plugin will be faster and cpu consumption will be less.
  • watch_retry_interval - The time interval in seconds for retry backoffs when watch connections fail. (default: 10)

Reading from the JSON formatted log files with in_tail and wildcard filenames while respecting the CRI-o log format with the same config you need the fluent-plugin “multi-format-parser”:

fluent-gem install fluent-plugin-multi-format-parser

The config block could look like this: ```

@type tail path /var/log/containers/.log pos_file fluentd-docker.pos read_from_head true tag kubernetes.

@type multi_format format json time_key time time_type string time_format "%Y-%m-%dT%H:%M:%S.%NZ" keep_time_key false format regexp expression /^(?