Annotations
- Annotations
- Overriding datastreams
- Replace patterns in events
- Hashing values in logs
- Escaping terminal sequences, including terminal colors
- Extracting fields from the container logs
- Defining Event pattern
- Application Logs
- Change output destination
- Logs sampling
- Thruput
- Time correction
- Handling multiple containers
- Cluster level annotations
- Troubleshooting
- Reference
- Links
You can define annotations for namespaces, workloads, and pods. Annotations allow you to change how collector forwards data to ElasticSearch. Annotations also help collector where to discover the application logs.
The complete list of all the available annotations is available at the bottom of this page.
Default configuration uses annotationSubdomain = elasticsearch
, so all the annotations should start with elasticsearch.collectord.io/
.
So ElasticSearch and ElasticSearch annotations are different.
Overriding datastreams
Using annotations, you can override to which datastream data should be forwarded from a specific namespace, workload, or pod. You can define
one datastream for the whole object with elasticsearch.collectord.io/index
or specific datastreams for container logs elasticsearch.collectord.io/logs-index
,
and events elasticsearch.collectord.io/events-index
(can be applied only to whole namespace).
As an example, if you want to override indexes for a specific namespace
apiVersion: v1 kind: Namespace metadata: name: team1 annotations: elasticsearch.collectord.io/index: logs-team1
This annotation tells collectord to forward all the data from this namespace to datastream named logs-team1
.
elasticsearch.collectord.io/logs-index
overrides only datastream for the container logs. If you want to override logs for the application logs you should useelasticsearch.collectord.io/index
orelasticsearch.collectord.io/volume.{N}-logs-index
.
Overriding datastream for specific events
In the case when your container is running multiple processes, sometimes you want to override datastream
just
for specific events in the container (or application) logs. You can do that with the override annotations.
For example, we will use the nginx
image with logs
172.17.0.1 - - [12/Oct/2018:22:38:05 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" 2018/10/12 22:38:15 [error] 8#8: *2 open() "/usr/share/nginx/html/a.txt" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: "GET /a.txt HTTP/1.1", host: "localhost:32768" 172.17.0.1 - - [12/Oct/2018:22:38:15 +0000] "GET /a.txt HTTP/1.1" 404 153 "-" "curl/7.54.0" "-"
If we want to override datastream of the web logs and keep all other logs with the predefined datastream
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-override.1-match: ^(\d{1,3}\.){3}\d{1,3} elasticsearch.collectord.io/logs-override.1-index: logs-nginx-web spec: containers: - name: nginx image: nginx
The collector will override datastream for matched events, so with our example you will end up with events similar to
datastream | event ------------------------------------------------------------------------------------------------------------------------ logs-nginx-web | 172.17.0.1 - - [12/Oct/2018:22:38:05 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" logs-collectord | 2018/10/12 22:38:15 [error] 8#8: *2 open() "/usr/share/nginx/html/a.txt" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: "GET /a.txt HTTP/1.1", host: "localhost:32768" logs-nginx-web | 172.17.0.1 - - [12/Oct/2018:22:38:15 +0000] "GET /a.txt HTTP/1.1" 404 153 "-" "curl/7.54.0" "-"
Replace patterns in events
You can define replace patterns with the annotations. That allows you to hide sensitive information or drop unimportant information from the messages.
Replace patterns for container logs are configured with a pair of annotations grouped with the same number
elasticsearch.collectord.io/logs-replace.1-search
and elasticsearch.collectord.io/logs-replace.2-val
, first specifies the search pattern as a
regular expression, second a replace pattern. In replace patterns you can use placeholders for matches, like $1
or $name
for named patterns.
We're using a Go regular expression library in implementation of replace pipes. You can find more information about the syntax at Package regexp and re2 syntax. We recommend to use https://regex101.com for testing your patterns (set the Flavor to golang).
Using nginx
as an example, our logs have a default pattern like
172.17.0.1 - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" 172.17.0.1 - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-" 172.17.0.1 - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"
Example 1. Replacing IPv4 addresses with X.X.X.X
If we want to hide an IP address from the logs by replacing all IPv4 addresses with X.X.X.X
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-replace.1-search: (\d{1,3}\.){3}\d{1,3} elasticsearch.collectord.io/logs-replace.1-val: X.X.X.X spec: containers: - name: nginx image: nginx
The result of this replace pattern will be in ElasticSearch
X.X.X.X - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" X.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-" X.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"
You can also keep the first part of the IPv4 with
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-replace.1-search: (?P<IPv4p1>\d{1,3})(\.\d{1,3}){3} elasticsearch.collectord.io/logs-replace.1-val: ${IPv4p1}.X.X.X spec: containers: - name: nginx image: nginx
That results in
172.X.X.X - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" 172.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-" 172.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"
Example 2. Dropping messages
With the replace patterns, you can drop messages that you don't want to see in ElasticSearch. With the example below we
drop all log messages resulted from GET
requests with 200
response
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-replace.1-search: '^.+\"GET [^\s]+ HTTP/[^"]+" 200 .+$' elasticsearch.collectord.io/logs-replace.1-val: '' elasticsearch.collectord.io/logs-replace.2-search: '(\d{1,3}\.){3}\d{1,3}' elasticsearch.collectord.io/logs-replace.2-val: 'X.X.X.X' spec: containers: - name: nginx image: nginx
In this example, we have two replace pipes. They apply in the alphabetical order (replace.1
comes first, before the replace.2
).
X.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-" X.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"
Example 3. Whitelisting the messages
With the whitelist
annotation you can configure a pattern for the log messages, and only messages that match this pattern
will be forwarded to ElasticSearch
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-whitelist: '((DELETE)|(POST))$' spec: containers: - name: nginx image: nginx
Hashing values in logs
To hide sensitive data, you can use replace patterns or hashing functions.
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}' elasticsearch.collectord.io/logs-hashing.1-function: 'fnv-1a-64' spec: containers: - name: nginx image: nginx
This example will replace values that look like an IP address in the string
172.17.0.1 - - [16/Nov/2018:11:17:17 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
With the hashed value, in our example using the algorithm fnv-1a-64
gqsxydjtZL4 - - [16/Nov/2018:11:17:17 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
Collectord supports a variety of hashing functions, including cryptographic hashing functions. The list of supported
functions and their performance is listed below (performance in the nanoseconds per operation to hash two IP addresses
in a string source: 127.0.0.1, destination: 10.10.1.99
)
| Function | ns / op | ------------------------------- | adler-32 | 1713 | | crc-32-ieee | 1807 | | crc-32-castagnoli | 1758 | | crc-32-koopman | 1753 | | crc-64-iso | 1739 | | crc-64-ecma | 1740 | | fnv-1-64 | 1711 | | fnv-1a-64 | 1711 | | fnv-1-32 | 1744 | | fnv-1a-32 | 1738 | | fnv-1-128 | 1852 | | fnv-1a-128 | 1836 | | md5 | 2032 | | sha1 | 2037 | | sha256 | 2220 | | sha384 | 2432 | | sha512 | 2516 |
Escaping terminal sequences, including terminal colors
Some containers don't turn off terminal colors automatically when they run inside container.
For example, if you run container with attached tty
and define that you want to see colors
apiVersion: v1 kind: Pod metadata: name: ubuntu-shell spec: containers: - name: ubuntu image: ubuntu tty: true command: [/bin/sh, -c, 'while true; do ls --color=auto /; sleep 5; done;']
You can find messages similar to below in ElasticSearch
[01;34mboot[0m [01;34metc[0m [01;34mlib[0m [01;34mmedia[0m [01;34mopt[0m [01;34mroot[0m [01;34msbin[0m [01;34msys[0m [01;34musr[0m [0m[01;34mbin[0m [01;34mdev[0m [01;34mhome[0m [01;34mlib64[0m [01;34mmnt[0m [01;34mproc[0m [01;34mrun[0m [01;34msrv[0m [30;42mtmp[0m [01;34mvar[0m
You can easily escape them with the annotation elasticsearch.collectord.io/logs-escapeterminalsequences='true'
apiVersion: v1 kind: Pod metadata: name: ubuntu-shell annotations: elasticsearch.collectord.io/logs-escapeterminalsequences: 'true' spec: containers: - name: ubuntu image: ubuntu tty: true command: [/bin/sh, -c, 'while true; do ls --color=auto /; sleep 5; done;']
That way you will see logs in ElasticSearch as you would expect
bin dev home lib64 mnt proc run srv tmp var boot etc lib media opt root sbin sys usr
In the collector configuration file you can find [input.files]/stripTerminalEscapeSequencesRegex
and
[input.files]/stripTerminalEscapeSequences
that defines default regexp used for removing terminal escape sequences
and default value if collector should strip terminal escape sequences (defaults to false
).
Extracting fields from the container logs
You can use fields extraction, which allows you to extract timestamps from the messages, extract fields that will be indexed with ElasticSearch to speed up the search.
Using the same example with nginx
we can define fields extraction for some fields.
172.17.0.1 - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" 172.17.0.1 - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-" 172.17.0.1 - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"
Important note that the first unnamed pattern is used as the message for the event. If you want to override it, you can use
the annotations elasticsearch.collectord.io/logs-extractionMessageField
to use a specific pattern as a message field.
Nested Objects
Considering that you might want to create nested JSON objects in the events sent to ElasticSearch, when you add a named
group with double underscore (__
) in the name, the collector will replace it with dot (.
). For example, if you've
named groups in the extraction pattern (?P<obj__firstname>value1)\w+ (?P<obj__lastname>value2)\w+
, the collector will create
and object {"obj":{"firstname": "value", "lastname": "value"}}
.
Datastreams and index templates
When you extract new fields, you might want to create a new index template for the new fields. Be sure to add
elasticsearch.collectord.io/logs-datastream
annotation to the pod, so the collector will send the data to a new datastream.
Example 1. Extracting the timestamp
Assuming we want to keep the whole message as it is, and extract just a timestamp. We can define the extraction pattern
with the regexp. Specify that the timestampfield
is timestamp
and define the timestampformat
.
We use Go time parsing library, that defines the format with the specific date
Mon Jan 2 15:04:05 MST 2006
. See Go documentation for details.
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-extraction: '^(.*\[(?P<timestamp>[^\]]+)\].+)$' elasticsearch.collectord.io/logs-timestampfield: timestamp elasticsearch.collectord.io/logs-timestampformat: '02/Jan/2006:15:04:05 -0700' spec: containers: - name: nginx image: nginx
In that way, you will get messages in ElasticSearch with the exact timestamp as specified in your container logs.
Example 2. Extracting the fields
If you want to extract some fields and keep the message shorter, as an example, if you've extracted the timestamps,
there is no need for you to keep the timestamp in the raw message. In the example below we extract the ip_address
address
as a field, timestamp
and keep the rest as a raw message.
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/logs-extraction: '^(?P<ip_address>[^\s]+) .* \[(?P<timestamp>[^\]]+)\] (.+)$' elasticsearch.collectord.io/logs-timestampfield: timestamp elasticsearch.collectord.io/logs-timestampformat: '02/Jan/2006:15:04:05 -0700' spec: containers: - name: nginx image: nginx
That results in messages
ip_address | _time | _raw -----------|---------------------|------------------------------------------------- 172.17.0.1 | 2018-08-31 21:11:26 | "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-" 172.17.0.1 | 2018-08-31 21:11:32 | "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-" 172.17.0.1 | 2018-08-31 21:11:35 | "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"
Defining Event pattern
With the annotation elasticsearch.collectord.io/logs-eventpattern
you can define how collector should identify new events in the pipe.
The default event pattern is defined by the collectord configuration as ^[^\s]
(anything that does not start from a space character).
The default pattern works in most of the cases, but doesn't work in some, like Java exceptions, where the call stack of the error starts on the next line, and it doesn't start with the space character.
In example below
we intentionally made a mistake in a configuration for the ElasticSearch (s-node
should be a single-node
) to get the error message
apiVersion: v1 kind: Pod metadata: name: elasticsearch-pod spec: containers: - name: elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0 env: - name: discovery.type value: s-node
Results in
[2018-08-31T22:44:56,433][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/92] [Main.cc@109] controller (64 bit): Version 6.4.0 (Build cf8246175efff5) Copyright (c) 2018 Elasticsearch BV [2018-08-31T22:44:56,886][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main] org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Unknown discovery type [s-node] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.4.0.jar:6.4.0] at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.4.0.jar:6.4.0] Caused by: java.lang.IllegalArgumentException: Unknown discovery type [s-node] at org.elasticsearch.discovery.DiscoveryModule.<init>(DiscoveryModule.java:129) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.node.Node.<init>(Node.java:477) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.node.Node.<init>(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0] ... 6 more [2018-08-31T22:44:56,892][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started
And with the default pattern we will not have the warning line [2018-08-31T22:44:56,886][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
with the whole callstack.
We can define that every log event in this container should start with the [
character with the regular expression as
apiVersion: v1 kind: Pod metadata: name: elasticsearch-pod annotations: elasticsearch.collectord.io/logs-eventpattern: '^\[' spec: containers: - name: elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0 env: - name: discovery.type value: s-node
Note: by default, collectord joins multi-line log lines that are written in the duration of 100ms, waits maximum of 1s for the next line, and combines event in the total of 100Kb. If you see that not all multi-line log lines are joined into one event as you expect, you might need to change the configuration for the collectord under
[pipe.join]
.
Application Logs
Sometimes it is hard or just not practical to redirect all logs from the container to stdout and stderr of the container. In these cases, you keep the logs in the container. We call them application logs. With collectord you can easily pick up these logs and forward them to ElasticSearch. No additional sidecars or processes are required inside your container.
Let's take a look at the example below. We have a postgresql container, that redirects most of the logs to the path
inside the container /var/log/postgresql
. We define for this container a volume (emptyDir
driver) with the name psql_logs
and mount it to /var/log/postgresql/
. With the annotation elasticsearch.collectord.io/volume.1-logs-name=psql_logs
we tell
collector to pick up all the logs with the default glob pattern *.log*
(a default glob pattern is set in the collector configuration,
and you can override it with annotation elasticsearch.collectord.io/volume.{N}-logs-glob
) in the volume and forward them automatically to ElasticSearch.
When you need to forward logs from multiple volumes of the same container, you can group the settings with the same number,
for example
elasticsearch.collectord.io/volume.1-logs-name=psql_logs
and elasticsearch.collectord.io/volume.2-logs-name=psql_logs
Example 1. Forwarding application logs
apiVersion: v1 kind: Pod metadata: name: postgres-pod annotations: elasticsearch.collectord.io/volume.1-logs-name: 'logs' spec: containers: - name: postgres image: postgres command: - docker-entrypoint.sh args: - postgres - -c - logging_collector=on - -c - log_min_duration_statement=0 - -c - log_directory=/var/log/postgresql - -c - log_min_messages=INFO - -c - log_rotation_age=1d - -c - log_rotation_size=10MB volumeMounts: - name: data mountPath: /var/lib/postgresql/data - name: logs mountPath: /var/log/postgresql/ volumes: - name: data emptyDir: {} - name: logs emptyDir: {}
Example 2. Forwarding application logs with fields extraction and time parsing
With the annotations for application logs, you can define fields extraction, replace patterns, override the datastreams.
As an example, with the extraction pattern and timestamp parsing you can do
apiVersion: v1 kind: Pod metadata: name: postgres-pod annotations: elasticsearch.collectord.io/volume.1-logs-name: 'logs' elasticsearch.collectord.io/volume.1-logs-extraction: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} [^\s]+) (.+)$' elasticsearch.collectord.io/volume.1-logs-timestampfield: 'timestamp' elasticsearch.collectord.io/volume.1-logs-timestampformat: '2006-01-02 15:04:05.000 MST' spec: containers: - name: postgres image: postgres command: - docker-entrypoint.sh args: - postgres - -c - logging_collector=on - -c - log_min_duration_statement=0 - -c - log_directory=/var/log/postgresql - -c - log_min_messages=INFO - -c - log_rotation_age=1d - -c - log_rotation_size=10MB volumeMounts: - name: data mountPath: /var/lib/postgresql/data - name: logs mountPath: /var/log/postgresql/ volumes: - name: data emptyDir: {} - name: logs emptyDir: {}
That way you will extract the timestamps and remove them from the message
_time | _raw 2018-08-31 23:31:02 | [133] LOG: duration: 0.908 ms statement: SELECT n.nspname as "Schema", | c.relname as "Name", | CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' END as "Type", | pg_catalog.pg_get_userbyid(c.relowner) as "Owner" | FROM pg_catalog.pg_class c | LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace | WHERE c.relkind IN ('r','p','') | AND n.nspname <> 'pg_catalog' | AND n.nspname <> 'information_schema' | AND n.nspname !~ '^pg_toast' | AND pg_catalog.pg_table_is_visible(c.oid) | ORDER BY 1,2; 2018-08-31 23:30:53 | UTC [124] FATAL: role "postgresql" does not exist
Placeholder templates in a glob pattern
In the case if you're mounting the same volume to multiple Pods, and you want to differentiate the logs, you can now
specify the placeholders in the glob configuration. For example, if you have a volume mounted to the Pod with the name
my-pod
and to the Pod with the name my-pod-2
you can specify the glob configuration like this
{{kubernetes_pod_name}}.log
, so the Collectord will be able to
identify that files my-pod.log
and my-pod-2.log
are coming from different Pods.
On Volume Database for acknowledgements
Collectord has a database that stores the information about the files that were already processed. The database is stored
by default on the host, where Collectord is running. In case, if one of the volumes is used on one host and then is
mounted to another host, the Collectord will start to process the files from the beginning. To avoid this, you can specify
annotation elasticsearch.collectord.io/volume.{N}-logs-onvolumedatabase=true
to enable the on volume database. In this case,
Collectord creates a database in the volume root .collectord.db
, so the data about processed files will be stored in the
volume and will be available on any host.
Important details about this feature, you need to mount the /rootfs
directory in the Collectord container with write access.
By default, the /rootfs
directory is mounted as read-only.
Volume types
Collector supports two volume types for application logs: emptyDir
, hostPath
and persistentVolumeClaim
. Collectord configuration
has two settings that helps collector to autodiscover application logs. First is the [general.kubernetes]/volumesRootDir
for discovering volumes created with emptyDir
, second is [input.app_logs]/root
for discovering host mounts, considering
that they will be mounted with a different path to collector.
Change output destination
By default, collector forwards all the data to ElasticSearch. You can configure containers to redirect data to devnull
instead
with annotation elasticsearch.collectord.io/logs-output=devnull
.
By changing the default output for specific data, you can change how you forward data to ElasticSearch. Instead of forwarding all
the logs by default, you can change configuration for collector with --env "COLLECTOR__LOGS_OUTPUT=input.files__output=devnull"
to specify not forward container logs by default. And define with the containers which logs you want to see in ElasticSearch with
elasticsearch.collectord.io/logs-output=elasticsearch
.
For example:
apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: MyApp annotations: elasticsearch.collectord.io/logs-output: 'elasticsearch' spec: containers: - name: nginx image: nginx
Additionally, if you configure multiple ElasticSearch outputs with the configuration, you can forward the data to a specific ElasticSearch Cluster as
apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: MyApp annotations: elasticsearch.collectord.io/output: 'elasticsearch::prod1' spec: containers: - name: nginx image: nginx
Forwarding logs to multiple ElasticSearch clusters simultaneously
With annotation elasticsearch.collectord.io/output
you can configure multiple ElasticSearch HEC endpoints, for example elasticsearch::apps
and elasticsearch::security
using the comma-separated list: elasticsearch.collectord.io/output=elasticsearch::apps,elasticsearch::security
.
Assuming you have them defined in the ConfigMap like [output.elasticsearch::apps]
and [output.elasticsearch::security]
.
Additionally, you can configure datastreams for the endpoints in square brackets, for example
elasticsearch.collectord.io/output=elasticsearch::apps[logs-team],elasticsearch::security[logs-security]
.
In that case, each event will be sent to both ElasticSearch Clusters.
For example, if you want to forward logs from a specific container to multiple ElasticSearch clusters, you can use the following
apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: MyApp annotations: elasticsearch.collectord.io/logs-output: 'elasticsearch::apps[logs-team],elasticsearch::security[logs-security]' spec: containers: - name: nginx image: nginx
Logs sampling
Example 1. Random based sampling
When the application produces a high number of logs, in some cases, it could be enough to just on the sampled amount of the logs to understand how many failed requests the application has, or how it behaves. You can add an annotation for the logs to specify the percent number of the logs that should be forwarded to ElasticSearch.
In the following example, this application produces 300,000
log lines. Only about 60,000
log lines are going to be
forwarded to ElasticSearch.
apiVersion: v1 kind: Pod metadata: name: logtest annotations: elasticsearch.collectord.io/logs-sampling-percent: '20' spec: restartPolicy: Never containers: - name: logtest image: docker.io/mffiedler/ocp-logtest:latest args: [python, ocp_logtest.py, --line-length=1024, --num-lines=300000, --rate=60000, --fixed-line]
Example 2. Hash-based sampling
In the situations where you want to look at the pattern for a specific user, you can specify that you want to sample logs based on the hash value, to be sure if the same key presents in two different log lines, both of them will be forwarded to ElasticSearch.
In the following example we define a key
(should be a named submatch pattern) as an IP address.
apiVersion: v1 kind: Pod metadata: name: nginx-sampling annotations: elasticsearch.collectord.io/logs-sampling-percent: '20' elasticsearch.collectord.io/logs-sampling-key: '^(?P<key>(\d+\.){3}\d+)' spec: containers: - name: nginx-sampling image: nginx
Thruput
You can configure the thruput specifically for the container logs as
apiVersion: v1 kind: Pod metadata: name: nginx-sampling annotations: elasticsearch.collectord.io/logs-ThruputPerSecond: 128Kb spec: containers: - name: nginx-sampling image: nginx
In case if this container produces more than 128kb
per second collectord will throttle logs.
Time correction
If you're pre-loading a lot of logs, you might want to configure the events, that you want to skip, as they're too old or too new
apiVersion: v1 kind: Pod metadata: name: nginx-sampling annotations: elasticsearch.collectord.io/logs-TooOldEvents: 168h elasticsearch.collectord.io/logs-TooOldEvents: 1h spec: containers: - name: nginx-sampling image: nginx
Handling multiple containers
Pod can have multiple containers. You can define annotations for a specific container with the name prefixed the annotation.
The format of the annotations is elasticsearch.collectord.io/{container_name}--{annotation}: {annotation-value}
. As an example.
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: elasticsearch.collectord.io/web--logs-index: 'web' elasticsearch.collectord.io/web--logs-replace.2-search: '(?P<IPv4p1>\d{1,3})(\.\d{1,3}){3}' elasticsearch.collectord.io/web--logs-replace.2-val: '${IPv4p1}.X.X.X' elasticsearch.collectord.io/user--logs-disabled: 'true' spec: containers: - name: web image: nginx - name: user image: busybox args: [/bin/sh, -c, 'while true; do wget -qO- localhost:80 &> /dev/null; sleep 5; done']
Cluster level annotations
You can apply annotations to Pods on the cluster level with Configuration in api group collectord.io/v1
. For example
apiVersion: "collectord.io/v1" kind: Configuration metadata: name: apply-to-all-nginx annotations: elasticsearch.collectord.io/nginx--logs-replace.1-search: '^.+\"GET [^\s]+ HTTP/[^"]+" 200 .+$' elasticsearch.collectord.io/nginx--logs-replace.1-val: '' elasticsearch.collectord.io/nginx--logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}' elasticsearch.collectord.io/nginx--logs-hashing.1-function: 'fnv-1a-64' spec: kubernetes.container.image: "^nginx(:.*)?$"
This configuration will be applied to all containers that use the image with nginx
in the name (examples are nginx:latest
or nginx:1.0
).
In the spec of the Configuration you include selectors based on the meta fields that we forward to ElasticSearch, which can include
fields like container.image.name
, kubernetes.container.name
, kubernetes.daemonset.name
, kubernetes.namespace
,
kubernetes.pod.name
, etc. When you specify multiple fields in the spec, all regexes should match.
Forcing Cluster Level Annotations
If you already have an annotation, for example, elasticsearch.collectord.io/index=foo
defined on Namespace, Deployment, or Pod,
and if you're trying to apply this annotation from Cluster Level Configuration as elasticsearch.collectord.io/index=bar
, the one
from the objects will take priority.
With the force
modifier you can override those annotations, even if you have them defined on the objects.
1 2 3 4 5 6 7 8 9 | apiVersion: "collectord.io/v1" kind: Configuration metadata: name: apply-to-all-nginx annotations: elasticsearch.collectord.io/index=bar spec: kubernetes_container_image: "^nginx(:.*)?$" force: true |
NOTE: if you have an annotation defined in the namespace as
elasticsearch.collectord.io/logs-index=foo
, it will still take priority overindex=bar
, aslogs-index=foo
is type specific.
Troubleshooting
Check the collector logs for warning messages about the annotations, you can find if you made a misprint in the annotations if you see warnings like
WARN 2018/08/31 21:05:33.122978 core/input/annotations.go:76: invalid annotation ...
Some pipes, like fields extraction and time-parsing pipes adds an error in the field collectord_errors
, so you can identify
when some events failed to be processed by this pipe.
Describe command
Please use Collectord describe command to see how annotations are applied to a specific pod or container.
See Troubleshooting -> Describe.
Reference
- General annotations
elasticsearch.collectord.io/index
- change the datastream for all the data forwarded for this Pod (metrics, container logs, application logs)elasticsearch.collectord.io/host
- change the host for all the data forwarded for this Pod (metrics, container logs, application logs)elasticsearch.collectord.io/output
- change the output todevnull
orelasticsearch
elasticsearch.collectord.io/userfields.{fieldname}
- attach custom fields to events
- Annotations for container logs
elasticsearch.collectord.io/logs-index
- change the datastream for the container logs forwarded from this Podelasticsearch.collectord.io/logs-host
- change the host for the container logs forwarded from this Podelasticsearch.collectord.io/logs-eventpattern
- set the regex identifying the event start pattern for Pod logselasticsearch.collectord.io/logs-replace.{N}-search
- define the search pattern for the replace pipeelasticsearch.collectord.io/logs-replace.{N}-val
- define the replace pattern for the replace pipeelasticsearch.collectord.io/logs-hashing.{N}-match
- the regexp for a matched valueelasticsearch.collectord.io/logs-hashing.{N}-function
- hash function (default issha256
, availableadler-32
,crc-32-ieee
,crc-32-castagnoli
,crc-32-koopman
,crc-64-iso
,crc-64-ecma
,fnv-1-64
,fnv-1a-64
,fnv-1-32
,fnv-1a-32
,fnv-1-128
,fnv-1a-128
,md5
,sha1
,sha256
,sha384
,sha512
)elasticsearch.collectord.io/logs-extraction
- define the regexp for fields extractionelasticsearch.collectord.io/logs-extractionMessageField
- specify the field name for the message (by default first unnamed ground in regexp)elasticsearch.collectord.io/logs-timestampfield
- define the field for timestamp (after fields extraction)elasticsearch.collectord.io/logs-timestampformat
- define the timestamp formatelasticsearch.collectord.io/logs-timestampsetmonth
- define if month should be set to current for timestampelasticsearch.collectord.io/logs-timestampsetday
- define if day should be set to current for timestampelasticsearch.collectord.io/logs-timestamplocation
- define timestamp location if not set by formatelasticsearch.collectord.io/logs-joinpartial
- join partial eventselasticsearch.collectord.io/logs-joinmultiline
- join multiline logs (default value depends on[pipe.join] disabled
)elasticsearch.collectord.io/logs-escapeterminalsequences
- escape terminal sequences (including colors)elasticsearch.collectord.io/logs-override.{N}-match
- match for override patternelasticsearch.collectord.io/logs-override.{N}-index
- override datastream for matched eventselasticsearch.collectord.io/logs-output
- change the output todevnull
orelasticsearch
(this annotation can't be specified for stderr and stdout)elasticsearch.collectord.io/logs-disabled
- disable any log processing for this container (this annotation can't be specified for stderr and stdout)elasticsearch.collectord.io/logs-sampling-percent
- specify the % value of logs that should be forwarded to ElasticSearchelasticsearch.collectord.io/logs-sampling-key
- regexp pattern to specify the key for the sampling based on hash valueselasticsearch.collectord.io/logs-ThruputPerSecond
- set the thruput for this container, maximum number of logs per second, for example128Kb
,1024b
elasticsearch.collectord.io/logs-TooOldEvents
- duration of events from now to past that are considered too old and should be ignored, for example168h
,24h
elasticsearch.collectord.io/logs-TooNewEvents
- duration of events from now to the future that are considered too new and should be ignored, for example1h
,30m
elasticsearch.collectord.io/logs-whitelist
- allow configuring a pattern for log messages, only log messages matching this pattern will be forwarded to ElasticSearchelasticsearch.collectord.io/logs-userfields.{fieldname}
- attach custom fields to events- Specific for
stdout
, with the annotations below you can define configuration specific forstdout
elasticsearch.collectord.io/stdout-logs-index
elasticsearch.collectord.io/stdout-logs-host
elasticsearch.collectord.io/stdout-logs-eventpattern
elasticsearch.collectord.io/stdout-logs-replace.{N}-search
elasticsearch.collectord.io/stdout-logs-replace.{N}-val
elasticsearch.collectord.io/stdout-logs-hashing.{N}-match
elasticsearch.collectord.io/stdout-logs-hashing.{N}-function
elasticsearch.collectord.io/stdout-logs-extraction
elasticsearch.collectord.io/stdout-logs-extractionMessageField
elasticsearch.collectord.io/stdout-logs-timestampfield
elasticsearch.collectord.io/stdout-logs-timestampformat
elasticsearch.collectord.io/stdout-logs-timestampsetmonth
elasticsearch.collectord.io/stdout-logs-timestampsetday
elasticsearch.collectord.io/stdout-logs-timestamplocation
elasticsearch.collectord.io/stdout-logs-joinpartial
elasticsearch.collectord.io/stdout-logs-joinmultiline
elasticsearch.collectord.io/stdout-logs-escapeterminalsequences
elasticsearch.collectord.io/stdout-logs-override.{N}-match
elasticsearch.collectord.io/stdout-logs-override.{N}-index
elasticsearch.collectord.io/stdout-logs-sampling-percent
elasticsearch.collectord.io/stdout-logs-sampling-key
elasticsearch.collectord.io/stdout-logs-ThruputPerSecond
elasticsearch.collectord.io/stdout-logs-TooOldEvents
elasticsearch.collectord.io/stdout-logs-TooNewEvents
elasticsearch.collectord.io/stdout-logs-whitelist
- Specific for
stderr
, with the annotations below you can define configuration specific forstderr
elasticsearch.collectord.io/stderr-logs-index
elasticsearch.collectord.io/stderr-logs-host
elasticsearch.collectord.io/stderr-logs-eventpattern
elasticsearch.collectord.io/stderr-logs-replace.{N}-search
elasticsearch.collectord.io/stderr-logs-replace.{N}-val
elasticsearch.collectord.io/stderr-logs-hashing.{N}-match
elasticsearch.collectord.io/stderr-logs-hashing.{N}-function
elasticsearch.collectord.io/stderr-logs-extraction
elasticsearch.collectord.io/stdout-logs-extractionMessageField
elasticsearch.collectord.io/stderr-logs-timestampfield
elasticsearch.collectord.io/stderr-logs-timestampformat
elasticsearch.collectord.io/stderr-logs-timestampsetmonth
elasticsearch.collectord.io/stderr-logs-timestampsetday
elasticsearch.collectord.io/stderr-logs-timestamplocation
elasticsearch.collectord.io/stderr-logs-joinpartial
elasticsearch.collectord.io/stderr-logs-joinmultiline
elasticsearch.collectord.io/stderr-logs-escapeterminalsequences
elasticsearch.collectord.io/stderr-logs-override.{N}-match
elasticsearch.collectord.io/stderr-logs-override.{N}-index
elasticsearch.collectord.io/stderr-logs-sampling-percent
elasticsearch.collectord.io/stderr-logs-sampling-key
elasticsearch.collectord.io/stderr-logs-ThruputPerSecond
elasticsearch.collectord.io/stderr-logs-TooOldEvents
elasticsearch.collectord.io/stderr-logs-TooNewEvents
elasticsearch.collectord.io/stderr-logs-whitelist
- Annotations for events (can be applied only to namespaces)
elasticsearch.collectord.io/events-index
- change the datastream for the events of specific namespaceelasticsearch.collectord.io/events-host
- change the host for the events of specific namespaceelasticsearch.collectord.io/events-userfields.{fieldname}
- attach custom fields to events
- Annotations for application logs
elasticsearch.collectord.io/volume.{N}-logs-name
- name of the volume attached to Podelasticsearch.collectord.io/volume.{N}-logs-index
- target datastream for logs forwarded from the volumeelasticsearch.collectord.io/volume.{N}-logs-host
- change the host for logs forwarded from the volumeelasticsearch.collectord.io/volume.{N}-logs-eventpattern
- change the event pattern defining new event for logs forwarded from the volumeelasticsearch.collectord.io/volume.{N}-logs-replace.{N}-search
- specify the regex search for replace pipe for the logselasticsearch.collectord.io/volume.{N}-logs-replace.{N}-val
- specify the regex replace pattern for replace pipe for the logselasticsearch.collectord.io/volume.{N}-logs-hashing.{N}-match
- the regexp for a matched valueelasticsearch.collectord.io/volume.{N}-logs-hashing.{N}-function
- hash function (default issha256
, availableadler-32
,crc-32-ieee
,crc-32-castagnoli
,crc-32-koopman
,crc-64-iso
,crc-64-ecma
,fnv-1-64
,fnv-1a-64
,fnv-1-32
,fnv-1a-32
,fnv-1-128
,fnv-1a-128
,md5
,sha1
,sha256
,sha384
,sha512
)elasticsearch.collectord.io/volume.{N}-logs-extraction
- specify the fields extraction with the regex the logselasticsearch.collectord.io/volume.{N}-logs-extractionMessageField
- specify the field name for the message (by default first unnamed ground in regexp)elasticsearch.collectord.io/volume.{N}-logs-timestampfield
- specify the timestamp fieldelasticsearch.collectord.io/volume.{N}-logs-timestampformat
- specify the format for timestamp fieldelasticsearch.collectord.io/volume.{N}-logs-timestampsetmonth
- define if month should be set to current for timestampelasticsearch.collectord.io/volume.{N}-logs-timestampsetday
- define if day should be set to current for timestampelasticsearch.collectord.io/volume.{N}-logs-timestamplocation
- define timestamp location if not set by formatelasticsearch.collectord.io/volume.{N}-logs-glob
- set the glob pattern for matching logselasticsearch.collectord.io/volume.{N}-logs-match
- set the regexp pattern for matching logselasticsearch.collectord.io/volume.{N}-logs-recursive
- set if walker should walk the directory recursiveelasticsearch.collectord.io/volume.{N}-logs-override.{N}-match
- match for override patternelasticsearch.collectord.io/volume.{N}-logs-override.{N}-index
- override datastream for matched eventselasticsearch.collectord.io/volume.{N}-logs-sampling-percent
- specify the % value of logs that should be forwarded to ElasticSearchelasticsearch.collectord.io/volume.{N}-logs-sampling-key
- regexp pattern to specify the key for the sampling based on hash valueselasticsearch.collectord.io/volume.{N}-logs-ThruputPerSecond
- set the thruput for this container, maximum number of logs per second, for example128Kb
,1024b
elasticsearch.collectord.io/volume.{N}-logs-TooOldEvents
- duration of events from now to past that are considered too old and should be ignored, for example168h
,24h
elasticsearch.collectord.io/volume.{N}-logs-TooNewEvents
- duration of events from now to the future that are considered too new and should be ignored, for example1h
,30m
elasticsearch.collectord.io/volume.{N}-logs-whitelist
- allow configuring pattern for log messages, only log messages matching this pattern will be forwarded to ElasticSearchelasticsearch.collectord.io/volume.{N}-logs-userfields.{fieldname}
- attach custom fields to eventselasticsearch.collectord.io/volume.{N}-logs-maxholdafterclose
- how long Collectord can hold file descriptors open for files in PVC after pod is terminated (duration5s
,1800s
)elasticsearch.collectord.io/volume.{N}-logs-onvolumedatabase
- boolean flag to enable on a volume database for this volume, in case if this volume might be used on more than one host
Links
-
Installation
- Forwarding container logs, application logs, host logs and audit logs
- Test our solution with the embedded 30-days evaluation license.
-
Collectord Configuration
- Collectord configuration reference for Kubernetes and OpenShift clusters.
-
Annotations
- Changing a type and format of messages forwarded from namespaces, workloads and pods.
- Forwarding application logs.
- Multi-line container logs.
- Fields extraction for application and container logs (including timestamp extractions).
- Hiding sensitive data, stripping terminal escape codes and colors.
- Troubleshooting
- FAQ and the common questions
- License agreement
- Pricing
- Contact