Monitoring Docker, OpenShift and Kubernetes - Version 5.3
November 19, 2018We are happy to share with you minor update of our solutions for Monitoring Docker, Kubernetes and OpenShift. This update brings improved capabilities for monitoring multiple clusters within one application, better observability for the state of the forwarding data, also insights into the Splunk Usage.
New annotations
Hashing sensitive data
If you need to hide sensitive data (to hide PII data and be compliant with GDPR) we suggested to use the replace patterns
so that you can replace IP addresses with static values like X.X.X.X
. But that can complicate observability if you want to
see the trace, or see all the requests from the specific IP address. Now, by using hashing functions you can get
the same values for the same IP addresses, so that can help you to identify similar values.
With the annotation logs-hashing.1-match
you can specify a match regexp.
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: collectord.io/logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}' spec: containers: - name: nginx image: nginx
Default hashing function is sha256
. So the resulting hashing value can be larger than the source value.
EsoXtJryKJQ28wPgFmAwoh5SXSZuIJJnQzgBqP1AcaA - - [18/Nov/2018:01:25:27 +0000] "GET /404 HTTP/1.1" 404 153 "-" "Wget" "-"
But you can specify the hashing function. For example when we set collectord.io/logs-hashing.1-function: 'fnv-1a-64'
to minimize
the length of the hashing result, we get smaller hashing result
qrr-cQTZFL4 - - [18/Nov/2018:01:27:17 +0000] "GET /404 HTTP/1.1" 404 153 "-" "Wget" "-"
- Monitoring OpenShift v5 - Annotations
- Monitoring Kubernetes v5 - Annotations
- Monitoring Docker v5 - Annotations
Annotations for specific container
Pods can have more than one container, but you cannot specify annotations on container level. With the version 5.3
we
allow to define container specific annotations with the format collectord.io/{container_name}--{annotation}: {annotation-value}
.
As an example, if you have nginx
containers running with other images, and you want to define various annotations, you
can do that as in the example
apiVersion: v1 kind: Pod metadata: name: nginx-pod annotations: collectord.io/nginx--logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}' collectord.io/get-trigger--logs-output: devnull spec: containers: - name: nginx image: nginx - name: get-trigger image: busybox args: [/bin/sh, -c, 'while true; do wget -qO- localhost:80; sleep 5; done']
In that example annotation logs-hashing.1-match
applied only to the nginx
container, and logs-output
to get-trigger
container.
Other annotations
collectord.io/logs-joinmultiline
- disable multi-line joining for the Podcollectord.io/logs-disabled
- completely disable log processing. The difference with thelogs-output=devnull
is that in case ofdevnull
output Collectord still reads the logs, so if you change the output later, Collectord will start processing logs right from the moment when you changed the output. In case of changingdisabled=true
tofalse
Collectord will start forwarding logs from this container as this is a new container, starting from the beginning of the log files.
Improved observability
We have added several alerts, that can help you to troubleshoot issues with Collectord. Alerts to show when Collectord reports errors in the processing pipeline, for example when it fails to extract the fields. Alert for showing when Collectord reports Warning messages, that can identify issues with the access to API Server, or that not all the requests to Splunk HEC can be delivered from the first time. The third alert is about the lag between the time of event and indexing time, this alert can identify issues with the performance of Collectord or Splunk Indexing pipeline.
Reducing Splunk Licensing cost for Network Socket Data and Events
We improved identification for the events, that we already sent to Splunk. That allows reducing amount of events Collectord forwards to Splunk. In a very high number of events that can be a significant change.
In version 5.3
Collectord groups network socket connections with the similar remote and local IP. For example, if a local
container has two connections
remote_addr | remote_port | local_addr | local_port | protocol | tcp_state | time 10.128.0.3 | 9090 | 10.128.0.1 | 55338 | tcp | TIME_WAIT | 2018-11-17 16:53:03.668 10.128.0.3 | 9090 | 10.128.0.1 | 55432 | tcp | TIME_WAIT | 2018-11-17 16:53:03.668
With version 5.3
Collectord groups them and adds an additional field connections
remote_addr | remote_port | local_addr | local_port | protocol | tcp_state | time | connections 10.128.0.3 | 9090 | 10.128.0.1 | 55338-55432 | tcp | TIME_WAIT | 2018-11-17 16:53:03.668 | 2
We have found that this grouping can reduce licensing cost of network socket table data in 4 times.
You can also see how much licensing cost is taken by the application with the Splunk Usage dashboard.
Performance improvements
With version 5.3 we significantly improved memory usage and improved log processing performance improvement. You can see the result in separate blog post Performance comparison between Collectord, Fluentd and Fluent-bit.
Links
You can find more information about other minor updates by following links below.
Upgrade instructions
Release notes
- Monitoring OpenShift - Release notes
- Monitoring Kubernetes - Release notes
- Monitoring Docker - Release notes