Outcold Solutions LLC

Monitoring Docker - Version 5

Monitoring Docker Installation

With our solution for Monitoring Docker, you can start monitoring your clusters in under 10 minutes, including forwarding metadata-enriched container logs, host logs, and metrics. You can request an evaluation license that valid for the 30 days.

Splunk configuration

Install Monitoring Docker application

Install Monitoring Docker from splunkbase. You need to install it on Search Heads only.

If you created a dedicated index, that is not searchable by default, modify the macro macro_docker_base to include this index.

macro_docker_base = (index=docker)

Enable HTTP Event Collector in Splunk

Outcold Solutions' Collector sends data to Splunk using HTTP Event Collector. By default, Splunk does not enable HTTP Event Collector. Please read HTTP Event Collector walkthrough to learn more about HTTP Event Collector.

The minimum requirement is Splunk Enterprise or Splunk Cloud 6.5. If you are managing Splunk Clusters with version below 6.5, please read our FAQ how to setup Heavy Weight Forwarder in between.

After enabling HTTP Event Collector, you need to find correct Url for HTTP Event Collector and generate an HTTP Event Collector Token. If you are running your Splunk instance on hostname hec.example.com, it listens on port 8088, using SSL and token is B5A79AAD-D822-46CC-80D1-819F80D7BFB0 you can test it with the curl command as in the example below.

$ curl -k https://hec.example.com:8088/services/collector/event/1.0 -H "Authorization: Splunk B5A79AAD-D822-46CC-80D1-819F80D7BFB0" -d '{"event": "hello world"}'
{"text": "Success", "code": 0}

-k is necessary for self-signed certificates.

If you are using Splunk Cloud, the URL is not the same as url for Splunk Web, see Send data to HTTP Event Collector on Splunk Cloud instances for details.

If you use an index, that is not searchable by default, please read our documentation on how to configure indices at Splunk and inside the collector at Splunk Indexes.

Install Collector for Docker

Pre-requirements

The collector uses JSON-files generated by JSON logging driver as a source for container logs.

Some linux distributions, CentOS for example, by default enable journald logging driver instead of default JSON logging driver. You can verify which driver is used by default

$ docker info | grep "Logging Driver"
Logging Driver: json-file

If docker configuration file location is /etc/sysconfig/docker (common in CentOS/RHEL case with Docker 1.13), you can change it and restart docker daemon after that with following commands.

$ sed -i 's/--log-driver=journald/--log-driver=json-file --log-opt max-size=100M --log-opt max-file=3/' /etc/sysconfig/docker
$ systemctl restart docker

If you configure Docker daemon with daemon.json in /etc/docker/daemon.json (common in Debian/Ubuntu), you can change it and restart docker daemon.

{
  "log-driver": "json-file",
  "log-opts" : {
    "max-size" : "100m",
    "max-file" : "3"
  }
}
$ systemctl restart docker

Please follow the manual to learn how to configure default logging driver for containers:

JSON logging driver configuration

With the default configuration, docker does not rotate JSON log files, with time they can become large and consume all disk space. That is why we specify max-size and max-file with the default configurations. See Configure and troubleshoot the Docker daemon for more details.

Installation

Pull the latest version of the collector.

docker pull outcoldsolutions/collectorfordocker:5.23.431

We recommend sticking to specific latest version to make the upgrade process more straightforward. Follow us on blog, twitter or subscribe to the newsletter to keep up to date with the releases.

Run collector image as in the example (command uses the same configuration as curl command above). Modify Splunk URL and Token value, review and accept license agreement and include license key (request an evaluation license key with this automated form).

On line 18 you can also name the cluster, replace - with the cluster name.

If you are planning to deploy Collectord on a cluster, which was running for a while, and has a lot of logs stored on the disk, Collectord will forward all the logs, which can disturb your cluster. You can configure under [general] values thruputPerSecond or tooOldEvents to configure the amount of logs you want to forward per second, and which events Collectord should skip.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
docker run -d \
    --name collectorfordocker \
    --volume /sys/fs/cgroup:/rootfs/sys/fs/cgroup:ro \
    --volume /proc:/rootfs/proc:ro \
    --volume /var/log:/rootfs/var/log:ro \
    --volume /var/lib/docker/:/rootfs/var/lib/docker/:ro \
    --volume /var/run/docker.sock:/rootfs/var/run/docker.sock:ro \
    --volume collector_data:/data/ \
    --cpus=1 \
    --cpu-shares=204 \
    --memory=256M \
    --restart=always \
    --env "COLLECTOR__SPLUNK_URL=output.splunk__url=https://hec.example.com:8088/services/collector/event/1.0" \
    --env "COLLECTOR__SPLUNK_TOKEN=output.splunk__token=B5A79AAD-D822-46CC-80D1-819F80D7BFB0"  \
    --env "COLLECTOR__SPLUNK_INSECURE=output.splunk__insecure=true"  \
    --env "COLLECTOR__ACCEPTLICENSE=general__acceptLicense=true" \
    --env "COLLECTOR__LICENSE=general__license=..." \
    --env "COLLECTOR__CLUSTER=general__fields.docker_cluster=-" \
    --privileged \
    outcoldsolutions/collectorfordocker:5.23.431

If you are running this command in Powershell on Windows, replace the \ at the end of each line (except the last one) with ^ to make the multi-line input to work.

In case of AWS ECS you need to change /sys/fs/cgroup:/rootfs/sys/fs/cgroup:ro to /cgroup:/rootfs/sys/fs/cgroup:ro as ECS optimized images mount cgroup filesystem in root folder directly. See Monitoring Amazon Elastic Container Service Clusters in Splunk.

You can find more information about the available configuration options for our collector and how you can configure it by building your image with embedded configuration on top of the official image at Configuration page.

If you use Docker Compose, use our docker-compose.yaml as a reference.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
version: "3"
services:

  collectorfordocker:
    image: outcoldsolutions/collectorfordocker:5.23.431
    volumes:
      - /sys/fs/cgroup:/rootfs/sys/fs/cgroup:ro
      - /proc:/rootfs/proc:ro
      - /var/log:/rootfs/var/log:ro
      - /var/lib/docker/:/rootfs/var/lib/docker/:ro
      - /var/run/docker.sock:/rootfs/var/run/docker.sock:ro
      - collector_data:/data/
    environment:
      - COLLECTOR__SPLUNK_URL=output.splunk__url=https://hec.example.com:8088/services/collector/event/1.0
      - COLLECTOR__SPLUNK_TOKEN=output.splunk__token=B5A79AAD-D822-46CC-80D1-819F80D7BFB0
      - COLLECTOR__SPLUNK_INSECURE=output.splunk__insecure=true
      - COLLECTOR__ACCEPTLICENSE=general__acceptLicense=true
      - COLLECTOR__LICENSE=general__license=...
      - COLLECTOR__CLUSTER=general__fields.docker_cluster=-
    restart: always
    deploy:
      mode: global
      restart_policy:
        condition: any
      resources:
        limits:
          cpus: '1'
          memory: 256M
        reservations:
          cpus: '0.1'
          memory: 64M
    privileged: true

volumes:
  collector_data:

If you use version 2.x of docker-compose

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
version: "2.2"
services:

  collectorfordocker:
    image: outcoldsolutions/collectorfordocker:5.23.431
    volumes:
      - /sys/fs/cgroup:/rootfs/sys/fs/cgroup:ro
      - /proc:/rootfs/proc:ro
      - /var/log:/rootfs/var/log:ro
      - /var/lib/docker/:/rootfs/var/lib/docker/:ro
      - /var/run/docker.sock:/rootfs/var/run/docker.sock:ro
      - collector_data:/data/
    environment:
      - COLLECTOR__SPLUNK_URL=output.splunk__url=https://hec.example.com:8088/services/collector/event/1.0
      - COLLECTOR__SPLUNK_TOKEN=output.splunk__token=B5A79AAD-D822-46CC-80D1-819F80D7BFB0
      - COLLECTOR__SPLUNK_INSECURE=output.splunk__insecure=true
      - COLLECTOR__ACCEPTLICENSE=general__acceptLicense=true
      - COLLECTOR__LICENSE=general__license=...
      - COLLECTOR__CLUSTER=general__fields.docker_cluster=-
    restart: always
    cpus: 1
    cpu_shares: 200
    mem_limit: 256M
    mem_reservation: 64M
    privileged: true

volumes:
  collector_data:

If you use Splunk generated certificate, you probably want to add some SSL specific configuration. The easiest to get started with is --env "COLLECTOR__SPLUNK_INSECURE=output.splunk__insecure=true" to skip SSL validation, as we specified in examples above.

Important note, that collector does not require you to change the default logging driver. It forwards logs from default JSON logging driver.

Give it a few moments to download the image and start the container. After the container is deployed, go to the Monitoring Docker application in Splunk and you should see data on dashboards.

The collector forwards by default container logs, host logs (including syslog), metrics for host, containers and processes.

Deploying on Docker Swarm

To deploy on Docker Swarm you can use docker-compose.yaml file from above and use the the stack deploy command.

docker stack deploy --compose-file ./docker-compose.yaml collectorfordocker

Deploying on Podman

Our solution does support Podman as a docker replacement. One of the requirements is to use a journald logging driver (default), as k8s-file logging driver does not keep rotated files, which makes it impossible to build a reliable solution.

When you configure Collectord at a minimum, you will need to configure the url to the Podman socket (instead of Docker socket), and change the Docker Root Folder.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
podman run -d \
    --name collectorforpodman \
    --volume /:/rootfs:ro \
    --volume collector_data:/data/ \
    --cpus=2 \
    --cpu-shares=1024 \
    --memory=512M \
    --restart=always \
    --env "COLLECTOR__SPLUNK_URL=output.splunk__url=..." \
    --env "COLLECTOR__SPLUNK_TOKEN=output.splunk__token=..."  \
    --env "COLLECTOR__SPLUNK_INSECURE=output.splunk__insecure=true"  \
    --env "COLLECTOR__EULA=general__acceptLicense=true" \
    --env "COLLECTOR__LICENSE_KEY=general__license=..." \
    --env "COLLECTOR__GENERALPODMAN_URL=general.docker__url=unix:///rootfs/var/run/podman/podman.sock" \
    --env "COLLECTOR__GENERALPODMAN_STORAGE=general.docker__dockerRootFolder=/rootfs/var/lib/" \
    --ulimit nofile=1048576:1048576 \
    --privileged \
    outcoldsolutions/collectorfordocker:5.23.431

Next steps

  • Review predefined alerts.
  • Verify configuration by using our troubleshooting instructions.
  • If your docker hosts use different timezone than UTC you need to let container to know about that. If your Linux host has a file /etc/localtime, you can just mount it to collectorfordocekr container with --volume /etc/localtime:/etc/localtime:ro. As alternative you can change configuration and specify timezone for all [input.files::*] inputs with timestampLocation.
  • We send the data to the default HTTP Event Collector index. For better performance we recommend at least to split logs with metrics in separate indices. You can find how to configure indexes in our guide Splunk Indices.
  • We provide flexible scheme, that allows you define search time extraction for logs in your containers. Follow the guide Splunk fields extraction for container logs to learn more.
  • Collector can forward application logs (logs inside the container) automatically with the annotations.
  • You can define specific patterns for multi-line log lines; override indexes, sources, source types for the logs and metrics; extract fields, redirect some log lines to /dev/null, hide sensitive information from logs with annotations for containers.

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.