Collecting Prometheus metrics
Most of the components in Kubernetes control plane export metrics in Prometheus format. The collector can read these metrics forward them to Splunk Enterprise or Splunk Cloud. Our installation has default configurations for collecting metrics from Kubernetes API Server, Scheduler, Controller Manager, Kubelets and etcd cluster. In most Kubernetes providers you don't need to do additional configuration to see these metrics.
If your applications export metrics in Prometheus format, you can use our collector to forward these metrics as well to Splunk Enterprise or Splunk Cloud.
Forwarding metrics from Pods
Please read our documentation on annotations, to learn how you can define forwarding metrics from Pods.
Defining prometheus input
We deploy collector in 3 different workloads. Depending on where you want to collect your metrics, you should plan to include you Prometheus metrics.
-
002-daemonset.conf
is installed on all nodes (masters and non-masters). Use this configuration if you need to collect metrics from all nodes, from local ports. Example of these metrics is Kubelet metrics. -
003-daemonset-master.conf
is installed only on master nodes. Use this configuration to collect metrics only from master nodes from local ports. Examples of these metrics are control plane processes, etcd running on masters. -
004-addon.conf
installed as a deployment and used only once in the whole cluster. Place your Prometheus configuration here, if you want to collect metrics from endpoints or service. Examples of these Prometheus configurations are controller manager and scheduler, which can be accessed only from an internal network and can be discovered with endpoints. Another example is etcd cluster running outside of the Kubernetes cluster.
Default configuration
Kubelet
On every node collector reads and forwards kubelet metrics. We deploy this configuration in 002-daemonset.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | [input.prometheus::kubelet] # disable prometheus kubelet metrics disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host (environment variables are supported, by default Kubernetes node name is used) host = ${KUBERNETES_NODENAME} # override source source = kubelet # how often to collect prometheus metrics interval = 60s # Prometheus endpoint, multiple values can be specified, collector tries them in order till finding the first # working endpoint. # At first trying to get it through proxy endpoint.1proxy = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${KUBERNETES_NODENAME}/proxy/metrics # In case if cannot get it through proxy, trying localhost endpoint.2http = http://127.0.0.1:10255/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token # server certificate for certificate validation certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # client certificate for authentication clientCertPath = # Allow invalid SSL server certificate insecure = true # include metrics help with the events includeHelp = false |
Kubernetes API Server
On master nodes collectors reads and forwards metrics from the kubernetes API server. We deploy this configuration using 003-daemonset-master.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | [input.prometheus::kubernetes-api] # disable prometheus kubernetes-api metrics disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host (environment variables are supported, by default Kubernetes node name is used) host = ${KUBERNETES_NODENAME} # override source source = kubernetes-api # how often to collect prometheus metrics interval = 60s # prometheus endpoint # at first trying to get it from localhost (avoiding load balancer, if multiple api servers) endpoint.1localhost = https://127.0.0.1:6443/metrics # as fallback using proxy endpoint.2kubeapi = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token # server certificate for certificate validation certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # client certificate for authentication clientCertPath = # Allow invalid SSL server certificate insecure = true # include metrics help with the events includeHelp = false |
Scheduler
On master nodes collectors reads and forwards metrics from the scheduler. We deploy this configuration using 003-daemonset-master.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | [input.prometheus::scheduler] # disable prometheus scheduler metrics disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host host = ${KUBERNETES_NODENAME} # override source source = scheduler # how often to collect prometheus metrics interval = 60s # prometheus endpoint endpoint = http://127.0.0.1:10251/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = # server certificate for certificate validation certPath = # client certificate for authentication clientCertPath = # Allow invalid SSL server certificate insecure = true # include metrics help with the events includeHelp = false |
Collecting metrics from scheduler using endpoint discovery
The collector will be able to forward metrics from scheduler only if scheduler binds to the localhost on master nodes.
In case if scheduler only binds to the pod network, you need to use a different way of collecting metrics from the scheduler.
In 004-addon.conf
you can find commented out section [input.prometheus::scheduler]
,
that allows collecting metrics from the scheduler using endpoint discovery.
You can comment out section [input.prometheus::scheduler]
in 003-daemonset-master.conf
and uncomment in 004-addon.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # Example on how to get scheduler metrics with endpoint discovery [input.prometheus::scheduler] # disable prometheus scheduler disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host (using discovery from endpoint) host = # override source source = scheduler # how often to collect prometheus metrics interval = 60s # prometheus endpoint endpoint = endpoint-http://kube-scheduler-collectorforkubernetes-discovery:10251/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = # server certificate for certificate validation certPath = # client certificate for authentication clientCertPath = # Allow invalid SSL server certificate insecure = false # include metrics help with the events includeHelp = true |
In this configuration, the collector is using endpoint endpoint-http://kube-scheduler-collectorforkubernetes-discovery:10251/metrics
,
that syntax defines endpoint auto-discovery, it is listing all endpoints with port 10251
defined under name
kube-scheduler-collectorforkubernetes-discovery
and using all endpoints to collect the metrics.
Endpoint kube-scheduler-collectorforkubernetes-discovery
is created with service, defined in our
configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-scheduler-collectorforkubernetes-discovery labels: k8s-app: kube-scheduler spec: selector: k8s-app: kube-scheduler type: ClusterIP clusterIP: None ports: - name: http-metrics port: 10251 targetPort: 10251 protocol: TCP |
Controller Manager
On master nodes collectors reads and forwards metrics from controller manager. We deploy this configuration using 003-daemonset-master.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | # This configuration works if controller-manager is bind to the localhost:10252 [input.prometheus::controller-manager] # disable prometheus controller-manager metrics disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host host = ${KUBERNETES_NODENAME} # override source source = controller-manager # how often to collect prometheus metrics interval = 60s # prometheus endpoint endpoint = http://127.0.0.1:10252/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = # server certificate for certificate validation certPath = # client certificate for authentication clientCertPath = # Allow invalid SSL server certificate insecure = false # include metrics help with the events includeHelp = false |
Collecting metrics from controller manager using endpoint discovery
The collector will be able to forward metrics from controller manager only if controller manager binds to the localhost on master nodes.
In case if controller manager only binds to the pod network, you need to use a different way of collecting metrics from the controller manager.
In 004-addon.conf
you can find commented out section [input.prometheus::controller-manager]
,
that allows to collect metrics from the controller manager using endpoint discovery.
You can comment out section [input.prometheus::controller-manager]
in 003-daemonset-master.conf
and uncomment in 004-addon.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # Example on how to get controller-manager metrics with endpoint discovery [input.prometheus::controller-manager] # disable prometheus controller-manager disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host (using discovery from endpoint) host = # override source source = controller-manager # how often to collect prometheus metrics interval = 60s # prometheus endpoint endpoint = endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = # server certificate for certificate validation certPath = # client certificate for authentication clientCertPath = # Allow invalid SSL server certificate insecure = false # include metrics help with the events includeHelp = true |
In this configuration, the collector is using endpoint endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics
,
that syntax defines endpoint auto-discovery, it is listing all endpoints with port 10252
defined under name
kube-controller-manager-collectorforkubernetes-discovery
and using all endpoints to collect the metrics.
Endpoint kube-controller-manager-collectorforkubernetes-discovery
is created with service, defined in our
configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-controller-manager-collectorforkubernetes-discovery labels: k8s-app: kube-controller-manager spec: selector: k8s-app: kube-controller-manager type: ClusterIP clusterIP: None ports: - name: http-metrics port: 10252 targetPort: 10252 protocol: TCP |
etcd
On master nodes, collectors read and forward metrics from etcd processes. We deploy this configuration using 003-daemonset-master.conf
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | [input.prometheus::etcd] # disable prometheus etcd metrics disabled = false # override type type = kubernetes_prometheus # specify Splunk index index = # override host host = ${KUBERNETES_NODENAME} # override source source = etcd # how often to collect prometheus metricd interval = 30s # prometheus endpoint endpoint.http = http://:2379/metrics endpoint.https = https://:2379/metrics # token for "Authorization: Bearer $(cat tokenPath)" tokenPath = # server certificate for certificate validation certPath = /rootfs/etc/kubernetes/pki/etcd/ca.pem # client certificate for authentication clientCertPath = /rootfs/etc/kubernetes/pki/etcd/client.pem clientKeyPath = /rootfs/etc/kubernetes/pki/etcd/client-key.pem # Allow invalid SSL server certificate insecure = true # include metrics help with the events includeHelp = false |
This configuration works when you run etcd cluster with master nodes.
With this configuration, collector tries to collect metrics using http
scheme at first,
and https
after that. For https
collector uses certPath
,
clientCertPath
and clientKeyPath
, which are mounted from the host.
1 2 3 4 5 6 7 8 9 10 11 | ... volumeMounts: ... - name: k8s-certs mountPath: /rootfs/etc/kubernetes/pki/ readOnly: true ... volumes: - name: k8s-certs hostPath: path: /etc/kubernetes/pki/ |
Verify that these certificates are available, if not, make appropriate changes. Check certificates used by the Kubernetes API Server, they are defined with 3 command line arguments
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
You can find these arguments by executingps aux | grep apiserver
on one of the master node, or try to find the API Server definition under/etc/kubernetes/manifests/kube-apiserver.yaml
.
If your etcd cluster is a dedicated set of nodes, you can define prometheus collection in 004-addon.conf
.
CoreDNS
If you are using CoreDNS in Kubernetes, you can collect metrics exported in Prometheus format, and we have provided for you a Dashboard and Alerts for monitoring CoreDNS.
To start collecting metrics from CoreDNS, you need to annotate CoreDNS deployment to let Collectord to know that you want to collect these metrics
kubectl annotate deployment/coredns --namespace kube-system 'collectord.io/prometheus.1-path=/metrics' 'collectord.io/prometheus.1-port=9153' 'collectord.io/prometheus.1-source=coredns' --overwrite
Metrics format
Prometheus defines several types of metrics.
Each metric value in Splunk has fields:
metric_type
- one of the types from the Prometheus metric types.metric_name
- the name of the metric.metric_help
- only ifincludeHelp
is set totrue
, you will see definition of this metric.metric_label_XXX
- if the metric has labels, you will be able to see them attached to the metric values.seed
- unique value from the host for specific metric collection.
Based on the metric type you can find various values for the metrics.
-
counter
v
- current counter valued
- the difference with a previous values
- period for which this difference is calculated (in seconds)p
- (deprecated) period for which this difference is calculated (in nanoseconds)
-
summary
andhistogram
v
- valuec
- counter specified for thissummary
orhistogram
metric
-
All others
v
- value
If you have specified to include help with the metrics, you can explore all available metrics with the search.
sourcetype="prometheus" | stats latest(_raw) by source, metric_type, metric_name, metric_help
Links
-
Installation
- Start monitoring your Kubernetes environments in under 10 minutes.
- Automatically forward host, container and application logs.
- Test our solution with the embedded 30 days evaluation license.
-
Collector Configuration
- Collector configuration reference.
-
Annotations
- Changing index, source, sourcetype for namespaces, workloads and pods.
- Forwarding application logs.
- Multi-line container logs.
- Fields extraction for application and container logs (including timestamp extractions).
- Hiding sensitive data, stripping terminal escape codes and colors.
- Forwarding Prometheus metrics from Pods.
-
Audit Logs
- Configure audit logs.
- Forwarding audit logs.
-
Prometheus metrics
- Collect metrics from control plane (etcd cluster, API server, kubelet, scheduler, controller).
- Configure collector to forward metrics from the services in Prometheus format.
-
Configuring Splunk Indexes
- Using not default HTTP Event Collector index.
- Configure the Splunk application to use not searchable by default indexes.
-
Splunk fields extraction for container logs
- Configure search-time fields extractions for container logs.
- Container logs source pattern.
-
Configurations for Splunk HTTP Event Collector
- Configure multiple HTTP Event Collector endpoints for Load Balancing and Fail-overs.
- Secure HTTP Event Collector endpoint.
- Configure the Proxy for HTTP Event Collector endpoint.
-
Monitoring multiple clusters
- Learn how you can monitor multiple clusters.
- Learn how to set up ACL in Splunk.
-
Streaming Kubernetes Objects from the API Server
- Learn how you can stream all changes from the Kubernetes API Server.
- Stream changes and objects from Kubernetes API Server, including Pods, Deployments or ConfigMaps.
-
License Server
- Learn how you can configure remote License URL for Collectord.
- Monitoring GPU
- Alerts
- Troubleshooting
- Release History
- Upgrade instructions
- Security
- FAQ and the common questions
- License agreement
- Pricing
- Contact