Alerts
- Alerts
- Predefined alerts
- Monitoring Kubernetes: Collector License Expiration (less than 14 days)
- Monitoring Kubernetes: Collector Failed License Checks
- Monitoring Kubernetes: Collector outdated
- Monitoring Kubernetes: Collector license overuse
- Monitoring Kubernetes: Cluster Critical: Kubernetes API is down
- Monitoring Kubernetes: Cluster Critical: Controller Manager is down
- Monitoring Kubernetes: Cluster Critical: Kubelet is down
- Monitoring Kubernetes: Cluster Critical: etcd member is down
- Monitoring Kubernetes: Events: Constant Warning
- Monitoring Kubernetes: Cluster Info: mismatched versions
- Monitoring Kubernetes: Cluster Info: mismatched kubelet versions
- Monitoring Kubernetes: Cluster Warning: high number of errors to Kubernetes API
- Monitoring Kubernetes: Cluster Warning: pods capacity on node
- Monitoring Kubernetes: Cluster Warning: Kubernetes API Latency
- Monitoring Kubernetes: Cluster Critical: Kubernetes API High Number of 5xx
- Monitoring Kubernetes: Cluster Warning: Kubernetes API certificate expires
- Monitoring Kubernetes: Cluster Critical: etcd does not have a leader
- Monitoring Kubernetes: Cluster Warning: etcd frequent leader change
- Monitoring Kubernetes: Cluster Warning: high amount of GRPC errors
- Monitoring Kubernetes: Cluster Warning: etcd member communication is slow
- Monitoring Kubernetes: Cluster Warning: etcd hight number of failed proposals
- Monitoring Kubernetes: Cluster Warning: etcd member fsync is slow
- Monitoring Kubernetes: Cluster Warning: etcd member commit durations are slow
- Monitoring Kubernetes: Cluster Warning: etcd member fd usage is high
- Monitoring Kubernetes: Cluster Warning: unhealthy nodes
- Monitoring Kubernetes: Cluster Warning: kubelet runtime disk space is low
- Monitoring Kubernetes: Cluster Warning: Persistent Volume Claim space is low
- Monitoring Kubernetes: Cluster Warning: high host memory usage
- Monitoring Kubernetes: Cluster Warning: high host CPU usage
- Monitoring Kubernetes: Cluster Warning: high container memory usage
- Monitoring Kubernetes: Cluster Warning: container cpu is throttled
- Monitoring Kubernetes: Warning: collectord reports errors in one or more pipelines
- Monitoring Kubernetes: Warning: collectord has WARN or ERROR logs
- Monitoring Kubernetes: Warning: Increasing lag between event time and indexing time in container logs
- Monitoring Kubernetes: Warning: Node reservation of memory is above 90 percent
- Monitoring Kubernetes: Warning: Node reservation of cpu is above 90 percent
- Monitoring Kubernetes: Collectord diagnostics
- Alert triggers
- Links
- Predefined alerts
Predefined alerts
Available since version 5.2
Monitoring Kubernetes application has predefined alerts, that help to monitor the health of your clusters and performance of containers.
Monitoring Kubernetes: Collector License Expiration (less than 14 days)
One or more collectors use license with expiration in less than 14 days.
Monitoring Kubernetes: Collector Failed License Checks
One or more collectors constantly failing to check the license.
Monitoring Kubernetes: Collector outdated
One or more collectors is outdated.
Monitoring Kubernetes: Collector license overuse
You are exceeding number of running collectors allowed by license. Contact sales@outcoldsolutions.com.
Monitoring Kubernetes: Cluster Critical: Kubernetes API is down
Collector has not published metrics for one of the Kubernetes API Servers. Possible missing Kubernetes API Server.
Monitoring Kubernetes: Cluster Critical: Controller Manager is down
Collector has not published metrics for one of the Controller Managers. Possible missing Controller Manager on Master nodes.
Monitoring Kubernetes: Cluster Critical: Kubelet is down
Collector has not published metrics for one of the Kubelet. Possible missing Node in the cluster.
Monitoring Kubernetes: Cluster Critical: etcd member is down
Collector has not published metrics for one of the etcd members. Possible missing etcd member in the cluster.
Monitoring Kubernetes: Events: Constant Warning
Cluster reports the same warnings more than 3 times
Monitoring Kubernetes: Cluster Info: mismatched versions
Mismatched build versions for the server components.
Monitoring Kubernetes: Cluster Info: mismatched kubelet versions
Mismatched build versions for the kubelets.
Monitoring Kubernetes: Cluster Warning: high number of errors to Kubernetes API
Kubelet experience a high number of errors (more than 1%) to API Server
Monitoring Kubernetes: Cluster Warning: pods capacity on node
Node has too many pods. Above 90% of capacity
Monitoring Kubernetes: Cluster Warning: Kubernetes API Latency
The API Server has a 99th percentile latency above 1 second
Monitoring Kubernetes: Cluster Critical: Kubernetes API High Number of 5xx
The API Server returned more than 5% of errors (5xx)
Monitoring Kubernetes: Cluster Warning: Kubernetes API certificate expires
Kubernetes API certificate expires in less than 7 days.
Monitoring Kubernetes: Cluster Critical: etcd does not have a leader
etcd cluster does not have a leader.
Monitoring Kubernetes: Cluster Warning: etcd frequent leader change
etcd changed leader more than 3 times in last hour
Monitoring Kubernetes: Cluster Warning: high amount of GRPC errors
High amount of GRPC errors in etcd cluster
Monitoring Kubernetes: Cluster Warning: etcd member communication is slow
etcd instance member communication is slow
Monitoring Kubernetes: Cluster Warning: etcd hight number of failed proposals
etcd hight number of failed proposals
Monitoring Kubernetes: Cluster Warning: etcd member fsync is slow
etcd member fsync is slow
Monitoring Kubernetes: Cluster Warning: etcd member commit durations are slow
etcd member commit durations are slow
Monitoring Kubernetes: Cluster Warning: etcd member fd usage is high
etcd member uses more than 80% of max fds
Monitoring Kubernetes: Cluster Warning: unhealthy nodes
Controller reports about one or more unhealthy nodes
Monitoring Kubernetes: Cluster Warning: kubelet runtime disk space is low
Node has less than 20% of available space for kubelet runtime
Monitoring Kubernetes: Cluster Warning: Persistent Volume Claim space is low
Persistent Volume Claim has less than 20% of available space
Monitoring Kubernetes: Cluster Warning: high host memory usage
High host memory usage. Above 85%
Monitoring Kubernetes: Cluster Warning: high host CPU usage
Kubernetes host uses more than 90% of CPU on average for the last 5 minutes
Monitoring Kubernetes: Cluster Warning: high container memory usage
Container uses more than 85% of memory limit
Monitoring Kubernetes: Cluster Warning: container cpu is throttled
Container is getting throttled for more than 20% of cpu
Monitoring Kubernetes: Warning: collectord reports errors in one or more pipelines
Collectord reports errors in one or more pipelines
Monitoring Kubernetes: Warning: collectord has WARN or ERROR logs
Collectord reports warnings or errors
Monitoring Kubernetes: Warning: Increasing lag between event time and indexing time in container logs
Increasing lag between event time and indexing time in container logs
Monitoring Kubernetes: Warning: Node reservation of memory is above 90 percent
Node reservation of memory is above 90 percent
Monitoring Kubernetes: Warning: Node reservation of cpu is above 90 percent
Node reservation of cpu is above 90 percent
Monitoring Kubernetes: Collectord diagnostics
Monitors Collectord logs and triggers when one or more ALARMs are ON, that getting triggered by diagnostics::
enabled in configuration.
Alert triggers
By default we show triggered alerts at the Overview page at the very top. We populate this table using the rest call /alerts/fired_alerts/
.
Other triggers
You can find various alerts actions on Splunk Base to integrate Splunk with the messaging applications and services for managing incidents.
After installing new alert action, you can modify existing alerts to add more triggers.
Links
-
Installation
- Start monitoring your Kubernetes environments in under 10 minutes.
- Automatically forward host, container and application logs.
- Test our solution with the embedded 30 days evaluation license.
-
Collector Configuration
- Collector configuration reference.
-
Annotations
- Changing index, source, sourcetype for namespaces, workloads and pods.
- Forwarding application logs.
- Multi-line container logs.
- Fields extraction for application and container logs (including timestamp extractions).
- Hiding sensitive data, stripping terminal escape codes and colors.
- Forwarding Prometheus metrics from Pods.
-
Audit Logs
- Configure audit logs.
- Forwarding audit logs.
-
Prometheus metrics
- Collect metrics from control plane (etcd cluster, API server, kubelet, scheduler, controller).
- Configure collector to forward metrics from the services in Prometheus format.
-
Configuring Splunk Indexes
- Using not default HTTP Event Collector index.
- Configure the Splunk application to use not searchable by default indexes.
-
Splunk fields extraction for container logs
- Configure search-time fields extractions for container logs.
- Container logs source pattern.
-
Configurations for Splunk HTTP Event Collector
- Configure multiple HTTP Event Collector endpoints for Load Balancing and Fail-overs.
- Secure HTTP Event Collector endpoint.
- Configure the Proxy for HTTP Event Collector endpoint.
-
Monitoring multiple clusters
- Learn how you can monitor multiple clusters.
- Learn how to set up ACL in Splunk.
-
Streaming Kubernetes Objects from the API Server
- Learn how you can stream all changes from the Kubernetes API Server.
- Stream changes and objects from Kubernetes API Server, including Pods, Deployments or ConfigMaps.
-
License Server
- Learn how you can configure remote License URL for Collectord.
- Monitoring GPU
- Alerts
- Troubleshooting
- Release History
- Upgrade instructions
- Security
- FAQ and the common questions
- License agreement
- Pricing
- Contact