Alerts
- Alerts
- Predefined alerts
- Monitoring OpenShift: Collector License Expiration (less than 14 days)
- Monitoring OpenShift: Collector Failed License Checks
- Monitoring OpenShift: Collector outdated
- Monitoring OpenShift: Collector license overuse
- Monitoring OpenShift: Cluster Critical: Kubernetes API is down
- Monitoring OpenShift: Cluster Critical: Controller Manager is down
- Monitoring OpenShift: Cluster Critical: Kubelet is down
- Monitoring OpenShift: Cluster Critical: etcd member is down
- Monitoring OpenShift: Events: Constant Warning
- Monitoring OpenShift: Cluster Info: mismatched versions
- Monitoring OpenShift: Cluster Info: mismatched kubelet versions
- Monitoring OpenShift: Cluster Warning: high number of errors to Kubernetes API
- Monitoring OpenShift: Cluster Warning: pods capacity on node
- Monitoring OpenShift: Cluster Warning: Kubernetes API Latency
- Monitoring OpenShift: Cluster Critical: Kubernetes API High Number of 5xx
- Monitoring OpenShift: Cluster Warning: Kubernetes API certificate expires
- Monitoring OpenShift: Cluster Critical: etcd does not have a leader
- Monitoring OpenShift: Cluster Warning: etcd frequent leader change
- Monitoring OpenShift: Cluster Warning: high amount of GRPC errors
- Monitoring OpenShift: Cluster Warning: etcd member communication is slow
- Monitoring OpenShift: Cluster Warning: etcd hight number of failed proposals
- Monitoring OpenShift: Cluster Warning: etcd member fsync is slow
- Monitoring OpenShift: Cluster Warning: etcd member commit durations are slow
- Monitoring OpenShift: Cluster Warning: etcd member fd usage is high
- Monitoring OpenShift: Cluster Warning: unhealthy nodes
- Monitoring OpenShift: Cluster Warning: kubelet runtime disk space is low
- Monitoring OpenShift: Cluster Warning: Persistent Volume Claim space is low
- Monitoring OpenShift: Cluster Warning: high host memory usage
- Monitoring OpenShift: Cluster Warning: high host CPU usage
- Monitoring OpenShift: Cluster Warning: high container memory usage
- Monitoring OpenShift: Cluster Warning: container cpu is throttled
- Monitoring OpenShift: Warning: collectord reports errors in one or more pipelines
- Monitoring OpenShift: Warning: collectord has WARN or ERROR logs
- Monitoring OpenShift: Warning: Increasing lag between event time and indexing time in container logs
- Monitoring OpenShift: Warning: Node reservation of memory is above 90 percent
- Monitoring OpenShift: Warning: Node reservation of cpu is above 90 percent
- Monitoring OpenShift: Collectord diagnostics
- Alert triggers
- Links
- Predefined alerts
Predefined alerts
Available since version 5.2
Monitoring OpenShift application has predefined alerts, that help to monitor the health of your clusters and performance of containers.
Monitoring OpenShift: Collector License Expiration (less than 14 days)
One or more collectors use license with expiration in less than 14 days.
Monitoring OpenShift: Collector Failed License Checks
One or more collectors constantly failing to check the license.
Monitoring OpenShift: Collector outdated
One or more collectors is outdated.
Monitoring OpenShift: Collector license overuse
You are exceeding number of running collectors allowed by license. Contact sales@outcoldsolutions.com.
Monitoring OpenShift: Cluster Critical: Kubernetes API is down
Collector has not published metrics for one of the Kubernetes API Servers. Possible missing Kubernetes API Server.
Monitoring OpenShift: Cluster Critical: Controller Manager is down
Collector has not published metrics for one of the Controller Managers. Possible missing Controller Manager on Master nodes.
Monitoring OpenShift: Cluster Critical: Kubelet is down
Collector has not published metrics for one of the Kubelet. Possible missing Node in the cluster.
Monitoring OpenShift: Cluster Critical: etcd member is down
Collector has not published metrics for one of the etcd members. Possible missing etcd member in the cluster.
Monitoring OpenShift: Events: Constant Warning
Cluster reports the same warnings more than 3 times
Monitoring OpenShift: Cluster Info: mismatched versions
Mismatched build versions for the server components.
Monitoring OpenShift: Cluster Info: mismatched kubelet versions
Mismatched build versions for the kubelets.
Monitoring OpenShift: Cluster Warning: high number of errors to Kubernetes API
Kubelet experience a high number of errors (more than 1%) to API Server
Monitoring OpenShift: Cluster Warning: pods capacity on node
Node has too many pods. Above 90% of capacity
Monitoring OpenShift: Cluster Warning: Kubernetes API Latency
The API Server has a 99th percentile latency above 1 second
Monitoring OpenShift: Cluster Critical: Kubernetes API High Number of 5xx
The API Server returned more than 5% of errors (5xx)
Monitoring OpenShift: Cluster Warning: Kubernetes API certificate expires
Kubernetes API certificate expires in less than 7 days.
Monitoring OpenShift: Cluster Critical: etcd does not have a leader
etcd cluster does not have a leader.
Monitoring OpenShift: Cluster Warning: etcd frequent leader change
etcd changed leader more than 3 times in last hour
Monitoring OpenShift: Cluster Warning: high amount of GRPC errors
High amount of GRPC errors in etcd cluster
Monitoring OpenShift: Cluster Warning: etcd member communication is slow
etcd instance member communication is slow
Monitoring OpenShift: Cluster Warning: etcd hight number of failed proposals
etcd hight number of failed proposals
Monitoring OpenShift: Cluster Warning: etcd member fsync is slow
etcd member fsync is slow
Monitoring OpenShift: Cluster Warning: etcd member commit durations are slow
etcd member commit durations are slow
Monitoring OpenShift: Cluster Warning: etcd member fd usage is high
etcd member uses more than 80% of max fds
Monitoring OpenShift: Cluster Warning: unhealthy nodes
Controller reports about one or more unhealthy nodes
Monitoring OpenShift: Cluster Warning: kubelet runtime disk space is low
Node has less than 20% of available space for kubelet runtime
Monitoring OpenShift: Cluster Warning: Persistent Volume Claim space is low
Persistent Volume Claim has less than 20% of available space
Monitoring OpenShift: Cluster Warning: high host memory usage
High host memory usage. Above 85%
Monitoring OpenShift: Cluster Warning: high host CPU usage
OpenShift host uses more than 90% of CPU on average for the last 5 minutes
Monitoring OpenShift: Cluster Warning: high container memory usage
Container uses more than 85% of memory limit
Monitoring OpenShift: Cluster Warning: container cpu is throttled
Container is getting throttled for more than 20% of cpu
Monitoring OpenShift: Warning: collectord reports errors in one or more pipelines
Collectord reports errors in one or more pipelines
Monitoring OpenShift: Warning: collectord has WARN or ERROR logs
Collectord reports warnings or errors
Monitoring OpenShift: Warning: Increasing lag between event time and indexing time in container logs
Increasing lag between event time and indexing time in container logs
Monitoring OpenShift: Warning: Node reservation of memory is above 90 percent
Node reservation of memory is above 90 percent
Monitoring OpenShift: Warning: Node reservation of cpu is above 90 percent
Node reservation of cpu is above 90 percent
Monitoring OpenShift: Collectord diagnostics
Monitors Collectord logs and triggers when one or more ALARMs are ON, that getting triggered by diagnostics::
enabled in configuration.
Alert triggers
By default we show triggered alerts at the Overview page at the very top. We populate this table using the rest call /alerts/fired_alerts/
.
Other triggers
You can find various alerts actions on Splunk Base to integrate Splunk with the messaging applications and services for managing incidents.
After installing new alert action, you can modify existing alerts to add more triggers.
Links
-
Installation
- Start monitoring your OpenShift environments in under 10 minutes.
- Automatically forward host, container and application logs.
- Test our solution with the embedded 30 days evaluation license.
-
Collector Configuration
- Collector configuration reference.
-
Annotations
- Changing index, source, sourcetype for namespaces, workloads and pods.
- Forwarding application logs.
- Multi-line container logs.
- Fields extraction for application and container logs (including timestamp extractions).
- Hiding sensitive data, stripping terminal escape codes and colors.
- Forwarding Prometheus metrics from Pods.
-
Audit Logs
- Configure audit logs.
- Forwarding audit logs.
-
Prometheus metrics
- Collect metrics from control plane (etcd cluster, API server, kubelet, scheduler, controller).
- Configure collector to forward metrics from the services in Prometheus format.
-
Configuring Splunk Indexes
- Using not default HTTP Event Collector index.
- Configure the Splunk application to use not searchable by default indexes.
-
Splunk fields extraction for container logs
- Configure search-time fields extractions for container logs.
- Container logs source pattern.
-
Configurations for Splunk HTTP Event Collector
- Configure multiple HTTP Event Collector endpoints for Load Balancing and Fail-overs.
- Secure HTTP Event Collector endpoint.
- Configure the Proxy for HTTP Event Collector endpoint.
-
Monitoring multiple clusters
- Learn how you can monitor multiple clusters.
- Learn how to set up ACL in Splunk.
-
Streaming OpenShift Objects from the API Server
- Learn how you can stream all changes from the OpenShift API Server.
- Stream changes and objects from OpenShift API Server, including Pods, Deployments or ConfigMaps.
-
License Server
- Learn how you can configure remote License URL for Collectord.
- Monitoring GPU
- Alerts
- Troubleshooting
- Release History
- Upgrade instructions
- Security
- FAQ and the common questions
- License agreement
- Pricing
- Contact