Monitoring OpenShift - Release History
- Monitoring OpenShift - Release History
- 5.23.431 - 2024-11-18
- 5.23.430 - 2024-10-28
- 5.22.422 - 2024-06-17
- 5.22.421 - 2024-05-13
- 5.22.420 - 2024-04-22
- 5.21.412 - 2024-01-08
- 5.21.411 - 2023-11-28
- 5.21.410 - 2023-10-16
- 5.20.403 - 2023-07-31
- 5.20.402 - 2023-06-06
- 5.20.401 - 2023-05-22
- 5.20.400 - 2023-04-17
- 5.19.390 - 2022-10-17
- 5.18.381 - 2022-05-17
- 5.18.380 - 2022-04-19
- 5.17.370 - 2021-10-20
- 5.16.363 - 2021-05-26
- 5.16.361 - 2021-03-16
- 5.16.353 - 2021-02-11
- 5.16.351 - 2021-01-04
- 5.16.350 - 2020-12-14
- 5.15.305 - 2021-01-04
- 5.15.302 - 2020-08-12
- 5.15.301 - 2020-06-24
- 5.15.300 - 2020-06-01
- 5.14.285 - 2020-08-12
- 5.14.284 - 2020-03-23
- 5.14.280 - 2020-01-27
- 5.12.273 - 2019-11-18
- 5.12.272 - 2019-11-08
- 5.12.271 - 2019-11-07
- 5.12.270 - 2019-10-22
- 5.11.266 - 2020-10-15
- 5.11.265 - 2020-06-24
- 5.11.264 - 2019-11-08
- 5.11.263 - 2019-10-02
- 5.11.261 - 2019-09-13
- 5.11.260 - 2019-09-09
- 5.10.255 - 2019-11-20
- 5.10.253 - 2019-07-31
- 5.10.252 - 2019-07-24
- 5.10.251 - 2019-06-20
- 5.10.250 - 2019-06-18
- 5.9.244 - 2019-06-05
- 5.9.240 - 2019-05-14
- 5.8.231 - 2019-04-25
- 5.8.230 - 2019-04-22
- 5.7.220 - 2019-03-18
- 5.6.213 - 2019-03-03
- 5.6.212 - 2019-02-19
- 5.5.205 - 2019-01-25
- 5.5.203 - 2019-01-25
- 5.5.202 - 2019-01-24
- 5.4.201 - 2018-12-19
- 5.4 - 2018-12-17
- 5.3 - 2018-11-19
- 5.2 - 2018-10-15
- 5.1 - 2018-09-17
- 5.0 - 2018-09-03
- 4.0.24 - 2018-05-05
- 3.0.23 - 2018-02-17
- 3.0.22 - 2018-02-07
- 2.1.18 - 2017-12-09
- Links
5.23.431 - 2024-11-18
Supports collectorforopenshift version 5.23.x and below
- Update application for Splunk Cloud compatibility
Collectord updates:
- Upgrade SQLite to 3.47.0.
- Upgrade golang to 1.23.3.
5.23.430 - 2024-10-28
sSupports collectorforopenshift version 5.23.x and below
- To better support installations with large number of nodes and containers, default behavior for most of the dashboards is to require pressing a Submit button after selecting filters.
- Overview Dashboard - new table with Not Ready Containers.
- Pod Dashboard - include container statuses table.
- Audit Dashboard - include user agent, and update compatibility with latest audit formats.
- Audit Dashboards - small performance improvement for the new installations.
- Host dashboard - show node conditions table.
- Host dashboard - show only external eht* interfaces in network stats.
Collectord updates:
- Implement new and improved watch mechanism for Kubernetes resources to handle large clusters.
- Change the default pipe join configuration to have max size of 1MB instead of 100KB.
- Allow to define outputs for prometheus metrics defined with annotations.
- When HTTP Server is enabled for the Collectord, it writes every call to the stdout, make it configurable.
- Bug fix: Collectord did not respect proxyBasicAuth for the splunk output.
- Bug fix: Collectord verify command can report incorrectly the status of containerd runtime.
- Upgrade SQLite to 3.46.1.
- Upgrade golang to 1.23.2.
5.22.422 - 2024-06-17
- Bug fix: Fix issue with calculating values on Cluster Resource Quota and Resource Quota dashboards.
Collectord updates:
- Upgrade SQLite to 3.46.0.
- Upgrade golang to 1.22.4.
5.22.421 - 2024-05-13
Collectord updates:
- Allow spawning journald log reader in a separate process, to prevent corrupted logs from crashing the main process.
- Upgrade golang to 1.22.3.
5.22.420 - 2024-04-22
Supports collectorforopenshift version 5.22.x and below
- Workload dashboard - add Pod OwnerKind and OwnerName, PriorityClass, and Pod Requests/Limits
- Address too many data points in host and workload dashboard in network graphs
- Additional CPU Metrics: CPU IOWait, Steal and Idle in Top Hosts dashboards.
- Showing CPU IOWait in Host dashboard.
- Alert Container CPU Throttled - exclude container with low CPU usage.
- New dashboard Review->Disk Stats for the host.
- Exclude virtual ethernet interfaces from host dashboard.
- Support memory limits and requests expressed in milli-bytes.
Collectord updates:
- Allow disabling IP address Lookup in net_socket_table input.
- Better handling of zombie processes in proc_stats input.
- Allow configuring user Splunk outputs using CDR SplunkOutput.
- Allow blacklisting labels from forwarded metadata.
- When onVolumeDatabase is used Collectord verifies that volume supports locking.
- Add additional metrics CPU IOWait, Steal and Idle.
- Monitoring disk stats for the host.
- Add input disk_stats.
- New diagnostic - CPU Vulnerabilities.
- Improve check for the Kubernetes API endpoint in verify command.
- Deprecate diagnostic for entropy.
- Upgrade default API Version to 1.24 for Docker endpoints.
- License Client - allow configuring the proxy.
- Bug fix: ignore containers with completed status.
- Bug fix: don't include containers with completed status (init) containers for the Pod requests and limits.
- Bug fix: if container does not generate a lot of logs, some messages can stack in queue while waiting for more messages.
- Bug fix: Collectord describe command can crash if user fields are defined with annotations on the pod.
- Upgrade golang to 1.22.2.
- Upgrade sqlite3 to 3.45.3.
5.21.412 - 2024-01-08
Collectord updates:
- Add libdl.so.2 library to the scratch image for compatibility with Aqua Security
- Upgrade SQLite to 3.44.2
- Upgrade Go language runtime to 1.21.5
- Upgrade RedHat image to RHEL9
5.21.411 - 2023-11-28
Collectord updates:
- Bug fix: Collectord might send events without timestamps
- Upgrade Go language runtime to 1.21.4
5.21.410 - 2023-10-16
Supports collectorforopenshift version 5.21.x and below (see https://www.outcoldsolutions.com for latest configuration)
- Compatibility updates for the version 5.21 of Collectord
- New Dashboard: Review -> CPU (Throttled, Limits, Requests)
- Alert update: High number of GRPC errors
- Alert update: Container CPU Throttled
- Network tables update: show UDP connections for Host, Workloads, Containers, and Pods
- Network Connection Dashboard: allows filtering by namespaces
- Show maximum and average number of Pods per cluster in Clusters (Allocations and usage) dashboard
- Update Cluster Resource Quota and Resource Quota dashboards to support comparing milli-cores and cores
Collectord updates:
- Support for global replace configurations for Collectord, allowing to sanitize data before forwarding to Splunk
- Support journald as logging driver for container logs
- When both volatile and persistent journald destination exist, Collectord will identify which has the most recent data
- Support for configuring modify values for specific namespaces when streaming objects
- Support for arrays in modify values for the streaming objects from Kubernetes/OpenShift API server
- Allow sending to Splunk more precise timestamps for the events
- Collectord can automatically refresh tokens when they are expired for API Server
- Compatibility updates for the latest versions of OpenShift
- Upgrade Go language runtime to 1.21.3
- Upgrade sqlite3 library to 3.43.1
- Upgrade libc and common base libraries to debian:bookworm
5.20.403 - 2023-07-31
Collectord updates:
- Improvements for working with NFS shares and closed file handlers.
- Improvements for streaming Pods from Kubernetes API server.
- Collectord reports when the Splunk HEC Collector does not reply with the correct response with 200 status code.
- Upgrade go runtime to version 1.20.6.
- Bug fix: Collectord might report invalid memory usage for the stopped containers.
- Bug fix: If collectord fails to initialize on volume database, that might crash whole Collectord instance.
5.20.402 - 2023-06-06
Collectord updates:
- Bug fix: onvolumedatabase annotation does not work when ignoreCSIMountFolderForDiscovery is enabled
- Bug fix: Splunk output might send event_id field when includeEventID is not enabled
- Allow configuring timeout-seconds for collecting diag
5.20.401 - 2023-05-22
Collectord updates:
- Upgrade go runtime to version 1.20.4
- Allow users to configure how many events Collectord can have in the output pipeline to lower memory footprint
- Include iNode and DevID in the info.txt in diag
- Bug fix: Collectord cannot collect performance metrics in diag
- Bug fix: Collectord can start forwarding logs from the older file position than in the acknowledgment database
5.20.400 - 2023-04-17
Supports collectorforopenshift version 5.20.x and below
- New dashboard: Review - Cluster Resource Quotas
- Show Pod conditions on the Pod dashboard
- Bug fix: Pods dashboard filters out pods not on the host network.
- Compatibility updates for the version 5.20 of Collectord
Collectord updates:
- Multi-architecture images for amd64 and arm64
- Allow sending logs to multiple Splunk HEC endpoints simultaneously
- New annotation
collectord.io/volume.{N}-logs-onvolumedatabase
to keep acknowledgement information about forwarded logs on the volume - Allow including placeholder templates in the annotation
collectord.io/volume.{N}-logs-glob
- Support for new outputs (ElasticSearch and OpenSearch)
- Collectord produces diag file without performance data, if flag
--include-performance-profiles
is not set - Use IMDSv2 for AWS metadata
- Performance improvements for an acknowledgement database
- Improvements for the acknowledgement database on how long Collectord keeps the data by refreshing the state, if file still exists on the disk
- Upgrade Go language runtime to 1.20.3
- Collectord verifies that only one Collectord instance can access the data folder, where Collectord stores its state
- Remove automatic watching for Docker runtime on Kubernetes/OpenShift hosts
- Add a verify step for Containerd runtime for the verify command
- Add ability to send events with
event_id
, unique identifier for the messages generated from logs - Bug fix: Collectord might assign processes running outside of the containers on the host to the Collectord container
- Bug fix: CPU-based license tries to connect to the license server, when running verify command
- Bug fix: Collectord might not set a source to the log files for non-default splunk output
5.19.390 - 2022-10-17
Supports collectorforopenshift version 5.19.x and below
- Create scheduler dashboard (and move those metrics from controller dashboard)
- Update dashboards for latest changes in the metric names for API Server, Controller and Scheduler
- Update Kubelet dashboard to support various container runtimes
- Audit (users and projects) dashboard: show access to non-projects resources
- Logs dashboard: show container and pod as separate filters
- New alert for Collectord alarms for node diagnostics (reboot required, and entropy)
- Bug fix: hosts dashboard does not filter events by host
Collectord updates:
- Splunk output supports maximumMessageLength to truncate messages exceeding this size
- Splunk output supports requireExplicitIndex to ignore all events that don't have explicit index defined
- Collectord monitors if node requires reboot
- Input Kubernetes watch allows now to hash or remove values from JSON before sending them to Splunk
- Collectord now reads its own clusterrole and implements a gate, that does not allow it to invoke requests to API server, that it does not have access to
- Instead of using automatic gate based on clusterrole, admin can define list of objects Collectord should use to load metadata for the Pods
- Improved support for CSI volumes, automatically discover additional sub directory "mount"
- Allow to force override annotations from cluster level configurations
- Upgrade go runtime to 1.19.2
- Beta: weighted splunk output algorithm when multiple threads used
- Bug fix: if docker runtime is not installed, Collectord can clog the output with warnings
- Bug fix: verify command can report an error with journald, when it properly works
- Bug fix: Collectord can clog the output if cgroupv2 is used, and blkio is not enabled
- Bug fix: Collectord can crash if default output.splunk is not configured, now it shows the error
- Bug fix: If output is not defined for Kubernetes Watch input, it should use default output
- Bug fix: if Kubernetes watch connection fails, Collectord can generate a lot of requests to API Server
5.18.381 - 2022-05-17
Collectord updates:
- Update go runtime to 1.17.11
- When Splunk HEC is slow, and cannot process the events, Collectord might hold on the files in PVC volume, preventing kubelet to stop the application pod. Collectord now has a configuration for how long it can keep the file descriptors for when pod is terminated.
- Bug fix: When Splunk HEC is unavailable, Collectord can start closing dedicated Splunk outputs for Indexes
- Bug fix: When Splunk HEC returns code 4xx, unrecognized by the format of Splunk HEC, Collectord might incorrectly skip the event
- Bug fix: Collectord builds incorrect path for the Kubernetes API service, when watchin for some objects, like gateway
- Bug fix: Verify command does not respect cgroup v2
5.18.380 - 2022-04-19
Supports collectorforopenshift version 5.18.x and below
- Cluster filter on Events dashboard
- Rewrite CPU throttled alert to make it less verbose
- Memory usage now reports memory without caches and memory that can be freed.
- Support cgroupv2
Collectord updates:
- Support cgroupv2
- New ability to specify the message field name for the logs extraction with annotations extractionMessageField
- Collectord improves grace period for expired licenses allowing to bootstrap new nodes for 14 days
- Support of journald database written with systemd library 247+
- Upgrade go runtime to 1.17.9
- Add arm64 scratch image
- Bug fix: cleanup the diag, exclude the real license key
- Bug fix: collectord reports high CPU usage for just started containers or hosts
- Bug fix: update pods/container labels when user updates them (prior restart was required)
- Bug fix: set now as a date for container logs with corrupted log files instead of 0 timestamp
- Bug fix: include the values of whitelists and blacklists in diag
- Bug fix: verify command might incorrectly show that it cannot find container logs with CRIO runtime
5.17.370 - 2021-10-20
Supports collectorforopenshift version 5.17.x and below
- Show milicores/cores CPU usage instead of percents
- New dashboard: Review - Resource Quotas
- Review - Projects: filter by project name
- Review - Clusters: filter by node label
- Review - Clusters: include max and avg usage
- Bug fix: storage dashboard might not render in some Splunk versions
- Bug fix: Projects dashboard shows only one namespace label
Collectord updates:
- Upgrade to Go 1.17.2
- Support query in Prometheus URLs for metrics
- Collectord now reports source and source type for the events with incorrect index
- Support for licensing server
- Support for CPU-based licenses
- Allow to specify multiple values for blacklist and whitelist for host logs
- Bug fix: Collectord clogs the output with WARN messages for stopped containers running with Containerd
- Bug fix: Containers with not set requests might show 1core request by default
- Bug fix: Collectord clogs the output with WARN messages about closed Splunk outputs
- Bug fix: parse commas in the timestamps for logs
5.16.363 - 2021-05-26
- Bug fix: Fix misprint "searcg" in Alert Monitoring OpenShift: Cluster Warning: high host CPU usage
- Bug fix: Put in parentheses source selection in macro_openshift_prometheus_metrics
Collectord updates:
- Upgrade go runtime to 1.16.3
- Bug fix: fix verbose logging for docker watcher with messages "failed to get next event"
- Bug fix: NetworkPolicy cannot be watched, as Collectord does not convert it in plural form properly
- Bug fix: Verify command fails on Containerd runtime
- Bug fix: DefaultIdleConnTimeout is ignored for HTTP clients
5.16.361 - 2021-03-16
Supports collectorforopenshift version 5.16.x and below
- Overview dashboard filters respect filters (show only namespaces from selected cluster)
- Bug fix: use correct units for Memory and Storage (MiB, MB, Mb)
- Bug fix: compatibility with new format of Events from API server (FirstSeen, LastSeen, Source could be shown as null)
- Bug fix: compatibility fixes for OpenShift 4.x
- Bug fix: Collectord metrics request time shows the summary on the period, not the individual request times
Collectord updates:
- Allow removing managed fields from events (enabled with new configurations by default)
- Upgrade to Go 1.16.2
- Blacklist verbose hyperkube host logs from journald (enabled with new configurations by default)
- Bug fix: precise time to Splunk HEC, sending with milliseconds instead of nanoseconds (which are incorrectly ronded by HEC)
- Bug fix: first sample of the container can record above 100% of the CPU usage, as the values are pretty low
- Bug fix: verify command does not respect glob patterns for Prometheus inputs (certs, tokens)
- Bug fix: trim spaces in token value for Prometheus inputs
5.16.353 - 2021-02-11
Collectord updates:
- Bug fix: collectord can report parse int errors on the stderr
- Upgrade go runtime to 1.15.8
5.16.351 - 2021-01-04
Collectord updates:
- Bug fix: host file inputs can raise a fatal error: concurrent map writes
5.16.350 - 2020-12-14
Supports collectorforopenshift version 5.16.x and below
- New dashboard: Collectord metrics
- Compatibility for Kubernetes 1.20
- Bug fix: broken link in Allocatable Resources dashboard
Collectord updates:
- Annotations for collecting prometheus metrics: authorization keys and CAName for SSL certificates
- Improvement for DNS resolutions of Splunk output FQDN
- Export internal collectord metrics in Prometheus format
- Forwarding internal collectord metrics to Splunk
- For the watch objects inputs being able to hide management fields
- In the diag include all open file descriptors
- Upgrade go runtime to 1.14.13
- Remove
\0
symbol from the labels values in the prometheus metrics - Allow to filter host logs with blacklist and whitelist
- Bug fix: less verbose warnings about not being able to load resources from API server
- Bug fix: performance improvements for Ack DB
- Bug fix: custom prometheus metrics forwarded by Collectord do not include cluster field or custom user fields
- Bug fix: addon pod terminates faster
- Bug fix: verify command trying to post to all outputs with all indexes specified in the configuration
- Bug fix: crash in AckDB
- Bug fix: input system stats does not recognize ouputs specified for the host and cgroup
- Bug fix: verify command runs recursively all the time for host logs even when recursively is set to false
5.15.305 - 2021-01-04
Collectord updates:
- Upgrade go runtime to 1.14.13
- Bug fix: host file inputs can raise a fatal error: concurrent map writes
5.15.302 - 2020-08-12
Collectord updates:
- Upgrade golang to 1.14.7 to fix the hang in runtime
5.15.301 - 2020-06-24
Collectord updates:
- Bug fix: verify command is broken, when system input is disabled
5.15.300 - 2020-06-01
Supports collectorforopenshift version 5.15.x and below
- Events dashboard: filters depend on selection of cluster and node labels
- Improvements for supporting Kubernetes 1.14 and higher (OpenShift 4.2+)
- Improvement for alert "Cluster Warning: high number of errors to Kubernetes API" (only alert on 5xx errors)
- Bug fix: node events aren't visible in Events tab
Collectord updates:
- Support for annotations to add custom user fields to data
- Support for blacklisting and whitelisting Prometheus metrics (significally reducing the indexing cost of data)
- Verify command improvements - verify proper configurations for cgroup (memory/memory.use_hierarchy is 1)
- Bug fix: fix bug in prometheus metrics parser, empty fields can be filled with previous fields
- Bug fix: occasionally addon can report warnings about trying to delete expired keys from ack db
- Bug fix: better handle of connections to metrics endpoints exported in Prometheus format
- Bug fix: http connections improvements for when Splunk is unresponsive
- Bug fix: broken diag
5.14.285 - 2020-08-12
Collectord updates:
- Upgrade golang to 1.14.7 to fix the hang in runtime
5.14.284 - 2020-03-23
Collectord updates:
- New annotation to configure whitelist pattern for log messages
- Allow to override Kubernetes service URL
- Bug fix: panic in output for addon
- Bug fix: performance and memory usage improvement for ack db
5.14.280 - 2020-01-27
Supports collectorforopenshift version 5.14.x and below
- Logs dashboard: filters depend on selection
- Overview dashboard: Project counter for list of projects
Collectord updates:
- Support templates in the index, source and sourcetype
- Allow to exclude indexed fields when forwarding to Splunk
- Support annotation for stats interval for containers
- Support containerd runtime
- Bug fix: verify command can show incorrect error about verifying journald input
- Bug fix: index on namespace should set index for application logs
- Bug fix: warning about not being able to retrieve node information
5.12.273 - 2019-11-18
Collectord updates:
- Bug fix: panic in application logs discovering for PVC volumes
5.12.272 - 2019-11-08
Collectord updates:
- Bug fix: in case when the rotated files are reusing FileID/DevID Collectord stops forwarding rotated files
5.12.271 - 2019-11-07
Supports collectorforopenshift version 5.12.x and below
- Improvements for the macros for backward compatibility
Collectord updates:
- Bug fix: when event pattern is used for joining multi-line events, the error can not be showed if raised by the input in pipeline.
- Bug fix: reduce warnings failed to get the new event in pipeline - submitted
- Stability improvements
5.12.270 - 2019-10-22
Supports collectorforopenshift version 5.12.27
- Compact metrics (pre-calculated on Collectord side)
- Switched stats for host and cgroup in different macros
- Use base macro for alerts
- Improved command extraction for exec in Audit Logs
- Add cluster name in the alert results
Collectord updates:
- Watch namespaces and workloads for changes
- Global configurations with Custom Resources and selectors
- Describe command to see applied annotations for pods
- Bug fix: panic when pipe join configuration is removed
- Bug fix: panic when proc stats is enabled and cgroup stats is disabled
- Bug fix: support ProxyBasicAuthorization for license server checks
- Bug fix: Fix for collecting first sample (can show high CPU usage for first sample)
- Bug fix: if list of URLs is used for Splunk output, the empty URL is still required
- Beta: dynamic index, source and sourcetype names based on the metafields
- Beta: cluster diagnostics with one rule: node entropy
5.11.266 - 2020-10-15
Collectord updates:
- Upgrade golang to 1.14.10 to fix the hang in runtime
5.11.265 - 2020-06-24
Collectord updates:
- Bug fix: memory improvement for large ackdb files
5.11.264 - 2019-11-08
Collectord updates:
- Bug fix: in case when the rotated files are reusing FileID/DevID Collectord stops forwarding rotated files
5.11.263 - 2019-10-02
Collectord update:
- Addressing vulnerabilities in the base Red Hat image
5.11.261 - 2019-09-13
Collectord update:
- Bug fix: improves discovery for the PVC volumes
- Bug fix: delay loading for the PVC volumes
- Bug fix: improves logging for the directory walker
5.11.260 - 2019-09-09
Supports collectorforopenshift version 5.11.x and below
- GPU Monitoring (NVIDIA)
Collectord updates:
- Support for PVC volumes for application logs
- Bug fix: small memory leak in addon
- Bug fix: duplicate events then pipeline is getting throttled
- Bug fix: don't use throttling for devnull output
- Bug fix: better recovery for ack db corruption
- Bug fix: crash on journald input initialization when ack db is corrupted
- Bug fix: annotations joinmultiline requires joinpartial
- Bug fix: configurations for stdout only with annotations can crash collectord
- Set events = 50 by default for Splunk output batches
5.10.255 - 2019-11-20
Collectord updates:
- Bug fix: better recovery for ack db corruption
- Bug fix: crash on journald input initialization when ack db is corrupted
5.10.253 - 2019-07-31
Collectord update:
- Bug fix: collectord can pick up compressed json logs (*.gz)
- Bug fix: too verbose warnings from the docker watcher about retries
5.10.252 - 2019-07-24
Collectord update:
- Support for configuring the thruput (general and with annotations for container logs)
- Support for configuring too old or too new events (general and with annotations for container logs)
5.10.251 - 2019-06-20
Collectord update:
- Ability to configure Acknowledgement database for collectord.
5.10.250 - 2019-06-18
Supports collectorforopenshift version 5.10.x and below
- Security dashboard: Access: access to host via ssh, sudo, exec commands, failed access
- Security dashboard: Audit (users and namespaces)
- Security dashboard: Network (traffic)
- Security dashboard: Network (connections)
- Security dashboard: Objects (pods) - review pods with host network, age of pods, image pull policy, attached host paths, security context and restart policies
- Review dashboard: Clusters (allocations and usage)
- Cluster field filters
- Base macro for overriding macros for other macros
Collectord updates:
- Support for volatile and persistent journald storage with default configuration
- Updated YAML configuration to include most common resources
- Better support for overriding sourcetype, that does not require to update the Splunk macros
- New image release base on RHEL8 (ubi8) for OpenShift 4.x
- Bug fix: rarely when collectord fails to post to HEC it can panic
- Bug fix: better support for OpenShift 4.x and CRI-O storage
- Bug fix: space characters in index annotations can break the pipeline
5.9.244 - 2019-06-05
Collectord update:
- Bug fix: support for CRI-O in Kubernetes 1.14
- Configuring path to certificates for the Prometheus client with glob patterns.
5.9.240 - 2019-05-14
Supports collectorforopenshift version 5.9.x and below
- Visual improvements on the graphs for the number of logs and events
- New alerts for the CPU and Memory reservation
Collectord updates:
- Support for multiple Splunk destinations (outputs)
- Support subdomains for annotations (to deploy multiple collectord instances)
- Support for streaming objects from Kubernetes API to Splunk
- Bug fix: journald input keeps fd open to the rotated files
- Bug fix: fix in the annotation parser for the interval annotations
- Bug fix: fix splunk url selection configuration for multiple splunk URLs
5.8.231 - 2019-04-25
- Bug fix: Collectord usage report shows trial licenses for all instances
5.8.230 - 2019-04-22
Supports collectorforopenshift version 5.8.x and below
- Use multiselect filters for most dashboards and filters with possibility to input custom filters.
- Reduce dedup usage to improve performance on dashboards.
- Add critical pod annotations for OpenShift ...3.10, and priority class for OpenShift 3.11...
- Fix: statefulset dashboard does not show data with filters.
- Add graph of number of pods per namespace on Overview dashboard.
Collectord updates:
- Bug fix: clogging collectord output with errors when incorrect index is used.
- Bug fix: short lived containers can results in duplicating logs.
- Bug fix: clogging collectord output with warnings when kernel reports incorrect VmRss size.
- Bug fix: annotations cannot override timestamp location for fields extraction.
- Bug fix: verify command reports Journald input in incorrect place.
- Better support for cgroup symlinks, automatically discover correct location.
5.7.220 - 2019-03-18
Supports collectorforopenshift version 5.7.x and below
- Review savedsearches/alerts to support indexing delay (start searches from 2 minutes behind) and run them in more random time.
- Workload dashboard - change CPU (of host) in table to real CPU
- Fixed single value memory panel on host dashboard (missed span)
- Use SEGMENTATION=none for stats events to use less disk space (needs to me moved to indexers)
Collectord updates:
- Support hostname formatting with environment variables in configuration
- New rotated file logic uses less file descriptors and frees rotated files quicker
- Allow to specify a default sampling value for container logs
- Reimplemented shutdown sequence to stop collectord faster
- Allow to override sampling percent with annotations
- New Input: journald
5.6.213 - 2019-03-03
- Collectord: Fix panic, when collectord does not have access to docker socket, and information about this container does not exist on the disk.
5.6.212 - 2019-02-19
Supports collectorforopenshift version 5.6.x and below
- New: Alert: high CPU usage on the host.
- Fixed: Splunk usage dashboard - charts do not show the data, when the used indexed aren't searchable by default.
- New: Support Dark theme.
- New: Free text search in Logs dashboard.
- New: Add auto-refresh options to the dashboard.
- Fixed: Revisited CPU limits and requests for Pods and Containers.
- New: add CPU Max, Memory Max and Project/Namespace labels to the Review-Namespaces dashboard.
- Fixed: Show deleted events
Collectord updates:
- Fixed: auto-recovery from the corrupted write-ahead-log in acknowledgment database.
- New: support sampling (random and hash-based) for container/application and host logs.
- New: when running multiple collectord on one host (with different output) - count that as one licensed host, change InstanceID format.
- Fixed: when container is scheduled with remove flag lock the file till collectord processes it completely.
- Fixed: collectord reports rare warning about unparsable uint64 max value from proc filesystem.
- Fixed: collectord reports rare warning about unparsable line from proc/io files.
- New: allow to include annotations in the forwarding data.
- Fixed: if collectord cannot access to the API - report the warning less often
- Fixed: do not report docker warnings for verify command, if there is no container scheduled outside of the Kubernetes.
- New: splunk output - allow to limit the output batch by the number of events in payload.
- Fixed: attach namespace labels to the forwarded logs.
- Fixed: attach openshift_namespace field to the events.
5.5.205 - 2019-01-25
- Collectord fix: collectord could stop sending container file logs when the original file has been truncated (using the same Node ID as previously used log file).
5.5.203 - 2019-01-25
- Collectord fix: collectord could send an empty
X-Splunk-Request-Channel
header to Splunk.
5.5.202 - 2019-01-24
Supports collectorforopenshift version 5.5.x and below
- New: Dashboard Review -> Projects. Review allocations and requests for Projects and pods.
- Fixed: openshift_stats_cpu_request_percent - is divided by the number of CPU.
Collectord updates:
- Fixed: Interval 0 in prometheus input can crash the collectord.
- Fixed: When both glob and match are set for the application logs, the glob pattern can block the match pattern from finding the files in the volume.
5.4.201 - 2018-12-19
Supports collectorforopenshift version 5.4.x and below
- Fixed: Alerts for licenses issued with AWS Subscriptions
Collectord updates:
- Fixed: Better handling rotated files (less open fd)
- Fixed: Events input can hang in the err loop.
5.4 - 2018-12-17
Supports collectorforopenshift version 5.x and below
- Improved: etcd metrics representation for bucket values.
- Fixed: API latency alert - exclude imagestreamimports.
- Compatibility update for collectord 5.4.
Collectord updates:
- New: Attach EC2 metadata fields
- New: Basic Auth for Proxy (License Server and Splunk)
- Fixed: Collectord verify reports CRI-O as unsupported runtime.
- Fixed: Rare crash on Prometheus metrics definition.
- Fixed: Better handling of acknowledgment database corruption.
- Fixed: When handling incorrect indexes, collectord can send index with empty string, that Splunk recognize as incorrect index
5.3 - 2018-11-19
Supports collectorforopenshift version 5.x and below
- Fixed: Improved Workload dashboard. Allows to filter by namespace, see all Pods in a specific namespace, filter by workload label.
- New: Alert for showing when Collectord reports errors in Processing pipelines (as an example if it failed to extract fields).
- New: Alert for showing when Collectord reports warnings.
- Fixed: Add node labels filter to Storage Dashboard and Control Plane Dashboards.
- New: Alert if lag in the indexing of the data.
- New: Splunk Usage (License usage, number of events) report under Setup.
- Fixed: misprint in Builds dashboard.
- Fixed: adjusted high amount of errors to Kubernetes API dashboard to make it less verbose.
- Fixed: lookup with alerts causing very often replication activities on SHC
- Fixed: changed search time for few alerts that cause false positives with indexing lag on large installations
Collectord updates:
- Fixed: high memory usage with Gzip compression enabled (reduced memory usage).
- New: Allow to disable pipe.join with annotations.
- Fixed: In high amount of logs (10,000 events per second) Collectord can read lines not in full, that can break JSON logs.
- Fixed: When collectord writes a Warning that it failed to post to Splunk, it will write a Success message after retry.
- New: Allow to hash sensitive data with annotations.
- Fixed: Group network socket tables to reduce the amount of forwarded data (4 times reducing the amount of data)
- Fixed: Identify when glob and match pattern require recursive directory traversal.
- Fixed: Make it possible to add annotations for the specific containers inside of the the same Pods.
- New: Annotation for complete disabling of the handling and forwarding logs for containers.
- Fixed: Performance improvements for CRI-O logs.
- Fixed: Collectord showed few Debug messages on start.
- Fixed: Performance improvements for log forwarding (up to 35% in high amount of logs).
- Fixed: reduce duplication of the Kubernetes events, forwarded to Splunk.
- Fixed: Do not generate a WARN when API Server results in 404. Usually this caused by the owner object being deleted.
- Fixed: Failed to parse proc name from the stat file with the not paired parentheses.
5.2 - 2018-10-15
Supports collectorforopenshift version 5.x and below
- New: Review/Storage dashboard based on storage metrics and PVC metrics.
- New: predefined alerts to help you monitor the health of the clusters and performance of the applications.
- Fixed: Performance improvements
Collector updates:
- New: runtime storage metrics (usage, available, inodes)
- New: image is built on top of
SCRATCH
image. - New:
verify
anddiag
commands for troubleshooting. - New: support
/dev/null
output for logs - New: override source/sourcetype and index base on regexp pattern for container logs.
- Fixed: do not send empty docker_labels
- New: support docker JSON tags and labels
- Fixed: allowing a new license to unblock collector with the expired license.
- Fixed: Prometheus parser fails to parse metrics with labels that end with a comma.
- Fixed: Performance improvements
- New: Prometheus parser supports basic authentication
- Fixed: Workaround for a bug in HTTP Event Collector, that can return an incorrect index of failed event
- New: Prometheus autodiscover support host network
- Fixed: remove node info and limit metadata from logs
- Fixed: documentation / default configuration update - mount
`/etc/localtime
to allow collector to use host tz (when not UTC) - Fixed: documentation / default configuration update - use
dnsPolicy: ClusterFirstWithHostNet
for pods mounted on host network
5.1 - 2018-09-17
Supports collectorforopenshift version 5.x and below
- New: Network metrics (MB, Packets, Drops and Errors) for host and containers.
- New: Network socket tables (list of port that containers and hosts are listen on, connections to external resources).
- New: Network review dashboard to see the list of connection to public services and in private network.
- Improvement: Replace python-based lookup with macro written with eval.
- Improvement: Visual improvement for showing when the object was Last Seen (highlighting and showing minutes ago).
- New: discovering Prometheus metrics in Pods with annotations.
- New: attaching pod metadata to metrics collected from prometheus metrics exposed from pods.
- Improvement: Changed source of proc stats to proc root filesystem, to keep minimum list of unique sources.
- New: Support for Splunk multi-threads outputs (for forwarding more than 3000 events per second).
- Improvement: Performance improvements for Prometheus parsing.
- Improvement: Reduce amount of metrics forwarded with proc_stats by excluding system threads.
- Improvement: Configuration for gzip compression.
- Improvement: Calculate checksums for first bytes of files, to better identify new files with reused iNode.
- Bug: Process metrics could be collected 2 times.
5.0 - 2018-09-03
Supports collectorforopenshift version 5.x and below
- New dashboard: Events
- Added events panel to the Workload and Pod dashboards.
- Labels on Workload and Hosts dashboards.
- Auto-discover and forward Application logs from host mounts or local volumes.
- Annotations for containers to change per container configurations (index, source, join rules, replaces and more).
- Escaping terminal sequences from container logs.
- Redirecting logs to /dev/null for specific patterns.
- Replace patterns in container and application logs (hiding sensitive or not important information).
- Support for extracting fields from the container logs, including timestamps.
- Include Memory and CPU limits for container lists.
- Visual updates for the panels, highlighting high CPU and Memory usages
- Filter cgroup stats, forward only container and host metrics.
- Support for multiple Splunk HTTP Event Collector endpoints (support fail-over and load-balancing).
- Handle HTTP Event Collector errors with the incorrect index. Multiple options to redirect to default index, drop or wait.
- Add retry logic to license client to reduce amount of false positive warnings.
- Add HTTP read timeouts (handle gateway timeouts, 504).
- Fixed: fail to parse the latest line in the JSON log.
- Better error handling incorrect configurations.
- Deprecating Join rules in favour of annotations.
- Support for HTTP Event Collector client certificates.
- Support CRI-O runtime.
- Fixed: limit directory walkers for depth (fixing issues when directory has a mount to itself)
- Fixed: add a limit of the maximum line size that collector can read at once (defaults to 1Mb).
- Fixed: acknowledgement database stores now NodeID, DevID and a parent folder identifier. That way if NodeID is going to be reused right away - we will identify this file as a new one, if it is in different location.
- Change:
docker_stream
field has been renamed tostream
for compatibility with other container runtime. - Change: prometheus metrics has default sourcetype=openshift_prometheus (macro supports backward compatibility)
4.0.24 - 2018-05-05
Supports collectorforopenshift version 4.x and below
- New dashboard: Cluster/Audit
- New dashboard: Cluster/API Server
- New dashboard: Cluster/Controller
- New dashboard: Cluster/Kubelet
- New dashboard: Cluster/etcd
- Include image name, when list containers.
- Added syslog component to the list of host logs.
- Fixed: Include Daemon Set on Overview dashboard, list of projects.
- Fixed: Broken navigation from the list of deployments.
Collector updates (4.0.171):
- Collecting metrics from Prometheus format.
- Add HTTP read timeouts (handle gateway timeouts, 504).
- Correctly parse HTTP Event Responses when one of few events fail to be indexed (as an example, wrong index).
- Performance optimizations.
- Optimize payloads for higher write throughput.
- Fixed: reduce the number of calls to Kubernetes API Server.
- Fixed: fail to parse the latest line in the JSON log.
- Better error handling incorrect configurations.
- Failed to parse memory limits (Failed to parse memory=000k for the container).
- Collecting Kubernetes events from the cluster once by using collector addon.
collectorforopenshift 4.0.172
- Fixed: Messages "WARN ... proc.go:441: Unparsable line from /rootfs/proc/X/status" caused by new Linux kernel that reports empty line in proc file system.
- Fixed: Incorrectly parsed Limits for the OpenShift pods.
5m
and500m
both results as0.500
.
collectorforopenshift 4.0.173
- Fixed: significant memory usage with the events larger than 512Kb, caused by Splunk issue SPL-156315 (incapable to parse events larger 512Kb, regression in 7.x).
collectorforopenshift 4.0.174.180730
- Show the index name in the output, when Splunk reports incorrect index.
3.0.23 - 2018-02-17
Supports collectorforopenshift version 3.x and below
- Bug: Memory view on workflow dashboard had a max limit set to 100.
- Bug: Events view on overview dashboard had a max limit set to 100.
3.0.22 - 2018-02-07
Supports collectorforopenshift version 3.x and below
- Added support for containers deployed without OpenShift (container based OpenShift installations).
- Added CPU Quota, CPU Shares, Throttled and Memory Limit and Request Overlays on Container and Pod Dashboards.
- Indexing OpenShift events in sourcetype openshift_events
- Performance improvement on Dashboards by combining multiple charts using one common search.
- New "Review/Allocatable Resources" dashboard to track limits and requests for CPU and Memory.
- New "Review/Privileged containers and enabled capabilities" dashboard to list all privileged containers and enabled security capabilities for containers.
- New Overview dashboard to easy navigate within the application.
- New Aggregated metrics dashboard for specific Workload.
- Fixed bug on Process Dashboard, some charts did not filter by host.
- "Setup: Collectors" now supports collectorforopenshift images distributed via private registries.
- "Overview: Process" dashboard did not use Span token for timechart dashboards.
- "Top: Containers" fixed incorrect memory usage (showed double size)
- Added alerts in application for notification about outdated collector versions and expired licenses for collector.
- Hide Wait Read/Write IO panels, when this data is not available.
- In process Dashboard show VmRSS with RssAnon, RssFile, and RssShmem.
Collector updates:
- Support for Splunk indexing acknowledgment.
- Watching for Kubernetes/OpenShift events.
- HTTP Proxy support for License server and Splunk output.
- Allow to configure destination indices for different types of data in collector configuration (stats, logs, host logs, proc stats and events).
- Handling responses from HTTP Event Collector to skip invalid events (will be logged).
- If container is running, but Kubernetes does not provide metadata, allow to wait for metadata.
- Collect security capabilities and uid/gid.
- For Kubernetes/OpenShift environments recognize containers scheduled outside of Pods and load metadata directly from docker.
- Support for custom labels, specified with collector configuration.
- Support OpenShift/Kubernetes annotations "collectord.io/..." to configure destination indices, sourcetypes and sources for pods, workloads and namespaces.
- Support for partial logs without join rules.
- Bug. Use local timezone by default for local syslog files.
- Bug. Fix small memory leak on deleted containers.
- Bug. When collector is failing to send data to Splunk, impossible to stop collector with terminate.
2.1.18 - 2017-12-09
Supports collectorforopenshift version 2.1.59.x and below
- Initial release for Monitoring OpenShift