Collecting system performance metrics

Kaspersky Endpoint Security affects the operating system. To help you analyze the impact, the application can collect metrics associated with application performance.

To configure the application to get metrics for operating system resources that are used by the application, run the following command:

kesl-control [-J] --export-metrics [--period <interval in seconds between exports>|--interactive]

where:

As a result, a list of metrics is displayed.

List of metrics

Internal name and label of the metric

Metric type

Description

Recommended threshold

lfs_system_descriptor_opened_count

Sensor

Number of descriptors currently opened by the kesl process.

If the number of descriptors grows faster than 250 per hour, we recommend contacting Technical Support.

lfs_system_uptime_milliseconds

Counter

Number of milliseconds from the startup of the operating system to now.

 

lfs_system_memory_usage_bytes

'type=virtual' label

Sensor

Amount of virtual memory currently used by the kesl process.

We recommend taking steps to set a threshold based on the system information. You can get this information manually or using a script. For an example script, you can contact Technical Support.

lfs_system_memory_usage_bytes

'type=resident' label

Sensor

Amount of resident memory currently used by the kesl process.

We recommend taking steps to set a threshold based on the system information. You can get this information manually or using a script. For an example script, you can contact Technical Support.

lfs_system_memory_usage_bytes

'type=swap' label

Sensor

Amount of swap memory currently used by the kesl process.

We recommend taking steps to set a threshold based on the system information. You can get this information manually or using a script. For an example script, you can contact Technical Support.

lfs_system_cpu_usage_milliseconds

'type=user' label

Counter

Milliseconds of CPU time used by the kesl process in user space, from the start of the kesl process to now.

We recommend taking steps to set a threshold based on the system information. You can get this information manually or using a script. For an example script, you can contact Technical Support.

lfs_system_cpu_usage_milliseconds

'type=kernel' label

Counter

Milliseconds of CPU time used by the kesl process in kernel space, from the start of the kesl process to now.

We recommend taking steps to set a threshold based on the system information. You can get this information manually or using a script. For an example script, you can contact Technical Support.

lfs_system_cpu_usage_milliseconds

'type=total' label

Counter

Milliseconds of CPU time used by the kesl process in user space and kernel space, from the start of the kesl process to now.

If the CPU time used by the kesl process exceeds the rest of the workload, we recommend contacting Technical Support.

lfs_tcpSynInterceptor_connection_hanging_count

Counter

Total number of network connections hanging at the handshake stage in the TCP network between the server and the client from the start of the kesl process to now (network connections are considered hanging if they have been waiting for the remote server to allow the connection for a time between 1 second to 2 minutes, depending on the system setting in /proc/sys/net/ipv4/tcp_syn_retries).

An increase of this metric may be caused by the remote server being inaccessible or missing. If the number of hanging network connections grows by more than 10 per minute, we recommend contacting Technical Support.

lfs_tcpSynInterceptor_connection_count

Sensor

Number of network connections currently waiting to connect to the remote server.

 

lfs_tcpSynInterceptor_verdict_count

"verdict=allow" label

Counter

Total number of intercepted connections that have been established from the start of the kesl process to the present.

 

lfs_tcpSynInterceptor_verdict_count

'verdict=drop' label

Counter

Total number of intercepted connections that have not been established, counting from the start of the kesl process to now.

 

lfs_tcpSynInterceptor_verdict_count

'verdict=unknown' label

Counter

Total number of network connections from the start of the kesl process to the present moment with a verdict of the remote server that cannot be processed.

 

lfs_tcpSynInterceptor_verdict_latency_milliseconds

Histogram

Histogram of the time the remote server takes to make a verdict on a connection from the time when the SYN packet is intercepted to the time when verdict is issued for it.

 

lfs_tproxy_connection_count

'direction=inbound' label

Sensor

Current number of intercepted inbound network connections.

 

lfs_tproxy_connection_count

'direction=outbound' label

Sensor

Current number of intercepted outbound network connections.

 

lfs_tproxy_connection_orphaned_count

Counter

The total number of connections (from the start of Web Threat Protection, Network Threat Protection, and Web Control protection components) intercepted by Kaspersky Endpoint Security despite their interception not being intended (for example, due to an incompatible iptables configuration or the fact that the application did not have time to process a new interception configuration). Such connections are rejected.

If the total number of such connections exceeds 10, which may indicate incorrect operation of the application or the system, we recommend contacting Technical Support.

lfs_tproxy_threadPool_task_count

Counter

Total number of tasks (new connection, new batch of data, data sent, and so on) that have been added to the thread pool since the start of the Web Threat Protection, Network Threat Protection, and Web Control protection components.

If the total number of created but uncompleted tasks keeps growing, you need to add resources, configure exclusions, or contact Technical Support.

lfs_tproxy_threadPool_task_duration_milliseconds

Histogram

Histogram of the execution time of tasks that process intercepted connections.

If the execution time of 95% of tasks exceeds 1 second, you need to add resources, configure exclusions, or contact Technical Support.

lfs_tproxy_socket_count

Sensor

Current number of sockets (listening sockets and connection sockets). An intercepted connection has 2 connection sockets.

 

lfs_trafficScanning_object_hanging_count

Sensor

Current number of hanging object scans in intercepted traffic (scans with a duration over 1 minute are considered hanging).

The total number of hanging object scans exceeding 35 indicates a problem with the application databases. In such cases, we recommend contacting Technical Support.

lfs_trafficScanning_object_duration_milliseconds

Histogram

Histogram of durations of intercepted traffic object scans.

If the scan time for 95% of objects exceeds 1 second, you need to add resources, update application databases, or contact Technical Support.

If the growth rate of packet scans in the intercepted traffic is 0 and the value of the lfs_tproxy_threadPool_task_count metric is not 0, you need to update the application databases or contact Technical Support.

lfs_fileMonitor_cache_size

Sensor

Current size of the File Monitor (file revision cache), which monitors file modification.

If the current File Monitor cache size is greater than 50,000, we recommend contacting Technical Support.

lfs_fileMonitor_cache_hit_count

Counter

Hit count of the File Monitor cache (file revision cache), which monitors file modification.

 

lfs_fileMonitor_cache_miss_count

Counter

Miss count of the File Monitor cache (file revision cache), which monitors file modification.

 

lfs_faCache_file_count

Sensor

Current size of the cache of fanotify interceptor files that do not need to be scanned.

If the current file cache size is greater than 100,000, we recommend contacting Technical Support.

lfs_faCache_file_hit_count

Counter

Hit count of the cache of fanotify interceptor files that do not need to be scanned.

 

lfs_faCache_file_miss_count

Counter

Miss count of the cache of fanotify interceptor files that do not need to be scanned.

 

lfs_faCache_volume_count

Sensor

Current size of the fanotify interceptor mount point cache.

 

lfs_oas_cache_file_count

Sensor

Current size of the cache of File Threat Protection files that do not need to be scanned.

If the current file cache size is greater than 100,000, we recommend contacting Technical Support.

lfs_oas_cache_file_hit_count

Counter

Hit count of the cache of File Threat Protection files that do not need to be scanned.

 

lfs_oas_cache_file_miss_count

Counter

Miss count of the cache of File Threat Protection files that do not need to be scanned.

 

lfs_oas_cache_volume_count

Sensor

Current size of the File Threat Protection mount point cache.

 

lfs_processManager_cache_size

Sensor

Current number of active processes in the system that are cached.

 

lfs_processManager_cache_hit_count

Counter

Hit count for the system process cache.

 

lfs_processManager_cache_miss_count

Counter

Miss count for the system process cache.

 

lfs_blockingProcessInterceptor_cache_size

Sensor

Current size of the cache of interpreters that can be used to start processes.

This number depends on how intensely scripts are run on a specific device. If the current file cache size is greater than 10,000, we recommend contacting Technical Support.

lfs_blockingProcessInterceptor_cache_hit_count

Counter

Hit count of the interpreter cache.

 

lfs_blockingProcessInterceptor_cache_miss_count

Counter

Miss count of the interpreter cache.

 

lfs_prevention_cache_file_count

Sensor

Current size of the "Execution prevention for objects" task's file cache for objects that do not need scanning.

 

lfs_prevention_cache_file_hit_count

Counter

Number of hits in the "Execution prevention for objects" task's file cache for objects that do not need scanning.

 

lfs_prevention_cache_file_miss_count

Counter

Number of misses in the "Execution prevention for objects" task's file cache for objects that do not need scanning.

 

lfs_prevention_cache_volume_count

Sensor

Current size of the "Execution prevention for objects" task's mount point cache.

 

eka_telemetry_metrics_registry_callbacks_exec_duration_milliseconds

Sensor

How long the current metrics update took for all clients registered in the registry (in milliseconds).

 

eka_telemetry_metrics_registry_registered_callbacks

Sensor

Current number of metric update callbacks that the MetricsRegistry has performed for all registered clients.

 

eka_telemetry_metrics_registry_scrape_duration_milliseconds

Sensor

How long it took to collect all registry metrics (in milliseconds).

If 99% of the values of this metric exceed 1 second for a 10-minute period, we recommend contacting Technical Support.

We recommend analyzing metrics in data visualization systems, for example, Grafana.

You can use a script to get information about your operating system and device. For an example script and integration with the Grafana system, you can contact Technical Support.

You can publish exported metrics to monitoring systems such as Prometheus and Zabbix. To integrate with monitoring systems, you can use a script that gets information from the application and publishes it to the monitoring system. To get the script, you can contact Technical Support.

Page top