-
Incident report
-
Resolution: Unresolved
-
Trivial
-
None
-
6.0.25
-
None
We are using the official template to monitor Nginx. https://www.zabbix.com/integrations/nginx#nginx_plus_http
It collects data using the Nginx API (https://nginx.org/en/docs/http/ngx_http_api_module.html) every minute.{}
In this case, the values for received are processed to calculate the rate of change, per second. That is how the value is converted from a total amount, to a speed value (based on the difference over a minute Zabbix calculates the rate of change).
Here is the received value we get from the JSON that Nginx returns:
nginx.stream.server_zones.received.rate{#NAME}
Preprocessing
- JSON Path: $.received
- Change per second
We kept seeing some values, calculated by Zabbix on the data returned by the Nginx API... which are impossible.
For example, these are some values that we collected back in December when we opened a case with the Nginx support:
023-12-03 16:03:04 | 0 |
2023-12-03 16:02:04 | 0 |
2023-12-03 16:01:04 | 0 |
2023-12-03 16:00:04 | 973733642604 (almost 1 terabit/s) |
2023-12-03 15:59:04 | 373300445226 |
2023-12-03 15:58:04 | 8216678288 |
2023-12-03 15:57:04 | 0 |
2023-12-03 15:56:04 | 0 |
2023-12-03 15:55:04 | 6217956416 |
2023-12-03 15:54:04 | 0 |
2023-12-03 15:53:04 | 0 |
2023-12-03 15:52:04 | 0 |
2023-12-03 15:51:04 | 0 |
2023-12-03 15:50:04 | 0 |
2023-12-03 15:49:04 | 5481749343 |
2023-12-03 15:48:04 | 263 |
Basically the values reported are 0 for many minutes... but then there are huge spikes. So we suspected that Nginx API is not updating this "bytes_received" or "bytes_sent" value in real-time, but is instead only updating it when connections are closed or something.
Nginx support confirmed back to us our suspicion... here was their reply:
Based on description we have now, our guess is that the Zabbix metric refers to the api "received" metric in the stream server zone, and the answer is that the metric is only updated when closing a connection, it is by design.
The "change per second" note from Zabbix documentation and the fact that "received" metric in the stream server zone is only updated when closing a connection, when putting together, follow us to think that the spikes are possible, especially, when closing a long lived connection that existed for a long time (many seconds) but reported it's received metric only upon closure => in that second you may observe a spike though there were no corresponding amount of traffic in that second.For example: Here is the example of access log record:
——167.238.31.19 [27/Nov/2023:00:15:31 -0500] TCP 200 17091 1784907579 2502.670"10.66.116.120:80" "1784907579" "17091" "0.001"
——
Where as per access log format for the record $bytes_received = 1784907579 and $session_time = 2502.670So for the whole 2502 seconds period there will be only 1 metric point at the end with the value 1784907579.
Nginx confirmed here that "the metric is only updated when closing a connection".
Calculating the average to receive a rate of transfer is never going to be really accurate, unfortunately.
The only simple option I can see, since we are relying on the Nginx API, is to reduce the update interval (default seems to be 1 minute) to a much slower period such as one hour in order to get more accurate metrics. Unless we can somehow obtain the "session_time" and use that to divide the value more accurately.