Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-23969

Nginx template received/sent "change per second" spikes to impossibly high values

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Unresolved
    • Icon: Trivial Trivial
    • None
    • 6.0.25
    • Templates (T)
    • None

      We are using the official template to monitor Nginx. https://www.zabbix.com/integrations/nginx#nginx_plus_http

      It collects data using the Nginx API (https://nginx.org/en/docs/http/ngx_http_api_module.html) every minute.{}

      In this case, the values for received are processed to calculate the rate of change, per second.  That is how the value is converted from a total amount, to a speed value (based on the difference over a minute Zabbix calculates the rate of change).

      Here is the received value we get from the JSON that Nginx returns:

      nginx.stream.server_zones.received.rate{#NAME}

      Preprocessing

      • JSON Path: $.received
      • Change per second

      We kept seeing some values, calculated by Zabbix on the data returned by the Nginx API... which are impossible.

      For example, these are some values that we collected back in December when we opened a case with the Nginx support:

      023-12-03 16:03:04 0
      2023-12-03 16:02:04 0
      2023-12-03 16:01:04 0
      2023-12-03 16:00:04 973733642604 (almost 1 terabit/s)
      2023-12-03 15:59:04 373300445226
      2023-12-03 15:58:04 8216678288
      2023-12-03 15:57:04 0
      2023-12-03 15:56:04 0
      2023-12-03 15:55:04 6217956416
      2023-12-03 15:54:04 0
      2023-12-03 15:53:04 0
      2023-12-03 15:52:04 0
      2023-12-03 15:51:04 0
      2023-12-03 15:50:04 0
      2023-12-03 15:49:04 5481749343
      2023-12-03 15:48:04 263

      Basically the values reported are 0 for many minutes... but then there are huge spikes.  So we suspected that Nginx API is not updating this "bytes_received" or "bytes_sent" value in real-time, but is instead only updating it when connections are closed or something.

      Nginx support confirmed back to us our suspicion... here was their reply:

      Based on description we have now, our guess is that the Zabbix metric refers to the api "received" metric in the stream server zone, and the answer is that the metric is only updated when closing a connection, it is by design.
      The "change per second" note from Zabbix documentation and the fact that "received" metric in the stream server zone is only updated when closing a connection, when putting together, follow us to think that the spikes are possible, especially, when closing a long lived connection that existed for a long time (many seconds) but reported it's received metric only upon closure => in that second you may observe a spike though there were no corresponding amount of traffic in that second.

      For example: Here is the example of access log record:
      ——

      167.238.31.19 [27/Nov/2023:00:15:31 -0500] TCP 200 17091 1784907579 2502.670"10.66.116.120:80" "1784907579" "17091" "0.001"
      ——
      Where as per access log format for the record $bytes_received = 1784907579 and $session_time = 2502.670

      So for the whole 2502 seconds period there will be only 1 metric point at the end with the value 1784907579.

      Nginx confirmed here that "the metric is only updated when closing a connection". 

      Calculating the average to receive a rate of transfer is never going to be really accurate, unfortunately.

      The only simple option I can see, since we are relying on the Nginx API, is to reduce the update interval (default seems to be 1 minute) to a much slower period such as one hour in order to get more accurate metrics.  Unless we can somehow obtain the "session_time" and use that to divide the value more accurately.

            fvilarnovo Facundo Vilarnovo
            mcbrineellis Connor McBrine-Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: