[ZBX-15790] Incorrect value of a delta (speed per second) iteration, when previous polling iteration has been missed. Created: 2019 Mar 07  Updated: 2019 Mar 22  Resolved: 2019 Mar 22

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 4.0.4, 4.0.5
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Matthew Leach Assignee: Arturs Lontons
Resolution: Won't fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Redhat Enterprise Linux 7.6


Attachments: PNG File attachment1.png     PNG File attachment2.png     PNG File attachment3.PNG     PNG File defaultIF.png     PNG File dependent.png    

 Description   

I have searched the Zabbix bugs/issues and this issue has raised its head quite a few times in past versions and despite SNMP validation being put in place, the issue persists.

Its a similar case to that if a polling iteration has been missed, the next iteration is completely skewed, going from ~1Kbps to ~333.72 Tbps.

I am not seeing any unsupported notifications in the logs when this happens but i am still investigating why a poll is missed on occasion and whether this is network or Zabbix related.



 Comments   
Comment by Matthew Leach [ 2019 Mar 07 ]

Previous bug reports are.

https://support.zabbix.com/browse/ZBX-4884

https://support.zabbix.com/browse/ZBX-5310

https://support.zabbix.com/browse/ZBX-8766

 

Comment by Glebs Ivanovskis [ 2019 Mar 08 ]

Could you do the following?

  1. Clone your item.
  2. Remove Delta (speed per second) preprocessing step of the new item.
  3. Create a Dependent item with Delta (speed per second) preprocessing step.

These two new items will provide raw data from your device and the result of preprocessing the same data by Zabbix.

Then go to Latest data, select these two items and show us their Values.

Comment by Matthew Leach [ 2019 Mar 08 ]

Ill get this setup now and return the values once i have them.

Comment by Matthew Leach [ 2019 Mar 08 ]

Hi Glebs,

I have added two attachments.

Dependent.png is the dependent of defaultIF.png.

The value might be large on defaultIF.png as i left the custom x8 multiplier in place, however, for the purposes of this exercise, hopefully that shouldn't matter.

Comment by Glebs Ivanovskis [ 2019 Mar 08 ]

Thank you! Any ideas where this 0 in raw data comes from? Is it a device bug or a bug in Zabbix SNMP code?

Comment by Matthew Leach [ 2019 Mar 08 ]

Good question. I am not sure if it is a device bug. The devices in question are Cisco Nexus 3K/9K 10Gbe switches and it doesn’t happen across all interfaces nor all switches. It happens on busy ports and idle ports. Firmware wise, they are all on the same version and I have had a brief check with Cisco support if it’s a known issue and I can’t see anything.

I should also mention that it affects all items on the affected interface that are counters at that given moment, so whether it’s bits in or out, packets in or out, it makes no difference, they all skew together.

I will cron up an an snmpwalk directly against a few specific oids twice a minute, and then diff it. This will assist in narrowing down whether it’s a device or Zabbix issue.

In the mean time, from a Zabbix perspective, being that the zero return seems to be the culprit, how is this causing the following iteration to be skewed when calculating the delta?

Comment by Glebs Ivanovskis [ 2019 Mar 08 ]

Zabbix has some logic inside for protection against artifacts when calculating deltas in case counter overflows. When the raw value decreases (like from X to 0) between the checks, Zabbix draws a conclusion that counter mush have overflowed somewhere in between and does not store a calculated delta (which is negative in this case) in history. But it stores current raw value (0 in your case) in a special place for the calculation of the next delta. As far as I know, in 4.2 there will be a possibility to discard such outliers.

I think you are heading in the right direction. Shadowing Zabbix with snmpwalk will definitely help to narrow down the issue.

Comment by Matthew Leach [ 2019 Mar 09 ]

Hi Glebs,

I scripted up an snmpwalk every 10 seconds and logged the counter64 response. It does appear that the issue is Cisco related so we can happily rule out Zabbix here.

03-08-2019 23:26:02 - IF-MIB::ifHCInOctets.436207616 = Counter64: 2508730437273526
03-08-2019 23:26:12 - IF-MIB::ifHCInOctets.436207616 = Counter64: 0
03-08-2019 23:26:22 - IF-MIB::ifHCInOctets.436207616 = Counter64: 0
03-08-2019 23:26:32 - IF-MIB::ifHCInOctets.436207616 = Counter64: 0
03-08-2019 23:26:43 - IF-MIB::ifHCInOctets.436207616 = Counter64: 2508730437281127
03-08-2019 23:26:53 - IF-MIB::ifHCInOctets.436207616 = Counter64: 2508730437281127

Ill have a TAC case with Cisco raised and get them involved to remedy it. I will also keep an eye on Zabbix 4.2 as it would be nice to discard these zero returns.

Thanks for the assist in getting to the bottom of it.

Generated at Fri Apr 26 05:23:21 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.