[ZBX-4164] SNMPv3 stops working sometimes Created: 2011 Sep 22  Updated: 2017 May 30  Resolved: 2013 Sep 23

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.6
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Michael Schwartzkopff Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: snmpv3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

SLES11.1



 Description   

From time to time SNMPv3 requests (items) just stop delivering data. snmpget from the command line for the same OID works.

I looked a little bit deeper into the packets exchanged between Zabbix server and the network node. It seems that the Zabbix server gets confused the the time since last boot of the device and thus only gets SNMPv3 errors "usmStatsNotInTimeWindows.0"

Packets on the line:
1) x.x.x.x:60674 > y.y.y.y:161: { SNMPv3

{ F=r }

{ USM B=0 T=0 U= } { ScopedPDU E= C= { GetRequest(14) R=1978855259 } } }
No boot and time set (B=0, T=0). This paket is exchanged to get the EngineID from the node.

2) y.y.y.y:161 > x.x.x.x:60674: { SNMPv3 { F= } { USM B=0 T=0 U= }

{ ScopedPDU E= 0x800x000x000x090x030x000x050x730xB70x9D0x40 C=

{ Report(33) R=1978855259 .1.3.6.1.6.3.15.1.1.4.0=914879 }

} }
The node reports its EnginneID. Please note the the note does NOT report boots and time since last boot (B=0, T=0).

3) x.x.x.x:60674 > y.y.y.y:161: { SNMPv3

{ F=ar } { USM B=19 T=3528723 U=xxxx } { ScopedPDU E= 0x800x000x000x090x030x000x050x730xB70x9D0x40 C= { GetRequest(34) R=1978855258 .1.3.6.1.2.1.2.2.1.2.436207616 } } }
The Zabbix server asks the node for the OID in question. Please note that Zabbix suddenly "knows" the nuber of boots and time since last boot of the node (B=19, T=3528723).

4) y.y.y.y:161 > x.x.x.x:60674: { SNMPv3 { F=a } { USM B=19 T=153362 U=xxxx } { ScopedPDU E= 0x800x000x000x090x030x000x050x730xB70x9D0x40 C= { Report(33) R=1978855258 .1.3.6.1.6.3.15.1.1.2.0=144804 } } }
The node reports the usmStatsNotInTimeWindows.0 error (OID 1.3.6.1.6.3.15.1.1.2.0) and reports it REAL time since last boot: T=153362

5) x.x.x.x:60674 > y.y.y.y:161: { SNMPv3 { F=ar }

{ USM B=19 T=3528724 U=xxxx }

{ ScopedPDU E= 0x800x000x000x090x030x000x050x730xB70x9D0x40 C=

{ GetRequest(34) R=1978855258 .1.3.6.1.2.1.2.2.1.2.436207616 }

} }
The Zabbix server is NOT impressed by the error and the correct time, nut sends out its SNMPv3 packet with the wrong time again. Please note the T=3528724, one second more that before.

6) y.y.y.y:161 > x.x.x.x:60674: { SNMPv3

{ F=a }

{ USM B=19 T=153363 U=oper }

{ ScopedPDU E= 0x800x000x000x090x030x000x050x730xB70x9D0x40 C=

{ Report(33) R=1978855258 .1.3.6.1.6.3.15.1.1.2.0=144805 }

} }
Of course the node sends the same error again.



 Comments   
Comment by Aleksandrs Saveljevs [ 2011 Sep 23 ]

A quick thought: this might or might not be similar to ZBX-2152. Please check that no two devices on the network have the same msgAuthoritativeEngineID (http://www.zabbix.com/documentation/1.8/manual/config/items#snmp_agent).

Comment by Michael Schwartzkopff [ 2011 Sep 23 ]

Hi,

it really seems to be the same problem as described in ZBX-2152.

I looked, but we do NOT have any duplicated snmpEngineIDs in our net. Two other indicators show me that the real cause cannot be duplicated EngineIDs:

1) After a reboot of the device Zabbix got data again.

2) If you have a close look to the 3rd packet in the trace Zabbix / net-snmp sends the *wrong* snmpEngineTime to the agent. Zabbix/net-snmp does not even bother to correct its snmpEngineTime even after the device reported the correct value.

Michael

Comment by richlv [ 2011 Sep 23 ]

"After a reboot of the device Zabbix got data again."

maybe some other device stops delivering the data, though ?

Comment by Michael Schwartzkopff [ 2011 Sep 23 ]

Hi,

as written in my first comment snmpget to the device always worked. This fact excludes such a trivial explanation.

Comment by richlv [ 2011 Sep 23 ]

as far as i'm aware, snmpget is not a valid test for such a problem, as it only tests a single device at a time. snmpget also worked flawlessly in ZBX-2152...

Comment by Michael Schwartzkopff [ 2011 Sep 23 ]

Yes. That is why it cannot be a problem of the SNMP agent. It must be a problem on the master side being located in Zabbix or net-snmp.

Comment by richlv [ 2011 Sep 23 ]

to clarify, if engine ids match, that is a problem on the agent side, but only if such agents are queried by the same client (or management station). snmpget just does not expose the problem.

Comment by Michael Schwartzkopff [ 2011 Sep 23 ]
  • snmpget from the Zabbix machine does NOT show the problem.
  • snmpget generates 6 SNMP packets:
  • question and answer to get the snmpEngineID of the node
  • question and answer to get the snmpEngineBoots and snmpEngineTime
  • question and answer to get the OID in question.
  • Zabbix generates only 4 packets on the line. Please see these packet in my first post.

Questions:

  • How does Zabbix know how many boots the agent has?
  • Why does Zabbix use the wrong snmpEngineTime for its question?
  • Why does Zabbix continues to use the wrong snmpEngineTime even after the agent reported the correct value?

Perhaps this is not a Zabbix problem, but related to net-snmp. Please see also my posts on the mailing list there.
BUT: Is net-snmp able to cache EngineID, Boots and Time? Can the caching bahaviour be triggered from Zabbix?

Comment by richlv [ 2011 Sep 23 ]

not sure about the difference - it could be lib-net-snmp doing something.

i was just pointing out that snmpget working does not exclude engineid being the source of the problem

Comment by Michael Schwartzkopff [ 2012 Mar 08 ]

Please see my discussion with on the net-snmp mailing list:
http://sourceforge.net/mailarchive/forum.php?thread_name=201109221523.27823.misch%40schwartzkopff.org&forum_name=net-snmp-users

Comment by Eric Gearhart [ 2012 Apr 16 ]

I think I am being bitten by an issue that is closely related to what Michael is reporting... I'm running Zabbix 2.0.0rc2, and all my hosts are SNMpv3 hosts. Sometimes Zabbix simply "quits working" periodically when doing its SNMP item polls, but snmpgets/snmpwalks work perfectly.

At the company I work at, It's getting to the point where this is pushing us to abandon Zabbix as a possible monitoring solution, and use Cacti+thold instead.

Comment by Michael Schwartzkopff [ 2012 Apr 17 ]

Please see my discussion on the mailing list. It seems to be a Zabbix issue. Sorry that the company ist not able to debug this issue further. We killed SNMPv3 because of this problem. Cacti is not a good alternative to Zabbix. Perhaps you have a look on opennms.org.

Greetings,

Michael.

Comment by Michael Schwartzkopff [ 2013 Sep 23 ]

Root cause of the problem: Duplicate engineID of the host. See also:

ZBX-2152

Solution:

Add the line

engineIDType 3

to your snmpd.conf of net-snmp and restart the agent. The agent will calculate a new, RFC conformant engineID and zabbix will resume to work.

Comment by Michael Schwartzkopff [ 2013 Sep 23 ]

It is not a Zabbix issue. Can be solved by resonconfiguration of the snmp agent on the monitored host. See solution in the comment.

Comment by Michael Schwartzkopff [ 2013 Sep 23 ]

Case closed.

Comment by Michael Schwartzkopff [ 2014 Dec 06 ]

Update in ZBX-2152:

Hi,

the problem is documented in ZBX-4164. Since you write that Zabbix caches the credentials, it is the fault of Zabbix that SNMPv3 is not usable. The problem still exsits in Zabbix 2.2 and probably in 2.4. Zabbix uses the wrong snmpTime.

When I restart the Zabbix server, the items get collected again. This is the proof, that the fault is located within the Zabbix server.

Michael.

Generated at Thu Mar 28 17:40:35 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.