[ZBX-7690] Caching of Dynamic SNMPv2 indexes Created: 2014 Jan 22  Updated: 2017 May 30  Resolved: 2014 Oct 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.2.1
Fix Version/s: 2.2.2rc1, 2.3.0

Type: Incident report Priority: Critical
Reporter: Mikhail Alipa Assignee: Unassigned
Resolution: Fixed Votes: 6
Labels: dynamicindexes, performance, regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS release 6.5 (Final) (2.6.32-431.3.1.el6.x86_64)


Attachments: PNG File cpu.png     PNG File queue.png     Text File zabbix_server.log    

 Description   

After upgrading zabbix server from 2.0.10 to 2.2.1 i have noticed that i have a huge snmpv2 queue in Administation->Queue.
Poller processes is about 40% busy.
I am started tcpdump and listening for snmpv2 packets. I saw a lot of GetNextRequest packets, which are sended to hosts, which contains items that was created with dynamic index (https://www.zabbix.com/documentation/2.2/manual/config/items/itemtypes/snmp/dynamicindex).
Next, i opened cpu graph on one of this hosts, and i saw high cpu utilization, caused by SNMPv2 packets. Looks like caching of this indexes works incorrect and zabbix server continuously trying to rebuild cache.



 Comments   
Comment by Michael [ 2014 Jan 31 ]

We are experiencing the same issue. SNMPv3 checks are also affected. On certain boxes CPU load increases dramatically. TCPdump shows tons of snmp requests which are indicating that snmp index cache is not used.

Comment by Aleksandrs Saveljevs [ 2014 Feb 03 ]

Regression, introduced in 2.2.1 when refactoring SNMP code under ZBXNEXT-98.

Comment by Aleksandrs Saveljevs [ 2014 Feb 03 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-7690 .

Comment by Aleksandrs Saveljevs [ 2014 Feb 03 ]

Fixed in pre-2.2.2 r42127 and pre-2.3.0 (trunk) r42128.

Comment by Jean Chiappini [ 2014 Feb 13 ]

I just installed Zabbix 2.2.2 and for me the problem is still present... Example for this item .1.3.6.1.2.1.2.2.1.7[index,.1.3.6.1.2.1.2.2.1.2,GigabitEthernet0/1], on all check the tree is parsed...

Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ]

Cannot reproduce. Is this item in the supported state? Could you please attach DebugLevel=4 with only this item enabled?

Comment by Jean Chiappini [ 2014 Feb 13 ]

The item is in supported state. Now the cache works, but not before. I had 100 pollers, maybe the cache was not filled for all pollers before. I will upgrade our production server in 2.2.2 to see if the behaviour will be the same.

Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ]

Yes, the cache is per poller. So if you have 100 pollers, it will take 100 walks in total until each poller builds a cache. After that it should always use the cache.

Also note that if the item is unsupported (because "GigabitEthernet0/1" is not found in ".1.3.6.1.2.1.2.2.1.2" tree), the walk will be done each time the item is processed because of the cache miss.

Comment by Jean Chiappini [ 2014 Feb 17 ]

Ok, I confirm that this works like you have explained, thank you.

Comment by wagner rocha [ 2014 Mar 10 ]

The problem is happening on version 2.2.2 as well!!

Comment by Aleksandrs Saveljevs [ 2014 Mar 11 ]

Our previous investigation by myself and Jean shows that 2.2.2 works correctly.

Could you please specify which OIDs fail to cache? Are all of your SNMP items with dynamic indexes supported? Does the problem appear a day after the server has been started?

Comment by Luiz Meier [ 2014 Mar 19 ]

Log of zabbix server with just one host with problem.

Comment by Luiz Meier [ 2014 Mar 19 ]

Hello!

I'm having this problem with IfInOctets and IfOutOctets. I attached a logfile in Debug 4 with just one host.

Comment by Aleksandrs Saveljevs [ 2014 Mar 20 ]

Luiz, according to the attached log, there is no problem:

$ grep In.zbx.snmp.walk zabbix_server.log
  2883:20140319:155453.642 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
  2886:20140319:155454.654 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
  2884:20140319:155553.701 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
  2885:20140319:155554.705 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'

While it is true that the cache is built four times, it is built for four different pollers (PIDs 2883, 2884, 2885, and 2886). Since currently the cache is per poller, this is expected.

Comment by Luiz Meier [ 2014 Mar 21 ]

So I don't know what is happening. Just a few hosts kepp giving me the information of Cannot find index "ifDescr" of the OID "IF-MIB::ifInOctets["index","ifDescr","interface-name"]": Timeout while connecting to "10.25.8.

There is no network problem and just a few hosts give me this information.

Comment by Paulo Raponi [ 2014 Apr 08 ]

What happen when receive this:

In zbx_snmp_walk() oid:'IF-MIB::ifOperStatus' search:'(null)'

I have this search:'(null)' in a lot of items. Using 2.2.2

Comment by Aleksandrs Saveljevs [ 2014 Apr 09 ]

Paulo, this is expected if you are using SNMP low-level discovery.

Comment by richlv [ 2014 Oct 16 ]

(1) looks like this changed the default discovery rule example :

-ROW   |2      |NULL        |Local network|192.168.1.1-255|3600 |1     |
+ROW   |2      |NULL        |Local network|192.168.0.1-254|3600 |1     |

we should decide whether we want to document it in the upgrade notes or whatsnew

asaveljevs We probably don't, because only users with new installations will see the changes.

<richlv> thanks, CLOSED

Generated at Thu Apr 25 09:23:25 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.