Details

      Description

      After upgrading zabbix server from 2.0.10 to 2.2.1 i have noticed that i have a huge snmpv2 queue in Administation->Queue.
      Poller processes is about 40% busy.
      I am started tcpdump and listening for snmpv2 packets. I saw a lot of GetNextRequest packets, which are sended to hosts, which contains items that was created with dynamic index (https://www.zabbix.com/documentation/2.2/manual/config/items/itemtypes/snmp/dynamicindex).
      Next, i opened cpu graph on one of this hosts, and i saw high cpu utilization, caused by SNMPv2 packets. Looks like caching of this indexes works incorrect and zabbix server continuously trying to rebuild cache.

      1. zabbix_server.log
        396 kB
        Luiz Meier
      1. cpu.png
        50 kB
      2. queue.png
        33 kB

        Activity

        Hide
        Michael added a comment - - edited

        We are experiencing the same issue. SNMPv3 checks are also affected. On certain boxes CPU load increases dramatically. TCPdump shows tons of snmp requests which are indicating that snmp index cache is not used.

        Show
        Michael added a comment - - edited We are experiencing the same issue. SNMPv3 checks are also affected. On certain boxes CPU load increases dramatically. TCPdump shows tons of snmp requests which are indicating that snmp index cache is not used.
        Hide
        Aleksandrs Saveljevs added a comment -

        Regression, introduced in 2.2.1 when refactoring SNMP code under ZBXNEXT-98.

        Show
        Aleksandrs Saveljevs added a comment - Regression, introduced in 2.2.1 when refactoring SNMP code under ZBXNEXT-98 .
        Hide
        Aleksandrs Saveljevs added a comment -

        Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-7690 .

        Show
        Aleksandrs Saveljevs added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-7690 .
        Hide
        Aleksandrs Saveljevs added a comment -

        Fixed in pre-2.2.2 r42127 and pre-2.3.0 (trunk) r42128.

        Show
        Aleksandrs Saveljevs added a comment - Fixed in pre-2.2.2 r42127 and pre-2.3.0 (trunk) r42128.
        Hide
        Jean Chiappini added a comment -

        I just installed Zabbix 2.2.2 and for me the problem is still present... Example for this item .1.3.6.1.2.1.2.2.1.7[index,.1.3.6.1.2.1.2.2.1.2,GigabitEthernet0/1], on all check the tree is parsed...

        Show
        Jean Chiappini added a comment - I just installed Zabbix 2.2.2 and for me the problem is still present... Example for this item .1.3.6.1.2.1.2.2.1.7 [index,.1.3.6.1.2.1.2.2.1.2,GigabitEthernet0/1] , on all check the tree is parsed...
        Hide
        Aleksandrs Saveljevs added a comment -

        Cannot reproduce. Is this item in the supported state? Could you please attach DebugLevel=4 with only this item enabled?

        Show
        Aleksandrs Saveljevs added a comment - Cannot reproduce. Is this item in the supported state? Could you please attach DebugLevel=4 with only this item enabled?
        Hide
        Jean Chiappini added a comment -

        The item is in supported state. Now the cache works, but not before. I had 100 pollers, maybe the cache was not filled for all pollers before. I will upgrade our production server in 2.2.2 to see if the behaviour will be the same.

        Show
        Jean Chiappini added a comment - The item is in supported state. Now the cache works, but not before. I had 100 pollers, maybe the cache was not filled for all pollers before. I will upgrade our production server in 2.2.2 to see if the behaviour will be the same.
        Hide
        Aleksandrs Saveljevs added a comment - - edited

        Yes, the cache is per poller. So if you have 100 pollers, it will take 100 walks in total until each poller builds a cache. After that it should always use the cache.

        Also note that if the item is unsupported (because "GigabitEthernet0/1" is not found in ".1.3.6.1.2.1.2.2.1.2" tree), the walk will be done each time the item is processed because of the cache miss.

        Show
        Aleksandrs Saveljevs added a comment - - edited Yes, the cache is per poller. So if you have 100 pollers, it will take 100 walks in total until each poller builds a cache. After that it should always use the cache. Also note that if the item is unsupported (because "GigabitEthernet0/1" is not found in ".1.3.6.1.2.1.2.2.1.2" tree), the walk will be done each time the item is processed because of the cache miss.
        Hide
        Jean Chiappini added a comment -

        Ok, I confirm that this works like you have explained, thank you.

        Show
        Jean Chiappini added a comment - Ok, I confirm that this works like you have explained, thank you.
        Hide
        wagner rocha added a comment -

        The problem is happening on version 2.2.2 as well!!

        Show
        wagner rocha added a comment - The problem is happening on version 2.2.2 as well!!
        Hide
        Aleksandrs Saveljevs added a comment -

        Our previous investigation by myself and Jean shows that 2.2.2 works correctly.

        Could you please specify which OIDs fail to cache? Are all of your SNMP items with dynamic indexes supported? Does the problem appear a day after the server has been started?

        Show
        Aleksandrs Saveljevs added a comment - Our previous investigation by myself and Jean shows that 2.2.2 works correctly. Could you please specify which OIDs fail to cache? Are all of your SNMP items with dynamic indexes supported? Does the problem appear a day after the server has been started?
        Hide
        Luiz Meier added a comment -

        Log of zabbix server with just one host with problem.

        Show
        Luiz Meier added a comment - Log of zabbix server with just one host with problem.
        Hide
        Luiz Meier added a comment -

        Hello!

        I'm having this problem with IfInOctets and IfOutOctets. I attached a logfile in Debug 4 with just one host.

        Show
        Luiz Meier added a comment - Hello! I'm having this problem with IfInOctets and IfOutOctets. I attached a logfile in Debug 4 with just one host.
        Hide
        Aleksandrs Saveljevs added a comment -

        Luiz, according to the attached log, there is no problem:

        $ grep In.zbx.snmp.walk zabbix_server.log
          2883:20140319:155453.642 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
          2886:20140319:155454.654 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
          2884:20140319:155553.701 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
          2885:20140319:155554.705 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1'
        

        While it is true that the cache is built four times, it is built for four different pollers (PIDs 2883, 2884, 2885, and 2886). Since currently the cache is per poller, this is expected.

        Show
        Aleksandrs Saveljevs added a comment - Luiz, according to the attached log, there is no problem: $ grep In.zbx.snmp.walk zabbix_server.log 2883:20140319:155453.642 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1' 2886:20140319:155454.654 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1' 2884:20140319:155553.701 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1' 2885:20140319:155554.705 In zbx_snmp_walk() oid:'.1.3.6.1.2.1.2.2.1.2' search:'GigabitEthernet0/1' While it is true that the cache is built four times, it is built for four different pollers (PIDs 2883, 2884, 2885, and 2886). Since currently the cache is per poller, this is expected.
        Hide
        Luiz Meier added a comment -

        So I don't know what is happening. Just a few hosts kepp giving me the information of Cannot find index "ifDescr" of the OID "IF-MIB::ifInOctets["index","ifDescr","interface-name"]": Timeout while connecting to "10.25.8.

        There is no network problem and just a few hosts give me this information.

        Show
        Luiz Meier added a comment - So I don't know what is happening. Just a few hosts kepp giving me the information of Cannot find index "ifDescr" of the OID "IF-MIB::ifInOctets ["index","ifDescr","interface-name"] ": Timeout while connecting to "10.25.8. There is no network problem and just a few hosts give me this information.
        Hide
        Paulo Raponi added a comment -

        What happen when receive this:

        In zbx_snmp_walk() oid:'IF-MIB::ifOperStatus' search:'(null)'

        I have this search:'(null)' in a lot of items. Using 2.2.2

        Show
        Paulo Raponi added a comment - What happen when receive this: In zbx_snmp_walk() oid:'IF-MIB::ifOperStatus' search:'(null)' I have this search:'(null)' in a lot of items. Using 2.2.2
        Hide
        Aleksandrs Saveljevs added a comment -

        Paulo, this is expected if you are using SNMP low-level discovery.

        Show
        Aleksandrs Saveljevs added a comment - Paulo, this is expected if you are using SNMP low-level discovery.
        Hide
        richlv added a comment - - edited

        (1) looks like this changed the default discovery rule example :

        -ROW   |2      |NULL        |Local network|192.168.1.1-255|3600 |1     |
        +ROW   |2      |NULL        |Local network|192.168.0.1-254|3600 |1     |
        

        we should decide whether we want to document it in the upgrade notes or whatsnew

        Aleksandrs Saveljevs We probably don't, because only users with new installations will see the changes.

        <richlv> thanks, CLOSED

        Show
        richlv added a comment - - edited (1) looks like this changed the default discovery rule example : -ROW |2 |NULL |Local network|192.168.1.1-255|3600 |1 | +ROW |2 |NULL |Local network|192.168.0.1-254|3600 |1 | we should decide whether we want to document it in the upgrade notes or whatsnew Aleksandrs Saveljevs We probably don't, because only users with new installations will see the changes. < richlv > thanks, CLOSED

          People

          • Assignee:
            Unassigned
            Reporter:
            Mikhail Alipa
          • Votes:
            6 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: