[ZBXNEXT-4428] internal logic of SNMP bulk processing Created: 2018 Mar 16  Updated: 2024 Apr 18  Resolved: 2023 Jan 23

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: 3.4.7
Fix Version/s: 6.4.0beta5, 6.4 (plan)

Type: Change Request Priority: Major
Reporter: Tibor Pittich Assignee: Dmitrijs Goloscapovs
Resolution: Fixed Votes: 20
Labels: lld, pollers, snmp,
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File image-2022-12-05-10-59-09-754.png     PNG File image-2022-12-05-11-18-04-592.png     PNG File image-2022-12-16-10-53-31-391.png     File snmp_info.diff     XML File template_internal_snmp_info.xml     XML File template_module_interfaces_snmp.xml     XML File template_snmp_network.xml     Text File zabbix_server_07-12-2022.log     Text File zabbix_server_D5-14-2022.log     Text File zabbix_server_D5-3_14-12-22.log     File zbx_export_hosts_07-12-2022.yaml    
Issue Links:
Causes
causes ZBXNEXT-8009 JSONpath optimizations Closed
Duplicate
Sub-task
depends on ZBX-19775 [bulk processing] reccuring tooBig er... Confirmed
part of ZBXNEXT-7786 Add possibility to set *context* Engi... Open
Sub-Tasks:
Key
Summary
Type
Status
Assignee
ZBXNEXT-8263 Update template windows_snmp Change Request (Sub-task) Closed Kristaps Naglis  
ZBXNEXT-8286 Update template linux_snmp Specification change (Sub-task) Closed Kristaps Naglis  
ZBXNEXT-8287 Update template cisco_snmp Specification change (Sub-task) Closed Kristaps Naglis  
ZBXNEXT-8288 Update template mikrotik_snmp Specification change (Sub-task) Closed Kristaps Naglis  
ZBXNEXT-8289 Update template tplink_snmp Specification change (Sub-task) Closed Kristaps Naglis  
Team: Team A
Sprint: Sprint 92 (Sep 2022), Sprint 93 (Oct 2022), Sprint 94 (Nov 2022), Sprint 95 (Dec 2022), Sprint 96 (Jan 2023)
Story Points: 4

 Description   

I would like to ask for improvements in bulk processing of SNMP packets. I know how internally it works (https://www.zabbix.com/documentation/3.4/manual/config/items/itemtypes/snmp#internal_workings_of_bulk_processing) but this isn't working well in my environment.

I'm trying to LLD with big Cisco ASR9k router which contain thousands of interfaces.

From command line:
$ snmpbulkwalk -mall -v2c -c community router.net ifName
I'm able to get all 2k+interfaces within few seconds. Net-SNMP by default using max-repeaters 10. Playing with switch -Cr<NUM> getting better result when <NUM> is bigger, for example 100, it's still possible to get response in one packet.

But, Zabbix using their own logic. Started slowly using low max_vars in function zbx_snmp_walk (zabbix_server/poller/checks_snmp.c) and then after some succesfuly retrieved responses increase max_vars. According to tcpdump, in my case it reached max. 15, then SNMP daemon started responding slowly (maybe because of previous small requests), then Zabbix decrease max_vars which is even worse and whole LLD failed.
Global timeout in poller configuration is at max.30s which is also not very good.

Disable of bulk processing for this device is no-go, because snmpwalk over ifName tree tooks ~2m30s (thousands interfaces)..

So, what I like to expect is - having some possibility of fine tunning SNMP bulk processing.

Suggestions:

  • ability to disable dynamic change max-repeaters
  • have possibility to define static value for max-repeaters (fine tunning for response)
  • implement it at same user interface when disabling bulk request is possible
  • global config parameter for zabbix_server

Please, take a look on it. Thanks



 Comments   
Comment by Oleksii Zagorskyi [ 2018 Mar 16 ]

I have to say that "bulk" term in zabbix is not any related to "walk" in SNMP terms.
So, I think, some your conclusions might be wrong as for zabbix.

This can be related to ZBXNEXT-4103

Comment by Oleksii Zagorskyi [ 2018 Mar 16 ]

Looked to tcpdump with LLD rule without discovered items and with them.

Looks like "max_vars" used in zabbix, used for both different cases in snmp: as "max_repetitions" for getBulkRequest and as a limit for get-request with multiple OIDs.
When I had a rule enabled (with 3 net interfaces on a host), I saw only 4 as max for "max_repetitions". But when I enabled regular discovered items, the "max_repetitions" has increased to 27. That's not a problem, but for each LLD OID zabbix server need to discover to, zabbix received redundant data (more OIDs than requires), i.e. response packets were more big than really actually needed.

Comment by Smirnov Alexey [ 2019 Sep 06 ]

Another problem with LLD is that zabbix search max-repetitions for all items, but some snmp-devices can successfully answer on some snmp-requests with max-repetitions=40 (for example) and other snmp-requests with max-repetitions=16 only.

Comment by Vladislavs Sokurenko [ 2020 Jul 20 ]

If someone is experiencing the issue, could you please let us know if you are experiencing the problem when no SNMP items where discovered yet and only discovery was performed or if discovery is done simultaneously with data collection ? And if using snmpbulkwalk with smaller values of max-repeaters, for example 1 also has problems and if using snmpbulkwalk while Zabbix server is running is also problematic ? This would greatly help to pinpoint the problem. Also please let us know if you are able to test patch where max-repeaters is configurable to see if it helps.

Comment by Giorgio [ 2021 Mar 22 ]

We have similar problem with Cisco asr9K devices.

The problem occurs during polling phase when snmp-get packets are built with an high number of oids.

Cisco Support stated that the standard policy Zabbix uses to choose snmp packet size is too heavy for these devices, since their SNMP agent gets stuck even before Zabbix finds the correct packet size.

in dbcache.h there are a couple of constants that probably determine the max packet size.

#define MAX_SNMP_ITEMS 128
#define MAX_POLLER_ITEMS 128

It would probably be enough to set these values to something smaller (i.e 32). A server parameter would be a very good solution

Comment by Oleksii Zagorskyi [ 2021 Dec 16 ]

Giorgio, possibility to adjust the 128 limit definition is requested in ZBXNEXT-5080

Comment by Craig Hopkins [ 2022 Sep 02 ]

Is there any news on this? ZBX-21069 is getting frustrating. 

Comment by Dmitrijs Goloscapovs [ 2022 Dec 16 ]

Available in versions:
  - pre-6.4.0rc1 - a818d516d9b

Comment by Martins Valkovskis [ 2022 Dec 29 ]

Updated documentation:

Generated at Wed Apr 24 17:06:07 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.