[ZBX-12956] queue calculation may give false positives for not fast *bulk* snmp operations Created: 2017 Oct 27 Updated: 2025 Jun 11 Resolved: 2017 Nov 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | None |
Affects Version/s: | 3.4.3 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Major |
Reporter: | Oleksii Zagorskyi | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 1 |
Labels: | bulk, queue, scheduling, snmpbulk | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() |
||||||||||||
Issue Links: |
|
||||||||||||
Team: | |||||||||||||
Sprint: | Sprint 20, Sprint 21 |
Description |
Not a bug , but ... Just FYI, I'm using a patch from ZBXNEXT-4103 to let my SNMPv3 device a chance to work in bulk mode. But it's not related to current case. But when I use it (bulk really works, with notes), I see an unexpected queue behavior - it's jumping. When attempt to capture snmp traffic, I could see that maximal OIDs number my device may reach is close to ~60 and as I was able to figure out - it depends on data size the device should reply by. Zabbix nicely splits on 2 parts and repeats requests in the same "snmp session" when it gets "error-status: tooBig". Internal monitoring: if increase update interval to queue measurement to 3 seconds, we can cleanly see those spikes. See graph. Yes, I know that such items polling depends on interfaceID so scheduled time will be the same for all the 1000 items. When I have 4 hosts and many pollers - it gets not any better. The request is to reconsider such behavior and maybe improve some part (scheduling or queue calculation). I have bunch of snmp traffic captures with different number of pollers, 1 or 4 number of hosts, captured from server start and following 3-4 polling batches. Can be provided on request. |
Comments |
Comment by Andris Zeila [ 2017 Oct 30 ] |
Ideally we should take in account the dc_interface->max_snmp_succeed when calculating the seed for nextchecks in get_item_nextcheck_seed() function. Instead of returning interfaceid if bulk is enabled we should return some hash based on interfaceid, number of items using this interface and max_snmp_succeed value. We do keep lists of snmp items by interface, so it should be possible. Something like interfaceid * itemid % (number of items / max_snmp_succeed) (when number of items > max_snmp_succeed). |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 03 ] |
Can someone tell me if this is somehow related to ZBXNEXT-3988? zalex_ua I'd say the ZBXNEXT-3988 has "top level" influence on current case as well, but issue described here is an independent, specific one. glebs.ivanovskis Thank you! |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 08 ] |
As a workaround, there is optional <from> parameter in zabbix[queue,<from>,<to>], it can be increased (default value is 6 seconds) to make queue readings more stable. It will introduce some latency in detecting delayed checks, but <from> should not be too high to account for slow SNMP polling and latencies will be tolerable. |
Comment by Rostislav Palivoda (Inactive) [ 2017 Nov 13 ] |
Continues under ZBXNEXT-4103 |
Comment by Oleksii Zagorskyi [ 2018 Jul 12 ] |
Just want to leave here a note as for my statement "data size the device should reply by" Here is an example output of "show snmp" management command of cisco switch: Router# show snmp Chassis: 01234567 37 SNMP packets input 0 Bad SNMP version errors 4 Unknown community name 0 Illegal operation for community name supplied 0 Encoding errors 24 Number of requested variables 0 Number of altered variables 0 Get-request PDUs 28 Get-next PDUs 0 Set-request PDUs 78 SNMP packets output 0 Too big errors (Maximum packet size 1500) 0 No such name errors 0 Bad values errors 0 General errors 24 Response PDUs 13 Trap PDUs SNMP logging: enabled Logging to 192.168.1.1.162, 0/10, 13 sent, 0 dropped. SNMP Manager-role output packets 4 Get-request PDUs 4 Get-next PDUs 6 Get-bulk PDUs 4 Set-request PDUs 23 Inform-request PDUs 30 Timeouts 0 Drops SNMP Manager-role input packets 0 Inform response PDUs 2 Trap PDUs 7 Response PDUs 1 Responses with errors SNMP informs: enabled Informs in flight 0/25 (current/max) Logging to 192.168.1.1.162 4 sent, 0 in-flight, 1 retries, 0 failed, 0 dropped Logging to 192.168.1.1.162 0 sent, 0 in-flight, 0 retries, 0 failed, 0 dropped note please this part: 78 SNMP packets output 0 Too big errors (Maximum packet size 1500) looks like it says about what I supposed initially - if prepared reply will be more than 1500 bytes(it depends on values, whose length is variable/unpredictable), snmp device will reply with tooBig error. |