[#ZBX-19775] [bulk processing] reccuring tooBig error-status

[ZBX-19775] [bulk processing] reccuring tooBig error-status Created: 2021 Aug 04 Updated: 2022 Oct 08
Status:	Confirmed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Server (S)
Affects Version/s:	5.4.2
Fix Version/s:	None

Type:

Problem report

Priority:

Major

Reporter:

thomas

Assignee:

Zabbix Development Team

Resolution:

Unresolved

Votes:

Labels:

SNMP, bulk

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Environment:

Zabbix installation "All In One" VM
==============================
CentOS Linux release 8.1.1911 (Core)
==============================
mariadb.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
mariadb-backup.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
mariadb-common.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
mariadb-connector-c.x86_64 3.0.7-1.el8
mariadb-connector-c-config.noarch 3.0.7-1.el8
mariadb-errmsg.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
mariadb-gssapi-server.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
mariadb-server.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
mariadb-server-utils.x86_64 3:10.3.17-1.module_el8.1.0+217+4d875839
net-snmp.x86_64 1:5.8-20.el8
zabbix-agent.x86_64 5.4.2-1.el8
zabbix-nginx-conf.noarch 5.4.2-1.el8
zabbix-server-mysql.x86_64 5.4.2-1.el8
zabbix-sql-scripts.noarch 5.4.2-1.el8
zabbix-web.noarch 5.4.2-1.el8
zabbix-web-deps.noarch 5.4.2-1.el8
zabbix-web-mysql.noarch 5.4.2-1.el8

Attachments:

Wireshark IO Graphe - snmp.variable_bindings.png

Issue Links:

Sub-task
part of	~~ZBXNEXT-4428~~	internal logic of SNMP bulk processing	Closed

Description

I'm new to Zabbix, but my first impression is that you do a very good job. Bulk processing is a great feature !

Steps to reproduce:

Difficult to reproduce : I'm polling a Cisco device with "Use bulk requests" configured and every hour I got a log message on my Cisco device like this one :

2021 Aug 3 01:00:16 switch %SNMPD-3-ERROR: SNMP log error : SNMP Operation (GET) failed. Reason:1 reqId (904207711) errno (2) error index (0)

It seems to be related to the bulk processing mecanism which try to determine the maximum SNMP items Zabbix can retreive in one SNMP request for a device.

For my device, it seems that max_succeed=59 works most part of the time but every hour for a specific get-resquest, related to network discovry rule, it fails.

Result:
Cisco device's log file spammed every hour with a log message like :

2021 Aug 3 01:00:16 switch %SNMPD-3-ERROR: SNMP log error : SNMP Operation (GET) failed. Reason:1 reqId (904207711) errno (2) error index (0)

Expected:
I don't understand when "max_succeed" is calculated but it should not oscillating between two values. Some margin/historical data should prevent such oscillation.

Comments

Comment by thomas [ 2021 Aug 10 ]

Hello,

I dit another test (see attached picture " Wireshark I/O Graphe - snmp.variable_bindings") and I found that :

(1) Cisco host is added to monitoring with usual template around 16:12:43 PM :
regular items (9x)
   sysUptime every 30s (1x oid)
   sysObjectID every 15m (1x oid)
   System every 1h (sysDescr...) (5x oids)
   entPhysicalSerialNum + entPhysicalModelName every 1h (2x oids)
discovery items (x91 discovered)
       dot3StatsDuplexStatus + ifOperStatus every 1m (16x oids)
   interfaces in/out (ifHCInOctets...) every 3m (48x oids)
   ifHighSpeed every 5m (8x oids)
   ifType every 1h (8x oids)
   Entity every 1h (11x oids)
Item discovery occurs every 1h so until 17 PM only regular items are polled, max_items = max_succeed = 2 (sysUptime + sysObjectID every 15m) and min_fail = MAX_SNMP_ITEMS + 1 = 129

(2) At 17 PM, all regular items are polled and discovery processing occurs. Then discovered items and regular items are polled at regular intervals accordingly to configuration, max_items is increased by 3/2 at the end of each successfull polling

(3) First failure occurs at 17:09:43 PM (SNMP error-status = tooBig following execution of zbx_snmp_process_standard, polling 63x items). At this point, max_succeed = 42 and min_fail = 63

(4) Then max_items is increased again at the end of each successfull polling but one by one

(5) 18:00:43 PM new failure occurs (SNMP error-status = tooBig following execution of zbx_snmp_process_standard, polling 59x items). At this point, max_succeed = 58 and min_fail = 59

(6) max_items isn't increased anymore because in DCconfig_get_suggested_snmp_vars_nolock, MAX(dc_snmp->max_succeed + 1 - 2, dc_snmp->min_fail - 1) is returned that is 58 (58+1-2=57 < 59-1=58)

(7) 19:00:43 PM new failure occurs (SNMP error-status = tooBig following execution of zbx_snmp_process_standard, polling 58x items). At this point, max_succeed = 58 and min_fail = 58

(8) max_items isn't increased anymore because in DCconfig_get_suggested_snmp_vars_nolock, MAX(dc_snmp->max_succeed + 1 - 2, dc_snmp->min_fail - 1) is returned that is 57 (58+1-2=57 == 58-1=57)
20:00:43 PM new failure occurs (SNMP error-status = tooBig following execution of zbx_snmp_process_standard, polling 57x items). At this point, max_succeed = 58 and min_fail = 57
Next, for every polling, max_items = 57 even if some are failling (almost every hour).

I think that if max_succeed is greater than min_fail, it should be lowered to min_fail. Or max_items should be configurable in GUI.

Comment by thomas [ 2021 Aug 11 ]

Side note : Cisco device's log file is spammed only when severity level for snmpd process is greater than or equal to 3.

Generated at Wed Jul 16 09:46:44 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBX-19775] [bulk processing] reccuring tooBig error-status Created: 2021 Aug 04 Updated: 2022 Oct 08