[ZBXNEXT-747] More efficient SNMP trapping Created: 2011 Apr 13 Updated: 2025 Mar 24 Resolved: 2011 Sep 09 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 1.9.3 (alpha) |
Fix Version/s: | 1.9.6 (beta) |
Type: | New Feature Request | Priority: | Major |
Reporter: | Alexander Vladishev | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 6 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() |
||||||||
Issue Links: |
|
Description |
Zabbix will support new SNMP trap handling by using native (script-less) integration with NET-SNMP trap daemon. The integration will not require shell scripts or any other heavy processing, it NET-SNMP trap daemon will report directly to Zabbix Server using standard (one of: file, fifo, pipes, etc) high performance IPC methods. Optional configurable trap pre-processors (such as SNMPTT) will be supported. It will be implemented so that Zabbix will automatically sort traps and put corresponding hosts based on IP address. |
Comments |
Comment by richlv [ 2011 May 17 ] |
any updates on where the development is currently heading ? <rudolfs> haven't got around to the final tests (working on other issues...) snmptrapd have various issues when configuring for a direct output (to a file/fifo). it seems that the handler does not invoke forking or any other heavy processing. |
Comment by Oleksii Zagorskyi [ 2011 May 18 ] |
I decided to publish here some my thought (part of some discussion with devs) for public discussion. Q: passing traps directly to zabbix server w/o intermediate layers would be much faster A: "classic" shell handler runs shell script and after zabbix_sender (current official implementation) every time when snmp trap is received. As result a system is loaded and snmptrapd works slowly because of constant process forking. My deep experiments gave me a result - snmptrapd can't receive and log in real time more than ~ 16-20 traps/second. With the speed of the traps flow more that this value, snmptrapd starts to buffering received traps and log they with delay. "perl" handler starts when snmptrapd starting, and it stay resident in the memory while snmptrapd running. When trap received - it makes TCP connection to zabbix_server and sent the message (trap) like a zabbix_sender. so, no any forks and as result - it is very-very fast. You can send raw message in some optimal way to the zabbix_server and parse this message on the server side (C code). I do not know any other "more direct" way without w/o intermediate layers. Additional details of my experiment (in Russian) are available for developers at the internal wiki page. |
Comment by Oleksii Zagorskyi [ 2011 May 29 ] |
After implementing |
Comment by Rudolfs Kreicbergs [ 2011 May 30 ] |
This is still a development branch, thus a lot of weird things can happen |
Comment by Oleksii Zagorskyi [ 2011 Jul 14 ] |
A note just for case. First of all i must to say that i confused because of my old experiments with a different syslog daemons or with stopped syslogd. So system messages came to different places (physical console or log files). Now description of problem will be more clear (i hope) When snmptrapd receives a trap message this buggy message is added to /var/log/messages - "perl callback function 0x8240694 returned a scalar of type 6 instead of an integer, assuming 1 (NETSNMPTRAPD_HANDLER_OK)" When i add some line (return 1;) to the perl-script (according to Rufolf's suggestion) then the buggy message start coming to /var/log/debug.log - "perl callback function 0x82644fc returns 1" If syslogd is stopped, then all messages come to the physical console. Maybe we can somehow suppress all this messages at all? I recall that this problem did not exist in the net-snmp 5.3 |
Comment by Rudolfs Kreicbergs [ 2011 Jul 14 ] |
Thank you for the info, Oleksiy. Added return statements for both OK and FAIL cases. The second message seems like a debug message so it should be ok, those can probably be switched off in the daemon configuration file. zalex_ua added 10-02-2015 /* actually call the callback function */ if (SvTYPE(pcallback) == SVt_PVCV) { noValuesReturned = perl_call_sv(pcallback, G_SCALAR); /* XXX: it discards the results, which isn't right */ } else if (SvROK(pcallback) && SvTYPE(SvRV(pcallback)) == SVt_PVCV) { /* reference to code */ noValuesReturned = perl_call_sv(SvRV(pcallback), G_SCALAR); } else { snmp_log(LOG_ERR, " tried to call a perl function but failed to understand its type: (ref = %p, svrok: %lu, SVTYPE: %lu)\n", pcallback, (unsigned long)SvROK(pcallback), (unsigned long)SvTYPE(pcallback)); callingCFfailed = 1; } if (!callingCFfailed) { SPAGAIN; if ( noValuesReturned == 0 ) { snmp_log(LOG_WARNING, " perl callback function %p did not return a scalar, assuming %d (NETSNMPTRAPD_HANDLER_OK)\n", pcallback, NETSNMPTRAPD_HANDLER_OK); } else { SV *rv = POPs; if (SvTYPE(rv) != SVt_IV) { snmp_log(LOG_WARNING, " perl callback function %p returned a scalar of type %lu instead of an integer, assuming %d (NETSNMPTRAPD_HANDLER_OK)\n", pcallback, (unsigned long)SvTYPE(rv), NETSNMPTRAPD_HANDLER_OK); } else { int rvi = (IV)SvIVx(rv); if ((NETSNMPTRAPD_HANDLER_OK <= rvi) && (rvi <= NETSNMPTRAPD_HANDLER_FINISH)) { snmp_log(LOG_DEBUG, " perl callback function %p returns %d\n", pcallback, rvi); result = rvi; } else { snmp_log(LOG_WARNING, " perl callback function %p returned an invalid scalar integer value (%d), assuming %d (NETSNMPTRAPD_HANDLER_OK)\n", pcallback, rvi, NETSNMPTRAPD_HANDLER_OK); } } } PUTBACK; } Difference may be noticed between these pages: Looks like the page for version 5.0404 "published" in CPAN tree at 17 Oct 2013, i.e. after I wrote my comment above. But below are some interesting details I investigated today.
If try to check it using cpan: # cpan -D NetSNMP::TrapReceiver CPAN: Storable loaded ok (v2.49) Reading '/root/.cpan/Metadata' Database was generated on Sat, 14 Feb 2015 05:29:02 GMT NetSNMP::TrapReceiver ------------------------------------------------------------------------- CPAN: Module::CoreList loaded ok (v5.020001) (no description) H/HA/HARDAKER/NetSNMP-TrapReceiver-5.0404.tar.gz /usr/lib/x86_64-linux-gnu/perl5/5.20/NetSNMP/TrapReceiver.pm Installed: 5.07021 CPAN: 5.0404 up to date Wes Hardaker (HARDAKER) [email protected] Note that Installed version shows correct value. To install the module on different distro: centos7 - "net-snmp-perl", Debian - "libsnmp-perl". So be careful when reading NetSNMP::TrapReceiver documentation - don't use CPAN or Internet. Use only net-snmp sources/packages. If module is not installed - snmptrapd daemon will generate this error: # snmptrapd -f -Lsd -p /run/snmptrapd.pid -n Can't locate NetSNMP/TrapReceiver.pm in @INC (you may need to install the NetSNMP::TrapReceiver module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.1 /usr/local/share/perl/5.20.1 /usr/lib/x86_64-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl .) at /usr/share/snmp/snmp_perl_trapd.pl line 13. BEGIN failed--compilation aborted at /usr/share/snmp/snmp_perl_trapd.pl line 13. I found that the message "perl callback function 874d1e0 returns 1" misleaded zabbix users here: Those LOG_DEBUG level messages may be indeed suppressed by corresponding configuration of snmptrapd. Just small unrelated note: snmpd daemon (agent) logs a line like "Connection from UDP: [127.0.0.1]:32942->[127.0.0.1]:161" in syslog for every incoming snmp request. They are LOG_INFO level messages. |
Comment by Rudolfs Kreicbergs [ 2011 Sep 09 ] |
Available in pre-1.9.6 r21580. |
Comment by Oleksii Zagorskyi [ 2015 Feb 19 ] |
What is interesting is if in snmptrapd.conf you specify several handlers, like these: perl do "/opt/my1.pl"; perl do "/opt/my2.pl"; perl do "/opt/my3.pl"; they all will be executed for every received trap AND in an order as my3 my2 my1 - e.g. in reversed order. If there huge flow of traps and embedded handler takes some time to process single trap - received traps are received and buffered internally by snmpttrapd and later processed one by one with delays. Just in case, these lines as for snmptrapd.conf are identical: authCommunity log,execute,net public authCommunity execute,log,net public i.e. native logging (log) will be performed first. The same is when "disableAuthorization yes" is specified in the conf file, which eliminates those two lines. The same information mentioned on snmptt page, but only in aspect of external handlers http://www.snmptt.org/docs/snmptt.shtml:
It's possible to measure how much time the main (production) traphandler takes for every trap with this example code, debug.pl: #!/usr/bin/perl sub my_receiver { use Time::HiRes qw ( time ); open (DEBUG, ">>$debuglogfile") or die "Cannot open $debuglogfile\n"; print DEBUG time." $$ \n"; # Print current time and snmptrapd's PID close (DEBUG); # sleep 5; return NETSNMPTRAPD_HANDLER_OK; } NetSNMP::TrapReceiver::register("all", \&my_receiver) || warn "failed to register our perl trap handler\n"; $debuglogfile = "/tmp/zabbix_snmptrap_debug_handler.log"; # FULL Path to this debug log print STDERR "Loaded zabbix debug handler\n"; and modification snmptrapd.conf like this: perl do "/opt/debug.pl"; perl do "/opt/zabbix_trap_receiver.pl"; # perl do "/usr/lib/snmptt/snmptthandler-embedded"; perl do "/opt/debug.pl"; |
Comment by richlv [ 2015 Aug 21 ] |
looks like we missed updates to the internal monitoring page, reported at |