[ZBXNEXT-747] More efficient SNMP trapping Created: 2011 Apr 13  Updated: 2015 Aug 21  Resolved: 2011 Sep 09

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: 1.9.3 (alpha)
Fix Version/s: 1.9.6 (beta)

Type: New Feature Request Priority: Major
Reporter: Alexander Vladishev Assignee: Unassigned
Resolution: Fixed Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File discovery item prototype.jpg     JPEG File discovery rule.jpg    
Issue Links:
Duplicate
is duplicated by ZBXNEXT-772 Complete handling of SNMP traps Closed

 Description   

Zabbix will support new SNMP trap handling by using native (script-less) integration with NET-SNMP trap daemon. The integration will not require shell scripts or any other heavy processing, it NET-SNMP trap daemon will report directly to Zabbix Server using standard (one of: file, fifo, pipes, etc) high performance IPC methods.

Optional configurable trap pre-processors (such as SNMPTT) will be supported.

It will be implemented so that Zabbix will automatically sort traps and put corresponding hosts based on IP address.



 Comments   
Comment by richlv [ 2011 May 17 ]

any updates on where the development is currently heading ?

<rudolfs> haven't got around to the final tests (working on other issues...) snmptrapd have various issues when configuring for a direct output (to a file/fifo). it seems that the handler does not invoke forking or any other heavy processing.
so the most effective way might be a simple C application configured as the handler that writes the data to a file. the file will then be read by special trappers on the server/proxy. so this will be configurable on the main server or on any of the proxies. the final processing might consist of finding the correct host in the DB and run the trap against regexes configured in Zabbix.

Comment by Oleksii Zagorskyi [ 2011 May 18 ]

I decided to publish here some my thought (part of some discussion with devs) for public discussion.

Q: passing traps directly to zabbix server w/o intermediate layers would be much faster

A: "classic" shell handler runs shell script and after zabbix_sender (current official implementation) every time when snmp trap is received. As result a system is loaded and snmptrapd works slowly because of constant process forking. My deep experiments gave me a result - snmptrapd can't receive and log in real time more than ~ 16-20 traps/second. With the speed of the traps flow more that this value, snmptrapd starts to buffering received traps and log they with delay.

"perl" handler starts when snmptrapd starting, and it stay resident in the memory while snmptrapd running. When trap received - it makes TCP connection to zabbix_server and sent the message (trap) like a zabbix_sender. so, no any forks and as result - it is very-very fast. You can send raw message in some optimal way to the zabbix_server and parse this message on the server side (C code). I do not know any other "more direct" way without w/o intermediate layers.
additional benefits of "perl" handler - output format for traps v1 and v2 are identical an fixed.

Additional details of my experiment (in Russian) are available for developers at the internal wiki page.

Comment by Oleksii Zagorskyi [ 2011 May 29 ]

After implementing ZBX-3105 we have:
zabbix_server [24855]: unknown parameter [StartSNMPTrappers] in config file [/etc/zabbix/zabbix_server.conf], line 2
/usr/local/etc/rc.d/zabbix_server_fast: WARNING: failed to start zabbix_server
Rudolfs, do not forget to provide new configuration parameters (StartSNMPTrappers and SNMPTrapperFile) and do not see such error

Comment by Rudolfs Kreicbergs [ 2011 May 30 ]

This is still a development branch, thus a lot of weird things can happen So you probably would not want to use this branch. Currently I'm waiting for more specs to continue work on this branch.

Comment by Oleksii Zagorskyi [ 2011 Jul 14 ]

A note just for case.
I registered already a old time some bug at a net-snmp bug-tracker http://sourceforge.net/tracker/?func=detail&aid=3053954&group_id=12694&atid=112694

First of all i must to say that i confused because of my old experiments with a different syslog daemons or with stopped syslogd. So system messages came to different places (physical console or log files).

Now description of problem will be more clear (i hope)

When snmptrapd receives a trap message this buggy message is added to /var/log/messages - "perl callback function 0x8240694 returned a scalar of type 6 instead of an integer, assuming 1 (NETSNMPTRAPD_HANDLER_OK)"

When i add some line (return 1;) to the perl-script (according to Rufolf's suggestion) then the buggy message start coming to /var/log/debug.log - "perl callback function 0x82644fc returns 1"

If syslogd is stopped, then all messages come to the physical console.

Maybe we can somehow suppress all this messages at all? I recall that this problem did not exist in the net-snmp 5.3

Comment by Rudolfs Kreicbergs [ 2011 Jul 14 ]

Thank you for the info, Oleksiy. Added return statements for both OK and FAIL cases. The second message seems like a debug message so it should be ok, those can probably be switched off in the daemon configuration file.

zalex_ua added 10-02-2015
Latest net-snmp in a source file TrapReceiver.xs contains this piece of code:

    /* actually call the callback function */
    if (SvTYPE(pcallback) == SVt_PVCV) {
        noValuesReturned = perl_call_sv(pcallback, G_SCALAR);
        /* XXX: it discards the results, which isn't right */
    } else if (SvROK(pcallback) && SvTYPE(SvRV(pcallback)) == SVt_PVCV) {
        /* reference to code */
        noValuesReturned = perl_call_sv(SvRV(pcallback), G_SCALAR);
    } else {
        snmp_log(LOG_ERR, " tried to call a perl function but failed to understand its type: (ref = %p, svrok: %lu, SVTYPE: %lu)\n", pcallback, (unsigned long)SvROK(pcallback), (unsigned long)SvTYPE(pcallback));
	callingCFfailed = 1;
    }

    if (!callingCFfailed) {
      SPAGAIN;

      if ( noValuesReturned == 0 ) {
        snmp_log(LOG_WARNING, " perl callback function %p did not return a scalar, assuming %d (NETSNMPTRAPD_HANDLER_OK)\n", pcallback, NETSNMPTRAPD_HANDLER_OK);
      }
      else {
	SV *rv = POPs;

	if (SvTYPE(rv) != SVt_IV) {
	  snmp_log(LOG_WARNING, " perl callback function %p returned a scalar of type %lu instead of an integer, assuming %d (NETSNMPTRAPD_HANDLER_OK)\n", pcallback, (unsigned long)SvTYPE(rv), NETSNMPTRAPD_HANDLER_OK);
	}
	else {
	  int rvi = (IV)SvIVx(rv);

	  if ((NETSNMPTRAPD_HANDLER_OK <= rvi) && (rvi <= NETSNMPTRAPD_HANDLER_FINISH)) {
	    snmp_log(LOG_DEBUG, " perl callback function %p returns %d\n", pcallback, rvi);
	    result = rvi;
	  }
	  else {
	    snmp_log(LOG_WARNING, " perl callback function %p returned an invalid scalar integer value (%d), assuming %d (NETSNMPTRAPD_HANDLER_OK)\n", pcallback, rvi, NETSNMPTRAPD_HANDLER_OK);
	  }
	}
      }

      PUTBACK;
    }

Difference may be noticed between these pages:
http://search.cpan.org/~hardaker/NetSNMP-TrapReceiver-5.0401/TrapReceiver.pm
http://search.cpan.org/~hardaker/NetSNMP-TrapReceiver-5.0404/TrapReceiver.pm

Looks like the page for version 5.0404 "published" in CPAN tree at 17 Oct 2013, i.e. after I wrote my comment above.
Also - I'd not miss requirements about returned value (like NETSNMPTRAPD_HANDLER_OK) when reading the page, so indeed, I was reading the CPAN page for version 5.0401 (supposedly published 18 Sep 2007)

But below are some interesting details I investigated today.
Looks like NetSNMP::TrapReceiver was initially and currently is just part of Net-SNMP package.
It has been firstly included in Net-SNMP 5.2 released 2004-11-25.
It was added to Net-SNMP sources 2004-02-11:

Initial pass at an embedded perl module for snmptrapd.
Whats odd for typical initial code from me is that it's fully
functional and documented even!
--Wes Hardaker

If try to check it using cpan:

# cpan -D NetSNMP::TrapReceiver
CPAN: Storable loaded ok (v2.49)
Reading '/root/.cpan/Metadata'
  Database was generated on Sat, 14 Feb 2015 05:29:02 GMT
NetSNMP::TrapReceiver
-------------------------------------------------------------------------
        CPAN: Module::CoreList loaded ok (v5.020001)
(no description)
        H/HA/HARDAKER/NetSNMP-TrapReceiver-5.0404.tar.gz
        /usr/lib/x86_64-linux-gnu/perl5/5.20/NetSNMP/TrapReceiver.pm
        Installed: 5.07021
        CPAN:      5.0404  up to date
        Wes Hardaker (HARDAKER)
        [email protected]

Note that Installed version shows correct value.
If the module is not installed - the value is empty.

To install the module on different distro: centos7 - "net-snmp-perl", Debian - "libsnmp-perl".
Then actual documentation will be available by "man NetSNMP::TrapReceiver"
Note - the man page for Debian is (was?) in separate "libsnmp-dev" package.
Note that all of man pages of NetSNMP::TrapReceiver I googled in the Internet - are outdated.

So be careful when reading NetSNMP::TrapReceiver documentation - don't use CPAN or Internet. Use only net-snmp sources/packages.

If module is not installed - snmptrapd daemon will generate this error:

# snmptrapd -f -Lsd -p /run/snmptrapd.pid -n
Can't locate NetSNMP/TrapReceiver.pm in @INC (you may need to install the NetSNMP::TrapReceiver module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.1 /usr/local/share/perl/5.20.1 /usr/lib/x86_64-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl .) at /usr/share/snmp/snmp_perl_trapd.pl line 13.
BEGIN failed--compilation aborted at /usr/share/snmp/snmp_perl_trapd.pl line 13.

I found that the message "perl callback function 874d1e0 returns 1" misleaded zabbix users here:
https://www.zabbix.com/forum/showthread.php?t=26511&page=2

Those LOG_DEBUG level messages may be indeed suppressed by corresponding configuration of snmptrapd.
For example on my Debian/testing host the daemon is running with option -Lsd.
If replace the option to -LS6-0d - it will suppress those LOG_DEBUG messages and they will not appear in different syslog files.
I'd recommend to do that for all users.

Just small unrelated note: snmpd daemon (agent) logs a line like "Connection from UDP: [127.0.0.1]:32942->[127.0.0.1]:161" in syslog for every incoming snmp request. They are LOG_INFO level messages.
They also may be suppressed if run the snmpd with option -LS5-0d instead of -Lsd.

Comment by Rudolfs Kreicbergs [ 2011 Sep 09 ]

Available in pre-1.9.6 r21580.

Comment by Oleksii Zagorskyi [ 2015 Feb 19 ]

What is interesting is if in snmptrapd.conf you specify several handlers, like these:

perl do "/opt/my1.pl";
perl do "/opt/my2.pl";
perl do "/opt/my3.pl";

they all will be executed for every received trap AND in an order as my3 my2 my1 - e.g. in reversed order.
Even if each script file registers "NetSNMP::TrapReceiver" module with identical subroutine name - the routines will be sort of independent (isolated) each other.
Each next script will be executed only after previous one has finished - e.g. sequentially.

If there huge flow of traps and embedded handler takes some time to process single trap - received traps are received and buffered internally by snmpttrapd and later processed one by one with delays.
It's true also for native snmptrap'd logging !!! But with one detail - native logging is always done before handler(s) processing.

Just in case, these lines as for snmptrapd.conf are identical:

authCommunity log,execute,net public
authCommunity execute,log,net public

i.e. native logging (log) will be performed first. The same is when "disableAuthorization yes" is specified in the conf file, which eliminates those two lines.

The same information mentioned on snmptt page, but only in aspect of external handlers http://www.snmptt.org/docs/snmptt.shtml:

The SNMPTRAPD program blocks when executing traphandle commands. This means that if the program called never quits, SNMPTRAPD will wait forever. If a trap is received while the traphandler is running, it is buffered and will be processed when the traphandler finishes. I do not know how large this buffer is.

It's possible to measure how much time the main (production) traphandler takes for every trap with this example code, debug.pl:

#!/usr/bin/perl
 
sub my_receiver {
    use Time::HiRes qw ( time );
    open (DEBUG, ">>$debuglogfile") or die "Cannot open $debuglogfile\n";
    print DEBUG time."  $$  \n";  # Print current time and snmptrapd's PID
    close (DEBUG);
    # sleep 5;
    return NETSNMPTRAPD_HANDLER_OK;
}
 
NetSNMP::TrapReceiver::register("all", \&my_receiver) || warn "failed to register our perl trap handler\n";

$debuglogfile = "/tmp/zabbix_snmptrap_debug_handler.log";   # FULL Path to this debug log

print STDERR "Loaded zabbix debug handler\n";

and modification snmptrapd.conf like this:

perl do "/opt/debug.pl";
perl do "/opt/zabbix_trap_receiver.pl";
# perl do "/usr/lib/snmptt/snmptthandler-embedded";
perl do "/opt/debug.pl";
Comment by richlv [ 2015 Aug 21 ]

looks like we missed updates to the internal monitoring page, reported at ZBX-9802

Generated at Fri Apr 26 10:38:10 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.