[ZBX-14559] Zabbix agent crash on AIX when processing net.dns[] items Created: 2018 Jul 03  Updated: 2024 Apr 10  Resolved: 2018 Sep 03

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.4.11
Fix Version/s: 3.0.22rc1, 3.4.14rc1, 4.0.0beta2, 4.0 (plan)

Type: Problem report Priority: Major
Reporter: Alexey Pustovalov Assignee: Andris Mednis
Resolution: Fixed Votes: 0
Labels: AIX, crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

AIX 6.1, pcre, openssl


Attachments: Text File ZBX-14559_patch_with_padding_for_30.txt     Text File ZBX-14559_patch_without_res_ninit_on_AIX_for_30.txt    
Team: Team A
Sprint: Sprint 37, Sprint 38, Sprint 41
Story Points: 2

 Description   

Zabbix agent crashed while using -p option (print all metrics):

# /usr/local/sbin/zabbix_agentd -p
agent.hostname                                [s|Zabbix server]
agent.ping                                    [u|1]
agent.version                                 [s|3.4.8]
system.localtime[utc]                         [u|1530603016]
system.run[echo test]                         [m|ZBX_NOTSUPPORTED] [Remote commands are not enabled.]
web.page.get[localhost,,80]                   [t|]
web.page.perf[localhost,,80]                  [d|0.000000]
web.page.regexp[localhost,,80,OK]             [s|]
vfs.file.size[/etc/passwd]                    [u|1162]
vfs.file.time[/etc/passwd,modify]             [u|1529397119]
vfs.file.exists[/etc/passwd]                  [u|1]
vfs.file.contents[/etc/passwd]                [t|root:!:0:0::/home/root:/usr/bin/ksh
daemon:!:1:1::/etc:
.... truncated...
zabbix:x:21602:11261:zabbix:/opt/zabbix:/bin/false]
vfs.file.regexp[/etc/passwd,root]             [s|root:!:0:0::/home/root:/usr/bin/ksh]
vfs.file.regmatch[/etc/passwd,root]           [u|1]
vfs.file.md5sum[/etc/passwd]                  [s|some_cksum]
vfs.file.cksum[/etc/passwd]                   [u|some_cksum]
vfs.dir.size[/var/log]                        [u|44015584]
zabbix_agentd [14745676]: ERROR: Got signal [signal:4(SIGILL),reason:30,refaddr:0]. Crashing ...
zabbix_agentd [14745676]: ERROR: ====== Fatal information: ======
zabbix_agentd [14745676]: ERROR: program counter not available for this architecture
zabbix_agentd [14745676]: ERROR: === Registers: ===
zabbix_agentd [14745676]: ERROR: register dump not available for this architecture
zabbix_agentd [14745676]: ERROR: === Backtrace: ===
zabbix_agentd [14745676]: ERROR: backtrace not available for this platform
zabbix_agentd [14745676]: ERROR: === Memory map: ===
zabbix_agentd [14745676]: ERROR: memory map not available for this platform
zabbix_agentd [14745676]: ERROR: ================================
net.dns[,zabbix.com]                         #
#


 Comments   
Comment by Vladislavs Sokurenko [ 2018 Jul 03 ]

Is it vfs.dir.size that crash ? Could you please be so kind and execute it separately ? Also would be nice to see it with debug log level.
dotneft It is "-p" so no debug info ( But I will try both keys, tomorrow.

Comment by Glebs Ivanovskis [ 2018 Jul 04 ]

Dear dotneft, I believe -C works with -p, so you can raise logging there.

Comment by Andris Mednis [ 2018 Jul 06 ]

So, vfs.dir.size[/var/log] was the last successful metric before crash.
The next one is probably net.dns[,zabbix.com]
You can try it

# /usr/local/sbin/zabbix_agentd -t net.dns[,zabbix.com]
Comment by Andris Mednis [ 2018 Aug 23 ]

Confirmed with 3.0.15, 3.0.21rc1 on AIX 7.1 TL0.

Comment by Andris Mednis [ 2018 Aug 23 ]

Crash happens when returning from dns_query() with SYSINFO_RET_OK in

static int      dns_query(AGENT_REQUEST *request, AGENT_RESULT *result, int short_answer)
{
...
        hp = (HEADER *)answer.buffer;

        if (1 == short_answer)
        {
                SET_UI64_RESULT(result, NOERROR != hp->rcode || 0 == ntohs(hp->ancount) || -1 == res ? 0 : 1);
                                <--- It goes successfully until here,
                                <--- then crashes in the process of returning.
                return SYSINFO_RET_OK;
        }
...

 

Comment by Andris Mednis [ 2018 Aug 28 ]

In src/libs/zbxsysinfo/common/net.c there is a function dns_query() which calls res_ninit():

...
#ifdef HAVE_RES_NINIT
        struct __res_state      res_state_local;
#else   /* thread-unsafe resolver API */
...
#ifdef HAVE_RES_NINIT
        memset(&res_state_local, 0, sizeof(res_state_local));
        if (-1 == res_ninit(&res_state_local))  /* initialize always, settings might have changed */
#else

It seems that on some AIX systems with no updates installed res_ninit() can corrupt stack causing a crash when returning from dns_query().

  • Solution 1: adding some padding bytes after res_state_local variable on stack (see attached patch file ZBX-14559_patch_with_padding_for_30.txt).
  • Solution 2: replacing res_ninit() with a deprecated function res_init() on AIX systems (see attached patch file ZBX-14559_patch_without_res_ninit_on_AIX_for_30.txt).

A similar problem was described in 2005 on Kerberos maillist [krbdev.mit.edu #3172

Comment by Andris Mednis [ 2018 Aug 29 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-6565-13645-14559-30 (based  on 3.0.22rc1) which contains proposed fixes for:

Tested on AIX 6.1 TL0, 7.1 TL0, 7.1 TL4.

Comment by Andris Mednis [ 2018 Aug 31 ]

Fixed in versions:

  • pre-3.0.22rc1 r84412
  • pre-3.4.14rc1 r84415
  • pre-4.0.0beta2 (trunk) r84416
Comment by Andris Mednis [ 2018 Aug 31 ]

No documentation update required.

Generated at Sat Apr 27 05:44:30 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.