[ZBX-2153] zabbix_server die after start <defunct> SIGBUS Created: 2010 Mar 12  Updated: 2017 May 30  Resolved: 2010 May 24

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.1
Fix Version/s: 1.8.3, 1.9.0 (alpha)

Type: Incident report Priority: Critical
Reporter: Sergey Ivlenkov Assignee: Alexander Vladishev
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris 10, Generic_118833-33 sun4u sparc SUNW,UltraAX-i2
Zabbix 1.8.1 (revision 9702)



 Description   

zabbix_server die after start and all zabbix_server process marks as <defunct>.
It happens when in DB exists host with enabled for monitoring.
If all hosts not monitoring - zabbix_server start and function until any host have monitoring enabled.

As I understand one zabbix process (DBCache) recieve SIGBUS singnal...

I found this bug with mysql database, but recompiling with postgresql doesn't help - the same bug.
Below are logs with prostgresql DB.

LOG info:
---------------------------------------
$ grep 14474 /var/log/zabbix/zabbix_server.log
14474:20100312:112825.254 server #1 started [DB Cache]
14474:20100312:112825.256 In main_dbconfig_loop()
14474:20100312:112825.258 Connect to the database
14474:20100312:112825.284 Query [txnlev:0] [select oid from pg_type where typname = 'bytea']
14474:20100312:112825.296 PostgreSQL Server version: 80402
14474:20100312:112825.297 Query [txnlev:0] [set escape_string_warning to off]
14474:20100312:112825.299 Syncing ...
14474:20100312:112825.301 In DCsync_confguration()
14474:20100312:112825.302 In DCsync_hosts()
14474:20100312:112825.303 Query [txnlev:0] [select hostid,proxy_hostid,host,useip,ip,dns,port,status,useipmi,ipmi_ip,ipmi_port,ipmi_authtype,ipmi_privilege,ipmi_username,ipmi_password,maintenance_status,maintenance_type,maintenance_from,errors_from,available,disable_until,snmp_errors_from,snmp_available,snmp_disable_until,ipmi_errors_from,ipmi_available,ipmi_disable_until from hosts where status in (0) and hostid between 000000000000000 and 099999999999999 order by hostid]
14471:20100312:112829.311 One child process died (PID:14474). Exiting ...
-----------------------------------

truss info:
------------
14474/1: open("/var/log/zabbix/zabbix_server.log", O_RDWR|O_APPEND|O_CREAT, 0666) = 7
14474/1: getpid() = 14474 [14471]
14474/1: fstat64(7, 0xFFBED6F8) = 0
14474/1: fstat64(7, 0xFFBED5A0) = 0
14474/1: ioctl(7, TCGETA, 0xFFBED684) Err#25 ENOTTY
14474/1: write(7, " 1 4 4 7 4 : 2 0 1 0 0".., 479) = 479
14474/1: close(7) = 0
14474/1: stat("/var/log/zabbix/zabbix_server.log", 0xFFBEECF8) = 0
14474/1: sigaction(SIGPIPE, 0xFFBFE9E0, 0xFFBFEA80) = 0
14474/1: send(6, " Q\0\001B5 s e l e c t ".., 438, 0) = 438
14474/1: sigaction(SIGPIPE, 0xFFBFE9E0, 0xFFBFEA80) = 0
14474/1: pollsys(0xFFBFEC20, 1, 0x00000000, 0x00000000) = 1
14474/1: recv(6, " T\0\003 0\01B h o s t i".., 16384, 0) = 1003
14474/1: semop(16777225, 0xFFBFEE58, 1) = 0
14474/1: Incurred fault #5, FLTACCESS %pc = 0x0003F9E4
14474/1: siginfo: SIGBUS BUS_ADRALN addr=0xF74000C4
14474/1: Received signal #10, SIGBUS [default]
14474/1: siginfo: SIGBUS BUS_ADRALN addr=0xF74000C4
14471/1: siginfo: SIGCLD CLD_KILLED pid=14474 status=0x000A
14471/1: kill(14474, SIGTERM) = 0
------------------------------------------------------------------



 Comments   
Comment by Marco Walther [ 2010 Mar 18 ]

1.8.1!
The problem is in src/libs/zbxdbcache/dbconfig.c: The first struct in the shared memory segment (ZBX_DC_CONFIG)
needs only 4-byte alignment, while all the following contain zbx_uint64_t elements and need 8-byte alignment. There
are some `int's in there but they will always be allocated in multiples of four, so they don't influence the alignment.

That will work pretty well on x86 which does not show those mis-alignment problems. But SPARC will fail because it
normally enforces the alignment. There are compiler switches, but they will create a big runtime penalty.

The fix is, to make sure, the size of ZBX_DC_CONFIG is a multiple of the biggest alignment.

[pre]
— zabbix-1.8.1.orig/src/libs/zbxdbcache/dbconfig.c 2010-01-27 13:22:44.000000000 -0800
+++ zabbix-1.8.1/src/libs/zbxdbcache/dbconfig.c 2010-03-17 18:06:47.377350002 -0700
@@ -2165,6 +2165,12 @@

sz = sizeof(ZBX_DC_CONFIG);

+#ifndef ORIGINAL
+ if (sz % sizeof(zbx_uint64_t) != 0)

{ + sz = (sz / sizeof(zbx_uint64_t) + 1) * sizeof(zbx_uint64_t); + }

+#endif
+
if (CONFIG_DBCONFIG_SIZE < sz)
{
zbx_error("Configuration buffer is too small. Please increase CacheSize parameter.");
[/pre]

Comment by Sergey Ivlenkov [ 2010 Mar 19 ]

Fix from Marco Walther (18/Mar/10 02:37 AM) resolve the problem.

After patching dbconfig.c zabbix_server start work (using gcc-3.4.6).
THANKS!

Another workaround I've found - to compile zabbix using SunStudio12u1 (BTW: it shows a number of warnings)

Comment by Marco Walther [ 2010 Mar 20 ]

I'm not so sure about the SunStudio12u1. I ran into that problem with a bit older version of SunStudio, I'm
currently not on my Solaris box, so I can't check the version

Unless you specify the options to `handle un-aligned data' and accept the runtime penalty, it should not
work

When I checked the code, the sizeof(ZBX_DC_CONFIG) % sizeof(zbx_uint64_t) was 4. So the the
next struct should always end up being un-aligned.

So there are two working solutions to the problem

  • Add a dummy zbx_uint64_t element to ZBX_DC_CONFIG or modify an existing element to require that
    alignment.
  • Make sure, the next struct starts at the correct alignment.
Comment by Guilherme França [ 2010 Mar 29 ]

I had the same issue but with Linux running on a Sun Fire v250 Ultrasparc IIIi machine. The patch sugested by Marco Walther fixed the issue.

Comment by Guilherme França [ 2010 Mar 29 ]

the problem still exists in 1.8.2

Comment by Fyodor [ 2010 Mar 30 ]

I have same bug on hp-ux (https://support.zabbix.com/browse/ZBX-1998)
I fix it simply add allow_unaligned_memory_access() call and link with -lunalign

Comment by Aleksandrs Saveljevs [ 2010 May 24 ]

Should be fixed as of r12212 in pre-1.8.3 as a side effect of ZBXNEXT-326. Please reopen if it still crashes.

Generated at Tue Apr 16 10:10:03 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.