ZABBIX BUGS AND ISSUES

2.0.1 agent on Solaris 10 throws "Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ..."

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 2.0.1
  • Fix Version/s: 2.0.4rc1, 2.1.0
  • Component/s: Agent (G)
  • Labels:
  • Environment:
    SunOS nodename 5.10 Generic_127111-06 sun4v sparc SUNW,Sun-Fire-T1000
  • Zabbix ID:
    RTF

Description

This bug appears to be similar to ZBX-2634.

$ CC=gcc CFLAGS=-O2 ./configure --prefix="/tmp/zabbix/agent" --enable-agent --enable-ipv6
$ make
$ make install

 23454:20120707:154526.434 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.1 (revision 28455).
 23455:20120707:154526.439 agent #0 started [collector]
 23456:20120707:154526.441 agent #1 started [listener]
 23457:20120707:154526.442 agent #2 started [listener]
 23458:20120707:154526.444 agent #3 started [listener]
 23455:20120707:154526.458 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ...
 23455:20120707:154526.458 ====== Fatal information: ======
 23455:20120707:154526.459 program counter not available for this architecture
 23455:20120707:154526.459 === Registers: ===
 23455:20120707:154526.459 register dump not available for this architecture
 23455:20120707:154526.460 === Backtrace: ===
 23455:20120707:154526.460 backtrace not available for this platform
 23455:20120707:154526.460 === Memory map: ===
 23455:20120707:154526.460 memory map not available for this platform
 23455:20120707:154526.461 ================================
 23454:20120707:154526.464 One child process died (PID:23455,exitcode/signal:-1). Exiting ...
 23454:20120707:154528.471 Zabbix Agent stopped. Zabbix 2.0.1 (revision 28455).
  1. zabbix_agentd_truss_output.txt
    2012 Aug 01 06:55
    17 kB
    Romeo Theriault
  2. zabbix_agentd_truss-f.log
    2012 Aug 13 14:54
    52 kB
    Pieter Vandevoorde
  3. zabbix-2.0.x-solaris10-SIGBUS-crash-ZBX-5289-structforcealign1.patch
    2012 Sep 24 05:33
    0.4 kB
    Jairo Eduardo Lopez Fuentes Nacarino
  4. zabbix-2.0.x-solaris10-SIGBUS-crash-ZBX-5289-structforcealign2.patch
    2012 Sep 24 05:33
    0.4 kB
    Jairo Eduardo Lopez Fuentes Nacarino
  5. zabbix-2.0.x-solaris10-SIGBUS-crash-ZBX-5289-structpad.patch
    2012 Sep 24 05:33
    0.6 kB
    Jairo Eduardo Lopez Fuentes Nacarino

Issue Links

Activity

Hide
Bruce Misc added a comment -

I should have included debug level log data.

23106:20120708:080050.609 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.1 (revision 28455).
23106:20120708:080050.612 In init_collector_data()
23106:20120708:080050.613 End of init_collector_data()
23107:20120708:080050.615 agent #0 started [collector]
23107:20120708:080050.616 In init_cpu_collector()
23108:20120708:080050.617 agent #1 started [listener]
23109:20120708:080050.618 agent #2 started [listener]
23110:20120708:080050.620 agent #3 started [listener]
23107:20120708:080050.630 End of init_cpu_collector():SUCCEED
23107:20120708:080050.630 In update_cpustats()
23107:20120708:080050.635 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ...
23107:20120708:080050.635 ====== Fatal information: ======
23107:20120708:080050.635 program counter not available for this architecture
23107:20120708:080050.636 === Registers: ===
23107:20120708:080050.636 register dump not available for this architecture
23107:20120708:080050.636 === Backtrace: ===
23107:20120708:080050.636 backtrace not available for this platform
23107:20120708:080050.637 === Memory map: ===
23107:20120708:080050.637 memory map not available for this platform
23107:20120708:080050.637 ================================
23106:20120708:080050.640 One child process died (PID:23107,exitcode/signal:-1). Exiting ...
23106:20120708:080050.641 zbx_on_exit() called
23108:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ...
23109:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ...
23110:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ...
23106:20120708:080052.648 Zabbix Agent stopped. Zabbix 2.0.1 (revision 28455).

Show
Bruce Misc added a comment - I should have included debug level log data. 23106:20120708:080050.609 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.1 (revision 28455). 23106:20120708:080050.612 In init_collector_data() 23106:20120708:080050.613 End of init_collector_data() 23107:20120708:080050.615 agent #0 started [collector] 23107:20120708:080050.616 In init_cpu_collector() 23108:20120708:080050.617 agent #1 started [listener] 23109:20120708:080050.618 agent #2 started [listener] 23110:20120708:080050.620 agent #3 started [listener] 23107:20120708:080050.630 End of init_cpu_collector():SUCCEED 23107:20120708:080050.630 In update_cpustats() 23107:20120708:080050.635 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ... 23107:20120708:080050.635 ====== Fatal information: ====== 23107:20120708:080050.635 program counter not available for this architecture 23107:20120708:080050.636 === Registers: === 23107:20120708:080050.636 register dump not available for this architecture 23107:20120708:080050.636 === Backtrace: === 23107:20120708:080050.636 backtrace not available for this platform 23107:20120708:080050.637 === Memory map: === 23107:20120708:080050.637 memory map not available for this platform 23107:20120708:080050.637 ================================ 23106:20120708:080050.640 One child process died (PID:23107,exitcode/signal:-1). Exiting ... 23106:20120708:080050.641 zbx_on_exit() called 23108:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ... 23109:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ... 23110:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ... 23106:20120708:080052.648 Zabbix Agent stopped. Zabbix 2.0.1 (revision 28455).
Hide
Romeo Theriault added a comment - - edited

I am also seeing the exact same issue on Solaris 9 with v.2.0.1. I've not tried on solaris 10 yet but I'm guessing from the above I'll see the same thing.

Show
Romeo Theriault added a comment - - edited I am also seeing the exact same issue on Solaris 9 with v.2.0.1. I've not tried on solaris 10 yet but I'm guessing from the above I'll see the same thing.
Hide
Romeo Theriault added a comment -

This is the output of truss on the zabbix_agentd daemon (v2.0.1) when trying to start on solaris 9.

Show
Romeo Theriault added a comment - This is the output of truss on the zabbix_agentd daemon (v2.0.1) when trying to start on solaris 9.
Hide
Tomasz Zielinski added a comment -

The same on 2.0.2 pls do somehting

Show
Tomasz Zielinski added a comment - The same on 2.0.2 pls do somehting
Hide
Alexei Vladishev added a comment -

Please try to test the latest nightly build and report back.

Show
Alexei Vladishev added a comment - Please try to test the latest nightly build and report back.
Hide
Romeo Theriault added a comment -

On Solaris 9 (sparc) I am still seeing the issu:

bash-2.05# uname -a
SunOS epf01 5.9 Generic_118558-13 sun4u sparc SUNW,Sun-Fire-V240
2624:20120907:113054.538 Starting Zabbix Agent [epf01]. Zabbix 2.0.3rc1 (revision 30147).
  2625:20120907:113054.539 agent #0 started [collector]
  2626:20120907:113054.540 agent #1 started [listener]
  2627:20120907:113054.542 agent #2 started [listener]
  2625:20120907:113054.543 Got signal [signal:10(SIGBUS),reason:1,refaddr:feebe4e4]. Crashing ...
  2625:20120907:113054.543 ====== Fatal information: ======
  2628:20120907:113054.543 agent #3 started [listener]
  2625:20120907:113054.544 program counter not available for this architecture
  2625:20120907:113054.544 === Registers: ===
  2625:20120907:113054.544 register dump not available for this architecture
  2625:20120907:113054.544 === Backtrace: ===
  2625:20120907:113054.544 backtrace not available for this platform
  2625:20120907:113054.544 === Memory map: ===
  2625:20120907:113054.544 memory map not available for this platform
  2625:20120907:113054.544 ================================
  2629:20120907:113054.545 agent #4 started [active checks]
  2624:20120907:113054.545 One child process died (PID:2625,exitcode/signal:-1). Exiting ...
  2624:20120907:113056.541 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).

I can test on solaris 10 (sparc) if you want.

Thanks.

Show
Romeo Theriault added a comment - On Solaris 9 (sparc) I am still seeing the issu:
bash-2.05# uname -a
SunOS epf01 5.9 Generic_118558-13 sun4u sparc SUNW,Sun-Fire-V240
2624:20120907:113054.538 Starting Zabbix Agent [epf01]. Zabbix 2.0.3rc1 (revision 30147).
  2625:20120907:113054.539 agent #0 started [collector]
  2626:20120907:113054.540 agent #1 started [listener]
  2627:20120907:113054.542 agent #2 started [listener]
  2625:20120907:113054.543 Got signal [signal:10(SIGBUS),reason:1,refaddr:feebe4e4]. Crashing ...
  2625:20120907:113054.543 ====== Fatal information: ======
  2628:20120907:113054.543 agent #3 started [listener]
  2625:20120907:113054.544 program counter not available for this architecture
  2625:20120907:113054.544 === Registers: ===
  2625:20120907:113054.544 register dump not available for this architecture
  2625:20120907:113054.544 === Backtrace: ===
  2625:20120907:113054.544 backtrace not available for this platform
  2625:20120907:113054.544 === Memory map: ===
  2625:20120907:113054.544 memory map not available for this platform
  2625:20120907:113054.544 ================================
  2629:20120907:113054.545 agent #4 started [active checks]
  2624:20120907:113054.545 One child process died (PID:2625,exitcode/signal:-1). Exiting ...
  2624:20120907:113056.541 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).
I can test on solaris 10 (sparc) if you want. Thanks.
Hide
Alexei Vladishev added a comment -

Please test on solaris 10. Thanks for your help.

Show
Alexei Vladishev added a comment - Please test on solaris 10. Thanks for your help.
Hide
Romeo Theriault added a comment - - edited

NP, glad I can help. The problem seems to be the same on Solaris 10 (sparc). See output below. I'll try to test this on solaris 10 (x64) later today and report back if this is just a sparc issue.

$ uname -a
SunOS t2k10 5.10 Generic_127111-03 sun4v sparc SUNW,Sun-Fire-T200
29562:20120910:101002.902 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.3rc1 (revision 30147). 29563:20120910:101002.906 agent #0 started [collector]
29564:20120910:101002.907 agent #1 started [listener] 
29565:20120910:101002.909 agent #2 started [listener] 
29566:20120910:101002.911 agent #3 started [listener] 
29567:20120910:101002.913 agent #4 started [active checks] 
29563:20120910:101002.927 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ... 29563:20120910:101002.927 ====== Fatal information: ====== 
29563:20120910:101002.927 program counter not available for this architecture 
29563:20120910:101002.927 === Registers: === 
29563:20120910:101002.927 register dump not available for this architecture 
29563:20120910:101002.927 === Backtrace: === 
29563:20120910:101002.928 backtrace not available for this platform 
29563:20120910:101002.928 === Memory map: === 
29563:20120910:101002.928 memory map not available for this platform 
29563:20120910:101002.928 ================================ 
29562:20120910:101003.270 One child process died (PID:29563,exitcode/signal:-1). Exiting ... 29562:20120910:101005.275 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).
Show
Romeo Theriault added a comment - - edited NP, glad I can help. The problem seems to be the same on Solaris 10 (sparc). See output below. I'll try to test this on solaris 10 (x64) later today and report back if this is just a sparc issue.
$ uname -a
SunOS t2k10 5.10 Generic_127111-03 sun4v sparc SUNW,Sun-Fire-T200
29562:20120910:101002.902 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.3rc1 (revision 30147). 29563:20120910:101002.906 agent #0 started [collector]
29564:20120910:101002.907 agent #1 started [listener] 
29565:20120910:101002.909 agent #2 started [listener] 
29566:20120910:101002.911 agent #3 started [listener] 
29567:20120910:101002.913 agent #4 started [active checks] 
29563:20120910:101002.927 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ... 29563:20120910:101002.927 ====== Fatal information: ====== 
29563:20120910:101002.927 program counter not available for this architecture 
29563:20120910:101002.927 === Registers: === 
29563:20120910:101002.927 register dump not available for this architecture 
29563:20120910:101002.927 === Backtrace: === 
29563:20120910:101002.928 backtrace not available for this platform 
29563:20120910:101002.928 === Memory map: === 
29563:20120910:101002.928 memory map not available for this platform 
29563:20120910:101002.928 ================================ 
29562:20120910:101003.270 One child process died (PID:29563,exitcode/signal:-1). Exiting ... 29562:20120910:101005.275 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).
Hide
Romeo Theriault added a comment -

I tested this version on solaris 10 x86 (64bit) and it works fine. Starts up and runs without problems. This is the first time I test on solaris x86 though so it may have worked fine with earlier versions as well. It seems this is an issue with sparc arch only (for solaris anyway).

Show
Romeo Theriault added a comment - I tested this version on solaris 10 x86 (64bit) and it works fine. Starts up and runs without problems. This is the first time I test on solaris x86 though so it may have worked fine with earlier versions as well. It seems this is an issue with sparc arch only (for solaris anyway).
Hide
Romeo Theriault added a comment -

If there is anything else I can do to help move this ticket along please let me know. We'd love to be able to upgrade our zabbix agents on solaris to 2.x.

Thanks!

Show
Romeo Theriault added a comment - If there is anything else I can do to help move this ticket along please let me know. We'd love to be able to upgrade our zabbix agents on solaris to 2.x. Thanks!
Hide
Romeo Theriault added a comment -

Was playing around with this a bit more and found how to get it to run without segfaulting. By default on my solaris sparc boxes the default compiler flags (I'm using gcc 3.4.2) picked up are "-g -02". (debugging and optimizing the code). I found that if I over-ride these with:

export CFLAGS=""; ./configure --enable-agent

the resulting binary builds and runs fine. I've not yet narrowed it down to see if it's the debugging or the code optimization feature which is causing the segfault. I'll play with it more later today and report back.

Show
Romeo Theriault added a comment - Was playing around with this a bit more and found how to get it to run without segfaulting. By default on my solaris sparc boxes the default compiler flags (I'm using gcc 3.4.2) picked up are "-g -02". (debugging and optimizing the code). I found that if I over-ride these with: export CFLAGS=""; ./configure --enable-agent the resulting binary builds and runs fine. I've not yet narrowed it down to see if it's the debugging or the code optimization feature which is causing the segfault. I'll play with it more later today and report back.
Hide
Romeo Theriault added a comment -

This appears to be related to the compiler optimizations. When I build with just the '-O2' compiler flag I still get the segfault. I tried building with '-O1' compiler flag, less optimizations, I still get the segfault. When I remove the compiler optimization flags the resulting binary seems to work fine.

Is building without the compiler optimizations a reasonable workaround at this point? How much is the lack of these optimizations likely to affect the speed of the agent?

Thanks

Show
Romeo Theriault added a comment - This appears to be related to the compiler optimizations. When I build with just the '-O2' compiler flag I still get the segfault. I tried building with '-O1' compiler flag, less optimizations, I still get the segfault. When I remove the compiler optimization flags the resulting binary seems to work fine. Is building without the compiler optimizations a reasonable workaround at this point? How much is the lack of these optimizations likely to affect the speed of the agent? Thanks
Hide
Romeo Theriault added a comment -

I also just tested this with Sun's 'cc' compiler which used the following compiler flags:

CFLAGS="-xO3 -m32 -xarch=v8"

and the resulting binary works fine. So it looks like this is specific to something with gcc's optimizations. Not sure if there are any other options to pass to gcc that might get it to work but I think for my own purposes I'm going to go ahead and use sun's c compiler to build my agent binaries.

Show
Romeo Theriault added a comment - I also just tested this with Sun's 'cc' compiler which used the following compiler flags:
CFLAGS="-xO3 -m32 -xarch=v8"
and the resulting binary works fine. So it looks like this is specific to something with gcc's optimizations. Not sure if there are any other options to pass to gcc that might get it to work but I think for my own purposes I'm going to go ahead and use sun's c compiler to build my agent binaries.
Hide
Jairo Eduardo Lopez Fuentes Nacarino added a comment - - edited

Hello all,

I've been working on this bug as I have parties interested on the Zabbix agent working on Solaris 10.

I have been able to replicate all the issues posted on the board, crashing agent using gcc optimization with all optimization levels, working agent compiling with gcc and the -g flag and the successful compilation of the Zabbix agent with Oracle/Sun's cc compiler with any optimization level, all exclusively on the SPARC architecture with the Zabbix agent source code included in version 2.0.3rc1.

I have been working on Solaris 10 10/08 s10s_u6wos_07b for SPARC, using gcc 3.4.3 (csl-sol210-3_4-branch+sol_rpath) on a Sun Fire V120 with a UltraSPARC-IIe 648MHz processor.

The error seems to be formed when the SPARC processor tries to use the std instruction, which is a double word store, when updating structs, specifically in the update_cpu_counter function of src/zabbix_agent/cpustat.c. The offending structure seems to be the ZBX_COLLECTOR_DATA struct defined in src/zabbix_agent/stats.h which is not memory aligned for the SPARC architecture.

When the agent is compiled without modifications the struct size ZBX_COLLECTOR_DATA is 12, which is what creates the SIGBUS when the std instruction is used.

We have been able to apparently fix the issue using two methods, both of which we do not consider particularly pretty. We can pad the ZBX_COLLECTOR_DATA struct to get to a size 16, be it by a char between the ZBX_CPUS_STAT_DATA struct and the diskstat_shmid int or any other size 4 variable of choice. We can also force the gcc compiler to align the ZBX_COLLECTOR_DATA struct to 8 bytes using __attribute__((aligned(8))). We found that forcing the alignment on the ZBX_SINGLE_CPU_STAT_DATA struct and ZBX_CPUS_STAT_DATA struct also forces the alignment of the ZBX_COLLECTOR_DATA struct.

The issue might be resolved if we provided a simple memory alignment check before getting the shared memory for the agent, specifically in the function zbx_shmget defined in src/libs/zbxnix/ipc.c.

Since changing any memory alignment has implications depending on the architecture used, I have no real idea as to which way would be best. I am submitting my current workaround patches to help find a much nicer solution.

I thank everyone for their time and hope to get feedback.

Show
Jairo Eduardo Lopez Fuentes Nacarino added a comment - - edited Hello all, I've been working on this bug as I have parties interested on the Zabbix agent working on Solaris 10. I have been able to replicate all the issues posted on the board, crashing agent using gcc optimization with all optimization levels, working agent compiling with gcc and the -g flag and the successful compilation of the Zabbix agent with Oracle/Sun's cc compiler with any optimization level, all exclusively on the SPARC architecture with the Zabbix agent source code included in version 2.0.3rc1. I have been working on Solaris 10 10/08 s10s_u6wos_07b for SPARC, using gcc 3.4.3 (csl-sol210-3_4-branch+sol_rpath) on a Sun Fire V120 with a UltraSPARC-IIe 648MHz processor. The error seems to be formed when the SPARC processor tries to use the std instruction, which is a double word store, when updating structs, specifically in the update_cpu_counter function of src/zabbix_agent/cpustat.c. The offending structure seems to be the ZBX_COLLECTOR_DATA struct defined in src/zabbix_agent/stats.h which is not memory aligned for the SPARC architecture. When the agent is compiled without modifications the struct size ZBX_COLLECTOR_DATA is 12, which is what creates the SIGBUS when the std instruction is used. We have been able to apparently fix the issue using two methods, both of which we do not consider particularly pretty. We can pad the ZBX_COLLECTOR_DATA struct to get to a size 16, be it by a char between the ZBX_CPUS_STAT_DATA struct and the diskstat_shmid int or any other size 4 variable of choice. We can also force the gcc compiler to align the ZBX_COLLECTOR_DATA struct to 8 bytes using __attribute__((aligned(8))). We found that forcing the alignment on the ZBX_SINGLE_CPU_STAT_DATA struct and ZBX_CPUS_STAT_DATA struct also forces the alignment of the ZBX_COLLECTOR_DATA struct. The issue might be resolved if we provided a simple memory alignment check before getting the shared memory for the agent, specifically in the function zbx_shmget defined in src/libs/zbxnix/ipc.c. Since changing any memory alignment has implications depending on the architecture used, I have no real idea as to which way would be best. I am submitting my current workaround patches to help find a much nicer solution. I thank everyone for their time and hope to get feedback.
Hide
richlv added a comment -

just a non-dev thinking out loud - shouldn't gcc avoid optimisations that result in crashes ?

Show
richlv added a comment - just a non-dev thinking out loud - shouldn't gcc avoid optimisations that result in crashes ?
Hide
Takanori Suzuki added a comment -

> shouldn't gcc avoid optimisations that result in crashes ?
No.
It's definitely a memory alignment problem.
The avoiding crash by changing optimization is just a lucky.
Because it's a undefined specification behavior in C.
Changing optimization is not a solution.

In SPARC, C developers have to take care memory alignment problem.
Because unaligned memory access cause crash in SPARC.
Original structure ZBX_COLLECTOR_DATA is not taken care of memory alignment.

In SPARC, if there is SIGBUS crash, we should think about memory alignment problem.
x86 CPU doesn't crash, because the CPU specification allows unaligned memory access.

Show
Takanori Suzuki added a comment - > shouldn't gcc avoid optimisations that result in crashes ? No. It's definitely a memory alignment problem. The avoiding crash by changing optimization is just a lucky. Because it's a undefined specification behavior in C. Changing optimization is not a solution. In SPARC, C developers have to take care memory alignment problem. Because unaligned memory access cause crash in SPARC. Original structure ZBX_COLLECTOR_DATA is not taken care of memory alignment. In SPARC, if there is SIGBUS crash, we should think about memory alignment problem. x86 CPU doesn't crash, because the CPU specification allows unaligned memory access.
Hide
richlv added a comment -

ah, cool, thanks for the info

Show
richlv added a comment - ah, cool, thanks for the info
Hide
Romeo Theriault added a comment -

Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them?

Show
Romeo Theriault added a comment - Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them?
Hide
Takanori Suzuki added a comment -

> Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them?
It's also just a lucky.
Original structure ZBX_COLLECTOR_DATA has possibilities to become 12 byte in some compiler.
So, some compiler like Sun's 'cc' doesn't crash, and some other compiler like gcc crashes.
We have to add pad to the structure to eliminate the possibilities in all compiler to avoid the crash.

I think programs should not depend on particular compiler specification.

Show
Takanori Suzuki added a comment - > Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them? It's also just a lucky. Original structure ZBX_COLLECTOR_DATA has possibilities to become 12 byte in some compiler. So, some compiler like Sun's 'cc' doesn't crash, and some other compiler like gcc crashes. We have to add pad to the structure to eliminate the possibilities in all compiler to avoid the crash. I think programs should not depend on particular compiler specification.
Hide
Jairo Eduardo Lopez Fuentes Nacarino added a comment -

The interesting thing is that the cc compiler doesn't use the SPARC std instruction for the offending function. That is just how the compiler has been designed.

By default Sun's cc compiler assumes at most an 8 byte alignment and raises a SIGBUS signal if the program tries to access misaligned data.

You can force the cc compiler to interpret the access to misaligned data while assuming at most an 8 byte alignment using the -xmemalign=8i flag but that is forcing the compiler to use information provided by the user.

This is actually equivalent to using __attribute__((aligned(8))) when defining the structs, since the macros involved are specifically for the gcc compiler.

I agree with Takanori that the error not being produced by Sun's cc compiler is mostly luck and think it would be nice to have a solution that is not compiler specific.

Show
Jairo Eduardo Lopez Fuentes Nacarino added a comment - The interesting thing is that the cc compiler doesn't use the SPARC std instruction for the offending function. That is just how the compiler has been designed. By default Sun's cc compiler assumes at most an 8 byte alignment and raises a SIGBUS signal if the program tries to access misaligned data. You can force the cc compiler to interpret the access to misaligned data while assuming at most an 8 byte alignment using the -xmemalign=8i flag but that is forcing the compiler to use information provided by the user. This is actually equivalent to using __attribute__((aligned(8))) when defining the structs, since the macros involved are specifically for the gcc compiler. I agree with Takanori that the error not being produced by Sun's cc compiler is mostly luck and think it would be nice to have a solution that is not compiler specific.
Hide
Arli added a comment - - edited

I encountered the same thing when trying to start 2.0.3 agent on HP-UX B.11.23, B.11.23.0812.076, compiled with cc.

 1394:20121004:134153.446 Starting Zabbix Agent [myserver.mydomain]. Zabbix 2.0.3 (revision 30485).
  1395:20121004:134153.450 agent #0 started [collector]
  1395:20121004:134153.450 Got signal [signal:10(SIGBUS),reason:1,refaddr:c2ec000c]. Crashing ...
  1395:20121004:134153.450 ====== Fatal information: ======
  1395:20121004:134153.450 program counter not available for this architecture
  1395:20121004:134153.450 === Registers: ===
  1395:20121004:134153.450 register dump not available for this architecture
  1395:20121004:134153.450 === Backtrace: ===
  1395:20121004:134153.450 backtrace not available for this platform
  1395:20121004:134153.450 === Memory map: ===
  1395:20121004:134153.450 memory map not available for this platform
  1395:20121004:134153.450 ================================
  1394:20121004:134153.451 One child process died (PID:1395,exitcode/signal:-1). Exiting ...
  1394:20121004:134155.459 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).
Show
Arli added a comment - - edited I encountered the same thing when trying to start 2.0.3 agent on HP-UX B.11.23, B.11.23.0812.076, compiled with cc.
 1394:20121004:134153.446 Starting Zabbix Agent [myserver.mydomain]. Zabbix 2.0.3 (revision 30485).
  1395:20121004:134153.450 agent #0 started [collector]
  1395:20121004:134153.450 Got signal [signal:10(SIGBUS),reason:1,refaddr:c2ec000c]. Crashing ...
  1395:20121004:134153.450 ====== Fatal information: ======
  1395:20121004:134153.450 program counter not available for this architecture
  1395:20121004:134153.450 === Registers: ===
  1395:20121004:134153.450 register dump not available for this architecture
  1395:20121004:134153.450 === Backtrace: ===
  1395:20121004:134153.450 backtrace not available for this platform
  1395:20121004:134153.450 === Memory map: ===
  1395:20121004:134153.450 memory map not available for this platform
  1395:20121004:134153.450 ================================
  1394:20121004:134153.451 One child process died (PID:1395,exitcode/signal:-1). Exiting ...
  1394:20121004:134155.459 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).
Hide
Oleksiy Zagorskyi added a comment - - edited

ZBX-5382 looks like very related, linked to be good noticeable.

Show
Oleksiy Zagorskyi added a comment - - edited ZBX-5382 looks like very related, linked to be good noticeable.
Hide
Jeff Shingara added a comment -

Still encountering this issue on Solaris10 with 2.0.3 agent

$
19965:20121022:092855.664 Starting Zabbix Agent [xxxxxxxx]. Zabbix 2.0.3 (revision 30485).
19966:20121022:092855.666 agent #0 started [collector]
19968:20121022:092855.666 agent #2 started [listener]
19967:20121022:092855.666 agent #1 started [listener]
19969:20121022:092855.667 agent #3 started [listener]
19970:20121022:092855.668 agent #4 started [listener]
19972:20121022:092855.669 agent #6 started [active checks]
19971:20121022:092855.668 agent #5 started [listener]
19966:20121022:092855.673 Got signal [signal:10(SIGBUS),reason:1,refaddr:fed0e4e4]. Crashing ...
19966:20121022:092855.673 ====== Fatal information: ======
19966:20121022:092855.673 program counter not available for this architecture
19966:20121022:092855.673 === Registers: ===
19966:20121022:092855.673 register dump not available for this architecture
19966:20121022:092855.673 === Backtrace: ===
19966:20121022:092855.673 backtrace not available for this platform
19966:20121022:092855.673 === Memory map: ===
19966:20121022:092855.674 memory map not available for this platform
19966:20121022:092855.674 ================================
19965:20121022:092855.675 One child process died (PID:19966,exitcode/signal:-1). Exiting ...
19965:20121022:092857.675 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).

$ uname -a
SunOS 5.10 Generic_147440-19 sun4u sparc SUNW,SPARC-Enterprise

Show
Jeff Shingara added a comment - Still encountering this issue on Solaris10 with 2.0.3 agent $ 19965:20121022:092855.664 Starting Zabbix Agent [xxxxxxxx]. Zabbix 2.0.3 (revision 30485). 19966:20121022:092855.666 agent #0 started [collector] 19968:20121022:092855.666 agent #2 started [listener] 19967:20121022:092855.666 agent #1 started [listener] 19969:20121022:092855.667 agent #3 started [listener] 19970:20121022:092855.668 agent #4 started [listener] 19972:20121022:092855.669 agent #6 started [active checks] 19971:20121022:092855.668 agent #5 started [listener] 19966:20121022:092855.673 Got signal [signal:10(SIGBUS),reason:1,refaddr:fed0e4e4]. Crashing ... 19966:20121022:092855.673 ====== Fatal information: ====== 19966:20121022:092855.673 program counter not available for this architecture 19966:20121022:092855.673 === Registers: === 19966:20121022:092855.673 register dump not available for this architecture 19966:20121022:092855.673 === Backtrace: === 19966:20121022:092855.673 backtrace not available for this platform 19966:20121022:092855.673 === Memory map: === 19966:20121022:092855.674 memory map not available for this platform 19966:20121022:092855.674 ================================ 19965:20121022:092855.675 One child process died (PID:19966,exitcode/signal:-1). Exiting ... 19965:20121022:092857.675 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485). $ uname -a SunOS 5.10 Generic_147440-19 sun4u sparc SUNW,SPARC-Enterprise
Hide
Alexander Vladishev added a comment -

Similar issue: ZBX-5741

Show
Alexander Vladishev added a comment - Similar issue: ZBX-5741
Hide
Andris Mednis added a comment -

Thanks for valuable comments and special thanks to Jairo and Takanori for explaining the root cause and proposing solution!
At http://bytes.com/topic/c/answers/587942-isnt-time-there-standard-align-statement the commenter <artifact one at googlemail com> shows that memory alignment directives differ between GCC, Sun C, Intel C, HP C and IBM XL compilers. It seems better to avoid compiler vendor-specific syntax in Zabbix codebase.
I'm working on a solution where required padding for 8-byte alignment is included into Zabbix agent data structures.

Show
Andris Mednis added a comment - Thanks for valuable comments and special thanks to Jairo and Takanori for explaining the root cause and proposing solution! At http://bytes.com/topic/c/answers/587942-isnt-time-there-standard-align-statement the commenter <artifact one at googlemail com> shows that memory alignment directives differ between GCC, Sun C, Intel C, HP C and IBM XL compilers. It seems better to avoid compiler vendor-specific syntax in Zabbix codebase. I'm working on a solution where required padding for 8-byte alignment is included into Zabbix agent data structures.
Hide
Andris Mednis added a comment -

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5289

Show
Andris Mednis added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5289
Hide
Alexander Vladishev added a comment -

Great work! Successfully tested.

Show
Alexander Vladishev added a comment - Great work! Successfully tested.
Hide
Andris Mednis added a comment -

Fixed in versions pre-2.0.4 rev. 31309 and pre-2.1.0 rev. 31312.

Show
Andris Mednis added a comment - Fixed in versions pre-2.0.4 rev. 31309 and pre-2.1.0 rev. 31312.
Hide
Paul Surgeon added a comment -

I can confirm that the fix also works for HP-UX 11.31 on Itanium2 using GCC.
I'm using the zabbix-2.0.4rc1 pre-release.

Show
Paul Surgeon added a comment - I can confirm that the fix also works for HP-UX 11.31 on Itanium2 using GCC. I'm using the zabbix-2.0.4rc1 pre-release.
Hide
Andris Mednis added a comment -

Thanks, Paul!
Yesterday zabbix-2.0.4rc1 was released.

Show
Andris Mednis added a comment - Thanks, Paul! Yesterday zabbix-2.0.4rc1 was released.
Hide
Gene Liverman added a comment -

Has anyone by chance already made some pre-compiled agents for Solaris 9 / 10 SPARC with the 2.0.4rc1 code?

Show
Gene Liverman added a comment - Has anyone by chance already made some pre-compiled agents for Solaris 9 / 10 SPARC with the 2.0.4rc1 code?
Hide
Andris Mednis added a comment -

Version 2.0.4 was released on Dec 8. Pre-compiled agents for Solaris are expected in few days at http://www.zabbix.com/download.php

Show
Andris Mednis added a comment - Version 2.0.4 was released on Dec 8. Pre-compiled agents for Solaris are expected in few days at http://www.zabbix.com/download.php
Hide
Andris Mednis added a comment -

Version 2.0.4 pre-compiled agents are at http://www.zabbix.com/download.php

Show
Andris Mednis added a comment - Version 2.0.4 pre-compiled agents are at http://www.zabbix.com/download.php

People

Vote (5)
Watch (11)

Dates

  • Created:
    Updated:
    Resolved: