ZABBIX BUGS AND ISSUES
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-5289

2.0.1 agent on Solaris 10 throws "Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ..."

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.4rc1, 2.1.0
    • Component/s: Agent (G)
    • Labels:
    • Environment:
      SunOS nodename 5.10 Generic_127111-06 sun4v sparc SUNW,Sun-Fire-T1000

      Description

      This bug appears to be similar to ZBX-2634.

      $ CC=gcc CFLAGS=-O2 ./configure --prefix="/tmp/zabbix/agent" --enable-agent --enable-ipv6
      $ make
      $ make install

      23454:20120707:154526.434 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.1 (revision 28455).
      23455:20120707:154526.439 agent #0 started [collector]
      23456:20120707:154526.441 agent #1 started [listener]
      23457:20120707:154526.442 agent #2 started [listener]
      23458:20120707:154526.444 agent #3 started [listener]
      23455:20120707:154526.458 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ...
      23455:20120707:154526.458 ====== Fatal information: ======
      23455:20120707:154526.459 program counter not available for this architecture
      23455:20120707:154526.459 === Registers: ===
      23455:20120707:154526.459 register dump not available for this architecture
      23455:20120707:154526.460 === Backtrace: ===
      23455:20120707:154526.460 backtrace not available for this platform
      23455:20120707:154526.460 === Memory map: ===
      23455:20120707:154526.460 memory map not available for this platform
      23455:20120707:154526.461 ================================
      23454:20120707:154526.464 One child process died (PID:23455,exitcode/signal:-1). Exiting ...
      23454:20120707:154528.471 Zabbix Agent stopped. Zabbix 2.0.1 (revision 28455).

      1. zabbix_agentd_truss_output.txt
        17 kB
        Romeo Theriault
      2. zabbix_agentd_truss-f.log
        52 kB
        Pieter Vandevoorde
      3. zabbix-2.0.x-solaris10-SIGBUS-crash-ZBX-5289-structforcealign1.patch
        0.4 kB
        Jairo Eduardo Lopez Fuentes Nacarino
      4. zabbix-2.0.x-solaris10-SIGBUS-crash-ZBX-5289-structforcealign2.patch
        0.4 kB
        Jairo Eduardo Lopez Fuentes Nacarino
      5. zabbix-2.0.x-solaris10-SIGBUS-crash-ZBX-5289-structpad.patch
        0.6 kB
        Jairo Eduardo Lopez Fuentes Nacarino

        Issue Links

          Activity

          Hide
          Bruce Misc added a comment -

          I should have included debug level log data.

          23106:20120708:080050.609 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.1 (revision 28455).
          23106:20120708:080050.612 In init_collector_data()
          23106:20120708:080050.613 End of init_collector_data()
          23107:20120708:080050.615 agent #0 started [collector]
          23107:20120708:080050.616 In init_cpu_collector()
          23108:20120708:080050.617 agent #1 started [listener]
          23109:20120708:080050.618 agent #2 started [listener]
          23110:20120708:080050.620 agent #3 started [listener]
          23107:20120708:080050.630 End of init_cpu_collector():SUCCEED
          23107:20120708:080050.630 In update_cpustats()
          23107:20120708:080050.635 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ...
          23107:20120708:080050.635 ====== Fatal information: ======
          23107:20120708:080050.635 program counter not available for this architecture
          23107:20120708:080050.636 === Registers: ===
          23107:20120708:080050.636 register dump not available for this architecture
          23107:20120708:080050.636 === Backtrace: ===
          23107:20120708:080050.636 backtrace not available for this platform
          23107:20120708:080050.637 === Memory map: ===
          23107:20120708:080050.637 memory map not available for this platform
          23107:20120708:080050.637 ================================
          23106:20120708:080050.640 One child process died (PID:23107,exitcode/signal:-1). Exiting ...
          23106:20120708:080050.641 zbx_on_exit() called
          23108:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ...
          23109:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ...
          23110:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0]. Exiting ...
          23106:20120708:080052.648 Zabbix Agent stopped. Zabbix 2.0.1 (revision 28455).

          Show
          Bruce Misc added a comment - I should have included debug level log data. 23106:20120708:080050.609 Starting Zabbix Agent [Zabbix server] . Zabbix 2.0.1 (revision 28455). 23106:20120708:080050.612 In init_collector_data() 23106:20120708:080050.613 End of init_collector_data() 23107:20120708:080050.615 agent #0 started [collector] 23107:20120708:080050.616 In init_cpu_collector() 23108:20120708:080050.617 agent #1 started [listener] 23109:20120708:080050.618 agent #2 started [listener] 23110:20120708:080050.620 agent #3 started [listener] 23107:20120708:080050.630 End of init_cpu_collector():SUCCEED 23107:20120708:080050.630 In update_cpustats() 23107:20120708:080050.635 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4] . Crashing ... 23107:20120708:080050.635 ====== Fatal information: ====== 23107:20120708:080050.635 program counter not available for this architecture 23107:20120708:080050.636 === Registers: === 23107:20120708:080050.636 register dump not available for this architecture 23107:20120708:080050.636 === Backtrace: === 23107:20120708:080050.636 backtrace not available for this platform 23107:20120708:080050.637 === Memory map: === 23107:20120708:080050.637 memory map not available for this platform 23107:20120708:080050.637 ================================ 23106:20120708:080050.640 One child process died (PID:23107,exitcode/signal:-1). Exiting ... 23106:20120708:080050.641 zbx_on_exit() called 23108:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0] . Exiting ... 23109:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0] . Exiting ... 23110:20120708:080050.641 Got signal [signal:15(SIGTERM),sender_pid:23106,sender_uid:10098,reason:0] . Exiting ... 23106:20120708:080052.648 Zabbix Agent stopped. Zabbix 2.0.1 (revision 28455).
          Hide
          Romeo Theriault added a comment - - edited

          I am also seeing the exact same issue on Solaris 9 with v.2.0.1. I've not tried on solaris 10 yet but I'm guessing from the above I'll see the same thing.

          Show
          Romeo Theriault added a comment - - edited I am also seeing the exact same issue on Solaris 9 with v.2.0.1. I've not tried on solaris 10 yet but I'm guessing from the above I'll see the same thing.
          Hide
          Romeo Theriault added a comment -

          This is the output of truss on the zabbix_agentd daemon (v2.0.1) when trying to start on solaris 9.

          Show
          Romeo Theriault added a comment - This is the output of truss on the zabbix_agentd daemon (v2.0.1) when trying to start on solaris 9.
          Hide
          Tomasz Zielinski added a comment -

          The same on 2.0.2 pls do somehting

          Show
          Tomasz Zielinski added a comment - The same on 2.0.2 pls do somehting
          Hide
          Alexei Vladishev added a comment -

          Please try to test the latest nightly build and report back.

          Show
          Alexei Vladishev added a comment - Please try to test the latest nightly build and report back.
          Hide
          Romeo Theriault added a comment -

          On Solaris 9 (sparc) I am still seeing the issu:

          bash-2.05# uname -a
          SunOS epf01 5.9 Generic_118558-13 sun4u sparc SUNW,Sun-Fire-V240
          
            2624:20120907:113054.538 Starting Zabbix Agent [epf01]. Zabbix 2.0.3rc1 (revision 30147).
            2625:20120907:113054.539 agent #0 started [collector]
            2626:20120907:113054.540 agent #1 started [listener]
            2627:20120907:113054.542 agent #2 started [listener]
            2625:20120907:113054.543 Got signal [signal:10(SIGBUS),reason:1,refaddr:feebe4e4]. Crashing ...
            2625:20120907:113054.543 ====== Fatal information: ======
            2628:20120907:113054.543 agent #3 started [listener]
            2625:20120907:113054.544 program counter not available for this architecture
            2625:20120907:113054.544 === Registers: ===
            2625:20120907:113054.544 register dump not available for this architecture
            2625:20120907:113054.544 === Backtrace: ===
            2625:20120907:113054.544 backtrace not available for this platform
            2625:20120907:113054.544 === Memory map: ===
            2625:20120907:113054.544 memory map not available for this platform
            2625:20120907:113054.544 ================================
            2629:20120907:113054.545 agent #4 started [active checks]
            2624:20120907:113054.545 One child process died (PID:2625,exitcode/signal:-1). Exiting ...
            2624:20120907:113056.541 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).
          

          I can test on solaris 10 (sparc) if you want.

          Thanks.

          Show
          Romeo Theriault added a comment - On Solaris 9 (sparc) I am still seeing the issu: bash-2.05# uname -a SunOS epf01 5.9 Generic_118558-13 sun4u sparc SUNW,Sun-Fire-V240 2624:20120907:113054.538 Starting Zabbix Agent [epf01]. Zabbix 2.0.3rc1 (revision 30147). 2625:20120907:113054.539 agent #0 started [collector] 2626:20120907:113054.540 agent #1 started [listener] 2627:20120907:113054.542 agent #2 started [listener] 2625:20120907:113054.543 Got signal [signal:10(SIGBUS),reason:1,refaddr:feebe4e4]. Crashing ... 2625:20120907:113054.543 ====== Fatal information: ====== 2628:20120907:113054.543 agent #3 started [listener] 2625:20120907:113054.544 program counter not available for this architecture 2625:20120907:113054.544 === Registers: === 2625:20120907:113054.544 register dump not available for this architecture 2625:20120907:113054.544 === Backtrace: === 2625:20120907:113054.544 backtrace not available for this platform 2625:20120907:113054.544 === Memory map: === 2625:20120907:113054.544 memory map not available for this platform 2625:20120907:113054.544 ================================ 2629:20120907:113054.545 agent #4 started [active checks] 2624:20120907:113054.545 One child process died (PID:2625,exitcode/signal:-1). Exiting ... 2624:20120907:113056.541 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147). I can test on solaris 10 (sparc) if you want. Thanks.
          Hide
          Alexei Vladishev added a comment -

          Please test on solaris 10. Thanks for your help.

          Show
          Alexei Vladishev added a comment - Please test on solaris 10. Thanks for your help.
          Hide
          Romeo Theriault added a comment - - edited

          NP, glad I can help. The problem seems to be the same on Solaris 10 (sparc). See output below. I'll try to test this on solaris 10 (x64) later today and report back if this is just a sparc issue.

          $ uname -a
          SunOS t2k10 5.10 Generic_127111-03 sun4v sparc SUNW,Sun-Fire-T200
          
          29562:20120910:101002.902 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.3rc1 (revision 30147). 29563:20120910:101002.906 agent #0 started [collector]
          29564:20120910:101002.907 agent #1 started [listener] 
          29565:20120910:101002.909 agent #2 started [listener] 
          29566:20120910:101002.911 agent #3 started [listener] 
          29567:20120910:101002.913 agent #4 started [active checks] 
          29563:20120910:101002.927 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ... 29563:20120910:101002.927 ====== Fatal information: ====== 
          29563:20120910:101002.927 program counter not available for this architecture 
          29563:20120910:101002.927 === Registers: === 
          29563:20120910:101002.927 register dump not available for this architecture 
          29563:20120910:101002.927 === Backtrace: === 
          29563:20120910:101002.928 backtrace not available for this platform 
          29563:20120910:101002.928 === Memory map: === 
          29563:20120910:101002.928 memory map not available for this platform 
          29563:20120910:101002.928 ================================ 
          29562:20120910:101003.270 One child process died (PID:29563,exitcode/signal:-1). Exiting ... 29562:20120910:101005.275 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).
          
          Show
          Romeo Theriault added a comment - - edited NP, glad I can help. The problem seems to be the same on Solaris 10 (sparc). See output below. I'll try to test this on solaris 10 (x64) later today and report back if this is just a sparc issue. $ uname -a SunOS t2k10 5.10 Generic_127111-03 sun4v sparc SUNW,Sun-Fire-T200 29562:20120910:101002.902 Starting Zabbix Agent [Zabbix server]. Zabbix 2.0.3rc1 (revision 30147). 29563:20120910:101002.906 agent #0 started [collector] 29564:20120910:101002.907 agent #1 started [listener] 29565:20120910:101002.909 agent #2 started [listener] 29566:20120910:101002.911 agent #3 started [listener] 29567:20120910:101002.913 agent #4 started [active checks] 29563:20120910:101002.927 Got signal [signal:10(SIGBUS),reason:1,refaddr:fec0e4e4]. Crashing ... 29563:20120910:101002.927 ====== Fatal information: ====== 29563:20120910:101002.927 program counter not available for this architecture 29563:20120910:101002.927 === Registers: === 29563:20120910:101002.927 register dump not available for this architecture 29563:20120910:101002.927 === Backtrace: === 29563:20120910:101002.928 backtrace not available for this platform 29563:20120910:101002.928 === Memory map: === 29563:20120910:101002.928 memory map not available for this platform 29563:20120910:101002.928 ================================ 29562:20120910:101003.270 One child process died (PID:29563,exitcode/signal:-1). Exiting ... 29562:20120910:101005.275 Zabbix Agent stopped. Zabbix 2.0.3rc1 (revision 30147).
          Hide
          Romeo Theriault added a comment -

          I tested this version on solaris 10 x86 (64bit) and it works fine. Starts up and runs without problems. This is the first time I test on solaris x86 though so it may have worked fine with earlier versions as well. It seems this is an issue with sparc arch only (for solaris anyway).

          Show
          Romeo Theriault added a comment - I tested this version on solaris 10 x86 (64bit) and it works fine. Starts up and runs without problems. This is the first time I test on solaris x86 though so it may have worked fine with earlier versions as well. It seems this is an issue with sparc arch only (for solaris anyway).
          Hide
          Romeo Theriault added a comment -

          If there is anything else I can do to help move this ticket along please let me know. We'd love to be able to upgrade our zabbix agents on solaris to 2.x.

          Thanks!

          Show
          Romeo Theriault added a comment - If there is anything else I can do to help move this ticket along please let me know. We'd love to be able to upgrade our zabbix agents on solaris to 2.x. Thanks!
          Hide
          Romeo Theriault added a comment -

          Was playing around with this a bit more and found how to get it to run without segfaulting. By default on my solaris sparc boxes the default compiler flags (I'm using gcc 3.4.2) picked up are "-g -02". (debugging and optimizing the code). I found that if I over-ride these with:

          export CFLAGS=""; ./configure --enable-agent

          the resulting binary builds and runs fine. I've not yet narrowed it down to see if it's the debugging or the code optimization feature which is causing the segfault. I'll play with it more later today and report back.

          Show
          Romeo Theriault added a comment - Was playing around with this a bit more and found how to get it to run without segfaulting. By default on my solaris sparc boxes the default compiler flags (I'm using gcc 3.4.2) picked up are "-g -02". (debugging and optimizing the code). I found that if I over-ride these with: export CFLAGS=""; ./configure --enable-agent the resulting binary builds and runs fine. I've not yet narrowed it down to see if it's the debugging or the code optimization feature which is causing the segfault. I'll play with it more later today and report back.
          Hide
          Romeo Theriault added a comment -

          This appears to be related to the compiler optimizations. When I build with just the '-O2' compiler flag I still get the segfault. I tried building with '-O1' compiler flag, less optimizations, I still get the segfault. When I remove the compiler optimization flags the resulting binary seems to work fine.

          Is building without the compiler optimizations a reasonable workaround at this point? How much is the lack of these optimizations likely to affect the speed of the agent?

          Thanks

          Show
          Romeo Theriault added a comment - This appears to be related to the compiler optimizations. When I build with just the '-O2' compiler flag I still get the segfault. I tried building with '-O1' compiler flag, less optimizations, I still get the segfault. When I remove the compiler optimization flags the resulting binary seems to work fine. Is building without the compiler optimizations a reasonable workaround at this point? How much is the lack of these optimizations likely to affect the speed of the agent? Thanks
          Hide
          Romeo Theriault added a comment -

          I also just tested this with Sun's 'cc' compiler which used the following compiler flags:

          CFLAGS="-xO3 -m32 -xarch=v8"
          

          and the resulting binary works fine. So it looks like this is specific to something with gcc's optimizations. Not sure if there are any other options to pass to gcc that might get it to work but I think for my own purposes I'm going to go ahead and use sun's c compiler to build my agent binaries.

          Show
          Romeo Theriault added a comment - I also just tested this with Sun's 'cc' compiler which used the following compiler flags: CFLAGS= "-xO3 -m32 -xarch=v8" and the resulting binary works fine. So it looks like this is specific to something with gcc's optimizations. Not sure if there are any other options to pass to gcc that might get it to work but I think for my own purposes I'm going to go ahead and use sun's c compiler to build my agent binaries.
          Hide
          Jairo Eduardo Lopez Fuentes Nacarino added a comment - - edited

          Hello all,

          I've been working on this bug as I have parties interested on the Zabbix agent working on Solaris 10.

          I have been able to replicate all the issues posted on the board, crashing agent using gcc optimization with all optimization levels, working agent compiling with gcc and the -g flag and the successful compilation of the Zabbix agent with Oracle/Sun's cc compiler with any optimization level, all exclusively on the SPARC architecture with the Zabbix agent source code included in version 2.0.3rc1.

          I have been working on Solaris 10 10/08 s10s_u6wos_07b for SPARC, using gcc 3.4.3 (csl-sol210-3_4-branch+sol_rpath) on a Sun Fire V120 with a UltraSPARC-IIe 648MHz processor.

          The error seems to be formed when the SPARC processor tries to use the std instruction, which is a double word store, when updating structs, specifically in the update_cpu_counter function of src/zabbix_agent/cpustat.c. The offending structure seems to be the ZBX_COLLECTOR_DATA struct defined in src/zabbix_agent/stats.h which is not memory aligned for the SPARC architecture.

          When the agent is compiled without modifications the struct size ZBX_COLLECTOR_DATA is 12, which is what creates the SIGBUS when the std instruction is used.

          We have been able to apparently fix the issue using two methods, both of which we do not consider particularly pretty. We can pad the ZBX_COLLECTOR_DATA struct to get to a size 16, be it by a char between the ZBX_CPUS_STAT_DATA struct and the diskstat_shmid int or any other size 4 variable of choice. We can also force the gcc compiler to align the ZBX_COLLECTOR_DATA struct to 8 bytes using __attribute__((aligned(8))). We found that forcing the alignment on the ZBX_SINGLE_CPU_STAT_DATA struct and ZBX_CPUS_STAT_DATA struct also forces the alignment of the ZBX_COLLECTOR_DATA struct.

          The issue might be resolved if we provided a simple memory alignment check before getting the shared memory for the agent, specifically in the function zbx_shmget defined in src/libs/zbxnix/ipc.c.

          Since changing any memory alignment has implications depending on the architecture used, I have no real idea as to which way would be best. I am submitting my current workaround patches to help find a much nicer solution.

          I thank everyone for their time and hope to get feedback.

          Show
          Jairo Eduardo Lopez Fuentes Nacarino added a comment - - edited Hello all, I've been working on this bug as I have parties interested on the Zabbix agent working on Solaris 10. I have been able to replicate all the issues posted on the board, crashing agent using gcc optimization with all optimization levels, working agent compiling with gcc and the -g flag and the successful compilation of the Zabbix agent with Oracle/Sun's cc compiler with any optimization level, all exclusively on the SPARC architecture with the Zabbix agent source code included in version 2.0.3rc1. I have been working on Solaris 10 10/08 s10s_u6wos_07b for SPARC, using gcc 3.4.3 (csl-sol210-3_4-branch+sol_rpath) on a Sun Fire V120 with a UltraSPARC-IIe 648MHz processor. The error seems to be formed when the SPARC processor tries to use the std instruction, which is a double word store, when updating structs, specifically in the update_cpu_counter function of src/zabbix_agent/cpustat.c. The offending structure seems to be the ZBX_COLLECTOR_DATA struct defined in src/zabbix_agent/stats.h which is not memory aligned for the SPARC architecture. When the agent is compiled without modifications the struct size ZBX_COLLECTOR_DATA is 12, which is what creates the SIGBUS when the std instruction is used. We have been able to apparently fix the issue using two methods, both of which we do not consider particularly pretty. We can pad the ZBX_COLLECTOR_DATA struct to get to a size 16, be it by a char between the ZBX_CPUS_STAT_DATA struct and the diskstat_shmid int or any other size 4 variable of choice. We can also force the gcc compiler to align the ZBX_COLLECTOR_DATA struct to 8 bytes using __attribute__((aligned(8))). We found that forcing the alignment on the ZBX_SINGLE_CPU_STAT_DATA struct and ZBX_CPUS_STAT_DATA struct also forces the alignment of the ZBX_COLLECTOR_DATA struct. The issue might be resolved if we provided a simple memory alignment check before getting the shared memory for the agent, specifically in the function zbx_shmget defined in src/libs/zbxnix/ipc.c. Since changing any memory alignment has implications depending on the architecture used, I have no real idea as to which way would be best. I am submitting my current workaround patches to help find a much nicer solution. I thank everyone for their time and hope to get feedback.
          Hide
          richlv added a comment -

          just a non-dev thinking out loud - shouldn't gcc avoid optimisations that result in crashes ?

          Show
          richlv added a comment - just a non-dev thinking out loud - shouldn't gcc avoid optimisations that result in crashes ?
          Hide
          Takanori Suzuki added a comment -

          > shouldn't gcc avoid optimisations that result in crashes ?
          No.
          It's definitely a memory alignment problem.
          The avoiding crash by changing optimization is just a lucky.
          Because it's a undefined specification behavior in C.
          Changing optimization is not a solution.

          In SPARC, C developers have to take care memory alignment problem.
          Because unaligned memory access cause crash in SPARC.
          Original structure ZBX_COLLECTOR_DATA is not taken care of memory alignment.

          In SPARC, if there is SIGBUS crash, we should think about memory alignment problem.
          x86 CPU doesn't crash, because the CPU specification allows unaligned memory access.

          Show
          Takanori Suzuki added a comment - > shouldn't gcc avoid optimisations that result in crashes ? No. It's definitely a memory alignment problem. The avoiding crash by changing optimization is just a lucky. Because it's a undefined specification behavior in C. Changing optimization is not a solution. In SPARC, C developers have to take care memory alignment problem. Because unaligned memory access cause crash in SPARC. Original structure ZBX_COLLECTOR_DATA is not taken care of memory alignment. In SPARC, if there is SIGBUS crash, we should think about memory alignment problem. x86 CPU doesn't crash, because the CPU specification allows unaligned memory access.
          Hide
          richlv added a comment -

          ah, cool, thanks for the info

          Show
          richlv added a comment - ah, cool, thanks for the info
          Hide
          Romeo Theriault added a comment -

          Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them?

          Show
          Romeo Theriault added a comment - Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them?
          Hide
          Takanori Suzuki added a comment -

          > Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them?
          It's also just a lucky.
          Original structure ZBX_COLLECTOR_DATA has possibilities to become 12 byte in some compiler.
          So, some compiler like Sun's 'cc' doesn't crash, and some other compiler like gcc crashes.
          We have to add pad to the structure to eliminate the possibilities in all compiler to avoid the crash.

          I think programs should not depend on particular compiler specification.

          Show
          Takanori Suzuki added a comment - > Out of interest Takanori, does it work with Sun's 'cc' compiler because cc automatically detects these memory alignment issues and pad them? It's also just a lucky. Original structure ZBX_COLLECTOR_DATA has possibilities to become 12 byte in some compiler. So, some compiler like Sun's 'cc' doesn't crash, and some other compiler like gcc crashes. We have to add pad to the structure to eliminate the possibilities in all compiler to avoid the crash. I think programs should not depend on particular compiler specification.
          Hide
          Jairo Eduardo Lopez Fuentes Nacarino added a comment -

          The interesting thing is that the cc compiler doesn't use the SPARC std instruction for the offending function. That is just how the compiler has been designed.

          By default Sun's cc compiler assumes at most an 8 byte alignment and raises a SIGBUS signal if the program tries to access misaligned data.

          You can force the cc compiler to interpret the access to misaligned data while assuming at most an 8 byte alignment using the -xmemalign=8i flag but that is forcing the compiler to use information provided by the user.

          This is actually equivalent to using __attribute__((aligned(8))) when defining the structs, since the macros involved are specifically for the gcc compiler.

          I agree with Takanori that the error not being produced by Sun's cc compiler is mostly luck and think it would be nice to have a solution that is not compiler specific.

          Show
          Jairo Eduardo Lopez Fuentes Nacarino added a comment - The interesting thing is that the cc compiler doesn't use the SPARC std instruction for the offending function. That is just how the compiler has been designed. By default Sun's cc compiler assumes at most an 8 byte alignment and raises a SIGBUS signal if the program tries to access misaligned data. You can force the cc compiler to interpret the access to misaligned data while assuming at most an 8 byte alignment using the -xmemalign=8i flag but that is forcing the compiler to use information provided by the user. This is actually equivalent to using __attribute__((aligned(8))) when defining the structs, since the macros involved are specifically for the gcc compiler. I agree with Takanori that the error not being produced by Sun's cc compiler is mostly luck and think it would be nice to have a solution that is not compiler specific.
          Hide
          Arli added a comment - - edited

          I encountered the same thing when trying to start 2.0.3 agent on HP-UX B.11.23, B.11.23.0812.076, compiled with cc.

           1394:20121004:134153.446 Starting Zabbix Agent [myserver.mydomain]. Zabbix 2.0.3 (revision 30485).
            1395:20121004:134153.450 agent #0 started [collector]
            1395:20121004:134153.450 Got signal [signal:10(SIGBUS),reason:1,refaddr:c2ec000c]. Crashing ...
            1395:20121004:134153.450 ====== Fatal information: ======
            1395:20121004:134153.450 program counter not available for this architecture
            1395:20121004:134153.450 === Registers: ===
            1395:20121004:134153.450 register dump not available for this architecture
            1395:20121004:134153.450 === Backtrace: ===
            1395:20121004:134153.450 backtrace not available for this platform
            1395:20121004:134153.450 === Memory map: ===
            1395:20121004:134153.450 memory map not available for this platform
            1395:20121004:134153.450 ================================
            1394:20121004:134153.451 One child process died (PID:1395,exitcode/signal:-1). Exiting ...
            1394:20121004:134155.459 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).
          
          Show
          Arli added a comment - - edited I encountered the same thing when trying to start 2.0.3 agent on HP-UX B.11.23, B.11.23.0812.076, compiled with cc. 1394:20121004:134153.446 Starting Zabbix Agent [myserver.mydomain]. Zabbix 2.0.3 (revision 30485). 1395:20121004:134153.450 agent #0 started [collector] 1395:20121004:134153.450 Got signal [signal:10(SIGBUS),reason:1,refaddr:c2ec000c]. Crashing ... 1395:20121004:134153.450 ====== Fatal information: ====== 1395:20121004:134153.450 program counter not available for this architecture 1395:20121004:134153.450 === Registers: === 1395:20121004:134153.450 register dump not available for this architecture 1395:20121004:134153.450 === Backtrace: === 1395:20121004:134153.450 backtrace not available for this platform 1395:20121004:134153.450 === Memory map: === 1395:20121004:134153.450 memory map not available for this platform 1395:20121004:134153.450 ================================ 1394:20121004:134153.451 One child process died (PID:1395,exitcode/signal:-1). Exiting ... 1394:20121004:134155.459 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).
          Hide
          Oleksiy Zagorskyi added a comment - - edited

          ZBX-5382 looks like very related, linked to be good noticeable.

          Show
          Oleksiy Zagorskyi added a comment - - edited ZBX-5382 looks like very related, linked to be good noticeable.
          Hide
          Jeff Shingara added a comment -

          Still encountering this issue on Solaris10 with 2.0.3 agent

          $
          19965:20121022:092855.664 Starting Zabbix Agent [xxxxxxxx]. Zabbix 2.0.3 (revision 30485).
          19966:20121022:092855.666 agent #0 started [collector]
          19968:20121022:092855.666 agent #2 started [listener]
          19967:20121022:092855.666 agent #1 started [listener]
          19969:20121022:092855.667 agent #3 started [listener]
          19970:20121022:092855.668 agent #4 started [listener]
          19972:20121022:092855.669 agent #6 started [active checks]
          19971:20121022:092855.668 agent #5 started [listener]
          19966:20121022:092855.673 Got signal [signal:10(SIGBUS),reason:1,refaddr:fed0e4e4]. Crashing ...
          19966:20121022:092855.673 ====== Fatal information: ======
          19966:20121022:092855.673 program counter not available for this architecture
          19966:20121022:092855.673 === Registers: ===
          19966:20121022:092855.673 register dump not available for this architecture
          19966:20121022:092855.673 === Backtrace: ===
          19966:20121022:092855.673 backtrace not available for this platform
          19966:20121022:092855.673 === Memory map: ===
          19966:20121022:092855.674 memory map not available for this platform
          19966:20121022:092855.674 ================================
          19965:20121022:092855.675 One child process died (PID:19966,exitcode/signal:-1). Exiting ...
          19965:20121022:092857.675 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).

          $ uname -a
          SunOS 5.10 Generic_147440-19 sun4u sparc SUNW,SPARC-Enterprise

          Show
          Jeff Shingara added a comment - Still encountering this issue on Solaris10 with 2.0.3 agent $ 19965:20121022:092855.664 Starting Zabbix Agent [xxxxxxxx] . Zabbix 2.0.3 (revision 30485). 19966:20121022:092855.666 agent #0 started [collector] 19968:20121022:092855.666 agent #2 started [listener] 19967:20121022:092855.666 agent #1 started [listener] 19969:20121022:092855.667 agent #3 started [listener] 19970:20121022:092855.668 agent #4 started [listener] 19972:20121022:092855.669 agent #6 started [active checks] 19971:20121022:092855.668 agent #5 started [listener] 19966:20121022:092855.673 Got signal [signal:10(SIGBUS),reason:1,refaddr:fed0e4e4] . Crashing ... 19966:20121022:092855.673 ====== Fatal information: ====== 19966:20121022:092855.673 program counter not available for this architecture 19966:20121022:092855.673 === Registers: === 19966:20121022:092855.673 register dump not available for this architecture 19966:20121022:092855.673 === Backtrace: === 19966:20121022:092855.673 backtrace not available for this platform 19966:20121022:092855.673 === Memory map: === 19966:20121022:092855.674 memory map not available for this platform 19966:20121022:092855.674 ================================ 19965:20121022:092855.675 One child process died (PID:19966,exitcode/signal:-1). Exiting ... 19965:20121022:092857.675 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485). $ uname -a SunOS 5.10 Generic_147440-19 sun4u sparc SUNW,SPARC-Enterprise
          Hide
          Alexander Vladishev added a comment -

          Similar issue: ZBX-5741

          Show
          Alexander Vladishev added a comment - Similar issue: ZBX-5741
          Hide
          Andris Mednis added a comment -

          Thanks for valuable comments and special thanks to Jairo and Takanori for explaining the root cause and proposing solution!
          At http://bytes.com/topic/c/answers/587942-isnt-time-there-standard-align-statement the commenter <artifact one at googlemail com> shows that memory alignment directives differ between GCC, Sun C, Intel C, HP C and IBM XL compilers. It seems better to avoid compiler vendor-specific syntax in Zabbix codebase.
          I'm working on a solution where required padding for 8-byte alignment is included into Zabbix agent data structures.

          Show
          Andris Mednis added a comment - Thanks for valuable comments and special thanks to Jairo and Takanori for explaining the root cause and proposing solution! At http://bytes.com/topic/c/answers/587942-isnt-time-there-standard-align-statement the commenter <artifact one at googlemail com> shows that memory alignment directives differ between GCC, Sun C, Intel C, HP C and IBM XL compilers. It seems better to avoid compiler vendor-specific syntax in Zabbix codebase. I'm working on a solution where required padding for 8-byte alignment is included into Zabbix agent data structures.
          Hide
          Andris Mednis added a comment -

          Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5289

          Show
          Andris Mednis added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5289
          Hide
          Alexander Vladishev added a comment -

          Great work! Successfully tested.

          Show
          Alexander Vladishev added a comment - Great work! Successfully tested.
          Hide
          Andris Mednis added a comment -

          Fixed in versions pre-2.0.4 rev. 31309 and pre-2.1.0 rev. 31312.

          Show
          Andris Mednis added a comment - Fixed in versions pre-2.0.4 rev. 31309 and pre-2.1.0 rev. 31312.
          Hide
          Paul Surgeon added a comment -

          I can confirm that the fix also works for HP-UX 11.31 on Itanium2 using GCC.
          I'm using the zabbix-2.0.4rc1 pre-release.

          Show
          Paul Surgeon added a comment - I can confirm that the fix also works for HP-UX 11.31 on Itanium2 using GCC. I'm using the zabbix-2.0.4rc1 pre-release.
          Hide
          Andris Mednis added a comment -

          Thanks, Paul!
          Yesterday zabbix-2.0.4rc1 was released.

          Show
          Andris Mednis added a comment - Thanks, Paul! Yesterday zabbix-2.0.4rc1 was released.
          Hide
          Gene Liverman added a comment -

          Has anyone by chance already made some pre-compiled agents for Solaris 9 / 10 SPARC with the 2.0.4rc1 code?

          Show
          Gene Liverman added a comment - Has anyone by chance already made some pre-compiled agents for Solaris 9 / 10 SPARC with the 2.0.4rc1 code?
          Hide
          Andris Mednis added a comment -

          Version 2.0.4 was released on Dec 8. Pre-compiled agents for Solaris are expected in few days at http://www.zabbix.com/download.php

          Show
          Andris Mednis added a comment - Version 2.0.4 was released on Dec 8. Pre-compiled agents for Solaris are expected in few days at http://www.zabbix.com/download.php
          Hide
          Andris Mednis added a comment -

          Version 2.0.4 pre-compiled agents are at http://www.zabbix.com/download.php

          Show
          Andris Mednis added a comment - Version 2.0.4 pre-compiled agents are at http://www.zabbix.com/download.php

            People

            • Assignee:
              Unassigned
              Reporter:
              Bruce Misc
            • Votes:
              5 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: