Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4823

unreachable pollers may "hang" when are doing ipmi checks

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Fixed
    • Icon: Major Major
    • None
    • 1.8.11
    • Proxy (P), Server (S)
    • CentOS 6.2 64bit

      I've got zabbix server hang with following backtrace (from gcore) and strace

      Backtrace

      Core was generated by `zabbix_server_pgsql'.
      #0  0x00007fed9cd15893 in __select_nocancel () from /lib64/libc.so.6
      Missing separate debuginfos, use: debuginfo-install zabbix-server-pgsql-1.8.11-0.x86_64
      (gdb) bt full
      #0  0x00007fed9cd15893 in __select_nocancel () from /lib64/libc.so.6
      No symbol table info available.
      #1  0x00007fed9d8c267d in ?? () from /usr/lib64/libOpenIPMIposix.so.0
      No symbol table info available.
      #2  0x00007fed9d8c2c5e in sel_select () from /usr/lib64/libOpenIPMIposix.so.0
      No symbol table info available.
      #3  0x00007fed9d8c057c in ?? () from /usr/lib64/libOpenIPMIposix.so.0
      No symbol table info available.
      #4  0x00000000004173e3 in init_ipmi_host ()
      No symbol table info available.
      #5  0x0000000000418412 in get_value_ipmi ()
      No symbol table info available.
      #6  0x000000000041a350 in get_values ()
      No symbol table info available.
      #7  0x000000000041ab4f in main_poller_loop ()
      No symbol table info available.
      #8  0x000000000041199d in MAIN_ZABBIX_ENTRY ()
      No symbol table info available.
      #9  0x0000000000440297 in daemon_start ()
      No symbol table info available.
      #10 0x00007fed9cc55cdd in __libc_start_main () from /lib64/libc.so.6
      No symbol table info available.
      #11 0x000000000040dd19 in _start ()
      No symbol table info available.
      (gdb) quit
      

      strace

           0.000000 select(1, [0], [], [], {7, 858908}) = -1 EBADF (Bad file descriptor)
           0.000077 select(1, [0], [], [], {7, 858825}) = -1 EBADF (Bad file descriptor)
           0.000042 select(1, [0], [], [], {7, 858782}) = -1 EBADF (Bad file descriptor)
           0.000041 select(1, [0], [], [], {7, 858741}) = -1 EBADF (Bad file descriptor)
           0.000074 select(1, [0], [], [], {7, 858674}) = -1 EBADF (Bad file descriptor)
           0.000050 select(1, [0], [], [], {7, 858617}) = -1 EBADF (Bad file descriptor)
           0.000041 select(1, [0], [], [], {7, 858575}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858536}) = -1 EBADF (Bad file descriptor)
           0.000038 select(1, [0], [], [], {7, 858498}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858459}) = -1 EBADF (Bad file descriptor)
           0.000064 select(1, [0], [], [], {7, 858395}) = -1 EBADF (Bad file descriptor)
           0.000040 select(1, [0], [], [], {7, 858355}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858316}) = -1 EBADF (Bad file descriptor)
           0.000038 select(1, [0], [], [], {7, 858278}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858239}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858199}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858161}) = -1 EBADF (Bad file descriptor)
           0.000039 select(1, [0], [], [], {7, 858121}) = -1 EBADF (Bad file descriptor)
           0.000040 select(1, [0], [], [], {7, 858082}) = -1 EBADF (Bad file descriptor)
      

      lsof tail

      zabbix_se 18214 zabbix  DEL    REG                0,4            753667 /SYSV7801030c
      zabbix_se 18214 zabbix  DEL    REG                0,4            884743 /SYSV5301030c
      zabbix_se 18214 zabbix    1w   REG                9,2 93959114   262695 /var/zabbix/zabbix_server.log
      zabbix_se 18214 zabbix    2w   REG                9,2 93959114   262695 /var/zabbix/zabbix_server.log
      zabbix_se 18214 zabbix    3w   REG                9,2        5   524521 /var/run/zabbix/zabbix.pid
      zabbix_se 18214 zabbix    4u  IPv4           57254714      0t0      TCP *:zabbix-trapper (LISTEN)
      zabbix_se 18214 zabbix    5u  unix 0xffff88010e921c80      0t0 57254932 socket
      

      Here is log tail for this pid

       18214:20120327:075815.715 server #33 started [unreachable poller #1]
      
       18214:20120402:145629.524 temporarily disabling Zabbix agent checks on host [xxx]: host unavailable
       18214:20120402:145936.829 IPMI item [Analog_Fan_RPM[FAN 1]] on host [xxx] failed: another network error, wait for 15 seconds
       18214:20120402:145939.831 IPMI item [Analog_Fan_RPM[Fan1]] on host [xxx] failed: another network error, wait for 15 seconds
       18214:20120402:145951.955 temporarily disabling IPMI checks on host [xxx]: host unavailable
       18214:20120402:145955.697 resuming IPMI checks on host [xxx]: connection restored
       18214:20120402:162901.379 Got signal [signal:15(SIGTERM),sender_pid:21889,sender_uid:0,reason:0]. Exiting ...
      

      Thanks,
      Alex

        1. full-backtrace.txt
          32 kB
          Alexander Vladishev
        2. strace-filtered.txt
          47 kB
          Oleksii Zagorskyi
        3. zabbix_server-filtered.log
          34 kB
          Oleksii Zagorskyi

            Unassigned Unassigned
            av Alex Vorona
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: