[ZBX-10656] One child process died Created: 2016 Apr 12 Updated: 2018 Jan 02 Resolved: 2017 Nov 02 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 3.0.1, 3.0.2 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Critical |
Reporter: | Biryukov Timofey | Assignee: | Unassigned |
Resolution: | Cannot Reproduce | Votes: | 1 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
FreeBSD 10.1 |
Attachments: |
![]() |
||||||||
Issue Links: |
|
Description |
872:20160412:171816.128 Starting Zabbix Server. Zabbix 3.0.1 (revision 58734). 872:20160412:171816.128 ****** Enabled features ****** 872:20160412:171816.128 SNMP monitoring: YES 872:20160412:171816.128 IPMI monitoring: YES 872:20160412:171816.128 Web monitoring: YES 872:20160412:171816.128 VMware monitoring: NO 872:20160412:171816.128 SMTP authentication: YES 872:20160412:171816.128 Jabber notifications: YES 872:20160412:171816.128 Ez Texting notifications: YES 872:20160412:171816.128 ODBC: YES 872:20160412:171816.128 SSH2 support: YES 872:20160412:171816.128 IPv6 support: YES 872:20160412:171816.128 TLS support: YES 872:20160412:171816.128 ****************************** 872:20160412:171816.128 using configuration file: /usr/local/etc/zabbix3/zabbix_server.conf 872:20160412:171816.143 current database version (mandatory/optional): 03000000/03000000 872:20160412:171816.143 required mandatory version: 03000000 872:20160412:171816.157 server #0 started [main process] 873:20160412:171816.164 server #1 started [configuration syncer #1] 874:20160412:171816.164 server #2 started [db watchdog #1] 881:20160412:171816.165 server #9 started [trapper #1] 882:20160412:171816.168 server #10 started [trapper #2] 883:20160412:171816.169 server #11 started [trapper #3] 884:20160412:171816.170 server #12 started [trapper #4] 885:20160412:171816.172 server #13 started [trapper #5] 887:20160412:171816.174 server #15 started [alerter #1] 888:20160412:171816.178 server #16 started [housekeeper #1] 889:20160412:171816.178 server #17 started [timer #1] 890:20160412:171816.178 server #18 started [http poller #1] 892:20160412:171816.180 server #20 started [history syncer #1] 893:20160412:171816.180 server #21 started [history syncer #2] 894:20160412:171816.180 server #22 started [history syncer #3] 895:20160412:171816.181 server #23 started [history syncer #4] 896:20160412:171816.181 server #24 started [escalator #1] 897:20160412:171816.181 server #25 started [proxy poller #1] 872:20160412:171816.188 One child process died (PID:875,exitcode/signal:4). Exiting ... 898:20160412:171816.189 server #26 started [self-monitoring #1] 872:20160412:171818.217 syncing history data... 872:20160412:171818.217 syncing history data done 872:20160412:171818.217 syncing trends data... 872:20160412:171818.218 syncing trends data done 872:20160412:171818.218 Zabbix Server stopped. Zabbix 3.0.1 (revision 58734). |
Comments |
Comment by Andris Mednis [ 2016 Apr 12 ] |
Looks like a poller process exits. Can you set DebugLevel=4 in server config file and try again ? Are you using encryption (e.g. certificate is configured for server) ? |
Comment by Biryukov Timofey [ 2016 Apr 13 ] |
Encryption is not used or configured. |
Comment by Andris Mednis [ 2016 Apr 13 ] |
Thanks for log file! What is "StartPollers" and" StartPollersUnreachable" parameter values in server config file ? |
Comment by Biryukov Timofey [ 2016 Apr 13 ] |
Everything related to Poller and trappers commented out... ############ ADVANCED PARAMETERS ################ ### Option: StartPollers # Number of pre-forked instances of pollers. # # Mandatory: no # Range: 0-1000 # Default: # StartPollers=5 ### Option: StartIPMIPollers # Number of pre-forked instances of IPMI pollers. # # Mandatory: no # Range: 0-1000 # Default: # StartIPMIPollers=0 ### Option: StartPollersUnreachable # Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java). # At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers # are started. # # Mandatory: no # Range: 0-1000 # Default: # StartPollersUnreachable=1 ### Option: StartTrappers # Number of pre-forked instances of trappers. # Trappers accept incoming connections from Zabbix sender, active agents and active proxies. # At least one trapper process must be running to display server availability and view queue # in the frontend. # # Mandatory: no # Range: 0-1000 # Default: # StartTrappers=5 ### Option: StartPingers # Number of pre-forked instances of ICMP pingers. # # Mandatory: no # Range: 0-1000 # Default: # StartPingers=1 |
Comment by Aleksandrs Saveljevs [ 2016 Apr 13 ] |
The suspicious part is "4" in the following line: 872:20160412:171816.188 One child process died (PID:875,exitcode/signal:4). Exiting ... If that is meant to indicate a signal, then that signal is SIGILL (illegal instruction). What architecture are you running on? How did you install Zabbix - from packages or from sources? Did you compile Zabbix on the same machine? What ./configure command was used to build Zabbix? |
Comment by Biryukov Timofey [ 2016 Apr 13 ] |
FreeBSD 10.1 i386. Installation is made as follows: Next corrected configuration files and installed zabbix3-frontend |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 19 ] |
Observation. On both occasions the following processes had not reported that they have started: #3-7 (5 pollers), #8 (poller unreachable), #14 (pinger), #19 (discoverer). Processes that had not reported "Got signal..." are 5 pollers, poller unreachable, pinger and main process. Note that discoverer reported getting signal! Pollers, pollers unreachable and discoverers are the processes performing init_snmp(). And in discoverer code init_snmp() comes before logging that process has started. Hypothesis is, pollers are busy logging at first and pinger is queueing behind them. They all blocked user signals before acquiring log access, that's why there are no "Got signal..." messages from them. Discoverer gets to init_snmp() first and while it is there the first poller who gets logging done performs init_snmp() too and here something bad happens. We don't get to see his message possibly because it dies immediately. Unfortunately, we can't dig deeper in this direction without recompiling Zabbix server because we have:
# define SNMP_NO_DEBUGGING /* disabling debugging messages from Net-SNMP library */
Dear tbiryukov, please check your NetSNMP library. |
Comment by Merlin [ 2016 Apr 26 ] |
I have exactly the same issue: The only possibility to start server is to set: |
Comment by Merlin [ 2016 May 01 ] |
Zabbix version 2.4.8 works fine on the same system. Also I compiled Zabbix 3 without net-snmp - still no success. |
Comment by Serg [ 2016 May 14 ] |
zabbix_proxy (Zabbix) 3.0.2 FreeBSD 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 22:51:51 UTC 2014 i386
4282:20160514:160942.259 End of DCsync_configuration()
4282:20160514:160942.260 proxy #0 started [main process]
4285:20160514:160942.263 proxy #3 started [trapper #1]
4286:20160514:160942.264 proxy #4 started [trapper #2]
4287:20160514:160942.264 proxy #5 started [trapper #3]
4288:20160514:160942.264 proxy #6 started [trapper #4]
4289:20160514:160942.264 proxy #7 started [trapper #5]
4282:20160514:160942.265 One child process died (PID:4283,exitcode/signal:4). Exiting ...
4285:20160514:160942.265 In zbx_tls_init_child()
4286:20160514:160942.265 In zbx_tls_init_child()
4287:20160514:160942.268 In zbx_tls_init_child()
4288:20160514:160942.270 In zbx_tls_init_child()
4289:20160514:160942.271 In zbx_tls_init_child()
4282:20160514:160942.274 zbx_on_exit() called
...
...
...
4285:20160514:160942.279 Got signal [signal:15(SIGTERM),sender_pid:4282,sender_uid:122,reason:65537]. Exiting ...
4286:20160514:160942.279 Got signal [signal:15(SIGTERM),sender_pid:4282,sender_uid:122,reason:65537]. Exiting ...
4288:20160514:160942.279 Got signal [signal:15(SIGTERM),sender_pid:4282,sender_uid:122,reason:65537]. Exiting ...
4287:20160514:160942.280 Got signal [signal:15(SIGTERM),sender_pid:4282,sender_uid:122,reason:65537]. Exiting ...
...
...
...
4282:20160514:160944.306 Zabbix Proxy stopped. Zabbix 3.0.2 (revision 59540).
zabbix_proxy [4289]: [file:'log.c',line:294] lock failed: [22] Invalid argument
UPD: |
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 15 ] |
Dear sapzxc, for completeness of the picture could you also try version 3.0.3rc1 built from sources with net-snmp and version 3.0.2 built from sources without net-snmp. Could you also attach a more detailed log file so that we can identify process "PID:4283" and see which processes reported their start and getting SIGTERM from main process? |
Comment by Serg [ 2016 May 15 ] |
Unfortunately, in freebsd port, there no option "snmp" ( http://www.screencast.com/t/OpoR7MuXu ). Also when I try build 3.0.3rc1 with snmp, I got this error:
./../poller/checks_snmp.h:30:11: fatal error: 'net-snmp/net-snmp-config.h' file not found
# include <net-snmp/net-snmp-config.h>
But actually port "net-snmp-5.7.3_7" installed in the system. I do not using snmp (as I imaging), so at the moment rc1 build works fine for me. |
Comment by Merlin [ 2016 Jun 04 ] |
Updating to FreeBSD 10.3-p4 resolved the problem. Even Zabbix 3.0.1_2 from pkg repository works. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jul 22 ] |
The root cause may be in different versions of OpenSSL used by NetSNMP and Zabbix itself. Related issue is Can anyone provide objdump or ldd output of crashing Zabbix binary? |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 02 ] |
No updates for more than a year. Closing as Cannot Reproduce because there is currently not enough information to continue investigation. |