[ZBX-11758] Crash in http poller on DebugLevel=4 on Solaris with regex: variable in scenario Created: 2017 Jan 31  Updated: 2024 Apr 10  Resolved: 2017 Feb 28

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Proxy (P), Server (S)
Affects Version/s: 2.2.17rc1, 3.0.8rc1, 3.2.2, 3.2.4rc1, 3.4.0alpha1
Fix Version/s: 2.2.18rc1, 3.0.9rc1, 3.2.5rc1, 3.4.0alpha1

Type: Incident report Priority: Major
Reporter: Little Martian Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: crash, logging, webmonitoring
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS: Solaris 10 SPARC T4-2
DB: MySQL 5.7.16 (compiled with GCC 5.2.0 Solaris SPARC)
Zabbix 3.2.1 (revision 62890)


Attachments: File by_pid.log     Text File server.log     PNG File server_busy.png     PNG File server_gather_busy.png     PNG File server_mysql_connections.png     PNG File server_perf.png    
Team: Team A
Team: Team A
Story Points: 0.5

 Description   

Hi,

Trying to investigate a WEB scenario i changed the log level from 3 to 4 and restarted server. After restart the server seemd hanged. Changed log level back to 3 and restarted server and everything was ok again.



 Comments   
Comment by Oleksii Zagorskyi [ 2017 Jan 31 ]

More correct description of the issue - server hangs with DebugLevel=4, right after start.

Related and last lines from attached log:

  1632:20170131:135225.455 server #73 started [java poller #1]
..
  1632:20170131:135225.504 zbx_tls_init_child() certificate and PSK ciphersuites: ECDHE-RSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-RSA-AES128-SHA AES128-GCM-SHA256 AES128-SHA256 AES128-SHA PSK-AES128-CBC-SHA
  1622:20170131:135225.504 In httpmacro_append_pair() pkey:'{msisdn}' pvalue:'732116687'
  1631:20170131:135225.505 End of DBconnect():0
  1596:20170131:135225.506 End of process_mass_data()
  1633:20170131:135225.506 End of DBconnect():0
  1630:20170131:135225.506 zbx_tls_init_child() certificate and PSK ciphersuites: ECDHE-RSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-RSA-AES128-SHA AES128-GCM-SHA256 AES128-SHA256 AES128-SHA PSK-AES128-CBC-SHA
  1635:20170131:135225.506 In DBconnect() flag:0
  1630:20170131:135225.506 End of zbx_tls_init_child()
  1633:20170131:135225.506 __zbx_zbx_setproctitle() title:'java poller #2 [got 0 values in 0.000000 sec, getting values]'
  1631:20170131:135225.506 __zbx_zbx_setproctitle() title:'ipmi poller #1 [got 0 values in 0.000000 sec, getting values]'
  1630:20170131:135225.507 __zbx_zbx_setproctitle() title:'escalator #1 [connecting to the database]'
  1596:20170131:135225.507 End of process_hist_data():SUCCEED
  1630:20170131:135225.507 In DBconnect() flag:0
  1632:20170131:135225.507 End of zbx_tls_init_child()
  1623:20170131:135225.507 __zbx_zbx_setproctitle() title:'http poller #10 [got 0 values in 0.023249 sec, idle 5 sec]'
  1634:20170131:135225.507 zbx_tls_init_child() certificate and PSK ciphersuites: ECDHE-RSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-RSA-AES128-SHA AES128-GCM-SHA256 AES128-SHA256 AES128-SHA PSK-AES128-CBC-SHA
  1636:20170131:135225.507 zbx_tls_init_child() certificate ciphersuites: ECDHE-RSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-RSA-AES128-SHA AES128-GCM-SHA256 AES128-SHA256 AES128-SHA
  1599:20170131:135225.507 In substitute_simple_macros() data:EMPTY
  1632:20170131:135225.508 __zbx_zbx_setproctitle() title:'java poller #1 [connecting to the database]'
  1636:20170131:135225.508 zbx_tls_init_child() PSK ciphersuites: PSK-AES128-CBC-SHA
  1599:20170131:135225.508 In substitute_simple_macros() data:EMPTY
  1632:20170131:135225.508 In DBconnect() flag:0
Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 01 ]

Attaching log file split by pid (DebugLevel=4 part). Pollers, unreachable pollers are almost silent, no sign of discoverer whatsoever, #65 is completely missing. I would blame NetSNMP library, particularly init_snmp() call. Or this can be a weird reincarnation of ZBX-11101.

What is your NetSNMP, libcurl and OpenSSL configuration?

Comment by Little Martian [ 2017 Feb 01 ]

I am using OpenCSW compiled packages, including GCC 5.2.0, and NetSNMP 5.7.3 and Curl 7.37.0. I compiled my self OpenSSL 1.0.2j.

It is a 64bit compilation on Sun Solaris 10 (I also have compiled it with 32bit but I'm using the 64bit version).

I used a self signed certificate for TLS configuration but I have no TLS enabled agent installed (I'm using agents found on zabbix.com on monitored servers) except the one I compiled but that has TLS disabled (TLSConnect=unencrypted).

Comment by Little Martian [ 2017 Feb 01 ]

I do have some inconsistencies with OpenSSL. Curl seems to be compiled with OpenSSL 1.0.0 but I can not use it from OpenCSW (https://www.opencsw.org/) because I do not have root access to install the packages properly, so I "installed" them locally with some configuration tricks. At some point I have compiled OpenSSL 1.0.1 and now I have OpenSSL 1.0.2. I do not remember if I comiled Zabbix against OpenSSL 1.0.1 or OpenSSL 1.0.2.

Do you recommend to recompile Zabbix without SSL or NetSNMP and try again ? But the question is why does the hang happen only on DebugLevel=4 ?

Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 01 ]

Actually, I have no clue whatsoever. This is so strange... But if you're ready for experiments (perhaps, you can try to reproduce the issue with the same Zabbix binary on a smaller setup, two servers run fine on one machine provided they have separate databases, config files, ports, etc.) I would try without NetSNMP first. It would be interesting to see ldd output for NetSNMP library, libcurl and Zabbix server binary. All OpenSSL 1.0.x versions should be compatible, but who knows...

Comment by Vladislavs Sokurenko [ 2017 Feb 01 ]

It's possible that null pointer is passed to some Debug function.
Could you please try the patch from here and see if it will no longer hang?
https://support.zabbix.com/browse/ZBX-11635

glebs.ivanovskis We would see a crash in this case, wouldn't we?

vso yes, if mutex would be reentrant we would see a crash. in short, proccess 1 mutex get locked it log try to printf to file, it crash, signal handler is launched for process 1, it try to lock mutex (again), but it is already locked so it waits for someone to unlock(while he is the one who locked) , but no unlock will happen since the one who locked is sig killed and try to lock again. Now everyone who wish to log something are waiting for unlock by process 1 that will never occur. This deadlock is easily spotted, no matter how you try to kill zabbix, you can't get any log out of it anymore.
That's why I have suggested to fix this by allowing mutex to be reentrant as in patch attached, this would allow to avoid hang and potential loss of data.

glebs.ivanovskis Very good reasoning! I was thinking about log file lock myself, but you added the missing link. I agree with you, this is the most probable scenario!

Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 02 ]

Actually, no need for a patch with reentrant mutex. If Zabbix processes are crashing we will see it in the output of ps aux | grep zabbix. Dear little_martian, is it possible to provide such?

Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 02 ]

Good candidate:

static int	httpmacro_append_pair(zbx_httptest_t *httptest, const char *pkey, size_t nkey,
			const char *pvalue, size_t nvalue, const char *data, char **err_str)
{
	...
	zbx_ptr_pair_t	pair = {NULL, NULL};
	...

	zabbix_log(LOG_LEVEL_DEBUG, "In %s() pkey:'%.*s' pvalue:'%.*s'",
			__function_name, (int)nkey, pkey, (int)nvalue, pvalue);

	if (NULL == data)
	{
		/* Ignore regex variables when no input data is specified. For example,   */
		/* scenario level regex variables don't have input data before the first  */
		/* web scenario step is processed.                                        */
		ret = SUCCEED;
		goto out;
	}

	...
out:
	zabbix_log(LOG_LEVEL_DEBUG, "End of %s():%s macro:'%s'='%s'",
			__function_name, zbx_result_string(ret), (char*)pair.first, (char*)pair.second);

	return ret;
}

And here we pass NULL as data explicitly:

int	process_httptests(int httppoller_num, int now)
{
	...
	while (NULL != (row = DBfetch(result)))
	{
		...
		/* add httptest variables to the current test macro cache */
		http_process_variables(&httptest, httptest.httptest.variables, NULL, NULL);
		...
	}
	...
}

Crashing process log:

  1622:20170131:135225.453 server #63 started [http poller #9]
  1622:20170131:135225.454 __zbx_zbx_setproctitle() title:'http poller #9 [connecting to the database]'
  1622:20170131:135225.460 In DBconnect() flag:0
  1622:20170131:135225.465 End of DBconnect():0
  1622:20170131:135225.479 __zbx_zbx_setproctitle() title:'http poller #9 [got 0 values in 0.000000 sec, getting values]'
  1622:20170131:135225.479 In process_httptests()
  1622:20170131:135225.479 query [txnlev:0] [select h.hostid,h.host,h.name,t.httptestid,t.name,t.variables,t.headers,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1485863545 and mod(t.httptestid,10)=8 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)]
  1622:20170131:135225.481 In substitute_simple_macros() data:'{msisdn}=732116687'
  1622:20170131:135225.497 In substitute_simple_macros() data:EMPTY
  1622:20170131:135225.497 In substitute_simple_macros() data:'Zabbix'
  1622:20170131:135225.497 In substitute_simple_macros() data:EMPTY
  1622:20170131:135225.503 In substitute_simple_macros() data:EMPTY
  1622:20170131:135225.503 In substitute_simple_macros() data:EMPTY
  1622:20170131:135225.504 In http_process_variables() variables:'{msisdn}=732116687'
  1622:20170131:135225.504 In httpmacro_append_pair() pkey:'{msisdn}' pvalue:'732116687'
Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 02 ]

Reproduced in trunk, although on Linux it does not crash. But it prints:

 16293:20170202:094728.283 End of httpmacro_append_pair():FAIL macro:'(null)'='(null)'

To achieve that I added {var}=regex:value into http test (not step!) variables. Before ZBX-11326 but after ZBXNEXT-1638 it could have been reproduced without regex:.

Comment by Little Martian [ 2017 Feb 02 ]

Hi,

Thank you for investigating the issue.

From what I understand (please correct me if i'm wrong) there are three issues:
1. (minor) - Problems in HTTP poller (passing null values to logging function)
2. (minor) - Problems in logging function (does not correctly handle null values for input parameters)
3. (major) - Possible deadlock on mutex if one process crashes before releasing the mutex. This can lead to application hanging, and when killing it, some resources might not be released (shared memory, semaphores). Killing the application in this stage can also lead to loss of data.

On Solaris "ps aux" does not work. Also, in Solaris, the command line of the process is limited to 80 chars and changing the command line after process started is not reflected in the ps command.

I tried again, this time I tried a truss (strace I belive is in Linux) to see what each process does, here is the output (I don't know if it helps):

truss -aeldf -p $(ps -fu monitor|grep zabbix_server|grep -v grep|awk '{print $2}')
Base time stamp:  1486022407.8081  [ Thu Feb  2 10:00:07 EET 2017 ]
5803/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5831/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5884/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5865/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5849/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5885/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5873/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5889/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5861/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5851/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5832/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5874/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5866/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5848/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5870/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5880/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5859/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5877/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5855/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5836/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5867/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5882/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5829/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5864/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5878/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5869/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5843/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5841/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5868/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5819/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5876/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5842/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5846/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5872/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5815/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5811/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5823/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5871/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5818/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5820/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5853/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5827/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5816/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5821/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5828/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5879/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5857/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5826/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5860/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5824/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5813/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5810/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5852/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5856/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5822/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5844/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5840/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5845/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5858/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5801/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5862/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5806/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5809/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5838/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5837/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5805/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5839/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5887/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5825/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5804/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5830/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5807/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5808/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5835/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5817/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5850/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5833/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5812/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5886/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5834/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5890/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5814/1:         psargs: /opt/monitor/home/local64/sbin/zabbix_server -c /opt/monitor/hom
5852/1:          0.4273 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5803/1:         semop(1308622929, 0xFFFFFFFF7FFEC2EA, 1) (sleeping...)
5831/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5884/1:         semop(1308622929, 0xFFFFFFFF7FFEC95A, 1) (sleeping...)
5865/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5849/1:         semop(1308622929, 0xFFFFFFFF7FFFCC0A, 1) (sleeping...)
5885/1:         semop(1308622929, 0xFFFFFFFF7FFEC95A, 1) (sleeping...)
5873/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5889/1:         semop(1308622929, 0xFFFFFFFF7FFFCBCA, 1) (sleeping...)
5861/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5851/1:         semop(1308622929, 0xFFFFFFFF7FFDA66A, 1) (sleeping...)
5832/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5874/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5866/1:         semop(1308622929, 0xFFFFFFFF7FFD68CA, 1) (sleeping...)
5848/1:         semop(1308622929, 0xFFFFFFFF7FFFCC0A, 1) (sleeping...)
5870/1:         semop(1308622929, 0xFFFFFFFF7FFECB7A, 1) (sleeping...)
5880/1:         semop(1308622929, 0xFFFFFFFF7FFEC95A, 1) (sleeping...)
5859/1:         semop(1308622929, 0xFFFFFFFF7FFE824A, 1) (sleeping...)
5877/1:         semop(1308622929, 0xFFFFFFFF7FFEC96A, 1) (sleeping...)
5855/1:         semop(1308622929, 0xFFFFFFFF7FFFCBAA, 1) (sleeping...)
5836/1:         semop(1308622929, 0xFFFFFFFF7FFFBF5A, 1) (sleeping...)
5867/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5882/1:         semop(1308622929, 0xFFFFFFFF7FFEC95A, 1) (sleeping...)
5829/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5864/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5878/1:         semop(1308622929, 0xFFFFFFFF7FFEC95A, 1) (sleeping...)
5869/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5843/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5841/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5868/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5819/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5876/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5842/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5846/1:         semop(1308622929, 0xFFFFFFFF7FFFCC0A, 1) (sleeping...)
5872/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5815/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5811/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5823/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5871/1:         semop(1308622929, 0xFFFFFFFF7FFEC95A, 1) (sleeping...)
5818/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5820/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5853/1:         semop(1308622929, 0xFFFFFFFF7FFFCBAA, 1) (sleeping...)
5827/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5816/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5821/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5828/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5879/1:         semop(1308622929, 0xFFFFFFFF7FFEC2AA, 1) (sleeping...)
5857/1:         semop(1308622929, 0xFFFFFFFF7FFFCBAA, 1) (sleeping...)
5826/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5860/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5824/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5813/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5810/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5856/1:         semop(1308622929, 0xFFFFFFFF7FFFCBAA, 1) (sleeping...)
5822/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5844/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5840/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5845/1:         semop(1308622929, 0xFFFFFFFF7FFFCC0A, 1) (sleeping...)
5858/1:         semop(1308622929, 0xFFFFFFFF7FFFCBAA, 1) (sleeping...)
5801/1:         waitid(P_ALL, 0, 0xFFFFFFFF7FFFCD60, WEXITED|WTRAPPED) (sleeping...)
5862/1:         semop(1308622929, 0xFFFFFFFF7FFFCBEA, 1) (sleeping...)
5806/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5809/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5838/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5837/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5805/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5839/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5887/1:         semop(1308622929, 0xFFFFFFFF7FFFCC1A, 1) (sleeping...)
5825/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5804/1:         semop(1308622929, 0xFFFFFFFF7FFFCC0A, 1) (sleeping...)
5830/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5807/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5808/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5835/1:         semop(1308622929, 0xFFFFFFFF7FFEB62A, 1) (sleeping...)
5817/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5850/1:         semop(1308622929, 0xFFFFFFFF7FFFCC0A, 1) (sleeping...)
5812/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5833/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5886/1:         semop(1308622929, 0xFFFFFFFF7FFEC96A, 1) (sleeping...)
5834/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5890/1:         semop(1308622929, 0xFFFFFFFF7FFFCBFA, 1) (sleeping...)
5814/1:         semop(1308622929, 0xFFFFFFFF7FFECA2A, 1) (sleeping...)
5852/1:          1.4278 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          2.4280 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          3.4282 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          4.4284 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          5.4286 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          6.4288 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          7.4289 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          8.4291 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:          9.4293 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0
5852/1:         10.4295 nanosleep(0xFFFFFFFF7FFFCC50, 0xFFFFFFFF7FFFCC40) = 0

glebs.ivanovskis Thank you! I guess we have a good understanding of a problem now.

  1. Yes. Really minor because manifests itself on DebugLevel=4-5 and in a very specific scenario. On 3.2.1 there are more scenarios leading to it, so I suggest upgrading to 3.2.2. Maybe your initial problem with failing web scenarios is ZBX-11326.
  2. That's not what we can fix, it's libc territory. As I've shown above, on Linux it just prints (null) in such case.
  3. Data loss will definitely take place on such "hard landings". But remaining semaphores and shared memory will be cleaned if you start Zabbix again (with the same config file path).
Comment by Little Martian [ 2017 Feb 02 ]

I compiled and installed version 3.2.3 and tested again the log level change. The server did not hang anymore.

Thank you.

Comment by Sergejs Paskevics [ 2017 Feb 06 ]

Crash is fixed in branch svn://svn.zabbix.com/branches/dev/ZBX-11758

Comment by Sergejs Paskevics [ 2017 Feb 06 ]

(1) [F] I think we need to add additional checks on the frontend side:

  • checking for empty variables
  • do not allow to use regexp function on the general configuration page (first tab on web scenario), because it is pointless, this function always returns null

sasha Moved to ZBXNEXT-2074 (3)

CLOSED

Comment by Sergejs Paskevics [ 2017 Feb 17 ]

(2) Please check my last changes in development branch svn://svn.zabbix.com/branches/dev/ZBX-11758 (r65775)

vso CLOSED

Comment by Vladislavs Sokurenko [ 2017 Feb 20 ]

Successfully tested

Comment by Sergejs Paskevics [ 2017 Feb 28 ]

Fixed in:

  • pre-2.2.18rc1 r66010,
  • pre-3.0.9rc1 r66011,
  • pre-3.2.5rc1 r66012,
  • pre-3.3.0 (trunk) r66016.
Generated at Sat Apr 20 13:19:46 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.