[ZBX-10626] agent crashes in collector after proc.cpu.util[] is requested on Solaris 8 Created: 2016 Apr 05  Updated: 2017 May 30  Resolved: 2016 Apr 11

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2rc1, 3.2.0alpha1

Type: Incident report Priority: Blocker
Reporter: Aleksandrs Saveljevs Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: crash, proc.cpu.util, solaris
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Suppose we start the agent and request proc.cpu.util[]:

$ bin/zabbix_get -s 127.0.0.1 -p 23050 -k proc.cpu.util[zabbix_agentd]
zabbix_get [29276]: Check access restrictions in Zabbix agent configuration

Investigation shows that the agent crashes with the following debug log:

 29078:20160405:225645.401 Requested [proc.cpu.util[zabbix_agentd]]
 29078:20160405:225645.403 In procstat_add()
 29078:20160405:225645.403 In zbx_dshm_realloc() shmid:-1 size:21704
 29078:20160405:225645.403 In procstat_copy_data()
 29078:20160405:225645.403 End of procstat_copy_data()
 29078:20160405:225645.403 End of zbx_dshm_realloc():SUCCEED shmid:605
 29078:20160405:225645.404 End of procstat_add()
 29077:20160405:225645.750 __zbx_zbx_setproctitle() title:'collector [processing data]'
 29077:20160405:225645.750 In update_cpustats()
 29077:20160405:225645.750 End of update_cpustats()
 ...
 29077:20160405:225645.766 DEBUG: util_local = ffbee948
 29077:20160405:225645.766 DEBUG: procstat_snapshot = 0
 29077:20160405:225645.767 DEBUG: procstat_snapshot_num = 0
 29077:20160405:225645.767 DEBUG: sizeof(zbx_procstat_util_t) = 32
 29077:20160405:225645.767 DEBUG: inside procstat_util_compare, u1 = ffbee948, u2 = 0
 29077:20160405:225645.767 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0]. Crashing ...
 29077:20160405:225645.767 ====== Fatal information: ======
 29077:20160405:225645.767 program counter not available for this architecture
 29077:20160405:225645.768 === Registers: ===
 29077:20160405:225645.768 register dump not available for this architecture
 29077:20160405:225645.768 === Backtrace: ===
 29077:20160405:225645.768 backtrace not available for this platform
 29077:20160405:225645.768 === Memory map: ===
 29077:20160405:225645.768 memory map not available for this platform
 29077:20160405:225645.769 ================================
 29076:20160405:225645.773 One child process died (PID:29077,exitcode/signal:1). Exiting ...

The crash happens in procstat_calculate_cpu_util_for_queries() in the second call to bsearch:

/* find the process utilization data in last snapshot */
putil = (zbx_procstat_util_t *)bsearch(&util_local, procstat_snapshot, procstat_snapshot_num,
		sizeof(zbx_procstat_util_t), procstat_util_compare);

As can be seen in the debug log above, the "procstat_snapshot" variable is NULL. The following small program shows that, when the second argument to bsearch() is NULL on Solaris 8, then the comparison function is called with one of the arguments being NULL (which our procstat_util_compare() function does not handle and crashes):

#include <stdio.h>
#include <stdlib.h>

static int	compare(const void *p1, const void *p2)
{
	printf("p1 = %p, p2 = %p\n", p1, p2);
	return 0;
}

int	main()
{
	int	a;
	void	*p;

	p = bsearch(&a, NULL, 0, sizeof(a), compare);

	printf("p = %p\n", p);

	return 0;
}
$ gcc bsearch.c -o bsearch -Wall -Wextra
$ ./bsearch
p1 = ffbefb4c, p2 = 0
p = 0

Compiling the same program on Linux gives the following warning (note that the comparison function is not called at runtime):

$ gcc bsearch.c -o bsearch -Wall -Wextra
bsearch.c: In function 'main':
bsearch.c:15:2: warning: null argument where non-null required (argument 2) [-Wnonnull]
  p = bsearch(&a, NULL, 0, sizeof(a), compare);
  ^
$ ./bsearch 
p = (nil)


 Comments   
Comment by Aleksandrs Saveljevs [ 2016 Apr 05 ]

We should also check calls to bsearch() in other places.

On Solaris 9, the comparison function does not seem to be called when the second argument to bsearch() is NULL.

Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 06 ]

POSIX says:

The bsearch() function shall search an array of nel objects, the initial element of which is pointed to by base, for an element that matches the object pointed to by key. The size of each element in the array is specified by width. If the nel argument has the value zero, the comparison function pointed to by compar shall not be called and no match shall be found.

However, I wasn't able to check this part in pre-2001 editions. Solaris 8 came out in 2000 and may not comply to POSIX-2001 standard. Nevertheless, it should obey C99 and C89. Here is what they say about conparison function:

The implementation shall ensure that the second argument of the comparison function (when called from bsearch), or both arguments (when called from qsort), are pointers to elements of the array.1) The first argument when called from bsearch shall equal key.

1) That is, if the value passed is p, then the following expressions are always nonzero:

((char *)p - (char *)base) % size == 0
(char *)p >= (char *)base
(char *)p < (char *)base + nmemb * size
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 07 ]

Fix for 3.0 is available in development branch svn://svn.zabbix.com/branches/dev/ZBX-10626 revision 59342. Simply defined zbx_bsearch() wrapper which will return NULL for arrays of zero elements as modern POSIX requests.

Comment by Andris Zeila [ 2016 Apr 08 ]

Successfully tested

Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 08 ]

Fixed in pre-3.0.2rc1 r59358, pre-3.1.0 (trunk) r59359.

Generated at Fri Apr 26 02:57:55 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.