[ZBX-10626] agent crashes in collector after proc.cpu.util[] is requested on Solaris 8 Created: 2016 Apr 05 Updated: 2017 May 30 Resolved: 2016 Apr 11 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 3.0.1 |
Fix Version/s: | 3.0.2rc1, 3.2.0alpha1 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Aleksandrs Saveljevs | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | crash, proc.cpu.util, solaris | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Description |
Suppose we start the agent and request proc.cpu.util[]: $ bin/zabbix_get -s 127.0.0.1 -p 23050 -k proc.cpu.util[zabbix_agentd] zabbix_get [29276]: Check access restrictions in Zabbix agent configuration Investigation shows that the agent crashes with the following debug log: 29078:20160405:225645.401 Requested [proc.cpu.util[zabbix_agentd]] 29078:20160405:225645.403 In procstat_add() 29078:20160405:225645.403 In zbx_dshm_realloc() shmid:-1 size:21704 29078:20160405:225645.403 In procstat_copy_data() 29078:20160405:225645.403 End of procstat_copy_data() 29078:20160405:225645.403 End of zbx_dshm_realloc():SUCCEED shmid:605 29078:20160405:225645.404 End of procstat_add() 29077:20160405:225645.750 __zbx_zbx_setproctitle() title:'collector [processing data]' 29077:20160405:225645.750 In update_cpustats() 29077:20160405:225645.750 End of update_cpustats() ... 29077:20160405:225645.766 DEBUG: util_local = ffbee948 29077:20160405:225645.766 DEBUG: procstat_snapshot = 0 29077:20160405:225645.767 DEBUG: procstat_snapshot_num = 0 29077:20160405:225645.767 DEBUG: sizeof(zbx_procstat_util_t) = 32 29077:20160405:225645.767 DEBUG: inside procstat_util_compare, u1 = ffbee948, u2 = 0 29077:20160405:225645.767 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0]. Crashing ... 29077:20160405:225645.767 ====== Fatal information: ====== 29077:20160405:225645.767 program counter not available for this architecture 29077:20160405:225645.768 === Registers: === 29077:20160405:225645.768 register dump not available for this architecture 29077:20160405:225645.768 === Backtrace: === 29077:20160405:225645.768 backtrace not available for this platform 29077:20160405:225645.768 === Memory map: === 29077:20160405:225645.768 memory map not available for this platform 29077:20160405:225645.769 ================================ 29076:20160405:225645.773 One child process died (PID:29077,exitcode/signal:1). Exiting ... The crash happens in procstat_calculate_cpu_util_for_queries() in the second call to bsearch: /* find the process utilization data in last snapshot */ putil = (zbx_procstat_util_t *)bsearch(&util_local, procstat_snapshot, procstat_snapshot_num, sizeof(zbx_procstat_util_t), procstat_util_compare); As can be seen in the debug log above, the "procstat_snapshot" variable is NULL. The following small program shows that, when the second argument to bsearch() is NULL on Solaris 8, then the comparison function is called with one of the arguments being NULL (which our procstat_util_compare() function does not handle and crashes): #include <stdio.h> #include <stdlib.h> static int compare(const void *p1, const void *p2) { printf("p1 = %p, p2 = %p\n", p1, p2); return 0; } int main() { int a; void *p; p = bsearch(&a, NULL, 0, sizeof(a), compare); printf("p = %p\n", p); return 0; } $ gcc bsearch.c -o bsearch -Wall -Wextra $ ./bsearch p1 = ffbefb4c, p2 = 0 p = 0 Compiling the same program on Linux gives the following warning (note that the comparison function is not called at runtime): $ gcc bsearch.c -o bsearch -Wall -Wextra bsearch.c: In function 'main': bsearch.c:15:2: warning: null argument where non-null required (argument 2) [-Wnonnull] p = bsearch(&a, NULL, 0, sizeof(a), compare); ^ $ ./bsearch p = (nil) |
Comments |
Comment by Aleksandrs Saveljevs [ 2016 Apr 05 ] |
We should also check calls to bsearch() in other places. On Solaris 9, the comparison function does not seem to be called when the second argument to bsearch() is NULL. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 06 ] |
POSIX says:
However, I wasn't able to check this part in pre-2001 editions. Solaris 8 came out in 2000 and may not comply to POSIX-2001 standard. Nevertheless, it should obey C99 and C89. Here is what they say about conparison function:
|
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 07 ] |
Fix for 3.0 is available in development branch svn://svn.zabbix.com/branches/dev/ZBX-10626 revision 59342. Simply defined zbx_bsearch() wrapper which will return NULL for arrays of zero elements as modern POSIX requests. |
Comment by Andris Zeila [ 2016 Apr 08 ] |
Successfully tested |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 08 ] |
Fixed in pre-3.0.2rc1 r59358, pre-3.1.0 (trunk) r59359. |