Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-10626

agent crashes in collector after proc.cpu.util[] is requested on Solaris 8

XMLWordPrintable

      Suppose we start the agent and request proc.cpu.util[]:

      $ bin/zabbix_get -s 127.0.0.1 -p 23050 -k proc.cpu.util[zabbix_agentd]
      zabbix_get [29276]: Check access restrictions in Zabbix agent configuration
      

      Investigation shows that the agent crashes with the following debug log:

       29078:20160405:225645.401 Requested [proc.cpu.util[zabbix_agentd]]
       29078:20160405:225645.403 In procstat_add()
       29078:20160405:225645.403 In zbx_dshm_realloc() shmid:-1 size:21704
       29078:20160405:225645.403 In procstat_copy_data()
       29078:20160405:225645.403 End of procstat_copy_data()
       29078:20160405:225645.403 End of zbx_dshm_realloc():SUCCEED shmid:605
       29078:20160405:225645.404 End of procstat_add()
       29077:20160405:225645.750 __zbx_zbx_setproctitle() title:'collector [processing data]'
       29077:20160405:225645.750 In update_cpustats()
       29077:20160405:225645.750 End of update_cpustats()
       ...
       29077:20160405:225645.766 DEBUG: util_local = ffbee948
       29077:20160405:225645.766 DEBUG: procstat_snapshot = 0
       29077:20160405:225645.767 DEBUG: procstat_snapshot_num = 0
       29077:20160405:225645.767 DEBUG: sizeof(zbx_procstat_util_t) = 32
       29077:20160405:225645.767 DEBUG: inside procstat_util_compare, u1 = ffbee948, u2 = 0
       29077:20160405:225645.767 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0]. Crashing ...
       29077:20160405:225645.767 ====== Fatal information: ======
       29077:20160405:225645.767 program counter not available for this architecture
       29077:20160405:225645.768 === Registers: ===
       29077:20160405:225645.768 register dump not available for this architecture
       29077:20160405:225645.768 === Backtrace: ===
       29077:20160405:225645.768 backtrace not available for this platform
       29077:20160405:225645.768 === Memory map: ===
       29077:20160405:225645.768 memory map not available for this platform
       29077:20160405:225645.769 ================================
       29076:20160405:225645.773 One child process died (PID:29077,exitcode/signal:1). Exiting ...
      

      The crash happens in procstat_calculate_cpu_util_for_queries() in the second call to bsearch:

      /* find the process utilization data in last snapshot */
      putil = (zbx_procstat_util_t *)bsearch(&util_local, procstat_snapshot, procstat_snapshot_num,
      		sizeof(zbx_procstat_util_t), procstat_util_compare);
      

      As can be seen in the debug log above, the "procstat_snapshot" variable is NULL. The following small program shows that, when the second argument to bsearch() is NULL on Solaris 8, then the comparison function is called with one of the arguments being NULL (which our procstat_util_compare() function does not handle and crashes):

      #include <stdio.h>
      #include <stdlib.h>
      
      static int	compare(const void *p1, const void *p2)
      {
      	printf("p1 = %p, p2 = %p\n", p1, p2);
      	return 0;
      }
      
      int	main()
      {
      	int	a;
      	void	*p;
      
      	p = bsearch(&a, NULL, 0, sizeof(a), compare);
      
      	printf("p = %p\n", p);
      
      	return 0;
      }
      
      $ gcc bsearch.c -o bsearch -Wall -Wextra
      $ ./bsearch
      p1 = ffbefb4c, p2 = 0
      p = 0
      

      Compiling the same program on Linux gives the following warning (note that the comparison function is not called at runtime):

      $ gcc bsearch.c -o bsearch -Wall -Wextra
      bsearch.c: In function 'main':
      bsearch.c:15:2: warning: null argument where non-null required (argument 2) [-Wnonnull]
        p = bsearch(&a, NULL, 0, sizeof(a), compare);
        ^
      $ ./bsearch 
      p = (nil)
      

            Unassigned Unassigned
            asaveljevs Aleksandrs Saveljevs
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: