Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-15998

Heavy SQL if use network discovery a lot

    XMLWordPrintable

    Details

    • Team:
      Team A
    • Sprint:
      Sprint 51 (Apr 2019)
    • Story Points:
      1

      Description

      We have such number of rows in corresponding tables:
      hosts - 423722
      interface - 620132
      dservices - 360533
      dchecks - 18123
      drules - 18123

      Yes, 18K network discovery rules, each assigned to own proxy (yes, 18K proxies).

      Each rule scan this ranges each 3 hours: 192.168.140.1-254,192.168.150.1-39 (~300 IPs).

      The problem that we always have many heavy queries from this function:

      select h.hostid,h.status from hosts h,interface i,dservices ds,dchecks dc,drules dr static zbx_uint64_t	select_discovered_host(const DB_EVENT *event)
      {
      	const char	*__function_name = "select_discovered_host";
      	DB_RESULT	result;
      	DB_ROW		row;
      	zbx_uint64_t	hostid = 0;
      	char		*sql = NULL;
      
      	zabbix_log(LOG_LEVEL_DEBUG, "In %s() eventid:" ZBX_FS_UI64,
      			__function_name, event->eventid);
      
      	switch (event->object)
      	{
      		case EVENT_OBJECT_DHOST:
      			sql = zbx_dsprintf(sql,
      				"select h.hostid,h.status"
      				" from hosts h,interface i,dservices ds,dchecks dc,drules dr"
      				" where h.hostid=i.hostid"
      					" and i.ip=ds.ip"
      					" and ds.dcheckid=dc.dcheckid"
      					" and dc.druleid=dr.druleid"
      					" and h.status in (%d,%d)"
      					" and " ZBX_SQL_NULLCMP("dr.proxy_hostid", "h.proxy_hostid")
      					" and i.useip=1"
      					" and ds.dhostid=" ZBX_FS_UI64
      				" order by i.hostid",
      				HOST_STATUS_MONITORED, HOST_STATUS_NOT_MONITORED,
      				event->objectid);
      			break;
      		case EVENT_OBJECT_DSERVICE:
      			sql = zbx_dsprintf(sql,
      				"select h.hostid,h.status"
      				" from hosts h,interface i,dservices ds,dchecks dc,drules dr"
      				" where h.hostid=i.hostid"
      					" and i.ip=ds.ip"
      					" and ds.dcheckid=dc.dcheckid"
      					" and dc.druleid=dr.druleid"
      					" and h.status in (%d,%d)"
      					" and " ZBX_SQL_NULLCMP("dr.proxy_hostid", "h.proxy_hostid")
      					" and i.useip=1"
      					" and ds.dserviceid=" ZBX_FS_UI64
      				" order by i.hostid",
      				HOST_STATUS_MONITORED, HOST_STATUS_NOT_MONITORED,
      				event->objectid);
      			break;
      		default:
      			goto exit;
      
      

      namely 2nd query - with filtering by ds.dserviceid we could spot in waiting transaction.
      On average these queries take 15 seconds and cause high CPU usage on DB server.

      As I see in my debug test each query from the function is repeated once (per check and per rule). I.e. two very similar heavy queries.

      I've figure out that this SQL comes from operation to delete a host, when zabbix server needs to check host existence.
      The operation (and the function call) is performed for IPs when host is deleted already and IP is still not responsive (check is failed).

      Maybe the SQL could be improved to be faster and avoid high CPU usage?

        Attachments

        1. actions.png
          actions.png
          26 kB
        2. E&Q.png
          E&Q.png
          31 kB
        3. ZBX-15998-3.0.diff
          3 kB

          Activity

            People

            Assignee:
            vso Vladislavs Sokurenko
            Reporter:
            zalex_ua Oleksii Zagorskyi
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: