Loading...

XML

Word

Printable

Type: Problem report
Resolution: Fixed
Priority: Major
Fix Version/s: 4.0.8rc1, 4.2.2rc1, 4.4.0alpha1, 4.4 (plan)
Affects Version/s: None
Component/s: Server (S)
Labels:

Sprint:
Sprint 51 (Apr 2019)
Story Points:
1

We have such number of rows in corresponding tables:
hosts - 423722
interface - 620132
dservices - 360533
dchecks - 18123
drules - 18123

Yes, 18K network discovery rules, each assigned to own proxy (yes, 18K proxies).

Each rule scan this ranges each 3 hours: 192.168.140.1-254,192.168.150.1-39 (~300 IPs).

The problem that we always have many heavy queries from this function:

select h.hostid,h.status from hosts h,interface i,dservices ds,dchecks dc,drules dr static zbx_uint64_t	select_discovered_host(const DB_EVENT *event)
{
	const char	*__function_name = "select_discovered_host";
	DB_RESULT	result;
	DB_ROW		row;
	zbx_uint64_t	hostid = 0;
	char		*sql = NULL;

	zabbix_log(LOG_LEVEL_DEBUG, "In %s() eventid:" ZBX_FS_UI64,
			__function_name, event->eventid);

	switch (event->object)
	{
		case EVENT_OBJECT_DHOST:
			sql = zbx_dsprintf(sql,
				"select h.hostid,h.status"
				" from hosts h,interface i,dservices ds,dchecks dc,drules dr"
				" where h.hostid=i.hostid"
					" and i.ip=ds.ip"
					" and ds.dcheckid=dc.dcheckid"
					" and dc.druleid=dr.druleid"
					" and h.status in (%d,%d)"
					" and " ZBX_SQL_NULLCMP("dr.proxy_hostid", "h.proxy_hostid")
					" and i.useip=1"
					" and ds.dhostid=" ZBX_FS_UI64
				" order by i.hostid",
				HOST_STATUS_MONITORED, HOST_STATUS_NOT_MONITORED,
				event->objectid);
			break;
		case EVENT_OBJECT_DSERVICE:
			sql = zbx_dsprintf(sql,
				"select h.hostid,h.status"
				" from hosts h,interface i,dservices ds,dchecks dc,drules dr"
				" where h.hostid=i.hostid"
					" and i.ip=ds.ip"
					" and ds.dcheckid=dc.dcheckid"
					" and dc.druleid=dr.druleid"
					" and h.status in (%d,%d)"
					" and " ZBX_SQL_NULLCMP("dr.proxy_hostid", "h.proxy_hostid")
					" and i.useip=1"
					" and ds.dserviceid=" ZBX_FS_UI64
				" order by i.hostid",
				HOST_STATUS_MONITORED, HOST_STATUS_NOT_MONITORED,
				event->objectid);
			break;
		default:
			goto exit;

namely 2nd query - with filtering by ds.dserviceid we could spot in waiting transaction.
On average these queries take 15 seconds and cause high CPU usage on DB server.

As I see in my debug test each query from the function is repeated once (per check and per rule). I.e. two very similar heavy queries.

I've figure out that this SQL comes from operation to delete a host, when zabbix server needs to check host existence.
The operation (and the function call) is performed for IPs when host is deleted already and IP is still not responsive (check is failed).

Maybe the SQL could be improved to be faster and avoid high CPU usage?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

actions.png
26 kB
2019 Apr 16 17:28
E&Q.png
31 kB
2019 Apr 15 18:21
ZBX-15998-3.0.diff
3 kB
2019 May 13 10:51

Assignee:: Vladislavs Sokurenko

Reporter:: Oleksii Zagorskyi

Team:: Team A

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2019 Apr 15 16:57

Updated:: 2024 Apr 10 16:57

Resolved:: 2019 May 01 14:14

Details

Description

Attachments

Attachments

Activity

People

Dates