[ZBXNEXT-5263] Database query latest history data very slow Created: 2019 Mar 28  Updated: 2019 Jun 07

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: Aaron Assignee: Zabbix Development Team
Resolution: Unresolved Votes: 10
Labels: database, history, latestdata, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Grafana-AP-Status.PNG     Text File Triggers.functions.lastvalue.txt    

 Description   

I'm querying the database directly to gather some information on the latest history data directly from a Grafana dashboard. It works dandy but the queries take a very long time to finish.

A simple query to gather the latest history item takes about 6 seconds on a 2740MB table (history_uint).

Here is the query (very simple one, the ones I'm doing require joins, that's when it get's messy):

 

zabbix=> select * from history_uint order by history_uint.clock DESC LIMIT 1; 
 itemid | clock | value | ns 
--------+------------+-------+----------
 188644 | 1553786344 | 1 | 94740072
(1 row)
Time: 6810,381 ms (00:06,810)

 

Here is the table size:

 

zabbix=> SELECT pg_size_pretty (pg_relation_size('history_uint')); 
 pg_size_pretty 
----------------
 2740 MB
(1 row)

 

One of the queries I'm doing to get the latest data on several items:

 

zabbix=> SELECT DISTINCT COUNT(items.name)
zabbix-> FROM history_uint
zabbix-> JOIN (SELECT 
zabbix(> history_uint.itemid, 
zabbix(> max(history_uint.clock) as max_clock
zabbix(> FROM history_uint WHERE history_uint.clock > extract(epoch from now()- INTERVAL '6 MINUTES')
zabbix(> GROUP BY history_uint.itemid) subselect_history_max_clock
zabbix-> ON (history_uint.itemid = subselect_history_max_clock.itemid) AND (history_uint.clock = subselect_history_max_clock.max_clock)
zabbix-> 
zabbix-> JOIN items ON items.itemid = subselect_history_max_clock.itemid 
zabbix-> JOIN hosts ON items.hostid = hosts.hostid 
zabbix-> JOIN hosts_groups ON hosts.hostid = hosts_groups.hostid
zabbix-> JOIN hstgrp ON hosts_groups.groupid = hstgrp.groupid 
zabbix-> 
zabbix-> WHERE (hstgrp.name = 'Amadores-Clusters') 
zabbix-> AND (items.name LIKE '%AP Status%')
zabbix-> AND (hosts.host LIKE '%VC%')
zabbix-> AND history_uint.clock > extract(epoch from now()- INTERVAL '6 MINUTES');
 count 
-------
 469
(1 row)
Time: 33019,439 ms (00:33,019)

 

This is on a server idling at the moment. If multiple users start using Grafana to check the dashboards it can get very messy very fast.

 

What I found:

 - There is no entry on the database that relates the history latest data besides the highest clock which isn't very efficient as it's not even pre-sorted

 - Other users have found this same issue

 - One of those users (dimir) came up with a solution and I think the fix is clean and simple

 

The fix:

It's stated here -> https://support.zabbix.com/browse/ZBX-13193?focusedCommentId=304714&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-304714

 

and here -> https://support.zabbix.com/browse/ZBX-13193?focusedCommentId=332045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-332045

It would create 5 tables with the latest data gathered on each item/history* table, those tables wouldn't grow over time as it's as big as items has the history tables, so it's a very tidy and fast solution.

 

Could be this implemented on Zabbix?

 

Best regards,

Aarón

 

PS: IOPs spike when I do those queries, so I'm even maxing out the IOPs of the hard drives.



 Comments   
Comment by Glebs Ivanovskis [ 2019 Mar 29 ]

It is strange to hear that you are writing SQL queries while using Grafana. Why? Have you heard about Zabbix plugin?

Comment by Aaron [ 2019 Mar 29 ]

Hey Glebs,

I need the flexibility of a database query to be able to retrieve the info I'm using.

Here is an example. I have several wireless controllers. I do a discovery on each one to see what APs are on the controllers database  and check wether the APs are up (1) or down (2).

With Zabbix plugin I haven't managed a way to get a count of APs that are UP and APs that are DOWN. I could preprocess the status values to 1 = up and 0 = down but still, I have to be very careful with the Time Range. If it's too short I will miss values and if it's too large I will get too many.

AFAIK there is no way to get the latest value on a metric on Grafana, it's always time dependant and very hard to know where to set the limit, it really depends on when the data was written into the DB (clock).

 

I have attached a screenshot to explain it better. The AP status items are being being queried every 5 minutes so I configured the time range on the Zabbix datasource to 7 minutes getting 64 APs UP but there are really 111 APs UP (see the SQL output right beside it). It doesn't matter how I fiddle with the time range I can't get it to state the correct number of APs being UP so I figured it would be easier and more accurate to directly query the DB.

 

 

I hope I have explained myself properly!

 

Best regards,

Aarón

Comment by Glebs Ivanovskis [ 2019 Mar 30 ]

If I read your SQL query correctly, it selects latest value for all items in history_uint, no wonder it takes ages to complete. It would be more efficient to get the list of itemid's you are interested in and only then find out their latest values (where itemid in (...)).

Comment by Aaron [ 2019 Apr 01 ]

Hi Glebs,

I got the list of items I want to query and managed to query the history_uint for only those items, but I'm failing to keep the most recent values and also discern when an AP is UP or when it's DOWN.

Let me explain, if I want to see if an AP is down I would filter the query to see if value == 2. If an AP goes down (value == 2) and then it comes up (value == 1) I will always get a match on value == 2 even though the AP is already up...

It's very complicated to query the last value on multiple items, that's why I was asking for that fix above. I checked the documentation (https://zabbix.org/wiki/Docs/DB_schema/2.0/items) on the database there was a field on the items table called 'lastvalue' that would help me a lot but it was removed on version 2.2.

 

I don't want to modify Zabbix code so I will try to create a trigger that on the event of an INSERT on history_uint table, I will store the itemid/value/clock into a new table.

 

Cheers,

Aarón

Comment by dimir [ 2019 Apr 01 ]

The trigger sounds like a good idea. As to why it was removed from items table, it's because it was causing deadlocks and performance issues every time new values were to be added to the database and that decision was right. But I think we forgot to think of other possibilities to have latest data always available.

Comment by Aaron [ 2019 Apr 03 ]

Hello,

I did it just for history_uint but this is replicable to all history tables. I created two functions, one for inserting/updating and another one for deleting.

The new table 'lastvalue_uint' stores itemid, clock and value:

CREATE TABLE lastvalue_uint(
itemid bigint primary key,
clock integer NOT NULL,
value numeric(20,0) NOT NULL);

 

The insert/update is triggered whenever an item is inserted/updated in history_uint.

CREATE OR REPLACE FUNCTION lastvalue_uint_func() RETURNS TRIGGER AS $BODY$
 DECLARE
 BEGIN
 IF (TG_OP = 'INSERT' OR TG_OP = 'UPDATE' ) THEN
 INSERT INTO lastvalue_uint(itemid, clock, value)
 VALUES(NEW.itemid, NEW.clock, NEW.value)
 ON CONFLICT (itemid) DO
 UPDATE SET
 clock = NEW.clock, 
 value = NEW.value;
 END IF;
 RETURN NEW;
 END;
$BODY$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_insert_lastvalue_uint
BEFORE INSERT OR UPDATE ON history_uint 
FOR EACH ROW EXECUTE 
PROCEDURE lastvalue_uint_func();

 

The delete is triggered whenever an item is deleted in the items table.

CREATE OR REPLACE FUNCTION lastvalue_uint_func_delete() RETURNS TRIGGER AS $BODY$
 DECLARE
 BEGIN
 IF (TG_OP = 'DELETE') THEN
 DELETE FROM lastvalue_uint WHERE itemid = OLD.itemid;
 END IF;
 RETURN OLD;
 END;
$BODY$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_delete_lastvalue_uint
BEFORE DELETE ON items 
FOR EACH ROW EXECUTE 
PROCEDURE lastvalue_uint_func_delete();

 

So far the size of the table is slowly growing but I will have to wait a day or so to see that it matches the size of history_uint unique items.

 

Hope this helps someone!

 

Cheers,

Aarón

Comment by Aaron [ 2019 Apr 10 ]

After a week the solution seems stable, no tables have outgrown the history_* unique itemids so that's good.

I finally created a function to update/insert on lastvalue_* for each history_* and a function for every delete on items. Probably there is a better way to do this but having separate functions keeps it pretty simple and easier to understand.

I have attached the functions/triggers I'm using, now the queries take almost no time and are a lot simpler.

 

Comparison on number of itemids on history_uint vs lastvalue_uint:

 

zabbix=> select count(DISTINCT itemid) from history_uint ;
 count 
-------
 36736
(1 row)
zabbix=> select count(DISTINCT itemid) from lastvalue_uint ;
 count 
-------
 29667
(1 row)

 

 

Comparison on size of history_uint and lastvalue_uint:

 

zabbix=> SELECT pg_size_pretty( pg_total_relation_size('history_uint'));
 pg_size_pretty 
----------------
 7612 MB
(1 row)
 
zabbix=> SELECT pg_size_pretty( pg_total_relation_size('lastvalue_uint'));
 pg_size_pretty 
----------------
 4240 kB
(1 row)

Triggers.functions.lastvalue.txt

 

Even though it's working, it would be great if this could be added on future Zabbix releases.

 

Cheers,

Aarón

 

Generated at Fri Apr 26 01:14:09 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.