[ZBX-10652] Zabbix server generates slow "select clock,ns,value from history_uint..." queries in case of missing data for the items Created: 2016 Apr 12  Updated: 2024 Apr 19  Resolved: 2016 Jul 18

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.4.7
Fix Version/s: None

Type: Incident report Priority: Blocker
Reporter: Oleg Ivanivskyi Assignee: Unassigned
Resolution: Fixed Votes: 5
Labels: partitioning, performance, valuecache
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix 2.4
MySQL 5.6/5.7


Issue Links:
Duplicate

 Description   

It takes a lot of time (hours!) to get values with select clock,ns,value from history_uint where itemid=<itemid> and clock<=<timestamp> and clock><timestamp> order by clock desc limit 1 query from a big partitioned DB (e.g. ~40-50GB per each partition). These queries lead to 100% disk IO on the DB server.

A possible solution is to replace the slow SQL query:

select clock,ns,value from history_uint where itemid=123 and clock<=1460364250 and clock>1459759450 order by clock desc limit 1

with these two:

select max(clock) from history_uint where itemid=123 and clock<=1460364250 and clock>1459759450;
select clock,ns,value from history_uint where itemid=123 and clock=1460364197 order by ns desc limit 1;


 Comments   
Comment by Aleksandrs Saveljevs [ 2016 Apr 12 ]

How about we close this as a duplicate of ZBX-10059 or ZBX-10616, or some other related issue?

oleg.ivanivskyi asaveljevs, it is up to you.

Comment by Sandis Neilands (Inactive) [ 2016 Apr 13 ]

The proposed solution relies on the optimizer:

  • for the first query: column clock being indexed (with balanced tree index);
  • for the second query: partition pruning.
Comment by Kelly Shutt [ 2016 Apr 22 ]

We are having similar issues on 2.4.7 after implementing partitioning a few weeks ago. I just had to remove the partitions as a temporary stopgap. After removing the partitions, everything returned to normal.

I did also test those two queries as a proposed fix, and they worked perfect on MySql command line test. Initial query would take several seconds to return. However, when I split the same query into the suggested two queries, data was returned instantly.

I also noticed that this is happens a LOT when adding new hosts for monitoring. Basically it's trying to cache values for the new host, even though there's obviously not any values collected yet. This causes it to go through a full scan of the history for every new item. Maybe a second part to this fix needs to be a stop measure that will just not look for existing values on "new" hosts?

Comment by Sandis Neilands (Inactive) [ 2016 Apr 22 ]

Thanks for your input on this issue! For completeness could you provide:

  • the output of the following MySQL commands for all the affected tables;
  • lines from the Zabbix server log showing which queries are slow.
show create table history_uint\G;
show table status from <your_zabbix_db> where name like 'history_uint'\G;
Comment by Kelly Shutt [ 2016 Apr 25 ]

As I said in my previous comment, I actually removed the partitioning last week because our server was not catching up. I added some new hosts, through a proxy instance, and it started taking forever to run these queries; which in turn caused the whole server to get behind. After an hour of waiting it was still not catching up; and was spamming notifications because everything was behind.

I can tell you that I followed the procedures outlined here: https://www.zabbix.org/wiki/Docs/howto/mysql_partition
The only difference is that I added an EVENT trigger to run the partition_maintenance_all procedure once a week; so I wouldn't have to run manually all the time.
Things were actually functioning relatively well until we tried to add additional hosts; and then it went crazy trying to cache the values that were not there, because they had not been received from the new hosts yet.

show create table zabbix.history_uint\G;

       Table: history_uint
Create Table: CREATE TABLE `history_uint` (
  `itemid` bigint(20) unsigned NOT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `value` bigint(20) unsigned NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  KEY `history_uint_1` (`itemid`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

show table status from zabbix where name like 'history_uint'\G;

           Name: history_uint
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 203926818
 Avg_row_length: 51
    Data_length: 10518265856
Max_data_length: 0
   Index_length: 5671731200
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2016-04-22 01:49:15
    Update_time: 2016-04-25 08:42:18
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)
Comment by Aleksandrs Saveljevs [ 2016 May 16 ]

ZBX-10787 seems to be about the same query, but on the frontend side.

Comment by Kononenko Sergey [ 2016 Jun 23 ]

Hi.
I have absolutely similar situation.Like Kelly Shutt.
ZAbbix 3.0.3
My SQL 5.7.13

mysql> show create table history_uint\G
*************************** 1. row ***************************
       Table: history_uint
Create Table: CREATE TABLE `history_uint` (
  `itemid` bigint(20) unsigned NOT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `value` bigint(20) unsigned NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  KEY `history_uint_1` (`itemid`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
/*!50100 PARTITION BY RANGE ( clock)
(PARTITION p2016_06_22 VALUES LESS THAN (1466629200) ENGINE = InnoDB,
 PARTITION p2016_06_23 VALUES LESS THAN (1466715600) ENGINE = InnoDB,
 PARTITION p2016_06_24 VALUES LESS THAN (1466802000) ENGINE = InnoDB) */
1 row in set (0.00 sec)

Very slow queries in Latest data, after add new host

mysql> SELECT * FROM history_uint h WHERE h.itemid='185615' AND h.clock>1466607574 ORDER BY h.clock DESC LIMIT 2 OFFSET 0;
+--------+------------+-------+----------+
| itemid | clock      | value | ns       |
+--------+------------+-------+----------+
| 185615 | 1466697810 |     1 | 57512250 |
| 185615 | 1466697510 |     1 | 26020090 |
+--------+------------+-------+----------+
2 rows in set (16.47 sec)

UPD
This patch resolve my issue.

https://support.zabbix.com/browse/ZBX-10229

Comment by Alexey Pustovalov [ 2016 Jun 23 ]

Kononenko Sergey, Try to disable "pushed_index_condition" optimizer without any custom patches.

Comment by Kononenko Sergey [ 2016 Jun 24 ]

Alexey Pustovalov,
thank you very much, it works

mysql> SET optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=off,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on
1 row in set (0.00 sec)

mysql> SELECT * FROM history_uint h WHERE h.itemid='185615' AND h.clock>1466607574 ORDER BY h.clock DESC LIMIT 2 OFFSET 0;
+--------+------------+-------+-----------+
| itemid | clock      | value | ns        |
+--------+------------+-------+-----------+
| 185615 | 1466749110 |     1 |  36607436 |
| 185615 | 1466748809 |     1 | 929473480 |
+--------+------------+-------+-----------+
2 rows in set (0.01 sec)
Comment by Alexey Pustovalov [ 2016 Jun 24 ]

oxygen121, It is known bug of MySQL 5.7. So I think now you do not need to use custom patches

Comment by Kelly Shutt [ 2016 Jun 24 ]

This appears to have helped in our case also. I just made the same change and I'm not seeing the queries get stuck anymore. I even grabbed one that I know would return no results, and queried across about 20 partitions, and it finished pretty quickly.

We are still running 2.4.x release at this point; so it looks like the same fix works for both 2.4.x and 3.x releases. This is good news because we plan on upgrading soon.

Thanks for this insight, I tried looking for possible solutions on the MySQL side but was having a hard time coming up with anything.

Comment by Kelly Shutt [ 2016 Jun 24 ]

And, there's an added side benefit. After applying this MySQL change; our steady state IOPS are down 50% from what they were previously. This will make our SAN happy.

Comment by Kenneth Palmertree [ 2016 Jul 07 ]

I had similar issues like above but resolved them with the following patch -> https://support.zabbix.com/browse/ZBX-10229

Comment by Alexey Pustovalov [ 2016 Jul 07 ]

palmertree It is not quite good solution. Try to disable index_condition_pushdown.

Comment by Sandis Neilands (Inactive) [ 2016 Jul 08 ]

The issue and the workaround will be documented in known issues.

Nothing to do in Zabbix since this is MySQL issue.

martins-v Added to known issues for 2.4. Please review before copying to 3.0.

sandis.neilands Thanks! CLOSED.

Comment by Dmitry Verkhoturov [ 2017 Mar 21 ]

If you see huge performance drop after upgrading to mysql 5.7.x, see this: http://stackoverflow.com/questions/39023651/mysql-5-7-strange-perfomance-reduction-with-order-by-asc-desc-on-partitioned-tab

If that's your case, adding following line to mysqld section of my.cnf should help:

optimizer_switch = 'index_condition_pushdown=off'

From my experience, from-the-box installations of Zabbix are not affected, only ones with partitioning enabled.

Generated at Fri Apr 19 16:17:01 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.