[ZBX-6804] [Patch] Initial loading of graphs is slow -- at least with PostgreSQL and huge tables Created: 2013 Jul 18 Updated: 2017 May 30 Resolved: 2013 Oct 09 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Frontend (F) |
Affects Version/s: | 2.0.7rc1, 2.1.1 |
Fix Version/s: | 2.0.9rc1, 2.1.5 |
Type: | Incident report | Priority: | Critical |
Reporter: | Volker Fröhlich | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 3 |
Labels: | performance, trivial, usability | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | sadpuppy.jpg zabbix-2.0.6-speed_up_graphs.patch |
Description |
I noticed this issue on PG, but it might exists with other DB backends as well: Loading a graph can take mighty long, like 30 seconds, unless the data is in the page cache. So, if you visit a graph you've never been to or haven't been to for a long time, you're likely going to experience this. Needless to say, it greatly hinders the use of screens too. This is mostly due to ... include/graphs.inc.php:get_min_itemclock_by_itemid() SELECT MIN(ht.clock) AS min_clock FROM trends_uint ht WHERE ht.itemid IN ('54956','54959','55026','55035','60728') ... particularly for graphs with short periods selected. It gets worse the more itemids are inside IN(). The resulting query plan on PG is giving quite poor results: Query plan zabbix20=# SELECT count(*) from trends; count ---------- 29349921 explain analyze SELECT MIN(ht.clock) AS min_clock FROM trends_uint ht WHERE ht.itemid IN ('54957','54960','55027','55036','60728'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=136689.71..136689.72 rows=1 width=4) (actual time=28713.804..28713.804 rows=1 loops=1) -> Bitmap Heap Scan on trends_uint ht (cost=973.83..136587.39 rows=40930 width=4) (actual time=14.356..28701.949 rows=12976 loops=1) Recheck Cond: (itemid = ANY ('{54957,54960,55027,55036,60728}'::bigint[])) -> Bitmap Index Scan on trends_uint_pkey (cost=0.00..963.60 rows=40930 width=0) (actual time=4.013..4.013 rows=12976 loops=1) Index Cond: (itemid = ANY ('{54957,54960,55027,55036,60728}'::bigint[])) Total runtime: 28713.849 ms (6 rows) A different query yields results vastly quicker: Query plan for modified query QUERY PLAN ------------------------------------------------------------------------------------------------------------ ------------------------------------------------- Aggregate (cost=33.23..33.33 rows=1 width=4) (actual time=0.140..0.140 rows=1 loops=1) -> Append (cost=3.95..33.21 rows=8 width=4) (actual time=0.030..0.137 rows=8 loops=1) -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.029..0.029 rows=1 loops=1) InitPlan 17 (returns $16) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.026..0.026 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 wi dth=4) (actual time=0.026..0.026 rows=1 loops=1) Index Cond: ((itemid = 34291) AND (clock IS NOT NULL)) Heap Fetches: 1 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.028..0.028 rows=1 l oops=1) InitPlan 18 (returns $17) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.027..0.027 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.026..0.026 rows=1 loops=1) Index Cond: ((itemid = 34295) AND (clock IS NOT NULL)) Heap Fetches: 1 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.016..0.017 rows=1 loops=1) InitPlan 19 (returns $18) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.014..0.014 rows=1 loops=1) Index Cond: ((itemid = 34296) AND (clock IS NOT NULL)) Heap Fetches: 1 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.007..0.007 rows=1 loops=1) InitPlan 20 (returns $19) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: ((itemid = 34297) AND (clock IS NOT NULL)) Heap Fetches: 0 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=1) InitPlan 21 (returns $20) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.008..0.008 rows=1 loops=1) Index Cond: ((itemid = 34299) AND (clock IS NOT NULL)) Heap Fetches: 1 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.012..0.013 rows=1 loops=1) InitPlan 22 (returns $21) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.011..0.011 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.011..0.011 rows=1 loops=1) Index Cond: ((itemid = 34300) AND (clock IS NOT NULL)) Heap Fetches: 1 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1) InitPlan 23 (returns $22) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.015..0.015 rows=1 loops=1) Index Cond: ((itemid = 34303) AND (clock IS NOT NULL)) Heap Fetches: 1 -> Result (cost=3.95..4.05 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1) InitPlan 24 (returns $23) -> Limit (cost=0.00..3.95 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=1) -> Index Only Scan using trends_pkey on trends ht (cost=0.00..24305.42 rows=6152 width=4) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: ((itemid = 34304) AND (clock IS NOT NULL)) Heap Fetches: 1 Total runtime: 0.392 ms (51 rows) To be fair, those were in the page cache already, but it hardly matters. The magnitude of improvement is still on a scale of more than factor 1000 for any given itemids. The attached patch is a proof of concept. There are probably a lot more places that could profit from this kind of change. |
Comments |
Comment by Marc [ 2013 Jul 18 ] | ||||||||||||||||||||||||||||||||||||
Confirm improvement from 15 sec to <1 sec | ||||||||||||||||||||||||||||||||||||
Comment by Florian Koch [ 2013 Aug 05 ] | ||||||||||||||||||||||||||||||||||||
+1 confirm 5-10x improvment | ||||||||||||||||||||||||||||||||||||
Comment by Alexei Vladishev [ 2013 Aug 20 ] | ||||||||||||||||||||||||||||||||||||
We need to fix it for both 2.0x and 2.2.x. | ||||||||||||||||||||||||||||||||||||
Comment by Alexei Vladishev [ 2013 Aug 20 ] | ||||||||||||||||||||||||||||||||||||
According to PostgreSQL developers (#postgresql freenode IRC), the inefficiency was fixed in 9.2.4 however I failed to find any evidence in PostgreSQL release notes. | ||||||||||||||||||||||||||||||||||||
Comment by Florian Koch [ 2013 Aug 24 ] | ||||||||||||||||||||||||||||||||||||
i have updated to postgresql 9.2.4 and zabbix 2.0.8 today, without the patch, i can't see a speedup. | ||||||||||||||||||||||||||||||||||||
Comment by Igor Shipenkov [ 2013 Aug 28 ] | ||||||||||||||||||||||||||||||||||||
We use zabbix 1.8.16 with postgresql 9.2.4 and have this issue. You can extend list of affected versions. I apply patch and it seems that all works fine now. | ||||||||||||||||||||||||||||||||||||
Comment by Eduards Samersovs (Inactive) [ 2013 Sep 11 ] | ||||||||||||||||||||||||||||||||||||
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-6804 | ||||||||||||||||||||||||||||||||||||
Comment by Volker Fröhlich [ 2013 Sep 11 ] | ||||||||||||||||||||||||||||||||||||
You're doing something different there. I found the order by/limit approach not to bring any benefit. | ||||||||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2013 Sep 13 ] | ||||||||||||||||||||||||||||||||||||
Hi Volker, I tested SQL requests. Here results of testing. Configuration:
Number of records
Tests
Test results (average values in seconds):Current (before changes) SQL statement:select min(ht.clock) as min_clock from history ht where ht.itemid between ? and ?;
Our SQL statement (r38414):select h.clock as min_clock from history h where h.itemid between ? and ? order by h.clock limit 1 offset 0;
Your SQL statement:select min(ht.item_min_clock) as min_clock from ( select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? union all select min(ht.clock) as item_min_clock from history ht where ht.itemid = ? ) as ht;
SummaryI confirm improvement with Volker's patch. It will be integrated into Zabbix. Many thanks! | ||||||||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2013 Sep 13 ] | ||||||||||||||||||||||||||||||||||||
(1) Please apply the Volker's patch. It works quicker! Eduards RESOLVED r.38465 sasha CLOSED | ||||||||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2013 Sep 13 ] | ||||||||||||||||||||||||||||||||||||
(2) The patch doesn't work on Oracle. It shall be a little changed: SELECT min (...) AS min_clock FROM (...) AS ht; => SELECT min (...) AS min_clock FROM (...) ht; It works on MySQL, PostgreSQL and IBM DB2 too. Eduards RESOLVED r.38472,38473 sasha CLOSED | ||||||||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2013 Sep 13 ] | ||||||||||||||||||||||||||||||||||||
Successfully tested! | ||||||||||||||||||||||||||||||||||||
Comment by Eduards Samersovs (Inactive) [ 2013 Sep 13 ] | ||||||||||||||||||||||||||||||||||||
Fixed in versions pre-2.1.5 (trunk) r.38477, 2.0.9rc1 r.38478 | ||||||||||||||||||||||||||||||||||||
Comment by Volker Fröhlich [ 2013 Sep 25 ] | ||||||||||||||||||||||||||||||||||||
Will be in EPEL 2.0.8-3 packages. | ||||||||||||||||||||||||||||||||||||
Comment by richlv [ 2013 Oct 01 ] | ||||||||||||||||||||||||||||||||||||
(3) this issue was closed without adding the improvement to whatsnew. it makes puppies sad.
martins-v For review: https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew209#faster_loading_graphs_with_postgresql sasha It's not PostgreSQL only. It will be faster on all databases! martins-v New version: https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew209#faster_loading_graphs sasha Great! CLOSED The puppy should be happy! |