[ZBX-13343] Performance issue with history syncer since at least 3.4.5 Created: 2018 Jan 16 Updated: 2024 Apr 10 Resolved: 2018 Jan 28 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.4.5, 3.4.6 |
Fix Version/s: | 3.4.7rc1, 4.0 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Daniel Poßmann | Assignee: | Andris Zeila |
Resolution: | Fixed | Votes: | 1 |
Labels: | performance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Debian Jessie |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||||||||
Issue Links: |
|
||||||||||||||||
Team: | |||||||||||||||||
Sprint: | Sprint 26 | ||||||||||||||||
Story Points: | 1 |
Description |
Comments |
Comment by Kaspars Mednis [ 2018 Jan 16 ] |
Hi Daniel, Your history syncer spikes cane be caused by housekeeper problems. select count(*),source from events group by source; regards, |
Comment by Daniel Poßmann [ 2018 Jan 16 ] |
Hi Kaspars, thanks for the fast reply. Here's the output: +----------+--------+ | count(*) | source | +----------+--------+ | 183884 | 0 | | 11835827 | 1 | | 368808 | 2 | | 7279489 | 3 | +----------+--------+ Due to the perfomance issues we decreased the retention times of events yesterday and since that hasn't fixed it, we deleted 17 million events from object=4 and source=3 that were older than 60 days (didn't know there was a fix for the events query Regards, |
Comment by Kaspars Mednis [ 2018 Jan 16 ] |
You can set the Internal data storage period to : 1d (its only Zabbix internal data) at this moment, which exact Zabbix Server version are you running - 3.4.5 or 3.4.6 ? zabbix_server -V Regards, |
Comment by Daniel Poßmann [ 2018 Jan 16 ] |
# zabbix_server -V zabbix_server (Zabbix) 3.4.6 Revision 76823 15 January 2018, compilation time: Jan 15 2018 09:34:47 From the official Repository deb http://repo.zabbix.com/zabbix/3.4/debian jessie main |
Comment by Daniel Poßmann [ 2018 Jan 16 ] |
I can see many select distinct itemid from trends where clock>=1516114800 and (itemid [...] queries right now. I don't think I ever saw those before. We have 700 mio rows in trends_uint and 300 mio rows in trends, both with 32 "HASH(`itemid` )" partitions. The number of rows hasn't changed much in the last months. Are those queries "new"? |
Comment by Oleksii Zagorskyi [ 2018 Jan 16 ] |
You can try to downgrade to previous zabbix version. |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 16 ] |
Probably caused by |
Comment by Oleksii Zagorskyi [ 2018 Jan 16 ] |
On 1st graph: |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 16 ] |
As you said, first trend calculation after restart is harder than subsequent ones, and we don't know how it looked like before upgrade to 3.4.5 (unless reporter provides a picture of restart without upgrade). So I am ignoring first syncer spike. The first anomaly after upgrade is 100% load of housekeeper. |
Comment by Oleksii Zagorskyi [ 2018 Jan 17 ] |
Ok, point accepted! Also, we see that syncers were sensitive on each hour start (spikes) before upgrade. It's quite rare picture. Usually it's not noticeable at all. |
Comment by Daniel Poßmann [ 2018 Jan 17 ] |
Thanks for your help. I did a downgrade and will check if the second and later housekeeper and syncer will show the same behavior. If yes, something with the database is wrong and it was some bad luck that it happened at the same time as the upgrade or a small schema change did something. Btw. the first graph shows the behavior before the upgrade. All the weekdays are statistics of 3.4.3, the weekend is already 3.4.5 |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 17 ] |
But there are no restarts before upgrade seen in those graphs. We are talking about the first hour after restart before and after the upgrade. We see after picture, before is missing. Perhaps you can still have such data in history, would be nice if you could attach a screenshot. By the way, disabling a housekeeper in 3.4.5 and checking how it affects (or not) history syncer would be a nice insight as well. |
Comment by Giuseppe Calignano [ 2018 Jan 17 ] |
Hi, Attaching the Zabbix chart and reporting the log, after upgrade, is full of this: vso slow query is the one that delete history for the item so I don't think it' related to anything, it's just slow operation, to delete 500K history entries in one go. |
Comment by Vladislavs Sokurenko [ 2018 Jan 17 ] |
Could you please be so kind and mention exact housekeeper configuration you had before and after upgrade ? Does reverting to old configuration solve the issue ? |
Comment by Giuseppe Calignano [ 2018 Jan 17 ] |
@Vladislav, |
Comment by Daniel Poßmann [ 2018 Jan 17 ] |
zabbix_processes_3.4.3_restart.png I'll upgrade again now, disable the housekeeper and send you the graph with 3 hours after the change. |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 17 ] |
Thank you, thetuxkeeper! So the first hourly spike of history syncer is as bad in 3.4.5 as it was in 3.4.3 (100% for ~15 minutes) and my assumption was correct. Housekeeper is still our main suspect. Looking forward to get some updates from you! |
Comment by Daniel Poßmann [ 2018 Jan 17 ] |
zabbix_processes_3.4.6_restart.png |
Comment by Giuseppe Calignano [ 2018 Jan 18 ] |
Update: Thanks for your help, appreciated |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 20 ] |
Difference between zabbix_processes_3.4.3_restart.png Have we done something in 3.4.5 that could have increased MySQL/MariaDB memory consumption? Trend synchronization is very sensitive to it. vso I know it's a long shot but glebs.ivanovskis I couldn't find anything more related in ChangeLog. Seems plausible. |
Comment by Daniel Poßmann [ 2018 Jan 20 ] |
We increased the innodb_buffer_pool_size from 8G to 12G (on 16.1.) and the RAM from 13G to 18G of the dedicated database vm (on 14.1.). But since that hasn't changed anything at all, I forgot to mention it. So both zabbix_processes_3.4.6_restart.png |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 21 ] |
This is an important comment! Thank you, thetuxkeeper! Looks more and more like some important piece of trend sync logic was lost with |
Comment by Andris Zeila [ 2018 Jan 22 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-13343 |
Comment by Vladislavs Sokurenko [ 2018 Jan 22 ] |
Successfully tested |
Comment by Andris Zeila [ 2018 Jan 22 ] |
Released in:
Note that only 3.4 branch was affected by this bug. |
Comment by richlv [ 2018 Jan 23 ] |
some communication seems to be lost - what was the issue in the end ? wiper Regression. |
Comment by Shane Arnold [ 2018 Jan 25 ] |
Is there anything we can to to mitigate this effect without moving to an RC/alpha? We are currently scheduling our upgrade from 3.2 to 3.4.5. In 3.2 we already have delayed values, and one of the basis for our upgrade to 3.4.5 is improved config syncer/housekeeper behaviour. Thanks |
Comment by Andris Zeila [ 2018 Jan 25 ] |
I'm afraid the only thing would be backporting the patch (it was quite simple fix). |
Comment by Dmitry Verkhoturov [ 2018 Jan 29 ] |
It's impossible to use 3.4.5, 3.4.6 because of this - in my case, all processing stuck and nothing works (3k nvps). In my opinion you should release 3.4.7 bugfix release even if it only will contain this patch and nothing else - there should not be such bug in latest published version of the product. |
Comment by Alexander Vladishev [ 2018 Jan 29 ] |
Sorry, this is another problem. This workaround will not help. Today will release 3.4.7rc1. |
Comment by Rostislav Palivoda (Inactive) [ 2018 Jan 30 ] |
Released in https://sourceforge.net/projects/zabbix/files/ZABBIX%20Release%20Candidates/3.4.7rc1/ |
Comment by brendon [ 2018 Feb 12 ] |
+1. Google send me here. I don't suppose someone has a 3.4.7rc1 mysql binary somewhere? Edit. I made some. |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Feb 20 ] |
3.4.7 must be available. |