[ZBX-16130] Zabbix 4.0.7 - History syncers slow when many events arrive Created: 2019 May 15 Updated: 2019 Nov 20 Resolved: 2019 Nov 20 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 4.0.7 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Minor |
Reporter: | James Cook | Assignee: | Aleksandrs Petrovs-Gavrilovs |
Resolution: | Duplicate | Votes: | 2 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix - 4.0.7 |
Issue Links: |
|
Description |
I have noticed the history syncers get busy and delay syncing when lots of events arrive and last time this occurred for us was 2700 in 10 minutes due to a large environment change being performed (including internal events ie item not supported). There is many zabbix server log entries with this entry in there - slow query: 11.164569 sec, "update ids set nextid=nextid+1 where table_name='event_suppress' and field_name='event_suppressid'".` The behavior is that the history syncers are slow and the cache starts to build up until the above log entries disappear (ie processing caught up) OR lots of hosts come out on their maintenance period (and we have a lot of hosts in maintenance after hours). The issue is that if the history syncing gets far enough behind other nodata() triggers start to fire causing additional load and compounds the issue. This never occured in Zabbix 3.4 and we have only just updated tgo Zabbix 4.0 in the last couple of weeks and this has happened several times. |
Comments |
Comment by Aleksandrs Petrovs-Gavrilovs [ 2019 May 17 ] |
Hello [email protected] Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding. |
Comment by James Cook [ 2019 Aug 13 ] |
Hi, I am currently in the process of putting a business case together to obtain support within out company. I believe the issue may be a result of shifting some of the trigger evaluation from the timers to the history syncers and now having to factor in an exclusive lock on the ids table while processing incoming events, effectively removing the parallelism of the history syncers as they all have to wait to get the exclusive lock. Hence the slow query being reported as the history syncers cant update the field because of the lock. This never used to happen in previous versions as the history syncers did not require the use of the ids table as the history* and trends* tables had no id. I would say this is not a bug, just the change in design did not factor in the performance while using exclusive locks. Cheers James |
Comment by James Cook [ 2019 Aug 13 ] |
Hi, I am currently in the process of putting a business case together to obtain support within out company. I believe the issue may be a result of shifting some of the trigger evaluation from the timers to the history syncers and now having to factor in an exclusive lock on the ids table while processing incoming events, effectively removing the parallelism of the history syncers as they all have to wait to get the exclusive lock. Hence the slow query being reported as the history syncers cant update the field because of the lock. This never used to happen in previous versions as the history syncers did not require the use of the ids table as the history* and trends* tables had no id. I would say this is not a bug, just the change in design did not factor in the performance while using exclusive locks. Cheers James |
Comment by Aaron Whiteman [ 2019 Sep 30 ] |
For what it is worth, I see similar events on Zabbix 4.2.6 5351 hosts, 1018695 items, 339437 triggers (2649 required vps). I have to create a maintenance items that disabled data gathering to get the server to process the queue and start fresh. If I don't, I just sit at 11-9 minutes "in the past" for hours. |
Comment by Konstantin Zaytsev [ 2019 Oct 11 ] |
I have the same problem. When i create maintenance with collection data and triggered many problems, we have lag new data from other zabbix-proxy and 100% busy history syncer processes 2019-10-11 06:22:58 UTC 172.25.41.102(48491) [9453]: [528-1] LOG: duration: 27119.832 ms statement: update ids set nextid=nextid+1 where table_name='event_suppress' and field_name='event_suppressid' Zabbix version 4.2.4 |
Comment by Aleksandrs Petrovs-Gavrilovs [ 2019 Nov 12 ] |
Hello, Could you please provide describe the approximate number of NVPS too? |
Comment by Vladislavs Sokurenko [ 2019 Nov 20 ] |
Thank you for your report, this will be fixed under |