[ZBX-16130] Zabbix 4.0.7 - History syncers slow when many events arrive Created: 2019 May 15  Updated: 2019 Nov 20  Resolved: 2019 Nov 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.0.7
Fix Version/s: None

Type: Incident report Priority: Minor
Reporter: James Cook Assignee: Aleksandrs Petrovs-Gavrilovs
Resolution: Duplicate Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix - 4.0.7
Hosts - 13035
Items - 3431108
Triggers - 387768
NVPS - 7610


Issue Links:
Duplicate
duplicates ZBX-16955 History syncers lock each other when ... Closed

 Description   

I have noticed the history syncers get busy and delay syncing when lots of events arrive and last time this occurred for us was 2700 in 10 minutes due to a large environment change being performed (including internal events ie item not supported).

There is many zabbix server log entries with this entry in there - slow query: 11.164569 sec, "update ids set nextid=nextid+1 where table_name='event_suppress' and field_name='event_suppressid'".`

The behavior is that the history syncers are slow and the cache starts to build up until the above log entries disappear (ie processing caught up) OR lots of hosts come out on their maintenance period (and we have a lot of hosts in maintenance after hours).

The issue is that if the history syncing gets far enough behind other nodata() triggers start to fire causing additional load and compounds the issue.

This never occured in Zabbix 3.4 and we have only just updated tgo Zabbix 4.0 in the last couple of weeks and this has happened several times.



 Comments   
Comment by Aleksandrs Petrovs-Gavrilovs [ 2019 May 17 ]

Hello [email protected]

Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding.

Comment by James Cook [ 2019 Aug 13 ]

Hi,

I am currently in the process of putting a business case together to obtain support within out company.

I believe the issue may be a result of shifting some of the trigger evaluation from the timers to the history syncers and now having to factor in an exclusive lock on the ids table while processing incoming events, effectively removing the parallelism of the history syncers as they all have to wait to get the exclusive lock.

Hence the slow query being reported as the history syncers cant update the field because of the lock.

This never used to happen in previous versions as the history syncers did not require the use of the ids table as the history* and trends* tables had no id.

I would say this is not a bug, just the change in design did not factor in the performance while using exclusive locks.

Cheers

James

Comment by James Cook [ 2019 Aug 13 ]

Hi,

I am currently in the process of putting a business case together to obtain support within out company.

I believe the issue may be a result of shifting some of the trigger evaluation from the timers to the history syncers and now having to factor in an exclusive lock on the ids table while processing incoming events, effectively removing the parallelism of the history syncers as they all have to wait to get the exclusive lock.

Hence the slow query being reported as the history syncers cant update the field because of the lock.

This never used to happen in previous versions as the history syncers did not require the use of the ids table as the history* and trends* tables had no id.

I would say this is not a bug, just the change in design did not factor in the performance while using exclusive locks.

Cheers

James

Comment by Aaron Whiteman [ 2019 Sep 30 ]

For what it is worth, I see similar events on Zabbix 4.2.6

5351 hosts, 1018695 items, 339437 triggers (2649 required vps). I have to create a maintenance items that disabled data gathering to get the server to process the queue and start fresh. If I don't, I just sit at 11-9 minutes "in the past" for hours.

Comment by Konstantin Zaytsev [ 2019 Oct 11 ]

I have the same problem. When i create maintenance with collection data and triggered many problems, we have lag new data from other zabbix-proxy and 100% busy history syncer processes

2019-10-11 06:22:58 UTC 172.25.41.102(48491) [9453]: [528-1] LOG:  duration: 27119.832 ms  statement: update ids set nextid=nextid+1 where table_name='event_suppress' and field_name='event_suppressid'

Zabbix version 4.2.4

Comment by Aleksandrs Petrovs-Gavrilovs [ 2019 Nov 12 ]

Hello,

Could you please provide describe the approximate number of NVPS too?
And as I understand the issue happens only with high number of hosts being in maintenance  with data collection? Are there any additional details that can be provided?

Comment by Vladislavs Sokurenko [ 2019 Nov 20 ]

Thank you for your report, this will be fixed under ZBX-16955

Generated at Fri Apr 26 20:30:23 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.