-
Incident report
-
Resolution: Unresolved
-
Trivial
-
None
-
6.0.26
-
None
-
Pre-Production
We have upgraded Zabbix server from 3.4.9 (RHEL 7.9) to 6.0.23 on Amazon Linux 2023 operating system using AWS PostgreSQL RDS version 14.6 and then upgraded RDS to 14.9.
And we are facing below issues:
01. After every couple of hours, we keep getting message saying "Zabbix agent is unreachable for more than 5 minutes" and this is for all agents.
After we restart the zabbix-server service, we keep getting duplicate value errors in the zabbix-server log. So, following one of solution mentioned in zabbix ticket, We truncated the IDS table and then rebooted the database and then zabbix-server appears to work back normally for few hours and then again it goes into hang state and same Agent unreachable error starts appearing.
Looking at the duplicate value errors, noticed that the table "event_tag" got the the max value of 34 million and then it keeps getting cleared, as zabbix keeps deleting from that table. So, now, it only got 57k records, But the key value is not reset, So we got the duplicate value error. Please check screenshot for details about this.
02. When we restart the zabbix-server service, it takes long time and in the log, we see "syncing history data" message.
03. History sync process is using 95-100% cpu on the server.
04. After the upgrade, we have applied the post upgrade primary key updates to the history tables, But in the front-end GUI, we still see the message saying, "Database history tables upgraded : NO".
Details about number of hosts, templates, items, triggers is in the attached screenshot.
Steps to reproduce:
- Changes in configuration...
- Upgraded Zabbix server from 3.4.9 (RHEL7.9) to 6.0.23 (Amazon Linux 2023)
- Backend database is AWS RDS PostgreSQL 14.6 and upgrade to 14.9
- Navigate to screen title...
- Monitoring > PROBLEMS
Result:
See screenshot...