[ZBX-24223] After upgrading Zabbix server from 3.4.9 to 6.0.23, Agents unreachable Created: 2024 Mar 14 Updated: 2024 Mar 15 |
|
Status: | Open |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 6.0.26 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Ahsanullah Khan | Assignee: | Aigars Kadikis |
Resolution: | Unresolved | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Pre-Production |
Attachments: |
![]() ![]() |
Description |
We have upgraded Zabbix server from 3.4.9 (RHEL 7.9) to 6.0.23 on Amazon Linux 2023 operating system using AWS PostgreSQL RDS version 14.6 and then upgraded RDS to 14.9. And we are facing below issues: 01. After every couple of hours, we keep getting message saying "Zabbix agent is unreachable for more than 5 minutes" and this is for all agents. After we restart the zabbix-server service, we keep getting duplicate value errors in the zabbix-server log. So, following one of solution mentioned in zabbix ticket, We truncated the IDS table and then rebooted the database and then zabbix-server appears to work back normally for few hours and then again it goes into hang state and same Agent unreachable error starts appearing. Looking at the duplicate value errors, noticed that the table "event_tag" got the the max value of 34 million and then it keeps getting cleared, as zabbix keeps deleting from that table. So, now, it only got 57k records, But the key value is not reset, So we got the duplicate value error. Please check screenshot for details about this.
02. When we restart the zabbix-server service, it takes long time and in the log, we see "syncing history data" message. 03. History sync process is using 95-100% cpu on the server. 04. After the upgrade, we have applied the post upgrade primary key updates to the history tables, But in the front-end GUI, we still see the message saying, "Database history tables upgraded : NO". Details about number of hosts, templates, items, triggers is in the attached screenshot.
Steps to reproduce:
Result: |
Comments |
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
Unable to attach screenshots, it keeps failing with internal error. |
Comment by Vladislavs Sokurenko [ 2024 Mar 14 ] |
It looks like problem is with older Zabbix agent, upgrading agent should help, what kind of configuration does it have, it seems that it have some very frequent checks, could you please share configuration for item that has lots of data ? |
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
We have got like half old agents on RHEL7.9 machine and half new zabbix-agent-2 on new AL2023 machines and both works fine and sends the data for sometime/hours. Could you please suggest, how do I take the configuration out for item that has lots of data. |
Comment by Vladislavs Sokurenko [ 2024 Mar 14 ] |
Duplicate errors should mention itemid that is problematic, and it's possible to find item information by doing following select: select * from items where itemid=<problem id>; |
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
Thanks vso for quick response. Yes, we can do that, but there are lots of errors like below, So, this doesn't seem to be a sensible solution to do it for each row individually. So, just wanted to understand, when zabbix is deleting the data, how this primary key value is getting reset? So, that again data can be inserted using old ids.
2616:20240313:212616.071 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "event_tag_pkey"
DETAIL: Key (eventtagid)=(57696) already exists.
Also, please suggest on zabbix-server hanging and agents unreachable.
|
Comment by Vladislavs Sokurenko [ 2024 Mar 14 ] |
It is required to identify hanging agents and to find out their configuration in frontend |
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
In Zabbix Front end, it shows the same error for all of the agents, that's what is worrying, suddenly, No Agent pings happening for any of the agent. Duplicate key value errors are mostly for event_tag_pkey and problem_tag_pkey. Not for items.
2630:20240313:110017.272 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "event_tag_pkey" 2629:20240313:110017.995 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "event_tag_pkey" 2630:20240313:110018.378 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "event_tag_pkey" 2629:20240313:110701.493 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "problem_tag_pkey" 2629:20240313:110702.613 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "problem_tag_pkey" 2629:20240313:110703.460 [Z3008] query failed due to primary key constraint: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "problem_tag_pkey"
|
Comment by Vladislavs Sokurenko [ 2024 Mar 14 ] |
Yes, it could happen if database is edited manually, id should be restored to max id from that table. |
Comment by Vladislavs Sokurenko [ 2024 Mar 14 ] |
Restart of Zabbix server should help with the issue |
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
No, we truncated the IDS table after we hit this issue and yes, then we restarted the zabbix-server. Sometimes, this is resolving the issue and sometime doesn't. So, can't really rely on restarting. Also, When zabbix-server is hanging and showing agents unreachable, then if we try to restart zabbix-server, then we are seeing these 2 issues.
|
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
Any suggestions for the agents unreachable issue? |
Comment by Ahsanullah Khan [ 2024 Mar 14 ] |
Please find the zabbix server and agents configurations attached. Please have a look and suggest, what can we do get the zabbix server up and running smoothly. |
Comment by Ahsanullah Khan [ 2024 Mar 15 ] |
aigars.kadikis , Could you please have a look at the parameters and suggest, if we need change anything there. Also, if you could point us in the right direction for all agents unreachable issue will be great. Thanks.
|