[ZBX-21222] Problem screen is very slow, HTTP 502 sometimes Created: 2022 Jun 17  Updated: 2024 Apr 10  Resolved: 2023 Jul 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F)
Affects Version/s: 5.0.23
Fix Version/s: None

Type: Problem report Priority: Blocker
Reporter: Damian Cancela Assignee: Zabbix Development Team
Resolution: Duplicate Votes: 11
Labels: permissions
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2022-06-21-14-31-15-716.png     PNG File image-2022-06-21-14-31-56-796.png     PNG File image-2022-06-21-14-47-47-235.png     PNG File image_22158.png    
Issue Links:
Causes
caused by ZBXNEXT-5878 Enhance permission checking/handling Closed
Duplicate
duplicates ZBXNEXT-5878 Enhance permission checking/handling Closed
Team: Team A
Sprint: Support backlog

 Description   

Hello, i have Zabbix 5.0.23 version using Server and Proxies. Also im using two isolated hosts for the Frontend (httpd 2.4 and php-fpm 7.2).

MySQL version 8.0.15

Since a couple of weeks im getting some strange behavior, PHP pool is dynamic and is starting to grow the amount of php processes. For what I saw, happens when frontend try to check for Problems in problem page or from Problem widget. What happens from Problems page is that sometimes users get 502 http error or from widget Problem the loading weal keeps spinning

If i check the DB current processes im seeing queries that could be running for more than 4000 sec, same query for what i can see.

Like:

SELECT COUNT(DISTINCT a.alertid) AS rowscount,a.eventid FROM alerts a WHERE EXISTS (SELECT NULL FROM events e,functions f,items i,hosts_groups hgg JOIN rights r ON r.id=hgg.groupid AND r.groupid=22 WHERE a.eventid=e.eventid AND e.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hgg.hostid GROUP BY f.triggerid HAVING MIN(r.permission)>0 AND MAX(r.permission)>='2') AND EXISTS (SELECT NULL FROM actions aa WHERE a.actionid=aa.actionid AND aa.eventsource='0') AND (a.userid IS NULL OR EXISTS (SELECT NULL FROM users_groups ug WHERE ug.userid=a.userid AND ug.usrgrpid=22)) AND a.eventid IN (171549,194437,194454,194817,208536,208725,209682, 209705,209724,210026,213358,214582,214596,214872,2 17713,217740,217887,217904,218302,219519,234691,23 5012,235013,235064,235077,235089,235102,235114,571 682,571904,574575,574576,574865,574882,574908,5749 20,769019,1225248,1226534,1226554,1226578,1226595, 1227052,1227053,1227420,1392400,1392412,1392433,13 93034,1395466,1395484,1395504,1395522,1395596,1395 597,1744261,1744265,1745075,1745088,1745105,174512 3,1745158,1745160,1766651,1766673,1766691,1766708, 1766748,1766749,5185522,5185992,5188826,5188973,51 89014,5189118,5189317,5189318,5335130,5335132,5335 146,5335149,5335153,5335155,6332586,6332852,633296 7,6418691,6962830,6963514,6963754,6965597,6965666, 6965705,6965751,6966198,6966200,6971792,7795968,77 96210,7796698,7828035,7828036,7828743,7828828,7828 854,7828919,7932716,10228697,10228699,10228701,112 56798,11256805,11263503,11263505,11264675,11264711 ,11264747,11264775,11264801,11264873,12445280,1276 3819,14473153,14569966,14578805,14579527,16130617, 16130622,16132529,16132575,16132610,16132643,16132 685,16132716,16132759,16132787,16133016,16133058,1 6133092,16133130,16133165,16133196,16133233,161332 71,16136920,16142526,16142565,16142599,16142638,16 142672,16142706,17587540,17589846,17590024,1759020 4,17590363,18506404,18507662,18975552,19595605,200 93620,20200813,21114017,21807745,21943848,21971016 ,21976022,21976024,21976028,21976291,21977800,2205 5301,22136008,22238711,22238712,22338318,22352482, 22360609,22381539,22422079,22436726,22439318,22441 113,22449664,22452164,22453390,22456240,22457243,2 2461914,22461947,22461994,22462008,22462177,224630 04,22463568,22463813,22463861,22463919,22464139,22 464190) GROUP BY a.eventid

These queries come from the Frontend hosts, the workaround im doing is restarting httpd and php services, that will clear the queries but eventually the same situation will happen again. No sure what these queries are trying to look...

Nothing has been changed, maybe more users are using the tool but not more than that. Strange that only happens with users with User or Admin privileges, SuperAdmin looks good.

Actually this problem started to happen some weeks ago with version 5.0.7, investigating i found a thread (similar to my issue) that was fixed in v 5.0.15 https://support.zabbix.com/browse/ZBX-16822

For that reason ive upgraded into latest 5.0.23 but issue is still there.

Do you know what could be happening? attaching a screenshot from MySQL workbench, you can see there that is same query running from different PHP connections.



 Comments   
Comment by Vladimir Stepanov (Inactive) [ 2022 Jun 21 ]

Hi!
Thank you for contacting us.
Please, provide us the result of the tool EXPLAIN using. Just take one of the slow queries and add EXPLAIN like:

EXPLAIN your_slow_query

Also please describe the steps that you did which brought the Frontend into the query execution.

Comment by Damian Cancela [ 2022 Jun 21 ]

 Thank you Stepanov.
Normally the query that has big times is the one ive mentioned, I did what you asked for:

SELECT COUNT(DISTINCT a.alertid) AS rowscount,a.eventid FROM alerts a WHERE EXISTS (SELECT NULL FROM events e,functions f,items i,hosts_groups hgg JOIN rights r ON r.id=hgg.groupid AND r.groupid IN (22,23,24) WHERE a.eventid=e.eventid AND e.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hgg.hostid GROUP BY f.triggerid HAVING MIN(r.permission)>0 AND MAX(r.permission)>='2') AND EXISTS (SELECT NULL FROM actions aa WHERE a.actionid=aa.actionid AND aa.eventsource='0') AND (a.userid IS NULL OR EXISTS (SELECT NULL FROM users_groups ug WHERE ug.userid=a.userid AND ug.usrgrpid IN (22,23,24))) AND a.eventid IN (171549,194437,194454,194817,208536,208725,209682,209705,209724,210026,213358,214582,214596,214872,217713,217740,217887,217904,218302,219519,234691,235012,235013,235064,235077,235089,235102,235114,571682,571904,574575,574576,574865,574882,574908,574920,769019,1225248,1226534,1226554,1226578,1226595,1227052,1227053,1227420,1392400,1392412,1392433,1393034,1395466,1395484,1395504,1395522,1395596,1395597,1744261,1744265,1745075,1745088,1745105,1745123,1745158,1745160,1766651,1766673,1766691,1766708,1766748,1766749,5185522,5185992,5188826,5188973,5189014,5189118,5189317,5189318,5335130,5335132,5335146,5335149,5335153,5335155,6332586,6332852,6332967,6418691,6962830,6963514,6963754,6965597,6965666,6965705,6965751,6966198,6966200,6971792,7795968,7796210,7796698,7828035,7828036,7828743,7828828,7828854,7828919,7932716,10228697,10228699,10228701,11256798,11256805,11263503,11263505,11264675,11264711,11264747,11264775,11264801,11264873,12445280,12763819,14473153,14569966,14570385,14578805,14579527,16130617,16130622,16132529,16132575,16132610,16132643,16132685,16132716,16132759,16132787,16133016,16133058,16133092,16133130,16133165,16133196,16133233,16133271,16136920,16142130,16142526,16142565,16142599,16142638,16142672,16142706,17587540,17589108,17589357,17589670,17589846,17590024,17590204,17590363,18506404,18507662,18975552,19595605,20093620,21114017,21339165,21807745,21920278,21943848,21971016,22055301,22136008,22381539,22621210,22621223,22621232,22623401,22824383,22826257,22886822,22913002,22940513,22942273,23018013,23019071,23042490,23087939,23105867,23106236,23110617,23114907,23120716,23125088,23128690,23129795,23130084,23130140,23130141) GROUP BY a.eventid;

Result

Also sometimes I saw two other queries that could have big times, here you have the results.

SELECT DISTINCT e.eventid,e.objectid,e.clock,e.ns,e.name,e.severity FROM events e,functions f,items i,hosts_groups hg WHERE e.source='0' AND e.object='0' AND NOT EXISTS (SELECT NULL FROM functions f,items i,hosts_groups hgg LEFT JOIN rights r ON r.id=hgg.groupid AND r.groupid IN (14,15,16) WHERE e.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hgg.hostid GROUP BY i.hostid HAVING MAX(permission)<2 OR MIN(permission) IS NULL OR MIN(permission)=0) AND e.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hg.hostid AND hg.groupid=16 AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=e.eventid) AND e.value=1 ORDER BY e.eventid DESC LIMIT 1001;

SELECT p.eventid,p.objectid,p.clock,p.ns,p.name,p.severity FROM problem p WHERE p.source='0' AND p.object='0' AND NOT EXISTS (SELECT NULL FROM functions f,items i,hosts_groups hgg LEFT JOIN rights r ON r.id=hgg.groupid AND r.groupid IN (14,15,16) WHERE p.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hgg.hostid GROUP BY i.hostid HAVING MAX(permission)<2 OR MIN(permission) IS NULL OR MIN(permission)=0) AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=p.eventid) AND p.r_eventid IS NULL ORDER BY p.eventid DESC LIMIT 1001;

Explain result for that one 

There is nothing special in frontend that generate these queries, for what I saw it happens when users want to see Problems from Problem page or even widget Problems i think. But only happens with users that have Admin or User role, not SuperAdmin (maybe with less privilege roles query is more specific and has more conditions¿?).

Let me know if you want to check something else, 

Thank you.

Comment by Damian Cancela [ 2022 Jun 23 ]

Hello, i have saw this issue was linked to some other, do you have any recommendation or some other information that I could share with you?

 

Thanks

Comment by Damian Cancela [ 2022 Jun 28 ]

@vstepanov 

 

hi, do you have some recommendation at least to mitigate the impact?

 

Thanks!

Comment by Damian Cancela [ 2022 Aug 15 ]

Hello, someone from Zabbix team knows if this issue it was addressed in any of the released versions or if it will be addressed soon?

 

Thanks

Comment by Damian Cancela [ 2022 Sep 08 ]

@Reinis Detlavs hello,

 

if you need something else from me that could help you just let me know.

 

Thank you

Comment by Rob Beglinger [ 2022 Sep 21 ]

I am also seeing this issue on Zabbix 6.2.2

Front End Servers:

Rocky Linux 8

nginx 1.14.1-9

php7.4,19-4

DB server:

CentOS 7

mysql-community-server-8.0.30-1

 

This causes the php-fpm active processes to fill up and then causes the interface to become unresponsive.

Comment by Damian Cancela [ 2022 Oct 10 ]

Hi @rdetlavs  hope you are doing well,

 

is there some update about this issue? Thanks in advance

Comment by Everaldo Santos Cabral [ 2022 Oct 26 ]

Any solution for the BUG?

Comment by Jose Eduardo Peres [ 2022 Oct 26 ]

Any solution?

Comment by Damian Cancela [ 2022 Oct 31 ]

Hi team, im aware there is no support agreement but will be possible to know if at least you are working on this? Just i want to give an update in my company.

 

Thank you!

Comment by Anton [ 2022 Nov 17 ]

I have the same problem:

-Ubuntu 22.04
-Zabbix 6.0.9
-MariaDB 10.6.7
-PHP-FPM 8.1.2
-Nginx 1.18.0

Zabbix works fine, but when you try to open "Monitoring -> Hosts", it freezes for 5-7 seconds. Fresh installation (less than a week running), the number of hosts is still negligible (5 hosts)

Debug mode showed a very slow SQL query:

SELECT DISTINCT t.triggerid FROM triggers t,functions f,items i WHERE i.hostid IN (10084,10529,10531,10532,10533,10534) AND f.triggerid=t.triggerid AND f.itemid=i.itemid AND NOT EXISTS (SELECT NULL FROM functions f,items i,hosts h WHERE t.triggerid=f.triggerid AND f.itemid=i.itemid AND i.hostid=h.hostid AND (i.status<>0 OR h.status<>0)) AND t.status=0 AND t.flags IN (0,4); 

Running for more than 6 seconds

P.S. In my case, there is no difference what role / group the user has, the problem is observed at all.

Comment by Damian Cancela [ 2022 Dec 03 ]

Hi Support Team, can someone please at least give an update?. Im seeing frequently updates and links to internal ZBX issues (something that i really appreciate) but would be great if we could have a date or a hello or something

 

Thanks of course!

Comment by Damian Cancela [ 2023 Feb 23 ]

Hi team, please can you share some update? Some workaround? Removing problem/events tables old data manually might help? Housekeeping i think is not doing anything with those tables.

 

Thanks

Comment by Samuel Suzano [ 2023 May 13 ]

Hi Damian, I resolved this problem by scheduling a daily 'optimize' at tables with performance issues. It's a workaround, but solved my problem.

OPTIMIZE TABLE events;
OPTIMIZE TABLE functions;
OPTIMIZE TABLE items;
OPTIMIZE TABLE hosts_groups;

Comment by Alexander Vladishev [ 2023 Jul 14 ]

Will be fixed in scope of ZBXNEXT-5878.

Comment by Dhinesh Kumar [ 2024 Jan 18 ]

Any solution till now? Am using 5.0.38 and am facing similar issue. For normal user and admin dashboard, problem tabs taking long time and could see spike in DB load avg. When checked in DB some queries are in execution for long time. Sharing one query below.

SELECT DISTINCT p.eventid,p.objectid,p.clock,p.ns,p.name,p.severity FROM problem p,functions f,items i,hosts_groups hg WHERE p.source='0' AND p.object='0' AND NOT EXISTS (SELECT NULL FROM functions f,items i,hosts_groups hgg LEFT JOIN rights r ON r.id=hgg.groupid AND r.groupid IN (35,36,41) WHERE p.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hgg.hostid GROUP BY i.hostid HAVING MAX(permission)<2 OR MIN(permission) IS NULL OR MIN(permission)=0) AND p.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hg.hostid AND hg.groupid IN (1245,1325,1548) AND p.severity IN (4,5) AND p.acknowledged=0 AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=p.eventid) AND (p.r_eventid IS NULL OR p.r_clock>1705458955) ORDER BY p.eventid DESC LIMIT 501

Comment by aws maythem [ 2024 Jan 18 ]

Any solution please, am using 6.4.2 and facing a similar issue For normal users and admins on the search and view problems in monitoring it takes 3 min to finish the query for every user, except for Super Admin users, and Setting all users as Super Admin does not work for us, as we need to filter the hosts they can see
I have :
42548 hosts
1724901 items
1434 users
41 groups
MariaDB version 10.6

Generated at Thu Dec 26 15:13:17 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.