[ZBX-16058] Zabbix server queue (IPMI agent) has dramatically increased after upgrade to 4.0.6 Created: 2019 Apr 30  Updated: 2024 Apr 10  Resolved: 2019 Jun 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.0.6, 4.0.7
Fix Version/s: 4.0.8rc1, 4.2.2rc1, 4.4.0alpha1, 4.4 (plan)

Type: Problem report Priority: Major
Reporter: Alexander Ivanes Assignee: Andris Mednis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 4.15.0-42-generic #45~16.04.1-Ubuntu SMP Mon Nov 19 13:02:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Zabbix server 4.0.7


Attachments: File frk_strace_1.txt.7z     File temp.log.txt.7z     File temp_frk2.log.txt.7z     PNG File zabbix-queue.png    
Issue Links:
Causes
caused by ZBX-15578 IPMI times out and fails to read valu... Closed
Duplicate
Team: Team A
Sprint: Sprint 52 (May 2019), Sprint 53 (Jun 2019)
Story Points: 0.25

 Description   

After upgrading to 4.0.6 (from 4.0.5) zabbix queue has dramatically increased. Its seems it's related to the IPMI agent items.

Increasing the number of IPMI pollers slightly helps. You can see on the graph between 29 and 30 of april. Number of pollers increased from 4 to 8.



 Comments   
Comment by Andrejs Sitals (Inactive) [ 2019 May 16 ]

There's a loop in IPMI poller that (1) allows OpenIPMI to process all internal operations and (2) checks if there's a message from the manager. When there's message from the manager, it stops the loop and processes the message. After processing the message, it enters the same loop.

It's my understanding that in most cases receiving message from manager will also result in performing OpenIPMI operations, so it should be safe call zbx_perform_all_openipmi_ops() only when there are no messages from the manager for a longer period.

Proposed modification of the loop:

  1. check for new messages from manager before calling zbx_perform_all_openipmi_ops()
  2. call zbx_perform_all_openipmi_ops() only if there are no messages from the manager
  3. slightly increase the timeout value for IPC
Index: src/zabbix_server/ipmi/ipmi_poller.c
===================================================================
--- src/zabbix_server/ipmi/ipmi_poller.c        (revision 92910)
+++ src/zabbix_server/ipmi/ipmi_poller.c        (working copy)
@@ -219,18 +219,23 @@
 
                update_selfmon_counter(ZBX_PROCESS_STATE_IDLE);
 
-               while (NULL == message)
+               for (;;)
                {
+                       const int ipc_timeout = 2;
                        const int ipmi_timeout = 1;
-                       const int ipc_timeout = 1;
 
-                       zbx_perform_all_openipmi_ops(ipmi_timeout);
-
                        if (SUCCEED != zbx_ipc_async_socket_recv(&ipmi_socket, ipc_timeout, &message))
                        {
                                zabbix_log(LOG_LEVEL_CRIT, "cannot read IPMI service request");
                                exit(EXIT_FAILURE);
                        }
+
+                       if (NULL != message)
+                       {
+                               break;
+                       }
+
+                       zbx_perform_all_openipmi_ops(ipmi_timeout);
                }
 
                update_selfmon_counter(ZBX_PROCESS_STATE_BUSY);
Comment by Andris Mednis [ 2019 May 20 ]

Available in:

  • 4.0.8rc1 031a511ac49
  • 4.2.2rc1 233e9c2fcd2
  • 4.4.0alpha1(trunk) 2e8bccd44aa
Generated at Sat Apr 27 01:07:52 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.