[ZBX-15610] ALL MY EMAILS STUCK IN 'INPROGRESS' STATE. Created: 2019 Feb 07  Updated: 2019 Feb 14  Resolved: 2019 Feb 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Problem report Priority: Critical
Reporter: Nitin Verma Assignee: Aigars Kadikis
Resolution: Commercial support required Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

RHEL 7.1.1503


Attachments: JPEG File IN PROGRESS.JPG     JPEG File INTERNAL PROCESS BUSY-last 1 day.JPG     JPEG File INTERNAL PROCESS BUSY.JPG     JPEG File capture1.JPG    

 Description   

All My emails are stuck in in progress state. I get emails for few alerts, for few get delayed emails. And for some it just gets failed. I have checked the mailq for the mail server, no issues there. Is that a known problem? 



 Comments   
Comment by Nitin Verma [ 2019 Feb 07 ]

You need something from me?

Comment by Aigars Kadikis [ 2019 Feb 07 ]

Hello Nitin,

Thank you for registering this issue. Here are some additional questions:

Which Zabbix version has been used?

What is the detailed description when you press on [i]?

What is the busyness of Zabbix alerter process? This can be observed by clicking on Monitoring -> Graphs -> Select group: Zabbix servers, -> select host: Zabbix server, -> "Zabbix Internal process busy". Please attach the graph to represent a time period for last 1 day.

 

 

Comment by Nitin Verma [ 2019 Feb 07 ]

Version :- 4.0.1

Comment by Nitin Verma [ 2019 Feb 07 ]

Last 1 day graph

Comment by Aigars Kadikis [ 2019 Feb 07 ]

Thank you for the screens.

Also please attach the graph "Zabbix data gathering process".

How many 'StartPollers=' and 'Timeout=' you have configured in 'zabbix_server.conf'?

Comment by Nitin Verma [ 2019 Feb 07 ]

Timeout=4

StartPollers="It is commented out, Might be picking default"

StartPollersUnreachable=100 

Comment by richlv [ 2019 Feb 07 ]

100 unreachable pollers still having such a high utilisation indicates serious problems in the environment. Most likely, this is is a support request, thus would be a better fit on IRC or other support channels, listed at https://zabbix.org/wiki/Getting_help .

Still, it would be interesting to see how many alerts in which status you have. What does the following return?

select status,count(*) from alerts group by status;
Comment by Nitin Verma [ 2019 Feb 07 ]

----------------+

status count

----------------+

0 378645
1 438886
2 27205

----------------+

 

Let me know if anything else is needed

Comment by Ingus Vilnis [ 2019 Feb 07 ]

Assuming default 3 alerters are running then one of them is constantly stuck at 100% busy. (see constant 33.33% alerter load)

Some weird Email media type settings could have caused that. Administration -> Media types -> Email . Check both tabs in config there. 

Comment by Nitin Verma [ 2019 Feb 07 ]

Nope! Wierd media types were jabber and sms, they were not used anywhere!!

Disabled them anyway!!

Comment by Nitin Verma [ 2019 Feb 11 ]

Any Update??

Comment by Nitin Verma [ 2019 Feb 11 ]

In creased the alerters to 50 and escalators to 50 as well. Mails are getting delayed by 1-2 days

Comment by Ingus Vilnis [ 2019 Feb 11 ]

Not the SMS and Jabber you were asked about as in the screenshots it is clear the Email is the media used. What about settings in them?

Knowing that you now have 50 alerters and escalators will not help if your mailserver is the bottleneck in the first place. 

Comment by Nitin Verma [ 2019 Feb 11 ]

No Mailq found on the mail server too!

Comment by Aigars Kadikis [ 2019 Feb 14 ]
select status,count(*) from alerts group by status;

This reported:

0 378645
1 438886
2 27205

which means:

0ALERT_STATUS_NOT_SENT - Alert is not yet sent but is cached (being processed) by alert manager
1ALERT_STATUS_SENT - Sent
2ALERT_STATUS_FAILED - Sending failed
3ALERT_STATUS_NEW - New alert, not yet processed by alert manager

There is a lot of messages to deliver.

I will mark and close this ticket as Commercial support required.

Comment by Nitin Verma [ 2019 Feb 14 ]

Resolved it!!

Seems like problem was with only 1 connection to smtp server. Increased the Concurrent sessions to fix it.

Generated at Fri Apr 19 13:40:04 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.