[ZBX-15844] Zabbix server inability to start on certain conditions - manager process is late after a child Created: 2019 Mar 19  Updated: 2024 Apr 10  Resolved: 2019 Apr 10

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 4.0.7rc1, 4.2.1rc1, 4.4.0alpha1, 4.4 (plan)

Type: Patch request Priority: Minor
Reporter: Mikhail Makurov Assignee: Andrejs Kozlovs
Resolution: Fixed Votes: 1
Labels: process, startup
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu server 16, Zabbix server 4.0.1, perhaps, PostgreSQL 9.6, not important to the issue


Team: Team A
Sprint: Sprint 50 (Mar 2019), Sprint 51 (Apr 2019)
Story Points: 0.5

 Description   

The story:

Due to some unrelated to this issue problems (probably system overload or slow database connection establishment or something) Zabbix server on startup starts threads significantly slower then it has to.

When there are hundreds of threads to start, the process might take 20-30 seconds. Which isn't a problem itself at all.

The problem: if takes more then 10 seconds to start threads between the first "alerter" thread has been started and the alert manager thread, which happens quite later, then the alerter thread times out with 111 error, exiting and causing Zabbix server to stop.

I propose the solution: start alert manager threads first. Which i assume might be accomplished by changing server.c code, in
int get_process_info_by_thread(
function responsible for the order of threads being started.

And move lines

else if (local_server_num <= (server_count += CONFIG_ALERTMANAGER_FORKS))
    {
        *local_process_type = ZBX_PROCESS_TYPE_ALERTMANAGER;
        *local_process_num = local_server_num - server_count + CONFIG_ALERTMANAGER_FORKS;
    }
 

ahead of

else if (local_server_num <= (server_count += CONFIG_ALERTER_FORKS))
    {
        *local_process_type = ZBX_PROCESS_TYPE_ALERTER;
        *local_process_num = local_server_num - server_count + CONFIG_ALERTER_FORKS;
    }

I did so on my test environment, the problem with alert processes has gone, and i have found no other issues with it.

Thanks



 Comments   
Comment by Andrejs Kozlovs [ 2019 Apr 10 ]

Fixed in:

  • pre-4.0.7rc1 r92436
  • pre-4.2.1rc1 r92437
  • pre-4.4.0alpha1 (trunk) r92438
Generated at Thu May 02 00:35:56 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.