[ZBX-13306] Zabbix Server freezing Created: 2017 Dec 26  Updated: 2019 Mar 21  Resolved: 2018 Mar 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.4
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: elias abou hamad Assignee: Unassigned
Resolution: Commercial support required Votes: 0
Labels: trigger, zabbix_server
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos7/100GB Ram/1TB SSD disk


Attachments: File graphs.RAR    

 Description   

Dear All,

Please note that we are facing any freezing issue in zabbix 3.4.4 . Server Specs Centos7/100GB Ram/1TB SSD disk / Mysql innoDB with partition per day/

Number of hosts (enabled/disabled/templates) 2094 1752 / 204 / 138
Number of items (enabled/disabled/not supported) 372260 215143 / 141074 / 16043
Number of triggers (enabled/disabled [problem/ok]) 80292 63887 / 16405 [452 / 63435]
Number of users (online) 46 14
Required server performance, new values per second 3407.71
and we have 2 Passive proxies Handel 500 Hosts .

Please note that zabbix server are freezing since all service are running and im seeing the zabbix_server.log it seems the polling working fine ex 24427:20171226:101241.672 sending configuration data to proxy "Zabbix System proxy" at "194.126.19.42", datalen 1652812
25346:20171226:101242.922 SNMP agent item "ifHCOutUcastPkts[GigabitEthernet0/17]" on host "GDS - Ersel1-SW" failed: first network error, wait for 60 seconds
25301:20171226:101247.779 SNMP agent item "MiniLinkBridgeUcastPktsIn[1/1/9-Jar-Tyr-TYR-JAR-CN]" on host "GDS MiniLink Jarjour3-CN-TYR" failed: first network error, wait for 60 seconds) but the problem is note displaying any triggers and notification . once i restart zabbix service it working fine and displaying triggers and after some time its freezing again and missing all triggers.

what your opinion on this issue and how we can solved the problem.

let me know if you need more information

Best Regards,
Elias



 Comments   
Comment by Alexey Pustovalov [ 2017 Dec 26 ]

Please check graphs from "Template App Zabbix Server".

Comment by elias abou hamad [ 2017 Dec 26 ]

Please note that all graph are included in the folder and the number of StartPollers are 999 (if you see the 100 % busy we can optimize by changing the porling interval).

Best Regards,
Elias

Comment by Alexey Pustovalov [ 2017 Dec 26 ]

3407.71NVPS does not require 999 pollers. Decrease it to 200-300.

Comment by elias abou hamad [ 2017 Dec 26 ]

Please note that i decries the startpollers to 280.

Comment by elias abou hamad [ 2018 Jan 09 ]

Please Note Since i decrease the StartPollers the problem not occurred . but the graphs showing the busy poller 100% how can i fix the problem without freezing the server ? and could you please advice about the relation between the NVPS and the StartPollers

Regards,
Elias

Comment by elias abou hamad [ 2018 Jan 27 ]

Problem still reoccurring

26092:20180127:055115.865 server #1538 started icmp pinger #296
26108:20180127:055115.866 server #1543 started alert manager #1
22195:20180127:055115.866 cannot connect to alert manager service: Cannot connect to service "alerter": [111] Connection refused.
22191:20180127:055115.866 cannot connect to alert manager service: Cannot connect to service "alerter": [111] Connection refused.
22192:20180127:055115.866 cannot connect to alert manager service: Cannot connect to service "alerter": [111] Connection refused.
15911:20180127:055115.881 One child process died (PID:22191,exitcode/signal:1). Exiting ...
15911:20180127:055117.903 syncing history data...
15911:20180127:055117.904 syncing history data done
15911:20180127:055117.904 syncing trend data...
15911:20180127:055117.904 syncing trend data done
15911:20180127:055117.905 Zabbix Server stopped. Zabbix 3.4.3 (revision 73588).
^P^P

Comment by elias abou hamad [ 2018 Mar 08 ]

any update??

regards,
Elias

Comment by Glebs Ivanovskis (Inactive) [ 2018 Mar 19 ]
  1. Consider upgrading to a more recent 3.4 version.
  2. Revise your configuration file. Read documentation thoroughly, especially footnotes.
  3. If no success, have a look at available ways of getting help.
Comment by Zhang Bowen [ 2019 Mar 21 ]

Did you solve the problem yet? It also happened to me when I try to set StartPollers large enough for hosts discouvery. 

The following is my error log:

5033:20190321:061103.707 server #625 started trapper #2
5035:20190321:061103.714 server #626 started trapper #3
5036:20190321:061103.742 server #627 started trapper #4
5039:20190321:061103.751 server #628 started trapper #5
5042:20190321:061103.782 server #629 started icmp pinger #1
5045:20190321:061103.791 server #630 started alert manager #1
5048:20190321:061103.798 server #631 started preprocessing manager #1
4313:20190321:061104.060 cannot connect to alert manager service: Cannot connect to service "alerter": [111] Connection refused.
4315:20190321:061104.062 cannot connect to alert manager service: Cannot connect to service "alerter": [111] Connection refused.
4314:20190321:061104.071 cannot connect to alert manager service: Cannot connect to service "alerter": [111] Connection refused.
4311:20190321:061104.073 One child process died (PID:4313,exitcode/signal:1). Exiting ...

Generated at Fri Apr 26 07:55:14 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.