[ZBX-323] Zabbix 1.4.4 - server suddenly stops collecting data Created: 2008 Mar 06 Updated: 2017 May 30 Resolved: 2008 Apr 02 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 1.4 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Blocker |
Reporter: | brendon | Assignee: | Alexei Vladishev |
Resolution: | Fixed | Votes: | 3 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Linux 2.6.22-3-amd64 #1 SMP Mon Nov 12 17:53:18 UTC 2007 x86_64 GNU/Linux |
Attachments: |
![]() ![]() |
Description |
All triggers with nodata functions change to ON after a server is running anywhere from 1 to 7 days. It use to be a weekly problem, but is now a daily issue with our server. After closer inspection, this happens because the Zabbix server stops recording ALL items. I checked 3 servers, and they all stop collecting information at the same time, thus causing the working portion of zabbix to trigger alerts. One thing to note is that simple checks like icmpping still collect data when this issue occurs. After the above happens the zabbix server continues to run, except the nodata triggers are ON and no data related to agents is being collected by the server. |
Comments |
Comment by brendon [ 2008 Mar 06 ] |
To resolve this, I run a simple /etc/init.d/zabbix-server restart. I have also looked through the logs, and have not found anything. I'll attach logs next time I catch them before they are over-written. |
Comment by Torsten Sorger [ 2008 Mar 19 ] |
This problem is somehow caused by the active agent handling in the server code. I changed my agent configuration from passive to active. Then the server randomly (2-48h) stops collecting data. some logfile excerpts: zabbix_agentd.log zabbix_server.log I'll attach some server logs in a next post with debuglevel=5 Which might be interesting is, that I use a virtual server for zabbix (virtuozzo enviroment). Don't know if this is important. |
Comment by Alexei Vladishev [ 2008 Mar 19 ] |
This is already fixed in pre-1.4.5 code. Alexei |
Comment by Torsten Sorger [ 2008 Mar 19 ] |
Logfile (debuglevel=4) of 1.4.4 zabbix_server with active agents causing the server to stop collecting data (error still exist in pre-1.4.5 from 17.3.2008) |
Comment by brendon [ 2008 Mar 19 ] |
I opened this ticket awhile ago once the server stopped accepting (or possibly recording) data from agents. I sent Alexei my logs and after closer inspection, the only thing I can relate this to is a busy mysql server. It happens at about 4 AM almost every night. Almost every day, zabbix needs to be restarted and I can't enable actions, because all of the high severity nodata actions are triggered when data is no longer collected. |
Comment by Torsten Sorger [ 2008 Mar 20 ] |
The server stopped this night again. Actually I doubt that the MySQL Server is the problem. Zabbix is the only process, that uses the database. I tried the nightly build from 19.3.2008 for the 1.4 branch. I will attach new server logs, if someone is willing to look into them... |
Comment by Torsten Sorger [ 2008 Mar 20 ] |
zabbix_server.log from pre-1.4 (19.3.2008) with debuglevel=4 |
Comment by brendon [ 2008 Mar 20 ] |
Torsten- Can you correlate your problem with high server load or high disk IO? I'm 99% sure that when my server disk io is very busy at night, it causes zabbix to malfunction. I installed SAR, and right when it gets very busy disk IO, is about the time zabbix stops working properly. From the data below, it zabbix broke at about 4 AM. SAR busy disk from march 18: SAR normal disk from march 18: |
Comment by Torsten Sorger [ 2008 Mar 22 ] |
Actually I have problems getting this IO statistics because zabbix server is running on a vserver (virtuozzo). I'll give it another try next week. The strange thing is, that this problem only arises when I use active agents. Is this the same situation for you? |
Comment by Sylvain Coutant [ 2008 Mar 27 ] |
I encounter this behavior almost once per day with 1.5 from early March. It happens when our backup process starts and put pressure on disk. Something breaks at this point. I have to restart Zabbix server to get rid of that. |
Comment by Torsten Sorger [ 2008 Apr 02 ] |
seems fixed in 1.4.5. - thanks! |
Comment by Alexei Vladishev [ 2008 Apr 02 ] |
I close it. |
Comment by brendon [ 2008 Apr 02 ] |
Solved? I'm still experiencing this issue... |
Comment by Alexei Vladishev [ 2008 Apr 03 ] |
I am pretty sure the problem no longer exists in 1.4.5. I need evidence if you do not agree Alexei |