[ZBX-4156] Zabbix agent service crash/hang Created: 2011 Sep 20 Updated: 2017 May 30 Resolved: 2011 Sep 22 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 1.8.6, 1.8.7 |
Fix Version/s: | 1.8.9, 1.9.7 (beta) |
Type: | Incident report | Priority: | Critical |
Reporter: | Alexandru Nica | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | agent | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Windows |
Attachments: |
![]() |
Description |
After updating agent to 1.8.7.rc1 (revision 21392) I get the following errors in the log. 5308:20110920:130647.210 PerfCounter 'Jýþ' FAILED: invalid format These may occur several times and at random times the agent may hang for a few minutes (so long as to trigger a "system down" PROBLEM in zabbix) and after a few minutes it resumes work like nothing happened (and triggers a "system down" OK in zabbix) I have a set of general items for monitoring CPUs like "perf_counter[\Processor(X)\% Processor Time, 300]" with 0<=X<=7. Of course not all systems have 8 CPUs, they may have just 4, as is the case with the server in question, and perf_counter instances for CPUs with X>4 would be invalid. I understand that part of the perfcounter code was rewritten in 1.8.6. |
Comments |
Comment by Alexandru Nica [ 2011 Sep 20 ] |
I can confirm that the issue only affects Windows 2008 R2. I have looked at several Windows 2003 servers and there is are no error messages, even with invalid perf_counter instances. |
Comment by richlv [ 2011 Sep 21 ] |
timeouts could be a different issue - |
Comment by Rudolfs Kreicbergs [ 2011 Sep 21 ] |
There indeed is a problem regarding the message formatting and that on it's own should not hang the agent. The unknown error message is: <rudolfs> REPRODUCED - it seems that I have reproduced the problem when agent hangs, will investigate that. |
Comment by Alexandru Nica [ 2011 Sep 21 ] |
With debug level 4 I get something of a cleaner output: 4452:20110921:123026.589 In PERF_COUNTER() I dont' get the following errors anymore: No hang until now, will restart the agent a few more times and wait another hour. |
Comment by Rudolfs Kreicbergs [ 2011 Sep 21 ] |
That in fact is a memory violation on read. Both error messages are fixed in dev branch: svn://svn.zabbix.com/branches/dev/ZBX-4156 Could you please try to repeat the "hanging" problem with that branch (it is based on 1.8.8rc2)? It seems that I was wrong in did NOT REPRODUCE the problem. |
Comment by Alexandru Nica [ 2011 Sep 21 ] |
Did not manage to hang it with DebugLevel=4 5404:20110921:152246.685 Starting Zabbix Agent [BITVMH1]. Zabbix 1.8.7rc1 (revision 21392). Will try with the dev branch you mentioned. Any windows svn client you recommend? Tortoise keeps crashing on me. Another thing I just saw is that with 1.8.7 I get "Active check [perf_counter[\Memory\Pages/sec, 300]] is not supported. Disabled." for a counter which is actually valid and should not be disabled. Will report on this after trying the dev branch. |
Comment by Alexandru Nica [ 2011 Sep 21 ] |
Running zabbix 1.8.8rc2 (revision 21676). Still get the error message on debuglevel=default 5092:20110921:164042.938 PerfCounter 'qýþ' FAILED: invalid format Valid counters DO get disabled, but it seems that only after the error message. 4152:20110921:164358.393 agent #0 started [collector] |
Comment by Rudolfs Kreicbergs [ 2011 Sep 21 ] |
Sorry, I did not compile the agent in the dev branch, will update the branch in a couple of minutes <rudolfs> DONE in r21799 at svn://svn.zabbix.com/branches/dev/ZBX-4156 |
Comment by Rudolfs Kreicbergs [ 2011 Sep 21 ] |
Are you using 32bit Win? I can compile an attach the .exe to the issue. |
Comment by Alexandru Nica [ 2011 Sep 21 ] |
Would you please attach the x64 version also? |
Comment by Rudolfs Kreicbergs [ 2011 Sep 21 ] |
Fair enough, it was a 50-50 chance |
Comment by Alexandru Nica [ 2011 Sep 21 ] |
Thank you for the binary, now running Zabbix 1.8.8 (revision {ZABBIX_REVISION}). |
Comment by Alexandru Nica [ 2011 Sep 22 ] |
So far so good, no nasty error messages, just a clean "not supported, disabled". No hangs, no script timeouts, I'm really happy with this. |
Comment by Rudolfs Kreicbergs [ 2011 Sep 22 ] |
I'll move forward with reviewing and testing the fix since the it is likely a separate issue from the hangs. Though it will not be closed till tomorrow anyhow and please feel free to reopen the issue if the problem occurs even after closing the issue. |
Comment by Alexandru Nica [ 2011 Sep 22 ] |
Sorry, still not fixed.
I will install this version on a win2003 box and let you know if they behave the same. I have a feeling this is 2008 specific, some sort of memory corruption that went by unnoticed in win2003. Extract from log with debuglevel=default 4256:20110922:130637.267 Zabbix Agent stopped. Zabbix 1.8.8 (revision {ZABBIX_REVISION}).5048:20110922:130645.785 Starting Zabbix Agent [BITVMH1]. Zabbix 1.8.8 (revision {ZABBIX_REVISION} ). |
Comment by Rudolfs Kreicbergs [ 2011 Sep 28 ] |
Crash fixed in pre-1.8.6 r21973 and pre-1.9.7 r21976. Nica, please separate the hanging problem in a separate ZBX. |