[ZBX-9915] agent on windows may crash on eventlog when perform FormatMessageW Created: 2015 Sep 28 Updated: 2019 Dec 10 |
|
Status: | Open |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | None |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Oleksii Zagorskyi | Assignee: | Unassigned |
Resolution: | Unresolved | Votes: | 0 |
Labels: | crash, eventlog, windows | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
First of all this is an attempt to leave some report which may help if the crash will happen once again for someone. |
Comments |
Comment by Oleksii Zagorskyi [ 2015 Sep 28 ] |
Zabbix agent on Windows Server 2003 x64 (Version: 5.2.3790 Service Pack 2 Build 3790) started to crash after 5 minutes of runtime.
eventlog[Application,,"Error|Critical",VERY_UNIQUE_THING,,,skip]
Update interval 300 sec, "lastlogsize" in database 190414. As currently we know, in case if "lastlogsize" is not 0 the "skip" parameter is actually does not have effect.
36316:20150923:083025.951 In process_eventlog() source:'Application' lastlogsize:319486
36316:20150923:083025.951 In zbx_open_eventlog()
36316:20150923:083025.951 zbx_open_eventlog() pNumRecords:91805 pLatestRecord:236773
36316:20150923:083025.951 End of zbx_open_eventlog():SUCCEED
36316:20150923:083025.982 In zbx_get_eventlog_message() lastlogsize:319487
65624:20150923:083026.591 In collect_perfstat()
65624:20150923:083026.591 End of collect_perfstat()
65624:20150923:083027.762 In collect_perfstat()
65624:20150923:083027.762 End of collect_perfstat()
65624:20150923:083029.075 In collect_perfstat()
65624:20150923:083029.075 End of collect_perfstat()
Note that last 3 seconds we don't see activity for "active checks" agent process (pid=36316), though all previous time it logged a lot of lines for eventlog parsing. 35700:20150923:143910.962 End of zbx_get_eventlog_message():SUCCEED 35700:20150923:143910.962 End of process_eventlog():SUCCEED 35700:20150923:143910.962 In process_eventlog() source:'Application' lastlogsize:319485 35700:20150923:143910.962 In zbx_open_eventlog() 35700:20150923:143910.962 End of zbx_open_eventlog():FAIL FirstID:239161 LastID:329786 numIDs:90626 35700:20150923:143910.962 End of zbx_open_eventlog():SUCCEED 35700:20150923:143910.962 In zbx_get_eventlog_message() lastlogsize:319486 35700:20150923:143910.962 End of zbx_get_eventlog_message():SUCCEED 35700:20150923:143910.962 End of process_eventlog():SUCCEED 35700:20150923:143910.962 In process_eventlog() source:'Application' lastlogsize:319486 35700:20150923:143910.962 In zbx_open_eventlog() 35700:20150923:143910.962 End of zbx_open_eventlog():FAIL FirstID:239161 LastID:329786 numIDs:90626 35700:20150923:143910.962 End of zbx_open_eventlog():SUCCEED 35700:20150923:143910.962 In zbx_get_eventlog_message() lastlogsize:319487 35700:20150923:143910.977 Unhandled exception c0000005 detected at 0x0000000077D67BBD. Crashing ... 35700:20150923:143910.977 ====== Fatal information: ====== 35700:20150923:143910.977 Program counter: 0x77d67bbd 35700:20150923:143910.977 === Registers: === 35700:20150923:143910.977 r8 = 1f1e9ec = 32631276 = 32631276 35700:20150923:143910.977 r9 = 1e40220 = 31719968 = 31719968 35700:20150923:143910.977 r10 = 1 = 1 = 1 35700:20150923:143910.977 r11 = 3 = 3 = 3 35700:20150923:143910.977 r12 = 400 = 1024 = 1024 35700:20150923:143910.977 r13 = 0 = 0 = 0 35700:20150923:143910.977 r14 = 31bf6c8 = 52164296 = 52164296 35700:20150923:143910.977 r15 = 0 = 0 = 0 35700:20150923:143910.977 rdi = 0 = 0 = 0 35700:20150923:143910.977 rsi = 0 = 0 = 0 35700:20150923:143910.977 rbp = 29ff = 10751 = 10751 35700:20150923:143910.977 rbx = c07eea46 = 3229542982 = 3229542982 35700:20150923:143910.977 rdx = c07eea46 = 3229542982 = 3229542982 35700:20150923:143910.977 rax = 0 = 0 = 0 35700:20150923:143910.977 rcx = a0354c9a = 2687847578 = 2687847578 35700:20150923:143910.977 rsp = 31bf510 = 52163856 = 52163856 35700:20150923:143910.977 efl = 10246 = 66118 = 66118 35700:20150923:143910.977 csgsfs = 332b5300 = 858477312 = 858477312 35700:20150923:143910.977 === Backtrace: === 35700:20150923:143911.071 11: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(DeviceIoControl+0x29d) [0x77d67bbd] 35700:20150923:143911.071 10: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(FormatMessageW+0x37) [0x77d67d27] 35700:20150923:143911.071 9: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(zbx_format_message+0x35 s:\trunk\src\zabbix_agent\eventlog.c:216) [0x4001e215] 35700:20150923:143911.071 8: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(zbx_get_eventlog_message+0x246 s:\trunk\src\zabbix_agent\eventlog.c:342) [0x4001f776] 35700:20150923:143911.071 7: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(process_eventlog+0x185 s:\trunk\src\zabbix_agent\eventlog.c:465) [0x4001fab5] 35700:20150923:143911.071 6: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(process_eventlog_check+0xbfb s:\trunk\src\zabbix_agent\active.c:1322) [0x4001d63b] 35700:20150923:143911.071 5: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(process_active_checks+0x145 s:\trunk\src\zabbix_agent\active.c:1475) [0x4001d9a5] 35700:20150923:143911.071 4: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(active_checks_thread+0x196 s:\trunk\src\zabbix_agent\active.c:1580) [0x4001dd26] 35700:20150923:143911.071 3: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(zbx_win_thread_entry+0xa s:\trunk\src\libs\zbxsys\threads.c:78) [0x4000cbca] 35700:20150923:143911.071 2: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(_callthreadstartex+0x17 f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c:348) [0x4003794f] 35700:20150923:143911.071 1: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(_threadstartex+0x9f f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c:326) [0x40037a03] 35700:20150923:143911.071 0: C:\Program Files (x86)\Zabbix Agent\zabbix_agentd.exe(BaseThreadStart+0x3a) [0x77d6b71a] 35700:20150923:143911.071 ================================ 38436:20150923:143911.149 In collect_perfstat() 38436:20150923:143911.149 End of collect_perfstat() 38436:20150923:143912.524 In collect_perfstat() 38436:20150923:143912.524 End of collect_perfstat() note the same recordID=319487. In Application log every time for agent crash I can see the following:
It was not possible to read what contains recordID=319487 on the same windows host. Tried MMC event viewer, wmi request - they all crashed too. I was able to save the Application log as .evt file and open it on another windows computer.
Detailed information (if open .evt) on Win7+:
"TreeSize Free" source from the failed windows server: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\Application\TreeSize Free] "EventMessageFile"="C:\\TreeSizeFree.exe" The TreeSizeFree application (free and available in the Internet) being used on the windows server time to time. There were not other way than erase all events in the Application log to avoid crashes as this server in production and have to be monitored. There was a hope that saved .evt file may help to investigate the crash. I have no more idea how to try to reproduce it. |