[ZBX-6028] Zabbix agent and server log to deleted logfile Created: 2012 Dec 27 Updated: 2018 Jul 19 Resolved: 2015 Nov 04 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G), Proxy (P), Server (S) |
Affects Version/s: | 1.8.15, 2.0.2 |
Fix Version/s: | 3.0.0alpha4 |
Type: | Incident report | Priority: | Major |
Reporter: | Luitzen van Gorkum | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 1 |
Labels: | logging | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Redhat version 5 and 6 |
Issue Links: |
|
Description |
With lsof I find a deleted /var/log/zabbix/zabbix_agentd.log.1 still in use by the zabbix agent process. The logrotate spec delivered by the rpm has the create option for a new logfile, the zabbix agent can't work with that. It is better to use the copytruncate option... |
Comments |
Comment by Luitzen van Gorkum [ 2012 Dec 27 ] |
Test on 2.0.4: mv /var/log/zabbix/zabbix_agentd.log /var/log/zabbix/zabbix_agentd.log.1 result: new logging goes to /var/log/zabbix/zabbix_agentd.log lsof still has old logfile open: zabbix_ag 14785 zabbix 1w REG 253,4 1122898 2054 /var/log/zabbix/zabbix_agentd.log.1 After delete of .1 logfile lsof has a open deleted file: zabbix_ag 14786 zabbix 1w REG 253,4 1122898 2054 /var/log/zabbix/zabbix_agentd.log.1 (deleted) Conclusion: |
Comment by richlv [ 2012 Dec 27 ] |
seems to be confirmable in current trunk |
Comment by Andris Zeila [ 2013 Jan 29 ] |
The problem is because zabbix and it's child processes redirects output to the log file. So when the log file is moved the output of multiple zabbix processes is still being redirected to the old file. If a process attempts to add a log record, its output redirection will be switched to the new file, but the rest of zabbix processes will be still holding the old (moved) file. One way to fix it would be redirecting output to a pipe, reading it either form main thread of for special logger thread and storing the data into log file. However there are plans for logging system improvements, so it's better to wait for that first. |
Comment by Andris Zeila [ 2013 Jan 29 ] |
Reopen this issue if there are no progress with the planned logging system improvements. |
Comment by Alexey Pustovalov [ 2013 Mar 14 ] |
The same behavior with zabbix_server log file! So users can not use logrotate for Zabbix log files. |
Comment by Oleksii Zagorskyi [ 2013 Mar 31 ] |
Some details related to STDERR after my experiments. I've turned on full debug output for libnetsnmp and I have 2 snmpv3 checks with 5 seconds interval. The enabled debug generates a lot of STDERR output. As we know all STDERR from zabbix processes being written to zabbix log. What I observed. Then I manually increase current log file (2M garbage >> zabbix_server.log) and again I manually start config synchronization (causes again log rotation). Another test: identical configuration, but the items are with update interval=3 secs. I sometimes was confused why I don't see STERR in zabbix log while it's expected to be there. |
Comment by Pat Suwalski [ 2013 May 09 ] |
I noticed this issue as well. Why not change the logrotate files to do the sort of thing apache does?: /etc/logrotate.d/zabbix-server-mysql: |
Comment by richlv [ 2013 May 09 ] |
restarting server can be a loooong and troublesome process on a large system |
Comment by Onno Steenbergen [ 2014 Jan 09 ] |
Problem still exists in 2.0.4 for the zabbix_server. It used 80% of the diskspace, needed to restart the process to solve it. Not the ideal solution. |
Comment by Patrick Hooper [ 2014 May 24 ] |
I confirm that this issue still exists in version 2.2.3. I can delete the zabbix_server.log files and see them still locked using lsof. These files are not deleted until I stop and then start the zabbix-server (/etc/init.d/zabbix-server stop and then /etc/init.d/zabbix-server start). A restart (/etc/init.d/zabbix-server restart) does not release the files. I am running my zabbix server on debian 7, but as an openvz container on Proxmox, so the kernel is based on Redhat. |
Comment by richlv [ 2014 May 24 ] |
are you sure init script does a proper restart ? it sounds like the restart action is failing - it should do stop followed by a start... |
Comment by MATSUDA Daiki [ 2014 May 30 ] |
I have already fixed this problem and attached the patch on |
Comment by dimir [ 2015 Oct 22 ] |
Thank you for the patch, [email protected]! Your approach to call redirect_std() is one of the possible solutions we consider. |
Comment by dimir [ 2015 Oct 27 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-6028. |
Comment by Andris Zeila [ 2015 Oct 28 ] |
(1) I'd suggest creating static helper functions for log locking/unlocking which would take care about signal disabling/renabling on *nix systems. Currently before each lock we disable some signals and after unlock restore them. Taking in account the 'not windows' defines - it makes the code hard to read. <dimir> Agreed. RESOLVED in r56423. wiper CLOSED |
Comment by Andris Zeila [ 2015 Oct 29 ] |
(2) In zbx_execute_nowait() we need simply to redirect output to /dev/null and calling function that also performs log rotation seems to be awkward. We could directly call redirect_stdio() function (renaming it to zbx_redirect_stdio() or similar). If we do above, then zbx_handle_log() is always called passing CONFIG_LOG_FILE as parameter. So we can simplify it by removing parameter from zbx_handle_log() and calling rotate_log(CONFIG_LOG_FILE) in zbx_handle_log(). This way we don't have to define the extern CONFIG_LOG_FILE variable in every file where zbx_handle_log() is called. <dimir> RESOLVED in r56453, r56454. wiper CLOSED |
Comment by Andris Zeila [ 2015 Oct 29 ] |
(3) [D] This also fixes bug when zabbix would not rotate log file if only stdout/stderr output was generated without zabbix log usage (for example with DebugLevel 1). However in this case there could be a situation when log file could be empty for some time (that could not happen before because log file was rotated during new message logging). Maybe it's worth mentioning in upgrade notes. <dimir> Documented here. RESOLVED wiper CLOSED |
Comment by Andris Zeila [ 2015 Nov 02 ] |
Successfully tested, please review minor fix in r56464 |
Comment by dimir [ 2015 Nov 03 ] |
Fixed in pre-3.0.0alpha4 (r56517). |
Comment by James Lodge [ 2015 Nov 19 ] |
FreeBSD 10.1 (Jails) - Zabbix 3.0 alpha4 - source package from zabbix.com Agent crashes zabbix_agentd [16034]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16030]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16029]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16032]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16031]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16033]: [file:'log.c',line:249] lock failed: [22] Invalid argument Server crashes zabbix_server [27018]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27015]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27034]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27031]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27013]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27012]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27014]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27011]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27032]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27028]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27033]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27029]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27030]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27022]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27026]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27016]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27020]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27021]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27025]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27009]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27023]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27017]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27019]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27027]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument zabbix_server [27010]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument and zabbix_server [27008]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_server [27024]: [file:'log.c',line:249] lock failed: [22] Invalid argument |
Comment by dimir [ 2015 Nov 19 ] |
Could you please attach the whole zabbix_agentd log? |
Comment by James Lodge [ 2015 Nov 19 ] |
No the entire log, but what looks to be important. Default logging level was on at the time. 16029:20151119:111601.178 Starting Zabbix Agent [ZABBIX-FE-01]. Zabbix 3.0.0alpha4 (revision 56592). 16029:20151119:111601.179 **** Enabled features **** 16029:20151119:111601.179 TLS support: YES 16029:20151119:111601.179 ************************** 16029:20151119:111601.179 using configuration file: /usr/local/etc/zabbix_agentd.conf 16029:20151119:111601.179 agent #0 started [main process] 16030:20151119:111601.182 agent #1 started [collector] 16031:20151119:111601.183 agent #2 started [listener #1] 16032:20151119:111601.183 agent #3 started [listener #2] 16033:20151119:111601.184 agent #4 started [listener #3] 16034:20151119:111601.184 agent #5 started [active checks #1] zabbix_agentd [16034]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16030]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16029]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16032]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16031]: [file:'log.c',line:249] lock failed: [22] Invalid argument zabbix_agentd [16033]: [file:'log.c',line:249] lock failed: [22] Invalid argument 27338:20151119:151815.128 Starting Zabbix Agent [ZABBIX-FE-01]. Zabbix 3.0.0alpha4 (revision 56592). 27338:20151119:151815.129 **** Enabled features **** 27338:20151119:151815.129 TLS support: YES 27338:20151119:151815.129 ************************** 27338:20151119:151815.129 using configuration file: /usr/local/etc/zabbix_agentd.conf 27338:20151119:151815.130 agent #0 started [main process] 27339:20151119:151815.132 agent #1 started [collector] 27340:20151119:151815.134 agent #2 started [listener #1] 27341:20151119:151815.135 agent #3 started [listener #2] 27342:20151119:151815.135 agent #4 started [listener #3] 27343:20151119:151815.136 agent #5 started [active checks #1] 27338:20151119:153118.439 Got signal [signal:15(SIGTERM),sender_pid:27672,sender_uid:0,reason:65537]. Exiting ... 27339:20151119:153118.442 Got signal [signal:15(SIGTERM),sender_pid:27672,sender_uid:0,reason:65537]. Exiting ... 27340:20151119:153118.443 Got signal [signal:15(SIGTERM),sender_pid:27672,sender_uid:0,reason:65537]. Exiting ... 27338:20151119:153118.447 Zabbix Agent stopped. Zabbix 3.0.0alpha4 (revision 56592). 27683:20151119:153118.472 Starting Zabbix Agent [ZABBIX-FE-01]. Zabbix 3.0.0alpha4 (revision 56592). 27683:20151119:153118.473 **** Enabled features **** 27683:20151119:153118.473 TLS support: YES 27683:20151119:153118.473 ************************** 27683:20151119:153118.473 using configuration file: /usr/local/etc/zabbix_agentd.conf 27683:20151119:153118.473 In load_modules() 27683:20151119:153118.473 End of load_modules():SUCCEED 27683:20151119:153118.474 In init_collector_data() 27683:20151119:153118.474 End of init_collector_data() 27683:20151119:153118.474 agent #0 started [main process] 27684:20151119:153118.477 agent #1 started [collector] 27685:20151119:153118.478 agent #2 started [listener #1] 27685:20151119:153118.478 In zbx_tls_init_child() 27685:20151119:153118.485 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized 27685:20151119:153118.486 End of zbx_tls_init_child() 27685:20151119:153118.486 listener #1 [waiting for connection] 27686:20151119:153118.486 agent #3 started [listener #2] 27686:20151119:153118.487 In zbx_tls_init_child() 27686:20151119:153118.492 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized 27686:20151119:153118.492 End of zbx_tls_init_child() 27686:20151119:153118.492 listener #2 [waiting for connection] 27687:20151119:153118.492 agent #4 started [listener #3] 27687:20151119:153118.493 In zbx_tls_init_child() 27688:20151119:153118.494 agent #5 started [active checks #1] 27688:20151119:153118.494 In zbx_tls_init_child() 27688:20151119:153118.498 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized 27688:20151119:153118.499 End of zbx_tls_init_child() 27688:20151119:153118.499 In init_active_metrics() 27688:20151119:153118.499 buffer: first allocation for 100 elements 27688:20151119:153118.499 End of init_active_metrics() 27688:20151119:153118.499 In send_buffer() host:'172.16.1.5' port:10051 entries:0/100 27688:20151119:153118.499 End of send_buffer():SUCCEED 27688:20151119:153118.499 active checks #1 [getting list of active checks] 27688:20151119:153118.499 In refresh_active_checks() host:'172.16.1.5' port:10051 27684:20151119:153118.500 In init_cpu_collector() 27684:20151119:153118.500 End of init_cpu_collector():SUCCEED 27684:20151119:153118.501 collector [processing data] 27684:20151119:153118.501 In update_cpustats() 27684:20151119:153118.501 End of update_cpustats() 27684:20151119:153118.501 collector [idle 1 sec] 27688:20151119:153118.502 sending [{"request":"active checks","host":"ZABBIX-FE-01"}] 27687:20151119:153118.508 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized 27687:20151119:153118.508 End of zbx_tls_init_child() 27687:20151119:153118.508 listener #3 [waiting for connection] 27688:20151119:153118.509 before read 27688:20151119:153118.509 got [{"response":"success","data":[]}] 27688:20151119:153118.509 In parse_list_of_checks() 27688:20151119:153118.509 End of parse_list_of_checks():SUCCEED 27688:20151119:153118.509 End of refresh_active_checks():SUCCEED |
Comment by dimir [ 2015 Nov 20 ] |
It seems something kills the semaphore or it fails to create which results in this behavior.This is reproducible in the following scenario:
I see you are using jail. Perhaps you could try the same outside of jail and see if that works. Anyway, this looks like a mutex (semaphore) issue, not related to this task. |
Comment by dimir [ 2015 Nov 20 ] |
Seems you have the same issue as described in |
Comment by James Lodge [ 2015 Nov 20 ] |
Many thanks dimir. |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 11 ] |
This correction introduced a problem that was detected by Valgrind. See (1) in |
Comment by Gerard H. Pille [ 2016 Dec 15 ] |
Sometimes a signal can be sent to a process (eg. kill -USER1 $PID) to tell it to reopen its logs, or stdout and stderr. Could zabbix use something similar? This could then be used in logrotate's postrotate. |
Comment by Rubén Burgué [ 2018 Jun 04 ] |
I keep having this problem with zabbix 3.4. [...] zabbix_se 11752 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11752 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11754 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11754 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11755 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11755 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11756 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11756 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11757 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11757 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11758 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11758 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11759 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11759 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11760 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11760 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11761 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11761 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) zabbix_se 11762 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted) [...] zabbix_server --version zabbix_server (Zabbix) 3.4.5 |