[ZBX-6028] Zabbix agent and server log to deleted logfile Created: 2012 Dec 27  Updated: 2018 Jul 19  Resolved: 2015 Nov 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Proxy (P), Server (S)
Affects Version/s: 1.8.15, 2.0.2
Fix Version/s: 3.0.0alpha4

Type: Incident report Priority: Major
Reporter: Luitzen van Gorkum Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: logging
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Redhat version 5 and 6


Issue Links:
Duplicate
is duplicated by ZBX-14616 vmware collector remains after zabbix... Closed
is duplicated by ZBX-9565 stderr output of system.run write err... Closed

 Description   

With lsof I find a deleted /var/log/zabbix/zabbix_agentd.log.1 still in use by the zabbix agent process. The logrotate spec delivered by the rpm has the create option for a new logfile, the zabbix agent can't work with that. It is better to use the copytruncate option...



 Comments   
Comment by Luitzen van Gorkum [ 2012 Dec 27 ]

Test on 2.0.4:

mv /var/log/zabbix/zabbix_agentd.log /var/log/zabbix/zabbix_agentd.log.1

result:

new logging goes to /var/log/zabbix/zabbix_agentd.log

lsof still has old logfile open:

zabbix_ag 14785 zabbix 1w REG 253,4 1122898 2054 /var/log/zabbix/zabbix_agentd.log.1
zabbix_ag 14785 zabbix 2w REG 253,4 1122898 2054 /var/log/zabbix/zabbix_agentd.log.1
zabbix_ag 14785 zabbix 3w REG 253,4 5 2055 /var/run/zabbix/zabbix_agentd.pid

After delete of .1 logfile lsof has a open deleted file:

zabbix_ag 14786 zabbix 1w REG 253,4 1122898 2054 /var/log/zabbix/zabbix_agentd.log.1 (deleted)
zabbix_ag 14786 zabbix 2w REG 253,4 1122898 2054 /var/log/zabbix/zabbix_agentd.log.1 (deleted)
zabbix_ag 14786 zabbix 3w REG 253,4 5 2055 /var/run/zabbix/zabbix_agentd.pid

Conclusion:
The agent switches to a new logfile but keeps the old one open. This seems to be a bug in the agent, the logrotate specification can be a workaround...

Comment by richlv [ 2012 Dec 27 ]

seems to be confirmable in current trunk

Comment by Andris Zeila [ 2013 Jan 29 ]

The problem is because zabbix and it's child processes redirects output to the log file. So when the log file is moved the output of multiple zabbix processes is still being redirected to the old file. If a process attempts to add a log record, its output redirection will be switched to the new file, but the rest of zabbix processes will be still holding the old (moved) file.

One way to fix it would be redirecting output to a pipe, reading it either form main thread of for special logger thread and storing the data into log file.

However there are plans for logging system improvements, so it's better to wait for that first.

Comment by Andris Zeila [ 2013 Jan 29 ]

Reopen this issue if there are no progress with the planned logging system improvements.

Comment by Alexey Pustovalov [ 2013 Mar 14 ]

The same behavior with zabbix_server log file! So users can not use logrotate for Zabbix log files.

Comment by Oleksii Zagorskyi [ 2013 Mar 31 ]

Some details related to STDERR after my experiments.

I've turned on full debug output for libnetsnmp and I have 2 snmpv3 checks with 5 seconds interval. The enabled debug generates a lot of STDERR output.
Server's LogFileSize=1

As we know all STDERR from zabbix processes being written to zabbix log.
My log file increases to 1M after ~1 minute.

What I observed.
Even when log file much more that 1M (filled by STDERR output only when overflows 1M bound) then build-in rotation doesn't work. The file continues to grow.
But if any log line will be generated by zabbix server itself then the log will be rotated.
For this I manually start config synchronization which causes log rotation and one line written to new log.
But all STDERR continues to be written to zabbix_server.log.old.

Then I manually increase current log file (2M garbage >> zabbix_server.log) and again I manually start config synchronization (causes again log rotation).
After that STDERR will not be written to any file.

Another test: identical configuration, but the items are with update interval=3 secs.
During server start some pollers start to work while not all pollers has been started. In zabbix_server.log.old created (rotated) right during server start-up I see several lines like "server #6 started poller #4"
In such condition STDERR from some pollers being written to zabbix_server.log and from some pollers to zabbix_server.log.old in the same time !

I sometimes was confused why I don't see STERR in zabbix log while it's expected to be there.
Now I know it for sure - after log file rotation (doesn't matter which one) STDERR will not be visible at all.

Comment by Pat Suwalski [ 2013 May 09 ]

I noticed this issue as well. Why not change the logrotate files to do the sort of thing apache does?:

/etc/logrotate.d/zabbix-server-mysql:
(...)
postrotate
[ -f /var/run/zabbix/zabbix_server.pid ] && /etc/init.d/zabbix-server restart
endscript

Comment by richlv [ 2013 May 09 ]

restarting server can be a loooong and troublesome process on a large system

Comment by Onno Steenbergen [ 2014 Jan 09 ]

Problem still exists in 2.0.4 for the zabbix_server. It used 80% of the diskspace, needed to restart the process to solve it. Not the ideal solution.

Comment by Patrick Hooper [ 2014 May 24 ]

I confirm that this issue still exists in version 2.2.3. I can delete the zabbix_server.log files and see them still locked using lsof.

These files are not deleted until I stop and then start the zabbix-server (/etc/init.d/zabbix-server stop and then /etc/init.d/zabbix-server start).

A restart (/etc/init.d/zabbix-server restart) does not release the files.

I am running my zabbix server on debian 7, but as an openvz container on Proxmox, so the kernel is based on Redhat.

Comment by richlv [ 2014 May 24 ]

are you sure init script does a proper restart ? it sounds like the restart action is failing - it should do stop followed by a start...

Comment by MATSUDA Daiki [ 2014 May 30 ]

I have already fixed this problem and attached the patch on ZBXNEXT-2319.

Comment by dimir [ 2015 Oct 22 ]

Thank you for the patch, [email protected]! Your approach to call redirect_std() is one of the possible solutions we consider.

Comment by dimir [ 2015 Oct 27 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-6028.

Comment by Andris Zeila [ 2015 Oct 28 ]

(1) I'd suggest creating static helper functions for log locking/unlocking which would take care about signal disabling/renabling on *nix systems. Currently before each lock we disable some signals and after unlock restore them. Taking in account the 'not windows' defines - it makes the code hard to read.

<dimir> Agreed. RESOLVED in r56423.

wiper CLOSED

Comment by Andris Zeila [ 2015 Oct 29 ]

(2) In zbx_execute_nowait() we need simply to redirect output to /dev/null and calling function that also performs log rotation seems to be awkward. We could directly call redirect_stdio() function (renaming it to zbx_redirect_stdio() or similar).

If we do above, then zbx_handle_log() is always called passing CONFIG_LOG_FILE as parameter. So we can simplify it by removing parameter from zbx_handle_log() and calling rotate_log(CONFIG_LOG_FILE) in zbx_handle_log(). This way we don't have to define the extern CONFIG_LOG_FILE variable in every file where zbx_handle_log() is called.

<dimir> RESOLVED in r56453, r56454.

wiper CLOSED

Comment by Andris Zeila [ 2015 Oct 29 ]

(3) [D] This also fixes bug when zabbix would not rotate log file if only stdout/stderr output was generated without zabbix log usage (for example with DebugLevel 1).

However in this case there could be a situation when log file could be empty for some time (that could not happen before because log file was rotated during new message logging). Maybe it's worth mentioning in upgrade notes.

<dimir> Documented here. RESOLVED

wiper CLOSED

Comment by Andris Zeila [ 2015 Nov 02 ]

Successfully tested, please review minor fix in r56464

Comment by dimir [ 2015 Nov 03 ]

Fixed in pre-3.0.0alpha4 (r56517).

Comment by James Lodge [ 2015 Nov 19 ]

FreeBSD 10.1 (Jails) - Zabbix 3.0 alpha4 - source package from zabbix.com

Agent crashes

zabbix_agentd [16034]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16030]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16029]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16032]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16031]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16033]: [file:'log.c',line:249] lock failed: [22] Invalid argument

Server crashes

zabbix_server [27018]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27015]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27034]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27031]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27013]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27012]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27014]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27011]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27032]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27028]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27033]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27029]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27030]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27022]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27026]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27016]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27020]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27021]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27025]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27009]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27023]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27017]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27019]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27027]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
zabbix_server [27010]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
 

and

zabbix_server [27008]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_server [27024]: [file:'log.c',line:249] lock failed: [22] Invalid argument
Comment by dimir [ 2015 Nov 19 ]

Could you please attach the whole zabbix_agentd log?

Comment by James Lodge [ 2015 Nov 19 ]

No the entire log, but what looks to be important. Default logging level was on at the time.

 16029:20151119:111601.178 Starting Zabbix Agent [ZABBIX-FE-01]. Zabbix 3.0.0alpha4 (revision 56592).
 16029:20151119:111601.179 **** Enabled features ****
 16029:20151119:111601.179 TLS support:           YES
 16029:20151119:111601.179 **************************
 16029:20151119:111601.179 using configuration file: /usr/local/etc/zabbix_agentd.conf
 16029:20151119:111601.179 agent #0 started [main process]
 16030:20151119:111601.182 agent #1 started [collector]
 16031:20151119:111601.183 agent #2 started [listener #1]
 16032:20151119:111601.183 agent #3 started [listener #2]
 16033:20151119:111601.184 agent #4 started [listener #3]
 16034:20151119:111601.184 agent #5 started [active checks #1]
zabbix_agentd [16034]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16030]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16029]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16032]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16031]: [file:'log.c',line:249] lock failed: [22] Invalid argument
zabbix_agentd [16033]: [file:'log.c',line:249] lock failed: [22] Invalid argument
 27338:20151119:151815.128 Starting Zabbix Agent [ZABBIX-FE-01]. Zabbix 3.0.0alpha4 (revision 56592).
 27338:20151119:151815.129 **** Enabled features ****
 27338:20151119:151815.129 TLS support:           YES
 27338:20151119:151815.129 **************************
 27338:20151119:151815.129 using configuration file: /usr/local/etc/zabbix_agentd.conf
 27338:20151119:151815.130 agent #0 started [main process]
 27339:20151119:151815.132 agent #1 started [collector]
 27340:20151119:151815.134 agent #2 started [listener #1]
 27341:20151119:151815.135 agent #3 started [listener #2]
 27342:20151119:151815.135 agent #4 started [listener #3]
 27343:20151119:151815.136 agent #5 started [active checks #1]
 27338:20151119:153118.439 Got signal [signal:15(SIGTERM),sender_pid:27672,sender_uid:0,reason:65537]. Exiting ...
 27339:20151119:153118.442 Got signal [signal:15(SIGTERM),sender_pid:27672,sender_uid:0,reason:65537]. Exiting ...
 27340:20151119:153118.443 Got signal [signal:15(SIGTERM),sender_pid:27672,sender_uid:0,reason:65537]. Exiting ...
 27338:20151119:153118.447 Zabbix Agent stopped. Zabbix 3.0.0alpha4 (revision 56592).
 27683:20151119:153118.472 Starting Zabbix Agent [ZABBIX-FE-01]. Zabbix 3.0.0alpha4 (revision 56592).
 27683:20151119:153118.473 **** Enabled features ****
 27683:20151119:153118.473 TLS support:           YES
 27683:20151119:153118.473 **************************
 27683:20151119:153118.473 using configuration file: /usr/local/etc/zabbix_agentd.conf
 27683:20151119:153118.473 In load_modules()
 27683:20151119:153118.473 End of load_modules():SUCCEED
 27683:20151119:153118.474 In init_collector_data()
 27683:20151119:153118.474 End of init_collector_data()
 27683:20151119:153118.474 agent #0 started [main process]
 27684:20151119:153118.477 agent #1 started [collector]
 27685:20151119:153118.478 agent #2 started [listener #1]
 27685:20151119:153118.478 In zbx_tls_init_child()
 27685:20151119:153118.485 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized
 27685:20151119:153118.486 End of zbx_tls_init_child()
 27685:20151119:153118.486 listener #1 [waiting for connection]
 27686:20151119:153118.486 agent #3 started [listener #2]
 27686:20151119:153118.487 In zbx_tls_init_child()
 27686:20151119:153118.492 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized
 27686:20151119:153118.492 End of zbx_tls_init_child()
 27686:20151119:153118.492 listener #2 [waiting for connection]
 27687:20151119:153118.492 agent #4 started [listener #3]
 27687:20151119:153118.493 In zbx_tls_init_child()
 27688:20151119:153118.494 agent #5 started [active checks #1]
 27688:20151119:153118.494 In zbx_tls_init_child()
 27688:20151119:153118.498 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized
 27688:20151119:153118.499 End of zbx_tls_init_child()
 27688:20151119:153118.499 In init_active_metrics()
 27688:20151119:153118.499 buffer: first allocation for 100 elements
 27688:20151119:153118.499 End of init_active_metrics()
 27688:20151119:153118.499 In send_buffer() host:'172.16.1.5' port:10051 entries:0/100
 27688:20151119:153118.499 End of send_buffer():SUCCEED
 27688:20151119:153118.499 active checks #1 [getting list of active checks]
 27688:20151119:153118.499 In refresh_active_checks() host:'172.16.1.5' port:10051
 27684:20151119:153118.500 In init_cpu_collector()
 27684:20151119:153118.500 End of init_cpu_collector():SUCCEED
 27684:20151119:153118.501 collector [processing data]
 27684:20151119:153118.501 In update_cpustats()
 27684:20151119:153118.501 End of update_cpustats()
 27684:20151119:153118.501 collector [idle 1 sec]
 27688:20151119:153118.502 sending [{"request":"active checks","host":"ZABBIX-FE-01"}]
 27687:20151119:153118.508 OpenSSL library (version OpenSSL 1.0.2d 9 Jul 2015) initialized
 27687:20151119:153118.508 End of zbx_tls_init_child()
 27687:20151119:153118.508 listener #3 [waiting for connection]
 27688:20151119:153118.509 before read
 27688:20151119:153118.509 got [{"response":"success","data":[]}]
 27688:20151119:153118.509 In parse_list_of_checks()
 27688:20151119:153118.509 End of parse_list_of_checks():SUCCEED
 27688:20151119:153118.509 End of refresh_active_checks():SUCCEED
Comment by dimir [ 2015 Nov 20 ]

It seems something kills the semaphore or it fails to create which results in this behavior.This is reproducible in the following scenario:

  • start agent
  • list it's semaphores using command ipcs -s
  • kill the semaphore that was created by agent using command ipcrm -s <semid>
  • watch agent log file, you'll get:
    [...]
    zabbix_server [9283]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
    zabbix_server [9253]: [file:'log.c',line:249] lock failed: [22] Invalid argument
    zabbix_server [9274]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
    zabbix_server [9277]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
    [...]
    

I see you are using jail. Perhaps you could try the same outside of jail and see if that works.

Anyway, this looks like a mutex (semaphore) issue, not related to this task.

Comment by dimir [ 2015 Nov 20 ]

Seems you have the same issue as described in ZBX-3974, you can see possible reasons and continue the discussion there.

Comment by James Lodge [ 2015 Nov 20 ]

Many thanks dimir.

Comment by Sandis Neilands (Inactive) [ 2016 Jan 11 ]

This correction introduced a problem that was detected by Valgrind. See (1) in ZBX-10239. In summary: get_time() is returning pointer to it's stack. This is potentially unsafe as the stack can be overwritten before the memory is read or written to.

Comment by Gerard H. Pille [ 2016 Dec 15 ]

Sometimes a signal can be sent to a process (eg. kill -USER1 $PID) to tell it to reopen its logs, or stdout and stderr. Could zabbix use something similar? This could then be used in logrotate's postrotate.

Comment by Rubén Burgué [ 2018 Jun 04 ]

I keep having this problem with zabbix 3.4.

[...]
zabbix_se 11752 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11752 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11754 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11754 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11755 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11755 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11756 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11756 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11757 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11757 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11758 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11758 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11759 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11759 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11760 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11760 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11761 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11761 zabbix 2w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
zabbix_se 11762 zabbix 1w REG 202,1 396170 2059278 /var/log/zabbix/zabbix_server.log.1 (deleted)
[...]
zabbix_server --version
zabbix_server (Zabbix) 3.4.5
Generated at Wed May 08 04:17:47 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.