[ZBX-8555] "write:Broken pipe" error while executing remote command Created: 2014 Jul 30 Updated: 2017 May 30 Resolved: 2014 Aug 07 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 2.2.2, 2.3.3 |
Fix Version/s: | 2.2.9rc1, 2.4.4rc1, 2.5.0 |
Type: | Incident report | Priority: | Minor |
Reporter: | Arturs Galapovs (Inactive) | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | logging, remotecommands | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3+deb7u1 x86_64 GNU/Linux |
Attachments: |
![]() ![]() |
Description |
Executing custom command as action operation produce "write: Broken pipe" error in server logs. Reproduction steps: |
Comments |
Comment by Arturs Galapovs (Inactive) [ 2014 Aug 01 ] |
Fix located in svn://svn.zabbix.com/branches/dev/ZBX-8555 |
Comment by Aleksandrs Saveljevs [ 2014 Aug 04 ] |
(1) In https://groups.google.com/d/msg/comp.unix.programmer/NXKriaq6G3g/UMBQg88EWEcJ it says the following:
In the proposed solution waitpid() is called before close() and, indeed, this whole process structure hangs when the executed script produces a lot of output. For instance, I replaced "date" in your example with "for i in `seq 1 100000`; do echo $i; done | tee /tmp/remote-command.txt" and the escalator hangs in zbx_waitpid() for 5 minutes, which is the value of CONFIG_TRAPPER_TIMEOUT. If we do "tail -n 1 /tmp/remote-command.txt" at that moment, we will see that the last number written is 12762. One solution might be to redirect the child process output streams to /dev/null instead, or use zbx_execute_nowait() if we do not need the output, similarly to how we do in "nowait" case in SYSTEM_RUN(). REOPENED. arturs.galapovs New implementation in r47912:47915. RESOLVED asaveljevs It seems to me that the implementation can be improved, so I would like to ask sasha to take a look at it. The general direction seems correct, except that it currently requires zbx_exec() to be followed with exit(), which is not elegant. The simplest solution with least intrusive changes is to leave the old code as is, but read from the pipe and discard the output if "buffer" is NULL. That might not be most efficient though. Meanwhile, I have suggested some stylistic fixed in r48068. Note that the branch in its current state will not compile on Windows. arturs.galapovs windows agent compilation fixed in r48084. Please review these changes as well asaveljevs Original solution abandonded in favor of a simpler solution. WON'T FIX. |
Comment by Andris Zeila [ 2015 Jan 08 ] |
After short discussion it was decided to move forward with the simplest solution - always read from the pipe, even if the buffer is NULL (just discard the results in this case). |
Comment by Andris Zeila [ 2015 Jan 08 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-8555 |
Comment by Aleksandrs Saveljevs [ 2015 Jan 09 ] |
(2) Please review suggestions in r51464 and r51470. wiper Looks good, CLOSED |
Comment by Aleksandrs Saveljevs [ 2015 Jan 09 ] |
(3) The problem with current code is that, in zbx_read_from_pipe() function on Windows, "offset" variable is never increased in case "buf" is NULL. Therefore, the PeekNamedPipe() loop can loop indefinitely, as long as there is output to read, and this process is bound neither by size, nor by time. A similar situation is on Unix, where SIGALRM can arrive outside of a call to read(). wiper Windows: I decided to move timeout check to the start of reading loop. While this introduces slight overhead while reading, it will ensure that timeout properly occurs even if there are still data to read. RESOLVED in r51466 wiper Regarding Unix-like systems - we decided to postpone until current timeout system based on alarms are reviewed and reworked. asaveljevs CLOSED |
Comment by Andris Zeila [ 2015 Jan 10 ] |
Released in:
|