[ZBX-11479] incomplete data not detected by zbx_execute_threaded_metric() interface Created: 2016 Nov 14 Updated: 2017 May 30 Resolved: 2017 Jan 02 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.0.19 |
Fix Version/s: | 2.0.21rc1, 2.2.17rc1, 3.0.7rc1, 3.2.3rc1, 3.4.0alpha1 |
Type: | Incident report | Priority: | Major |
Reporter: | Sandis Neilands (Inactive) | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | ZBX-11479-reproduce.patch main.c main_fixed.c |
Description |
zbx_execute_threaded_metric() doesn't serialize length of the data. This can lead to sending incomplete agent result to server when the child process received a signal (SIGUSR1 for example) during write(2). The feature was implemented in Coverity: CID 154194. |
Comments |
Comment by Vladislavs Sokurenko [ 2016 Nov 29 ] |
Fixed in development branch: svn://svn.zabbix.com/branches/dev/ZBX-11479 |
Comment by Vladislavs Sokurenko [ 2016 Nov 29 ] |
To reproduce the error, compile attached source file main.c and pkill a.out -SIGUSR1 main_fixed.c reuses function committed to bug to handle interrupt. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 02 ] |
(1) Neither return nor out parameter of write_all() is checked in zbx_execute_threaded_metric(). If we don't care about result of write_all() it should be void. If we do it should return SUCCEED/FAIL or SYSINFO_RET_OK/SYSINFO_RET_FAIL. vso If return of write is to be checked, I think it's best then that we return EXIT_FAILURE if write_all fails and check WEXITSTATUS later on do you agree ? glebs.ivanovskis Do whatever you think is best. vso RESOLVED in r64171 glebs.ivanovskis A bit of optimization in r64191. Have a look. vso CLOSED |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 02 ] |
(2) 0 in return of write() shouldn't cause immediate failure. glebs.ivanovskis CLOSED |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 02 ] |
(3) Copying a whole serial_header_t structure ruins the very idea of serialization. We could just dump AGENT_RESULT into pipe, but we don't do that for a reason. - memcpy(*data + *data_offset, &agent_ret, sizeof(int)); - *data_offset += sizeof(int); + serial_header_pos = *data + *data_offset; + *data_offset += sizeof(serial_header); vso WONTFIX |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 05 ] |
(4) In serialize_agent_result() we know exactly how much memory we need for data, no need for a loop. Since the total length of data is deterministic it would be easier to understand the code if we copied length, return code and value in the same order as they lay down in the serialized form. vso WONTFIX |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 05 ] |
(5) Even though we have a length in message header we still read() until there is nothing to read() in a pipe and check the length only after that. Theoretically, with such approach it should be possible to get away without message length and simply read() until child closes the pipe. Or maybe it would be better for the future (when child process will be reusable) to make a more sophisticated read function which will process the header first and will not read from the pipe more than necessary. glebs.ivanovskis I like it this way - simple and clear. One more finishing touch needed. Child process needs to check whether write_all() has written all the data without errors and then exit() with appropriate code. Parent process should check WEXITSTATUS(status). REOPENED vso RESOLVED in r64314 glebs.ivanovskis Should be fine now. Slight improvements in r64327. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 09 ] |
Successfully tested. |
Comment by Vladislavs Sokurenko [ 2016 Dec 09 ] |
Fixed in:
|
Comment by Andrey Melnikov [ 2016 Dec 11 ] |
Same fix need for fgets() calls in icmppinger and zbx_popen(). |
Comment by Viktors Tjarve [ 2016 Dec 13 ] |
(6) Compilation warning: sysinfo.c: In function ‘write_all’: sysinfo.c:1278:8: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith] buf += ret; ^ void* pointer arithmetic is not allowed in C standard. GCC has an extension that allows to do this. glebs.ivanovskis Good enough, warning disappeared. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 19 ] |
Dear lynxchaus, where in zbx_popen() do you think the check is needed? Would you create a dedicated issue(s)? I tend to agree about fgets() in icmpping, but this needs some further investigation, because according to man 7 signal file operations cannot be interrupted:
|
Comment by Andrey Melnikov [ 2016 Dec 19 ] |
pipe not disk device. Sending USR1/USR2 signals to child's process break read() with EINTR, which propagated to fgets() and it return NULL. This break any of running scripts: 2228:20161219:152115.127 log level has been increased to 3 (warning) 2175:20161219:152115.127 the signal was redirected to process pid:2229 2229:20161219:152115.127 log level has been increased to 3 (warning) 2175:20161219:152115.127 the signal was redirected to process pid:2230 2230:20161219:152115.127 log level has been increased to 3 (warning) 2215:20161219:152115.151 fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2225:20161219:152115.173 item "xx.xx.xx.18:icmpping" became not supported: fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2225:20161219:152115.173 item "xx.xx.xx.18:icmppingloss" became not supported: fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2225:20161219:152115.173 item "xx.xx.xx.18:icmppingsec" became not supported: fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2225:20161219:152115.173 item "xx.xx.xx.58:icmpping" became not supported: fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2225:20161219:152115.173 item "xx.xx.xx.58:icmppingloss" became not supported: fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2225:20161219:152115.173 item "xx.xx.xx.58:icmppingsec" became not supported: fping failed: xx.xx.xx.19 : [4], 84 bytes, 44.5 ms (44.5 avg, 0% loss) 2210:20161219:152117.264 fping failed: xx.xx.xx.1 : [1], 84 bytes, 27.8 ms (27.8 avg, 0% loss) 2223:20161219:152117.276 item "xx.xx.xx.9:icmpping" became not supported: fping failed: xx.xx.xx.1 : [1], 84 bytes, 27.8 ms (27.8 avg, 0% loss) 2223:20161219:152117.276 item "xx.xx.xx.9:icmppingloss" became not supported: fping failed: xx.xx.xx.1 : [1], 84 bytes, 27.8 ms (27.8 avg, 0% loss) 2223:20161219:152117.276 item "xx.xx.xx.9:icmppingsec" became not supported: fping failed: xx.xx.xx.1 : [1], 84 bytes, 27.8 ms (27.8 avg, 0% loss) 2175:20161219:152131.347 Got signal [signal:10(SIGUSR1),sender_pid:32312,sender_uid:0,value_int:33025(0x00008101)]. 2175:20161219:152131.347 the signal was redirected to process pid:2178 Real problem not in fgets() and read(). problem in your signal handler. Simple one line patch cure problem for me. --- a/src/libs/zbxnix/daemon.c 2016-12-08 15:07:42.000000000 +0300 +++ b/src/libs/zbxnix/daemon.c 2016-12-19 15:30:28.053938010 +0300 @@ -268,7 +268,7 @@ static void set_daemon_signal_handlers(v sig_parent_pid = (int)getpid(); sigemptyset(&phan.sa_mask); - phan.sa_flags = SA_SIGINFO; + phan.sa_flags = SA_SIGINFO | SA_RESTART; phan.sa_sigaction = user1_signal_handler; sigaction(SIGUSR1, &phan, NULL); |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Dec 19 ] |
Of course pipe is not a disk device. My apologies, I didn't investigate icmpping.c thoroughly enough. My error illustrates that this issue should be handled without a hurry and in a separate ticket. Please create or find existing one. ZBX-10042 and ZBXNEXT-3233 seem to be distantly related, so you can share your thoughts there. Unfortunately, your one line fix will break all timeouts in Zabbix because they are based on SIGALRM and EINTR. |
Comment by Andrey Melnikov [ 2016 Dec 19 ] |
Please, split this ticket. It can't break timeouts - this patch affects only SIGUSR1/SIGPIPE handlers. |
Comment by Vladislavs Sokurenko [ 2016 Dec 22 ] |
Fixed warning in:
|