ZABBIX BUGS AND ISSUES

using closed handle in libs/zbxexec/execute.c zbx_execute()

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.8.7
  • Fix Version/s: 1.8.9, 1.9.7 (beta)
  • Component/s: Agent (G)
  • Labels:
    None
  • Zabbix ID:
    Reviewed 2.0

Description

libs/zbxexec/execute.c zbx_execute() code, which is supposed to wait for launched process finish, is accessing previously closed handle in WaitForSingleObject() call.
it results in sporadical false execute timeouts.

also MSDN recommends calling SetHandleInformation() for read pipe handle for reasons unknown, and I trust them in that.

patch for 1.8.7 is attached

Activity

Hide
Rudolfs Kreicbergs added a comment -

Fixed in pre-1.8.9 r21989 and pre-1.9.7 r21990.

Show
Rudolfs Kreicbergs added a comment - Fixed in pre-1.8.9 r21989 and pre-1.9.7 r21990.
Hide
richlv added a comment -

timeouts for scripts on windows also mentioned at ZBX-4156

Show
richlv added a comment - timeouts for scripts on windows also mentioned at ZBX-4156
Hide
Rudolfs Kreicbergs added a comment -

Fixed in dev branch: svn://svn.zabbix.com/branches/dev/ZBX-4104

Show
Rudolfs Kreicbergs added a comment - Fixed in dev branch: svn://svn.zabbix.com/branches/dev/ZBX-4104
Hide
Rudolfs Kreicbergs added a comment - - edited

Regarding handle inheritance, there is an article in MSDN explaining it: http://msdn.microsoft.com/en-us/library/aa365782%28v=vs.85%29.aspx

It seems that we do NOT need to remove handle inheritance. Removing handle inheritance is important in a different situation:
1) We create a pipe in the parent process
2) Pass the read end to the child process
3) Run the child process and call ReadFile() in the child process() till it returns 0 and sets the error to ERROR_HANDLE_EOF (end of file)
4) Write the data to the pipe and use CloseHandle() to indicate that all data has been sent
5) Child process does not get ERROR_HANDLE_EOF since the child has inherited the write end of the pipe as well and it has not been closed that handle - only when ALL handles are closed the child will get ERROR_HANDLE_EOF.
If inheritance for the write end of the pipe had been removed, the child process would have correctly received ERROR_HANDLE_EOF.

In our case we pass the write end of the handle and in fact this ensures that we will read all the input of the child and it's child processes if the handle gets inherited to the whole tree - even if the first child exits, the parent will wait till all the handles are closed. So it is even more unlikely that waiting for the initial child process after reading from the pipe will do any harm as compared to waiting for the whole process tree if it was possible.

Note that the parent process does not pass the read handle and keeps it open till either a timeout occurs or the other end is closed by all child processes (and thus no process should be interested whether we are still keeping it open). So in our case there is no need to remove the inheritance for the read end.

Show
Rudolfs Kreicbergs added a comment - - edited Regarding handle inheritance, there is an article in MSDN explaining it: http://msdn.microsoft.com/en-us/library/aa365782%28v=vs.85%29.aspx It seems that we do NOT need to remove handle inheritance. Removing handle inheritance is important in a different situation: 1) We create a pipe in the parent process 2) Pass the read end to the child process 3) Run the child process and call ReadFile() in the child process() till it returns 0 and sets the error to ERROR_HANDLE_EOF (end of file) 4) Write the data to the pipe and use CloseHandle() to indicate that all data has been sent 5) Child process does not get ERROR_HANDLE_EOF since the child has inherited the write end of the pipe as well and it has not been closed that handle - only when ALL handles are closed the child will get ERROR_HANDLE_EOF. If inheritance for the write end of the pipe had been removed, the child process would have correctly received ERROR_HANDLE_EOF. In our case we pass the write end of the handle and in fact this ensures that we will read all the input of the child and it's child processes if the handle gets inherited to the whole tree - even if the first child exits, the parent will wait till all the handles are closed. So it is even more unlikely that waiting for the initial child process after reading from the pipe will do any harm as compared to waiting for the whole process tree if it was possible. Note that the parent process does not pass the read handle and keeps it open till either a timeout occurs or the other end is closed by all child processes (and thus no process should be interested whether we are still keeping it open). So in our case there is no need to remove the inheritance for the read end.
Hide
Rudolfs Kreicbergs added a comment -

So it seems Moreover, my tests show that all the waiting is done in zbx_read_from_pipe() even when we have "detached" child processes - the output handle is kept open till the last process in the child tree exits (though I'm pretty sure there are other cases as well).

Show
Rudolfs Kreicbergs added a comment - So it seems Moreover, my tests show that all the waiting is done in zbx_read_from_pipe() even when we have "detached" child processes - the output handle is kept open till the last process in the child tree exits (though I'm pretty sure there are other cases as well).
Hide
alix added a comment -

yesterday I experimented with jobs too and couldn't make WaitForSingleObject (I've also tried WaitForMultipleObjects) to signal when spawned process is finished. we just sit there until timeout hits despite the fact that spawned process and all its children has already been finished.

so basically we're stuck with what I've suggested in the first place it still is a pretty reasonable solution.
I'll keep this task in my head anyway and post if something interesting pops up. there's got to be a way to force jobs do what we want.

Show
alix added a comment - yesterday I experimented with jobs too and couldn't make WaitForSingleObject (I've also tried WaitForMultipleObjects) to signal when spawned process is finished. we just sit there until timeout hits despite the fact that spawned process and all its children has already been finished. so basically we're stuck with what I've suggested in the first place it still is a pretty reasonable solution. I'll keep this task in my head anyway and post if something interesting pops up. there's got to be a way to force jobs do what we want.
Hide
Rudolfs Kreicbergs added a comment - - edited

MSDN suggests 2 solutions:
1) http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx
Use WaitForSingleObject() on job handle which should return on either timeout or when the job gets signaled after all associated processes have exited - that simply does not work at all. Found no additional info on the requirements on this setup, however, found a warning that the job might get signaled in other cases as well.
2) http://msdn.microsoft.com/en-us/library/aa365198%28v=vs.85%29.aspx
Use I/O completion ports. However, that requires a separate thread to be monitoring the completion port (otherwise, we might get a race condition when the job is completed before we start monitoring the port). Furthermore, MSDN states that "delivery of messages to the completion port is not guaranteed" which make this solution also unusable.

Thus we come to two "bad" options:
A) Wait for the main process - if the first child exits, we assume that all scripts should have finished and force-kill all other processes still associated with the job. This means that users should always keep the main process running until everything is finished.
B) Query jobs process count every N miliseconds till it is zero or a timeout occurs.

After talking to Sasha, decided to implement option A since it is rather reasonable and option B is really dirty. If there is a better solution, please let us know.

Show
Rudolfs Kreicbergs added a comment - - edited MSDN suggests 2 solutions: 1) http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx Use WaitForSingleObject() on job handle which should return on either timeout or when the job gets signaled after all associated processes have exited - that simply does not work at all. Found no additional info on the requirements on this setup, however, found a warning that the job might get signaled in other cases as well. 2) http://msdn.microsoft.com/en-us/library/aa365198%28v=vs.85%29.aspx Use I/O completion ports. However, that requires a separate thread to be monitoring the completion port (otherwise, we might get a race condition when the job is completed before we start monitoring the port). Furthermore, MSDN states that "delivery of messages to the completion port is not guaranteed" which make this solution also unusable. Thus we come to two "bad" options: A) Wait for the main process - if the first child exits, we assume that all scripts should have finished and force-kill all other processes still associated with the job. This means that users should always keep the main process running until everything is finished. B) Query jobs process count every N miliseconds till it is zero or a timeout occurs. After talking to Sasha, decided to implement option A since it is rather reasonable and option B is really dirty. If there is a better solution, please let us know.
Hide
alix added a comment -

it's pretty interesting, and is right, in theory. but I'd like to test what exactly happens when children of a parent (and thus parent) hits timeout value and we have to kill its parent process. I'll report what I've found.

Show
alix added a comment - it's pretty interesting, and is right, in theory. but I'd like to test what exactly happens when children of a parent (and thus parent) hits timeout value and we have to kill its parent process. I'll report what I've found.
Hide
Rudolfs Kreicbergs added a comment -

In general, there is no guaranty that a child will not exit before it's children do - "detached" child process is a normal case. Moreover, even if we call a process that will not exit before it's children do, timeout may occur and we would still need other means of killing the process tree - that's why jobs were introduced in ZBX-3519.

Show
Rudolfs Kreicbergs added a comment - In general, there is no guaranty that a child will not exit before it's children do - "detached" child process is a normal case. Moreover, even if we call a process that will not exit before it's children do, timeout may occur and we would still need other means of killing the process tree - that's why jobs were introduced in ZBX-3519.
Hide
alix added a comment -

I'm still not convinced about jobs. we're launching a single process and it won't exit until all subprocesses it starts are finished. at least my tests show that this approach works flawless, and I'm starting quite a number things piped together in my userparameters.
of course process can launch something 'detached' from itself, and we don't care about the fate of these detached childs and thus don't need to wait for them.

I heard there are plans to make simultaneous, not consecutive userparameter checks and maybe then jobs will be useful.

Show
alix added a comment - I'm still not convinced about jobs. we're launching a single process and it won't exit until all subprocesses it starts are finished. at least my tests show that this approach works flawless, and I'm starting quite a number things piped together in my userparameters. of course process can launch something 'detached' from itself, and we don't care about the fate of these detached childs and thus don't need to wait for them. I heard there are plans to make simultaneous, not consecutive userparameter checks and maybe then jobs will be useful.
Hide
Rudolfs Kreicbergs added a comment -

Jobs are used to handle process trees - the initial child may exit before the whole tree has finished. We can add the whole process tree to a job and then monitor the job instance.

It looks like waiting for the job to finish will be not as easy as it seemed in the beginning - WaitForSingleObject() does not work (i.e. times out) in my tests as well opposed to what is written in MSDN: http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx#managing_job_objects . Let's see how to solve this....

Show
Rudolfs Kreicbergs added a comment - Jobs are used to handle process trees - the initial child may exit before the whole tree has finished. We can add the whole process tree to a job and then monitor the job instance. It looks like waiting for the job to finish will be not as easy as it seemed in the beginning - WaitForSingleObject() does not work (i.e. times out) in my tests as well opposed to what is written in MSDN: http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx#managing_job_objects . Let's see how to solve this....
Hide
alix added a comment -

disregard the 50% number in previous comment - after a while all my userparameter items are turned into not supported

Show
alix added a comment - disregard the 50% number in previous comment - after a while all my userparameter items are turned into not supported
Hide
alix added a comment -

I've tried the suggested change (replace process handle with job handle in WaitForSingleObject() call) and I'm getting random WAIT_TIMEOUT results in approximately 50% calls.
I don't really understand why the jobs are involved, isn't enough to spawn child processes directly?

Show
alix added a comment - I've tried the suggested change (replace process handle with job handle in WaitForSingleObject() call) and I'm getting random WAIT_TIMEOUT results in approximately 50% calls. I don't really understand why the jobs are involved, isn't enough to spawn child processes directly?
Hide
alix added a comment -

according to my week-length test, it doesn't harm either, so I'd like to be on a safe side here.

Show
alix added a comment - according to my week-length test, it doesn't harm either, so I'd like to be on a safe side here.
Hide
Rudolfs Kreicbergs added a comment -

Thanks, I found the same one - I'll look more into it. It seems a bit weird that we need to call SetHandleInformation() on a handle, that we do not pass to the child process.

Show
Rudolfs Kreicbergs added a comment - Thanks, I found the same one - I'll look more into it. It seems a bit weird that we need to call SetHandleInformation() on a handle, that we do not pass to the child process.
Hide
alix added a comment -

ah right, there's also a job... I suppose I can try to test WaitForSingleObject() for job on my hosts, alright.

here's a MSDN article I mentioned: http://msdn.microsoft.com/en-us/library/ms682499(v=vs.85).aspx

Show
alix added a comment - ah right, there's also a job... I suppose I can try to test WaitForSingleObject() for job on my hosts, alright. here's a MSDN article I mentioned: http://msdn.microsoft.com/en-us/library/ms682499(v=vs.85).aspx
Hide
Rudolfs Kreicbergs added a comment -

Thanks, alix, for spotting this. The correct solution is actually to call WaitForSingleObject() for the job - it is signaled when all the processes are done.

Could you please send a link to the article about the SetHandleInformation() call? I would like to get to the bottom of the problem, thanks!

Show
Rudolfs Kreicbergs added a comment - Thanks, alix, for spotting this. The correct solution is actually to call WaitForSingleObject() for the job - it is signaled when all the processes are done. Could you please send a link to the article about the SetHandleInformation() call? I would like to get to the bottom of the problem, thanks!
Hide
alix added a comment -

diff -u

Show
alix added a comment - diff -u

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: