[ZBX-4104] using closed handle in libs/zbxexec/execute.c zbx_execute() Created: 2011 Sep 03  Updated: 2017 May 30  Resolved: 2011 Sep 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 1.8.7
Fix Version/s: 1.8.9, 1.9.7 (beta)

Type: Incident report Priority: Major
Reporter: alix Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File execute.c.patch     File execute.c.patch.unified    

 Description   

libs/zbxexec/execute.c zbx_execute() code, which is supposed to wait for launched process finish, is accessing previously closed handle in WaitForSingleObject() call.
it results in sporadical false execute timeouts.

also MSDN recommends calling SetHandleInformation() for read pipe handle for reasons unknown, and I trust them in that.

patch for 1.8.7 is attached



 Comments   
Comment by alix [ 2011 Sep 08 ]

diff -u

Comment by Rudolfs Kreicbergs [ 2011 Sep 09 ]

Thanks, alix, for spotting this. The correct solution is actually to call WaitForSingleObject() for the job - it is signaled when all the processes are done.

Could you please send a link to the article about the SetHandleInformation() call? I would like to get to the bottom of the problem, thanks!

Comment by alix [ 2011 Sep 09 ]

ah right, there's also a job... I suppose I can try to test WaitForSingleObject() for job on my hosts, alright.

here's a MSDN article I mentioned: http://msdn.microsoft.com/en-us/library/ms682499(v=vs.85).aspx

Comment by Rudolfs Kreicbergs [ 2011 Sep 09 ]

Thanks, I found the same one - I'll look more into it. It seems a bit weird that we need to call SetHandleInformation() on a handle, that we do not pass to the child process.

Comment by alix [ 2011 Sep 09 ]

according to my week-length test, it doesn't harm either, so I'd like to be on a safe side here.

Comment by alix [ 2011 Sep 09 ]

I've tried the suggested change (replace process handle with job handle in WaitForSingleObject() call) and I'm getting random WAIT_TIMEOUT results in approximately 50% calls.
I don't really understand why the jobs are involved, isn't enough to spawn child processes directly?

Comment by alix [ 2011 Sep 09 ]

disregard the 50% number in previous comment - after a while all my userparameter items are turned into not supported

Comment by Rudolfs Kreicbergs [ 2011 Sep 12 ]

Jobs are used to handle process trees - the initial child may exit before the whole tree has finished. We can add the whole process tree to a job and then monitor the job instance.

It looks like waiting for the job to finish will be not as easy as it seemed in the beginning - WaitForSingleObject() does not work (i.e. times out) in my tests as well opposed to what is written in MSDN: http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx#managing_job_objects . Let's see how to solve this....

Comment by alix [ 2011 Sep 12 ]

I'm still not convinced about jobs. we're launching a single process and it won't exit until all subprocesses it starts are finished. at least my tests show that this approach works flawless, and I'm starting quite a number things piped together in my userparameters.
of course process can launch something 'detached' from itself, and we don't care about the fate of these detached childs and thus don't need to wait for them.

I heard there are plans to make simultaneous, not consecutive userparameter checks and maybe then jobs will be useful.

Comment by Rudolfs Kreicbergs [ 2011 Sep 12 ]

In general, there is no guaranty that a child will not exit before it's children do - "detached" child process is a normal case. Moreover, even if we call a process that will not exit before it's children do, timeout may occur and we would still need other means of killing the process tree - that's why jobs were introduced in ZBX-3519.

Comment by alix [ 2011 Sep 12 ]

it's pretty interesting, and is right, in theory. but I'd like to test what exactly happens when children of a parent (and thus parent) hits timeout value and we have to kill its parent process. I'll report what I've found.

Comment by Rudolfs Kreicbergs [ 2011 Sep 13 ]

MSDN suggests 2 solutions:
1) http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx
Use WaitForSingleObject() on job handle which should return on either timeout or when the job gets signaled after all associated processes have exited - that simply does not work at all. Found no additional info on the requirements on this setup, however, found a warning that the job might get signaled in other cases as well.
2) http://msdn.microsoft.com/en-us/library/aa365198%28v=vs.85%29.aspx
Use I/O completion ports. However, that requires a separate thread to be monitoring the completion port (otherwise, we might get a race condition when the job is completed before we start monitoring the port). Furthermore, MSDN states that "delivery of messages to the completion port is not guaranteed" which make this solution also unusable.

Thus we come to two "bad" options:
A) Wait for the main process - if the first child exits, we assume that all scripts should have finished and force-kill all other processes still associated with the job. This means that users should always keep the main process running until everything is finished.
B) Query jobs process count every N miliseconds till it is zero or a timeout occurs.

After talking to Sasha, decided to implement option A since it is rather reasonable and option B is really dirty. If there is a better solution, please let us know.

Comment by alix [ 2011 Sep 13 ]

yesterday I experimented with jobs too and couldn't make WaitForSingleObject (I've also tried WaitForMultipleObjects) to signal when spawned process is finished. we just sit there until timeout hits despite the fact that spawned process and all its children has already been finished.

so basically we're stuck with what I've suggested in the first place it still is a pretty reasonable solution.
I'll keep this task in my head anyway and post if something interesting pops up. there's got to be a way to force jobs do what we want.

Comment by Rudolfs Kreicbergs [ 2011 Sep 13 ]

So it seems Moreover, my tests show that all the waiting is done in zbx_read_from_pipe() even when we have "detached" child processes - the output handle is kept open till the last process in the child tree exits (though I'm pretty sure there are other cases as well).

Comment by Rudolfs Kreicbergs [ 2011 Sep 14 ]

Regarding handle inheritance, there is an article in MSDN explaining it: http://msdn.microsoft.com/en-us/library/aa365782%28v=vs.85%29.aspx

It seems that we do NOT need to remove handle inheritance. Removing handle inheritance is important in a different situation:
1) We create a pipe in the parent process
2) Pass the read end to the child process
3) Run the child process and call ReadFile() in the child process() till it returns 0 and sets the error to ERROR_HANDLE_EOF (end of file)
4) Write the data to the pipe and use CloseHandle() to indicate that all data has been sent
5) Child process does not get ERROR_HANDLE_EOF since the child has inherited the write end of the pipe as well and it has not been closed that handle - only when ALL handles are closed the child will get ERROR_HANDLE_EOF.
If inheritance for the write end of the pipe had been removed, the child process would have correctly received ERROR_HANDLE_EOF.

In our case we pass the write end of the handle and in fact this ensures that we will read all the input of the child and it's child processes if the handle gets inherited to the whole tree - even if the first child exits, the parent will wait till all the handles are closed. So it is even more unlikely that waiting for the initial child process after reading from the pipe will do any harm as compared to waiting for the whole process tree if it was possible.

Note that the parent process does not pass the read handle and keeps it open till either a timeout occurs or the other end is closed by all child processes (and thus no process should be interested whether we are still keeping it open). So in our case there is no need to remove the inheritance for the read end.

Comment by Rudolfs Kreicbergs [ 2011 Sep 14 ]

Fixed in dev branch: svn://svn.zabbix.com/branches/dev/ZBX-4104

Comment by richlv [ 2011 Sep 21 ]

timeouts for scripts on windows also mentioned at ZBX-4156

Comment by Rudolfs Kreicbergs [ 2011 Sep 28 ]

Fixed in pre-1.8.9 r21989 and pre-1.9.7 r21990.

Generated at Thu May 02 01:06:41 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.