[ZBX-8839] Java gateway keeps connections without using any timeout Created: 2014 Sep 30 Updated: 2017 May 30 Resolved: 2015 Mar 23 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Java gateway (J) |
Affects Version/s: | 2.2.5 |
Fix Version/s: | 2.0.15rc1, 2.2.10rc1, 2.4.5rc1, 2.5.0 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Andrei Gushchin (Inactive) | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 2 |
Labels: | jmx, timeout | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
Probably case
So I think in such cases java_gateway should use timeout (probably we should add new parameter in the settings). |
Comments |
Comment by Aleksandrs Saveljevs [ 2014 Oct 01 ] | ||||||||||||||||
This might be an explanation for | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Jan 10 ] | ||||||||||||||||
Research shows that there seem to be four ways to approach the problem: (a) Try using "sun.rmi" properties documented at http://docs.oracle.com/javase/7/docs/technotes/guides/rmi/sunrmiproperties.html . For instance: $ java -Dsun.rmi.transport.tcp.responseTimeout=1000 Test However, testing shows that none of the timeout properties on the page have any effect on JMX connection timeout. Still, they might be used for specifying timeouts on an already established connnection. (b) Try putting properties into environment for the call to JMXConnectorFactory.connect(url, env): env.put("jmx.remote.x.request.waiting.timeout", new Long(1000)); However, according to https://community.oracle.com/thread/1176791 , this "property only applies to already-established connections, and only with the JMXMP connector (not the RMI connector)". Therefore, this is not a universal solution. (c) Try replacing RMI socket factory with a custom one, see http://stackoverflow.com/questions/1822695/java-rmi-client-timeout . Considerations regarding this approach are described in http://dev.clojure.org/jira/browse/JMX-5 . This solution seems to work in the current versions of Java gateway, because we only support URLs of the form "service:jmx:rmi:///jndi/rmi://<conn>:<port>/jmxrmi". However, if later in (d) Try using a separate thread for making connections:
While this solution does not seem to be trivial, it seems to be used in practice. | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Jan 20 ] | ||||||||||||||||
Development branch svn://svn.zabbix.com/branches/dev/ZBX-8839 contains a prototype solution. This comment describes its current state, considerations and ways it can be improved. Let us start with the fact that there were two ways to approach the timeout problem. One would be to set a timeout for the whole JMX-value-getting operation, which includes connecting, querying the objects, etc. This approach has not really been thought of. Instead, another approach that would set a timeout for each network operation was chosen. This is kind of similar to our SNMP implementation, where we set "session.timeout" for each attempt, but there can be multiple attempts due to bulk retries, checking cached indices, etc. The implemented solution introduces two kinds of timeouts: (a) connect operation timeout based on https://weblogs.java.net/blog/emcmanus/archive/2007/05/making_a_jmx_co.html , where each connect operation is done in a separate thread, and (b) read operation timeout based on "sun.rmi.transport.tcp.responseTimeout". This might solve the problem in However, there is one consideration mentioned on the Java blog:
Zabbix case is the first case from the quote above. Indeed, suppose we set UnavailableDelay to 15 seconds in server configuration file. Connect operation timeout by default is over 2 minutes, so 8 connection threads in the gateway will be alive at any given time for any unavailable JMX host. If, say, we are monitoring 100 unavailable JMX hosts, then that will be 800 connection threads, which is not very inspiring. Continuing the above, another consideration mentioned at http://dev.bizo.com/2014/06/cached-thread-pool-considered-harmlful.html is that Executors.newCachedThreadPool(), which is used in the current implementation, is unbounded. Therefore, a malicious attacker can create quite a number of connection threads in the gateway. Yet another minor consideration is that currently threads created by Executors.newCachedThreadPool() with our DaemonThreadFactory have names "pool-2-thread-1", "pool-3-thread-1", "pool-4-thread-1", "pool-5-thread-1", etc., as opposed to our main threads "pool-1-thread-1", "pool-1-thread-2", "pool-1-thread-3", "pool-1-thread-4", etc. Once executor implementation is finalized, this should be improved. | ||||||||||||||||
Comment by dimir [ 2015 Feb 02 ] | ||||||||||||||||
Successfully tested! | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Mar 04 ] | ||||||||||||||||
It is important to address issues described in my comment above. Therefore, reopening to improve the implementation. | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Mar 05 ] | ||||||||||||||||
Commits 52510 and 52514 fix thread names created by our DaemonThreadFactory. Looking at Java source code in JDK installation, Executors.defaultThreadFactory() returns a new factory each time it is called. That is why threads were previously named "pool-2-thread-1", "pool-3-thread-1", "pool-4-thread-1", etc. They should now be named "pool-2-thread-1", "pool-2-thread-2", "pool-2-thread-3", and so on. This is useful, because we now have two thread pools and we should be able to easily distinguish between their threads: one pool for pollers, instantiated in JavaGateway.java:70, with thread names beginning with "pool-1", and another for connection threads, created in ZabbixJMXConnectorFactory.java:44, with thread names beginning with "pool-2". | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Mar 05 ] | ||||||||||||||||
It should now be tested how heavyweight connection threads in the second pool are. They just wait for a connection to be established and do not consume any CPU. Note that a malicious attacker cannot create an arbitrary number of threads. He can only create a maximum of around START_POLLERS * ("2 minutes 7 seconds" / TIMEOUT), where "2 minutes 7 seconds" is the hardcoded JMX connection timeout. With a default setting of TIMEOUT="3 seconds", this makes it "2 minutes 7 seconds" / "3 seconds" = 42.33 connection threads per poller. It should be tested whether this is acceptable, or we should impose a different, lower limit. | ||||||||||||||||
Comment by dimir [ 2015 Mar 11 ] | ||||||||||||||||
I've done some tests of Zabbix server working with unreachable JMX interface on a usual workstation (Pentium Dual-Core E5400 2.70 GHz, 4 GB RAM) with Zabbix server settings: UnavailableDelay=3 UnreachablePeriod=3 UnreachableDelay=3 StartPollersUnreachable=25 Java gateway settings: START_POLLERS=25 and 1 minute 9 seconds timeout of TCP connection on my OS and here are the results I got:
So I conclude that this solution is acceptable. | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Mar 11 ] | ||||||||||||||||
(1) Also in this fix we shall replace legacy, synchronized Vector with non-synchronized ArrayList. <dimir> Looks great. Please review my small change in r52720. asaveljevs CLOSED | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Mar 13 ] | ||||||||||||||||
New timeout configuration option for Java gateway is available in pre-2.0.15 r52723, pre-2.2.10 r52724, pre-2.4.5 r52725, and pre-2.5.0 (trunk) r52726. | ||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2015 Mar 13 ] | ||||||||||||||||
Documented at the following locations:
sasha CLOSED | ||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Dec 03 ] | ||||||||||||||||
There was a mistake in official zabbix packages and this new parameter was not actually applied. |