[ZBX-17303] timeout in "net.tcp.service" blocks agent Created: 2020 Feb 11 Updated: 2020 Mar 23 Resolved: 2020 Mar 23 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | None |
Affects Version/s: | 4.0.16 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Trivial |
Reporter: | Onlyjob | Assignee: | Aigars Kadikis |
Resolution: | Cannot Reproduce | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Debian |
Description |
I'm investigating regular gaps (5...10 min. duration) in all graphs for a particular VM (passive agent, standard "Template OS Linux" template plus few custom checks). Here is what I've found in Zabbix server log, repeatedly logged: ``` Interval for the check is 300s and the problem appears to be because firewall is dropping connections: ``` real 2m11.103s The problem is that timeout in "net.tcp.service" makes Zabbix agent unresponsive which affects all other checks. Problem is exacerbated when there are more than one timeouting "net.tcp.service" check. I've isolated the problem by disabling the problematic "net.tcp.service" item which instantly restored graphs to normal. |
Comments |
Comment by Aigars Kadikis [ 2020 Feb 13 ] |
If the endpoint is not reachable there is nothing much to do, But to improve agent performance you can try:
|
Comment by Onlyjob [ 2020 Feb 14 ] |
Thanks for suggestions. |
Comment by Aigars Kadikis [ 2020 Feb 27 ] |
If you are not running long shell command I really suggest to have a small timeout, the default is 'Timeout=3'.
if you have 3+ hosts which are not reachable then it will totally block the agent capability every for 22 seconds during 5 minutes. By doing the math, I think to have 3 pre-forked agents, timeout=22, host checking every 300s. 14+ unreachable hosts will be the threshold which will block the agent functionality. This is by design. Use smaller 'Timeout=' |
Comment by Onlyjob [ 2020 Feb 28 ] |
It is not desirable to use smaller timeout as we might introduce a command (UserParameter). For now, just one unreachable host blocks enough to disrupt regular data flow with 8 pre-forked agents. Does it not look like a bug to you? |
Comment by Aigars Kadikis [ 2020 Mar 02 ] |
It looks like a bug, I will try to reproduce it. Please attach a screenshot of full item list on the host you do the checking. + zabbix_agentd.conf. |
Comment by Aigars Kadikis [ 2020 Mar 23 ] |
Closing due to missing details on how to reproduce the issue. |
Comment by Onlyjob [ 2020 Mar 23 ] |
This is quite unfair and disappointing. I have provided enough details to reproduce the problem. Yes I could not follow-up in time to add quite irrelevant comment about needless additional information requested but you could at least try to reproduce, could you? Agent is configured with mostly default config. Host is monitored using standard Linux_OS template + 4 custom 'net.tcp.service' checks. That's pretty much it. There is no need to provide full list of checks as I have already reliably isolated the problem to just one single 'net.tcp.service' item. |
Comment by dimir [ 2020 Mar 23 ] |
You could provide what was asked for and re-open the ticket. |
Comment by Onlyjob [ 2020 Mar 23 ] |
All information to reproduce the problem is already provided, as per original report and my previous comments. |