[ZBX-23941] Issue with TLS PSK connection from server to passive agents Created: 2024 Jan 14 Updated: 2025 May 15 Resolved: 2024 Feb 23 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G), Server (S) |
Affects Version/s: | 7.0.0alpha9 |
Fix Version/s: | 7.0.0beta2, 7.0 (plan) |
Type: | Problem report | Priority: | Trivial |
Reporter: | Andrii Malyi | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix 7.0.0.a9 on RHEL8.9, DB: PG15 with TSDB 2.13 |
Attachments: |
![]() |
||||
Issue Links: |
|
||||
Team: | |||||
Sprint: | Sprint candidates, S24-W6/7 | ||||
Story Points: | 0.25 |
Description |
We have strange floating issue with TLS PSK connectivity between server and agent TLSConnect=psk TLSAccept=psk TLSPSKIdentity=******-PSK-IDENTITY-***** TLSPSKFile=/etc/zabbix/zabbix_agentd.psk Time to time, in zabbix server log we wound following errors: 2443:20240114:050602.443 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.443 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.444 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init 2443:20240114:050602.445 SSL_shutdown() with 10.0.9.29 set result code to 1: file ssl/ssl_lib.c line 2094: error:140E0197:SSL routines:SSL_shutdown:shut down while in init (10.0.9.29 - one of the zabbix agents) Here are the corresponding records on the agent side 7528:20240114:051025.852 failed to accept an incoming connection: from 10.xx.xx.xx: unspecified certificate verification error: TLS handshake set result code to 5: 6800:20240114:063734.937 failed to accept an incoming connection: from 10.xx.xx.xx: unspecified certificate verification error: TLS handshake set result code to 5: 152:20240114:063734.943 failed to accept an incoming connection: from 10.xx.xx.xx: unspecified certificate verification error: TLS handshake set result code to 5: 10.xx.xx.xx - ip of zabbix server This issue appears on Linux hosts sometimes but most often this occurs on Windows servers. This problem is floating, errors can be recorded in the log from several minutes to tens of minutes, the data seems to be collected, but given that I saw several crashes in the server log, I can not say for sure whether this is related to the server crashes or no. |
Comments |
Comment by Vladislavs Sokurenko [ 2024 Jan 15 ] |
Thank you for your report, there could be more information if debug log attached for the time when the problem has occurred, also it is possible that it is fixed in this commit: |
Comment by Andrii Malyi [ 2024 Jan 17 ] |
I greatly appreciate your support and the quick response to my ticket. Thank you! |
Comment by Vladislavs Sokurenko [ 2024 Jan 17 ] |
It will be included in alpha10, please upgrade when available and let us know if issue still persists. |
Comment by Andrii Malyi [ 2024 Jan 17 ] |
Thank you! |
Comment by Andrii Malyi [ 2024 Feb 01 ] |
Hello team! Unfortunately, I have upgraded to beta1 but issue still there. Could you please provide me with an email address to which I can send you the debug logs that I have collected? |
Comment by Vladislavs Sokurenko [ 2024 Feb 02 ] |
It appears that issue is occurring when there is timeout during TLS handshake and we should suppress warning in that case, are you using Zabbix agent or Zabbix agent2, is it possible that they don't have enough listeners or capacity to handle all requests ? 20240116:171718.284 In agent_task_process() step 'connect' event:4 itemid:50439 573098:20240116:171722.209 In agent_task_process() step 'tls' event:1 itemid:50439 Please check how many pollers there was before the upgrade and set MaxConcurrentChecksPerPoller to the same size or increase StartAgents count so that they can process all checks and not timeout |
Comment by Andrii Malyi [ 2024 Feb 02 ] |
I'd hazard a guess that it is not related to poller parametres if only their default values are not enough to start. zabbix_server.conf: #StartPollers=5 #StartAgentPollers=1 #StartHTTPAgentPollers=1 #StartSNMPPollers=1 #MaxConcurrentChecksPerPoller=1000 #StartIPMIPollers=0 #StartPollersUnreachable=1 #StartHistoryPollers=5 #StartHTTPPollers=1 #StartJavaPollers=0 #StartProxyPollers=1 #StartODBCPollers=1 On the other hand, agents have default settings as well, so they start with 3 listeners. Starting Zabbix Agent [******01]. Zabbix 6.4.10 (revision 4da16fb82f5). one of the zabbix_agent.confs' #StartAgents=3
As I mentioned, it is a fresh installation with up to 20 agents. Currently, we have 2 Win2022, 4 Win2016, ~5 AIX, and ~5 RHEL8 servers. Nothing unusual... Most often this issue appears on windows, but on linux host, I also find TLS errors 3227939:20240119:193024.585 Zabbix Agent stopped. Zabbix 6.4.10 (revision 4da16fb82f5). |
Comment by Vladislavs Sokurenko [ 2024 Feb 02 ] |
It appears on windows because some checks don't return data for long periods of time, thus they don't allow other checks to be processed |
Comment by Vladislavs Sokurenko [ 2024 Feb 08 ] |
(1) It should be documented that it's recommended to increase StartAgents if there is either high queue or network errors during passive checks as this may indicate that checks are slow and Zabbix agent is too busy. |
Comment by Andrii Malyi [ 2024 Feb 08 ] |
Hi team, just for information. I'm unable to increase the parameter MaxConcurrentChecksPerPoller as it has its max value = 1000 by default. |
Comment by Vladislavs Sokurenko [ 2024 Feb 09 ] |
Please increase StartAgents on Zabbix agent |
Comment by Andrii Malyi [ 2024 Feb 09 ] |
On the one of the Windows hosts, parameter StartAgents is set to 8, but it did not help, errors continue to occur. Should the StartAgents' parameter be increased more? |
Comment by Vladislavs Sokurenko [ 2024 Feb 09 ] |
Yes, 8 might not be enough to process 300 metrics in time |
Comment by Andrii Malyi [ 2024 Feb 09 ] |
wow, increased to 15 |
Comment by Andrii Malyi [ 2024 Feb 09 ] |
agent log: 18104:20240209:102253.193 Zabbix Agent received stop request. //10.хх.хх.хх- IP of zabbix server Server log: 1283925:20240209:102258.236 Zabbix agent item "perf_counter_en["\PhysicalDisk(3)\Avg. Disk Write Queue Length",60]" on host "ts-rdho-06" 10.x.x.x - IP of ts-rdho-06 We just tried the option with StartAgent=25, but result the same, issue persist. Reverted to StartAgent=15 |
Comment by Vladislavs Sokurenko [ 2024 Feb 19 ] |
Fixed in:
|