[ZBX-18777] Occasional unspecified certficate verification error with PSK on Windows Server 2019 Created: 2020 Dec 16  Updated: 2025 Mar 20  Resolved: 2020 Dec 18

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 5.0.6
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Markku Leiniö Assignee: Aleksandrs Pahomovs
Resolution: Won't fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Server: Zabbix server 5.0.6 on Debian Linux 10 (Buster)
Active agents: Agent 5.0.6 on Windows Server 2019


Attachments: Text File failing-anon.txt     PNG File image-2020-12-17-17-16-17-404.png     PNG File image-2020-12-17-17-16-19-904.png     PNG File image-2020-12-17-17-16-24-011.png    
Issue Links:
Duplicate

 Description   

Steps to reproduce:

  1. Upgraded both server and agents from 4.4.x to 5.0.6
  2. Using active agent checks and TLS with PSK, on Windows servers

Result:

Server logs occasionally:

10377:20201216:125835.926 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:125835.928 failed to accept an incoming connection: from 10.22.33.10: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:125907.068 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:

At the same time client (10.11.22.33 agent above) logs:

12628:20201216:125833.259 active check data upload to [zabbix-server-ip:10051] started to fail ([connect] TCP successful, cannot establish TLS to [[zabbix-server-ip]:10051]: SSL_connect() timed out)
12628:20201216:125835.938 active check data upload to [zabbix-server-ip:10051] is working again

Expected:
No messages in the logs when agents connect and send data to server.

Other information:

Initially we had server 4.4.10 on Debian Linux 9 (Stretch) and agents 4.4.x, and we didn't have those errors.

Then we first changed the server to a new one with Debian Linux 10 (Buster) with server 4.4.10 (new installation, copied the configurations), and that's when the error messages started.

We then upgraded both server and agents to 5.0.6, but the occasional errors continued. There are less errors though with 5.0.6.

Notable detail is that agents on Linux, on Windows 10 or on Windows Server 2016 do not cause these errors (agents are 4.0.x, 4.4.x or 5.0.6).

Debian 9 server (old server with no problems) openssl version: OpenSSL 1.1.0l 10 Sep 2019

Debian 10 server (current) openssl version: OpenSSL 1.1.1d 10 Sep 2019

Agent TLS configuration:

  • TLSConnect=psk
  • TLSAccept=psk
  • TLSPSKIdentity=XXX
  • TLSPSKFile=C:\Program Files\Zabbix Agent\psk.key
  • Agents installed with MSI packages from zabbix.com


 Comments   
Comment by Markku Leiniö [ 2020 Dec 16 ]

Additional information: Servers have been installed from the official Zabbix repo using the supplied dpkg files and instructions, as well as the Linux agents.

Comment by Markku Leiniö [ 2020 Dec 16 ]

Update: I only now realized that since upgrading from 4.4.10 to 5.0.6 we only get these errors from Windows Server 2019 agents that are behind firewalls. The only Win2019 agent we have in the same subnet with Zabbix server stopped showing the errors when upgraded the agent from 4.0.27 to 5.0.6. (Later update: This is not so simple after all: just received an error from that agent as well.)

Let me know if you have specific hints how to troubleshoot this further.

Comment by Markku Leiniö [ 2020 Dec 17 ]

To get an idea of the recurrence pattern at the moment (timestamps are EET):

10375:20201216:182815.626 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:182815.631 failed to accept an incoming connection: from 10.44.55.7: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:182815.634 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10376:20201216:183017.028 failed to accept an incoming connection: from 10.44.55.7: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:183022.684 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:183022.684 failed to accept an incoming connection: from 10.44.55.7: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:183022.685 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:183022.688 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:183022.689 failed to accept an incoming connection: from 10.44.55.7: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:183022.689 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:184001.667 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10376:20201216:184516.675 failed to accept an incoming connection: from 10.22.33.10: unspecified certificate verification error: TLS handshake set result code to 5:
10376:20201216:184516.677 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:184526.211 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:184526.215 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:184526.216 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:184526.217 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10374:20201216:184526.217 failed to accept an incoming connection: from 10.44.55.7: unspecified certificate verification error: TLS handshake set result code to 5:
10373:20201216:190913.360 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:
10375:20201216:192325.347 failed to accept an incoming connection: from 10.33.33.8: unspecified certificate verification error: TLS handshake set result code to 5:
10377:20201217:094452.126 failed to accept an incoming connection: from 10.11.22.33: unspecified certificate verification error: TLS handshake set result code to 5:

while all hosts have several items with 1 minute interval (the usual Windows metrics like CPU, disk and network-related).

Comment by Aleksandrs Pahomovs [ 2020 Dec 17 ]

Hello,

Could you please try the same only without encryption, it must be excluded or confirmed.

Comment by Markku Leiniö [ 2020 Dec 17 ]

Hi, ok, I will. But first, here is one error case:

10376:20201217:120429.472 failed to accept an incoming connection: from 192.168.0.1: unspecified certificate verification error: TLS handshake set result code to 5:

Here is a Zabbix server-side pcap export attached how it looks like. 192.168.0.1 = agent on Windows 2019, 10.10.10.1 = Zabbix server on Debian 10

  • packets 1-5 look normal (agent sends Client Hello and server ACKs it)
  • after that Zabbix server should respond to the TLS handshake but it does not
  • after waiting for 3 seconds the agent decides to FIN the connection at #6 (and it logs: "7892:20201217:120427.836 active check configuration update from [10.10.10.1:10051] started to fail (TCP successful, cannot establish TLS to [[10.10.10.1]:10051]: SSL_connect() timed out)")
  • server ACKs the FIN at #8 but still sends Server Hello at #9 (this does not make any sense)

We have some 20-30 other Zabbix agents as well (most on Linux), all active, and only the connections from Windows Server 2019 agents show this occasional behaviour.

Edited the text above: actually we only use TLS with selected agents, and those happen to be Windows-only. So we don't currently have data about the TLS PSK behaviour from Linux agents in this case.

Comment by Markku Leiniö [ 2020 Dec 17 ]

So to conclude, there are no IP connectivity errors (as shown by the client logs as well, "TCP successful"), just TLS problems. Disabling TLS does not bring us closer to solution actually.

Comment by Aleksandrs Pahomovs [ 2020 Dec 17 ]

Do you use zabbix agent version 1 or 2?

Comment by Markku Leiniö [ 2020 Dec 17 ]

These are v1 agents only.

Comment by Aleksandrs Pahomovs [ 2020 Dec 17 ]

Could you please check your PSK key size? Is it 512 bits or more?
https://www.zabbix.com/documentation/5.0/manual/encryption/using_pre_shared_keys

Comment by Markku Leiniö [ 2020 Dec 17 ]

PSK size was originally 64 hex characters = 32 bytes = 256 bits, but as part of troubleshooting I reduced it to 62 characters = 31 bytes = 248 bits (didn't affect as far as I noticed).

Comment by Aleksandrs Pahomovs [ 2020 Dec 17 ]

Unfortunately, I can't reproduce your issue.
Could you please provide server and agent config files?
Since you mentioned that the problem happens from time to time, I will leave my test environment under load for a while. I'll come back with the results tomorrow.
My test environment:

{root@debian:/home/zabbix# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@debian:/home/zabbix# zabbix_server -V
zabbix_server (Zabbix) 5.0.6
Revision 93895db26b 30 November 2020, compilation time: Nov 30 2020 08:11:40

Copyright (C) 2020 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.1.1d  10 Sep 2019
Running with OpenSSL 1.1.1d  10 Sep 2019
C:\Users\Administrator>systeminfo

Host Name:                 WIN-D2P4GAJ25MJ
OS Name:                   Microsoft Windows Server 2019 Essentials
OS Version:                10.0.17763 N/A Build 17763
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Server
OS Build Type:             Multiprocessor Free
Registered Owner:          Windows User
C:\Users\Administrator>zabbix_agentd -V
zabbix_agentd Win64 (service) (Zabbix) 5.0.6
Revision 93895db26b 30 November 2020, compilation time: Nov 30 2020 16:06:48

Copyright (C) 2020 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.1.1g  21 Apr 2020
Running with OpenSSL 1.1.1g  21 Apr 2020

Comment by Markku Leiniö [ 2020 Dec 17 ]

I rebooted the server again about three hours ago, and I haven't got errors since. Let's see...

Server:

LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=x.x.x.x
DBName=zabbix
DBUser=zabbix
DBPassword=xxx
StartPollersUnreachable=20
StartPingers=20
StartVMwareCollectors=3
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
CacheSize=64M
HistoryCacheSize=64M
TrendCacheSize=32M
ValueCacheSize=128M
Timeout=4
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000
StatsAllowedIP=127.0.0.1

Agent (config created by the MSI installer, no files in conf.d):

LogFile=C:\Program Files\Zabbix Agent\zabbix_agentd.log
Server=x.x.x.x
ServerActive=x.x.x.x
Hostname=xxx
Include=C:\Program Files\Zabbix Agent\zabbix_agentd.conf.d\
TLSConnect=psk
TLSAccept=psk
TLSPSKIdentity=xxx
TLSPSKFile=C:\Program Files\Zabbix Agent\psk.key

I appreciate your attention on this.

Comment by Markku Leiniö [ 2020 Dec 18 ]

To let you know: There haven't been any TLS errors yet since I reboted the Zabbix server.

Also, I only now found out that these had been happening as well, with active Linux agents (with no TLS):

 593:20201216:190911.181 active check data upload to [zabbix-server-ip:10051] started to fail ([recv] ZBX_TCP_READ() timed out)
 593:20201216:190913.525 active check data upload to [zabbix-server-ip:10051] is working again
 593:20201217:094451.199 active check data upload to [zabbix-server-ip:10051] started to fail ([recv] ZBX_TCP_READ() timed out)
 593:20201217:094452.199 active check data upload to [zabbix-server-ip:10051] is working again
 593:20201217:132737.596 active check data upload to [zabbix-server-ip:10051] started to fail ([recv] ZBX_TCP_READ() timed out)
 593:20201217:132742.776 active check data upload to [zabbix-server-ip:10051] is working again

Those started exactly when we changed the server (still with 4.4.10 server), and continued when upgraded server to 5.0.6.

But these are now gone since I rebooted the Zabbix server yesterday.

It now looks very much so that there was something strange with the new server, and after the latest reboot that something got fixed. All system upgrades have been up to date all the time, so I cannot point this to any specific event or detail.

I'll keep checking this and report again after a few days at latest.

Comment by Aleksandrs Pahomovs [ 2020 Dec 18 ]

I see that is not a problem with zabbix, Finally, probably problem is on the routing table.

Comment by Aleksandrs Pahomovs [ 2020 Dec 18 ]

Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding.

Comment by Markku Leiniö [ 2020 Dec 22 ]

FYI, no similar errors have occurred after the last reboot.

Generated at Sun Apr 27 11:03:25 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.