[ZBX-17559] Agents getting "host not found" on version 5 Created: 2020 Apr 07  Updated: 2020 Apr 29  Resolved: 2020 Apr 29

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Server (S)
Affects Version/s: 5.0.0alpha3, 5.0.0alpha4
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: jchegedus Assignee: Edgar Akhmetshin
Resolution: Won't fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Archlinux - 5.5.13-arch2-1 #1 SMP PREEMPT Mon, 30 Mar 2020 20:42:41
Zabbix 5.0.0 alpha 3 and 4
Several Agent versions and OSs


Attachments: PNG File Screenshot_20200413_224337.png     PNG File Screenshot_20200413_224407.png     PNG File Screenshot_20200413_225248.png     PNG File Screenshot_20200413_225722.png     PNG File Screenshot_20200413_225732.png     PNG File image (6).png     File zabbix-agentd.tgz     File zabbix_agentd.conf     File zabbix_server.tgz     Text File zbx-encod-charset.txt    

 Description   

Steps to reproduce:

Basically after upgrading from 4.4.x to 5.0-alpha3/4 the agent / server are complaining from "host [hostname] nor found".

The odd part is that a few hosts are working and a some are not.

One host with agent 5.0 (from compilation) works

Some hosts with 4.4.6 works (with or without PSK) and others don't (with or without PSK)

I first noticed that he started auto registering hosts with the same name and same IP. I disabled auto registration, but there are no changes.

Maybe DB upgrade was not as expected from 4.4 to 5 ?

Looking at the agents and configurations I cannot explain the behavior.

Result:

2590578:20200407:143901.238 item "messier67:master_system" became supported

2590654:20200407:143935.623 autoregistration from "192.168.1.148" denied (host:"arch-epsilon.science.net" ip:"192.168.1.148" port:100
50): connection type "TLS with PSK" is not allowed for autoregistration

2590654:20200407:143935.624 cannot send list of active checks to "192.168.1.148": host [arch-epsilon.science.net] not found

2590654:20200407:143935.836 cannot send list of active checks to "192.168.1.250": host [mercury.science.net] not found

2590657:20200407:143937.826 autoregistration from "192.168.1.142" denied (host:"arch-eta.science.net" ip:"192.168.1.142" port:10050):
 connection type "TLS with PSK" is not allowed for autoregistration

2590657:20200407:143937.826 cannot send list of active checks to "192.168.1.142": host [arch-eta.science.net] not found

2590656:20200407:143941.015 cannot send list of active checks to "192.168.1.210": host [beehive.science.net] not found

2590658:20200407:143944.788 cannot send list of active checks to "192.168.1.22": host [lagrange1.science.net] not found

2590658:20200407:144015.723 autoregistration from "192.168.1.10" denied (host:"messier67" ip:"192.168.1.10" port:10050): connection t
ype "TLS with PSK" is not allowed for autoregistration

2590658:20200407:144015.724 cannot send list of active checks to "192.168.1.10": host [messier67] not found

2590658:20200407:144021.867 cannot send list of active checks to "192.168.1.2": host [orion.science.net] not found

Above you have agents from FreeBSD, OpenBSD, CentOS, Archlinux all of them were working find before update.



 Comments   
Comment by Edgar Akhmetshin [ 2020 Apr 13 ]

Hello

Usually this error means that Hostname variable set on the Agent side is not matching Hostname used for the host settings defined through Frontend.

2590654:20200407:143935.624 cannot send list of active checks to "192.168.1.148": host [arch-epsilon.science.net] not found

Could you please show that host exists with the same name in Zabbix Frontend and also attach Zabbix Agent log file with debug level 5.

Regards,
Edgar

Comment by jchegedus [ 2020 Apr 14 ]

Hi,

The server was working all perfectly before, the "only" change was the update. This was all configured properly before. But for the sake of investigation, lets go.

Let's follow these 3 examples:

 

2590657:20200413:223451.922 cannot send list of active checks to "192.168.1.10": host [messier67] not found
2590654:20200413:223456.611 cannot send list of active checks to "192.168.1.137": host [arch-alpha-o] not found
2590656:20200413:223544.394 cannot send list of active checks to "192.168.1.250": host [mercury.science.net] not found

See images attached for the frontend view:

 

CentOS – Messier67 CFG:

Server=192.168.1.144
ServerActive=192.168.1.144
Hostname=messier67
TLSConnect=psk### Option: TLSAccept
TLSAccept=psk
TLSPSKIdentity=messier67
TLSPSKFile=/etc/zabbix/messier67.enc

$ cat messier67.enc 
07b124a6857ee11288c07dfb088bc1c0d1d2cb5b2f19ce4396082c193153742f

ArchLinux – Arch-Alpha-O CFG:

Server=192.168.1.144
ServerActive=192.168.1.144
Hostname=arch-alpha-o
# TLSConnect=unencrypted
# TLSAccept=unencrypted

FreeBSD (PFSense) – Mercury CFG:

Server=192.168.1.144
ServerActive=192.168.1.144
Hostname=mercury.science.net
TLSConnect=unencrypted
TLSAccept=unencrypted

So at the same time there are other machines like Arch-Alpha who are working normally , like Arch-Delta with no encryption or Arch-Zeta with encryption (PSK).

They are all properly configured and again, were functional before upgrading to v5.

 

 

Comment by Edgar Akhmetshin [ 2020 Apr 14 ]

Hello

Please provide compressed log file for the trapper process on the server side and agent file, both at level 4 and the same time.

Regards,
Edgar

Comment by jchegedus [ 2020 Apr 15 ]

Hi, see files attached.

Going towards this line, I got really confused...

 

MariaDB [zabbix]> select hostid from hosts where host='orion.science.net'; 
Empty set (0.001 sec) 
MariaDB [zabbix]> select hostid from hosts where host like '%orion.science.net%';
 +--------+
 | hostid |
 +--------+
 |  10107 |
 +--------+
 1 row in set (0.001 sec)
 MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='orion.science.net' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
 Empty set (0.001 sec)
 MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host like 'orion.science.net' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
 Empty set (0.001 sec)
 MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host like 'orion.science.net%' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
 Empty set (0.001 sec)
 MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host like '%orion.science.net' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
   +--------+--------+------------+------------+-------------+------------------+---------------+-------------+------------+-------------+-------+
 | hostid | status | tls_accept | tls_issuer | tls_subject | tls_psk_identity | host_metadata | listen_ip   | listen_dns | listen_port | flags |
 +--------+--------+------------+------------+-------------+------------------+---------------+-------------+------------+-------------+-------+
 |  10107 |      0 |          1 |            |             |                  |               | 192.168.1.2 |            |       10050 |     0 |
 +--------+--------+------------+------------+-------------+------------------+---------------+-------------+------------+-------------+-------+
 1 row in set (0.002 sec)
 MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host like '%orion.science.net%' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
 +--------+--------+------------+------------+-------------+------------------+---------------+-------------+------------+-------------+-------+
 | hostid | status | tls_accept | tls_issuer | tls_subject | tls_psk_identity | host_metadata | listen_ip   | listen_dns | listen_port | flags |
 +--------+--------+------------+------------+-------------+------------------+---------------+-------------+------------+-------------+-------+
 |  10107 |      0 |          1 |            |             |                  |               | 192.168.1.2 |            |       10050 |     0 |
 +--------+--------+------------+------------+-------------+------------------+---------------+-------------+------------+-------------+-------+
 1 row in set (0.001 sec)
 MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host like 'arch-delta-o' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
 Empty set (0.001 sec)

 

So Orion is giving errors... when searching on the DB directly on the the name it doesn't work but using "like" it works, but then strangely, arch-delta-o, doesn't have errors, but the query also gives an empty result (??) I got befuddled.

So maybe there is something (still) wrong with the DB ? But what?

When I first migrated, there was an error about charset (see img attached), I had that fixed, but maybe there was some remaining unseen problem?

 

Comment by Edgar Akhmetshin [ 2020 Apr 20 ]

Hello,

Which encoding and collation is used for Zabbix database, tables and columns? It's not enough to change only database or table encoding/collation, columns should be converted also.

You can follow this article from Atlassian to change this information correctly.

Required encoding: utf8 and collation: utf8_bin.

Regards,
Edgar

Comment by jchegedus [ 2020 Apr 20 ]

Hi Edgar, 

 

I followed a similar procedure than the one proposed.

Now he is all UTF8 with collation UTF8_BIN, see attached list / report.

Comment by jchegedus [ 2020 Apr 23 ]

I still cannot explain this.

But today, I updated to 5.0 Beta 1 and I tried to add a template to one of the servers that were "broken", and upon clicking to update, he complained the name of the host was "unrecognized".

What I did following was to change the name of the host to "something-else" saved, then I opened again and saved back the same name that was not working before, he saved normally and fixed the problem without errors.

I checked on the logs and he fixed all the hosts like that.

In my case I have some 10 machines (out of 21) with problems, and I could easily fix all of them like that, but in a environment with thousands of hosts this will not be so easy going.

Maybe with the aid of a script renaming the hosts back and forth would also fix the problem?

Not sure... but if someone else see something similar that's one possible solution.

Comment by Edgar Akhmetshin [ 2020 Apr 29 ]

Hello

There is no such problem on a test installation using alpha4 or beta 2 with correct database settings from the beginning.

Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support or consultancy services.

Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding.

Regards,
Edgar

Generated at Wed May 21 07:04:58 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.