[ZBX-19712] icmpping checks stop working if host interface gets unavailable Created: 2021 Jul 21  Updated: 2024 Apr 10  Resolved: 2021 Sep 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 5.4.2, 5.4.4rc1
Fix Version/s: 5.4.5rc1, 6.0.0alpha3, 6.0 (plan)

Type: Problem report Priority: Blocker
Reporter: wins Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

archlinux, postgresql 13.3, timescaledb 2.3.1


Attachments: PNG File Screenshot from 2021-08-02 20-18-33.png     PNG File Screenshot from 2021-08-02 20-38-14.png     PNG File image-2021-08-03-14-42-32-093.png     PNG File image_19561.png    
Issue Links:
Causes
caused by ZBXNEXT-6311 Move host availability to host interf... Closed
Duplicate
Team: Team A
Sprint: Sprint 80 (Sep 2021)
Story Points: 0.25

 Description   

Fresh install at archlinux, postgresql 13.3, timescaledb 2.3.1

I using default templates: Template Module ICMP Ping (Template tooling version used: 0.38),

Template Module Interfaces (Template tooling version used: 0.38)

 

Steps to reproduce:

  1. Add Template Module Interfaces to host (Template Module ICMP Ping will be attached automatically), wait for complete discovery cycle.
  2. Simulate "host down" situation (power off host, or add block firewall rule)

Result:
After 3 minutes trigger "Unavailable by ICMP ping" does not work, but trigger "No SNMP data collection" is in effect.

Maybe, zabbix server stops sending icmp requests after the host interface becomes unavailable state.

I checked the same case in version 5.2.6 - no problem
If I use only one template "Template Module ICMP Ping" on host, or host have other interface (agent, for ex) - then there is no problem.

 



 Comments   
Comment by wins [ 2021 Jul 21 ]

the problem is reproduced better, when using 15-20 hosts for test

Comment by Andrey Tocko (Inactive) [ 2021 Jul 29 ]

Hello!
This problem was fixed in 5.4.3.
Upgrade, try again and report your findings.

Good luck.
Andrey

Comment by wins [ 2021 Jul 29 ]

same problem in 5.4.3

Comment by Andrey Tocko (Inactive) [ 2021 Aug 02 ]

Hello again!
Still can not reproduce.
Added only snmp interface to host. Added snmp and icmp checks. Bringed down host for 10 minutes. After host up all metrics works again.
What am i missing?

Comment by wins [ 2021 Aug 02 ]

Problem reproduced, when host going down (not up!) I show you 2 screenshots, pay attention for item timestamp.

No problem (version 5.2.6): "0" value every 1 min.

Problem (v.5.4.3): First "0" value received after 5min! (but it does 1min).  Thus, the "host unavailable by icmp ping" message arrives much later than it should. 

 

 

Comment by Andrey Tocko (Inactive) [ 2021 Aug 03 ]

Here is my tests result. Table header contains keys, as items can differ from template to template. Module templates mentioned before not available in 5.4 (only if tagged along after upgrades from previous version)
Changed interval to scheduled (once per 5 seconds) to get clear picture.
There are icmpping item, then internal item to check interface availability(internal check) and 2 snmp item with some OIDs

Timestamp icmpping zabbix[host,snmp,available] OID1 OID2
2021-08-03 12:47:10 0 1    
2021-08-03 12:47:05 0 1    
2021-08-03 12:47:00   1    
2021-08-03 12:46:55   1    
2021-08-03 12:46:50   1    
2021-08-03 12:46:45 0 1    
2021-08-03 12:46:40 0 1    
2021-08-03 12:46:35 0 1 495 21807104
2021-08-03 12:46:30 1 1 490 21807104
2021-08-03 12:46:25 1 1 485 21807104
2021-08-03 12:46:20 1 1 480 21807104
2021-08-03 12:46:15 1 1 475 21807104
2021-08-03 12:46:10 1 1 470 21807104

Immediately after router was shutdown icmpping returns 0 and no snmp data is available. It does not matter how much more items added to that host or how much interfaces is there. Always result is the same.

Comment by Andrey Tocko (Inactive) [ 2021 Aug 03 ]

Same with multiple hosts:

Comment by wins [ 2021 Aug 04 ]

Do I understand correctly that in your screenshot, zabbix[host,snmp,available] data item continues to return "1" after the router has been turned off?

This means, that the host interface will not go to Unreacheble state, and icmpping check will work correctly.

In my case snmp available check changing value to 0, and host interface went to unreachable state.

Comment by Andrey Tocko (Inactive) [ 2021 Aug 09 ]

In my case SNMP interface goes to unreachable state in a minute, but this does not influence icmp checks.

Comment by Oleksii Zagorskyi [ 2021 Aug 25 ]

I confirm this on my test installation with current GIT rev 5.4.4rc1
Also confirmed from 2 production installations on 5.4.3

It's very simple to reproduce: have host with agent interface, have 2 items: agent and simple one, 1 minute update interval.
Have both items collecting values properly.
Stop agent (it get unavailable) and observe how it stops the simple check collection.

A few examples:

Here? after long time waiting, the simple check starts randomly? to pool again, but later it may have gaps again.
Host was turned off:

1740899:20210825:225840.472 Starting Zabbix Server. Zabbix 5.4.4rc1 (revision {ZABBIX_REVISION}).
...
1740924:20210825:230348.669 Zabbix agent item "agent.version" on host "W7" failed: first network error, wait for 15 seconds
1740929:20210825:230406.690 Zabbix agent item "agent.version" on host "W7" failed: another network error, wait for 15 seconds
1740929:20210825:230424.692 Zabbix agent item "agent.version" on host "W7" failed: another network error, wait for 15 seconds
1740929:20210825:230442.695 temporarily disabling Zabbix agent checks on host "W7": interface unavailable

Timestamp	Value
2021-08-25 23:19:30	0
2021-08-25 23:18:21	0
2021-08-25 23:17:15	0
2021-08-25 23:03:29	0
2021-08-25 23:02:29	1
2021-08-25 23:01:29	1

Here host was up, but agent was stopped. I was waiting for 15 minutes but simple check did not collect values this time. Not sure why:

1740926:20210825:232545.850 Zabbix agent item "agent.version" on host "W7" failed: first network error, wait for 15 seconds
1740929:20210825:232600.221 Zabbix agent item "agent.version" on host "W7" failed: another network error, wait for 15 seconds
1740929:20210825:232615.223 Zabbix agent item "agent.version" on host "W7" failed: another network error, wait for 15 seconds
1740929:20210825:232630.226 temporarily disabling Zabbix agent checks on host "W7": interface unavailable

2021-08-25 23:25:29	1
2021-08-25 23:24:29	1
2021-08-25 23:23:29	1

The same, with 127.0.0.1 IP:

1740926:20210825:234028.962 Zabbix agent item "agent.version" on host "it0" failed: first network error, wait for 15 seconds
1740929:20210825:234043.348 Zabbix agent item "agent.version" on host "it0" failed: another network error, wait for 15 seconds
1740929:20210825:234058.350 Zabbix agent item "agent.version" on host "it0" failed: another network error, wait for 15 seconds
1740929:20210825:234113.352 temporarily disabling Zabbix agent checks on host "it0": interface unavailable

2021-08-25 23:40:06	1
2021-08-25 23:39:06	1
2021-08-25 23:38:06	1
2021-08-25 23:37:06	1
Comment by Semiadmin [ 2021 Sep 02 ]

Maybe it would be better to simply remove the binding to the interface at a simple check item? There aren't such binding at database monitor or script item types.

Comment by Vladislavs Sokurenko [ 2021 Sep 02 ]

Fixed in pull request feature/ZBX-19712-5.4

Comment by Vladislavs Sokurenko [ 2021 Sep 08 ]

Fixed in

Updated documentation:

Generated at Sun May 25 09:07:12 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.