[ZBX-4284] Possible wrong host (agent!,snmp?) disabling and bad handling of Unreachable items (hosts) Created: 2011 Oct 27  Updated: 2018 Jun 14  Resolved: 2016 Feb 24

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 1.8.9
Fix Version/s: 2.2.11rc1, 2.4.7rc1, 2.5.0

Type: Incident report Priority: Blocker
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 9
Labels: pollers, timeout, unavailable
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

1.8.9 (svn head)


Attachments: PNG File 60_items_update_interval_10sec.png     PNG File 60_items_update_interval_60sec.png     Text File demo.log     PNG File items_interval.png    
Issue Links:
Duplicate
is duplicated by ZBXNEXT-1022 heartbeat communication for between Z... Reopened
is duplicated by ZBXNEXT-840 if some snmp values timeout, the whol... Closed
is duplicated by ZBX-4371 Low level discovery bind to snmp_errors Closed
is duplicated by ZBX-5943 Zabbix treats SNMP noSuchName as "hos... Closed
is duplicated by ZBX-9609 The host is treated as unavailable be... Closed

 Description   

Deep debugging -> this issue report

Configuration:
Single enabled host with three agent items. The keys are:
agent.version (itemid=22535)
agent.version[] (itemid=22536) <----DISABLED for the experiment stage 1 (similar key selected for easier searching across log file)
sleep5 (itemid=22537)
Update interval for they all is 30 seconds.

Zabbix agent config (added UserParameter):
UserParameter=sleep5,sleep 5

Zabbix server config is default.

Stage 1:
In the zabbix_server.log I see:
30817:20111027:133610.349 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
30820:20111027:133628.418 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:133646.421 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:133704.426 temporarily disabling Zabbix agent checks on host [it0]: host unavailable

And the item "agent.version" will never be checked !!! because the key "sleep5" gives timeout.

And it's very bad.

Stage 2:
In this moment I enabled the item with the key "agent.version[]"
And server behavior changed to this:

30820:20111027:134834.519 enabling Zabbix agent checks on host [it0]: host became available
30815:20111027:134840.435 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
30820:20111027:134858.525 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:134916.528 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:134931.530 resuming Zabbix agent checks on host [it0]: connection restored
30819:20111027:134940.443 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
30820:20111027:134958.538 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:135016.541 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:135031.544 resuming Zabbix agent checks on host [it0]: connection restored
30815:20111027:135040.450 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
30820:20111027:135058.550 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:135116.553 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
30820:20111027:135131.556 resuming Zabbix agent checks on host [it0]: connection restored
.... etc, etc, etc

The key "agent.version[]" does not give to become host unavailable after three network errors (UnreachablePeriod=45 seconds).
Currently, the keys "agent.version" and "agent.version[]" are checked but with the not proper & unstable intervals:
-> "agent.version"
2011.Oct.27 13:51:36 1.8.9rc1
2011.Oct.27 13:50:36 1.8.9rc1
2011.Oct.27 13:49:36 1.8.9rc1
2011.Oct.27 13:48:36 1.8.9rc1

-> "agent.version[]"
2011.Oct.27 13:51:36 1.8.9rc1
2011.Oct.27 13:51:31 1.8.9rc1
2011.Oct.27 13:50:36 1.8.9rc1
2011.Oct.27 13:50:31 1.8.9rc1
2011.Oct.27 13:49:36 1.8.9rc1
2011.Oct.27 13:49:31 1.8.9rc1
2011.Oct.27 13:48:36 1.8.9rc1
2011.Oct.27 13:48:34 1.8.9rc1

See "items_interval.png" screenshot additionally.

But the key "sleep5" are not marked anyhow with the error state in the GUI.

Single place where we can see the reason is zabbix_sever.log and a queue in the GUI.

That's too not very good.

Server was restarted and the filtered log (debuglevel=4) is attached.
Filter is: grep -E " started|agent.version|network error|connection restored|became available|sleep5|agent result" zabbix_server.log > demo.log

I would recommend to redesign this behavior.

Specification: https://www.zabbix.org/wiki/Docs/specs/ZBX-4284



 Comments   
Comment by Oleksii Zagorskyi [ 2012 Apr 21 ]

ZBX-4901 could be similar

<zalex> no, the ZBX-4901 is another issue.
Note: Eric's comment below related to this my note and to that ZBX-4901 but not to this issue!

Comment by Eric Gearhart [ 2012 Apr 22 ]

On the SNMP specific side of this issue, if I change my items and LLD discovery rules over to SNMPv2, the "first network error" problems completely disappear. As soon as I change back over to SNMPv3 the "first network error" problem reappears, and my SNMP hosts (Cisco routers and switches) get disabled by Zabbix.

As the title of this bug alludes to, I think items that are low-evel discovered should be reporting back as "OID not supported" and automatically marked as disabled. Instead, it appears that SNMPv3 items that should be reporting back "not supported" are reporting back as timeouts.

Comment by Oleksii Zagorskyi [ 2012 May 25 ]

Because of single slow Userparameter (takes more than 20 seconds) and customized UnreachableDelay=27 and Timeout=20 for the zabbix server, one user had such issue.

Here is part of filtered log (real server with 1500 hosts, 186 NVPS):

 30304:20120522:202105.398 enabling Zabbix agent checks on host [host.name]: host became available
 29776:20120522:202206.198 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:202253.431 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30308:20120522:202353.670 enabling Zabbix agent checks on host [host.name]: host became available
 30024:20120522:202508.563 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30307:20120522:202555.081 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30307:20120522:202655.102 enabling Zabbix agent checks on host [host.name]: host became available
 29965:20120522:202804.243 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:202831.493 resuming Zabbix agent checks on host [host.name]: connection restored
 30000:20120522:202907.101 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:202934.508 resuming Zabbix agent checks on host [host.name]: connection restored
 30090:20120522:203007.613 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:203054.173 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30304:20120522:203154.560 enabling Zabbix agent checks on host [host.name]: host became available
 30122:20120522:203306.661 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30308:20120522:203353.720 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30308:20120522:203453.113 enabling Zabbix agent checks on host [host.name]: host became available
 29579:20120522:203601.715 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30307:20120522:203648.344 resuming Zabbix agent checks on host [host.name]: connection restored
 30308:20120522:203648.521 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30307:20120522:203735.352 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30304:20120522:203835.011 enabling Zabbix agent checks on host [host.name]: host became available
 29836:20120522:203905.405 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30307:20120522:203932.370 resuming Zabbix agent checks on host [host.name]: connection restored
 29681:20120522:204007.878 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30307:20120522:204034.385 resuming Zabbix agent checks on host [host.name]: connection restored
 29687:20120522:204106.037 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:204153.092 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30304:20120522:204253.108 enabling Zabbix agent checks on host [host.name]: host became available
 29624:20120522:204403.412 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:204450.154 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30304:20120522:204550.177 enabling Zabbix agent checks on host [host.name]: host became available
 29952:20120522:204707.395 Zabbix agent item [run_this["sudo /home/script"]] on host [host.name] failed: first network error, wait for 27 seconds
 30304:20120522:204754.204 temporarily disabling Zabbix agent checks on host [host.name]: host unavailable
 30304:20120522:204854.225 enabling Zabbix agent checks on host [host.name]: host became available
Comment by Oleksii Zagorskyi [ 2013 Mar 17 ]

Another example, see attached 60_items_update_interval_10sec.png
It's server v 2.0.5

I have 60 agent items: keys are "echo[1]-echo[59]" (ids=23370-23428) and single item key "sleep[5]" (id=23429) - they all created sequentially.
Update interval is 10 seconds.
Userparameters are:
UserParameter=sleep[*], sleep $1
UserParameter=echo[*], echo "$1"

On the graph the sleep[5] item was enabled for some period and for that period data on the graph look worse.

Some part of server log for that period (DL=3):

 31958:20130317:143149.622 resuming Zabbix agent checks on host [IT0]: connection restored
 31955:20130317:143202.669 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143217.649 resuming Zabbix agent checks on host [IT0]: connection restored
 31953:20130317:143222.706 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143237.686 resuming Zabbix agent checks on host [IT0]: connection restored
 31954:20130317:143242.724 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143257.787 resuming Zabbix agent checks on host [IT0]: connection restored
 31953:20130317:143302.751 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143317.825 resuming Zabbix agent checks on host [IT0]: connection restored
 31957:20130317:143322.799 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143337.863 resuming Zabbix agent checks on host [IT0]: connection restored
 31956:20130317:143342.827 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143401.939 resuming Zabbix agent checks on host [IT0]: connection restored
 31953:20130317:143412.871 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143427.951 resuming Zabbix agent checks on host [IT0]: connection restored
 31956:20130317:143432.898 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143447.987 resuming Zabbix agent checks on host [IT0]: connection restored
 31954:20130317:143452.923 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143507.103 resuming Zabbix agent checks on host [IT0]: connection restored
 31957:20130317:143512.940 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143527.117 resuming Zabbix agent checks on host [IT0]: connection restored
 31957:20130317:143532.985 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143547.147 resuming Zabbix agent checks on host [IT0]: connection restored
 31956:20130317:143552.987 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143613.248 resuming Zabbix agent checks on host [IT0]: connection restored
 31955:20130317:143622.145 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143640.295 Zabbix agent item [sleep[5]] on host [IT0] failed: another network error, wait for 15 seconds
 31958:20130317:143655.300 resuming Zabbix agent checks on host [IT0]: connection restored
 31956:20130317:143702.048 Zabbix agent item [sleep[5]] on host [IT0] failed: first network error, wait for 15 seconds
 31958:20130317:143719.394 resuming Zabbix agent checks on host [IT0]: connection restored

Also additional graph - update interval for all items have been changed to 60 seconds - attached 60_items_update_interval_60sec.png

Conclusion - single timeouted item can "kill" agents monitoring.

Maybe we should consider separately cases when we cannot establish a TCP connection to a host and when it timeouted by response ?
For example in case of response timeout switch the item just to notsupported state and write meaningful item error text ?

Comment by Oleksii Zagorskyi [ 2013 Mar 17 ]

Related case but for SNMP only is there - ZBX-4026

Comment by richlv [ 2013 Mar 17 ]

this reminded me about an issue i reported earlier... or so i thought. i searched for a LONG time. eventually i wound draft text in some file =)

i'm not creating a new issue on this, but my text went like this :

be more resilient when some items fail on a host

currently, if a host has a failing item (for example, a script that timeouts), zabbix disables the host for 15 seconds (here and later default values will be used). during this time other item checks may be missed. if the next item to be checked works ok, host is re-enabled. then, when the script timeouts again, host is disabled again... this results in missing data and it's not easy to debug.

some more advanced algorithm could try to figure out that not all items fail and only make the failed items unsupported.

extending on this idea, a rough logic draft :

  • if an item fails, we note that down but keep on polling other items as before
  • if another item fails, we note that down and keep on polling other items up to "n" times
  • when n failures are reached, we declare the host to be unreachable
  • if other item succeeds, "n" is cleared, but all the items that failed so far get into unsupported state with whatever was the error message for them and are treated as unsupported items normally are

i can think of so many scenarios that would benefit from different values of "n" - not sure whether there's and easy way out to work properly in all of those

Comment by Aleksandrs Saveljevs [ 2015 Jun 17 ]

There was the following idea during initial investigation. Currently, when we have any kind of network timeout (both connect and read), we move all items of the host to the unreachable poller queue. As zalex_ua proposed, we can distinguish between connect timeout and read timeout. So if we could connect to a host, but there was a read timeout, we can move only this item to the unreachable poller queue.

However, there are two major considerations:

  • One consideration is whether we wish the item to become not supported. It seems dubious to make the item not supported straight away, because the read timeout could have been due to a legitimate network error (e.g., host was rebooted). So an idea of richlv's counters might be an option.
  • Second consideration is whether this approach will work for non-TCP items like SNMP and IPMI. Also, for JMX items, our connection from the server is to the gateway, not the monitored host.

Note that if the above solution will only solve the problem for Zabbix agent items, then implementing it might not be worth it, because there is a much simpler solution for Zabbix agent items - just make Zabbix server timeout larger than Zabbix agent timeout (which is already so by default since ZBXNEXT-2637).

Comment by Aleksandrs Saveljevs [ 2015 Jun 17 ]

(1) During investigation described above, it was considered to use TIMEOUT_ERROR constant for read timeouts. In the process, the following bug with TIMEOUT_ERROR was discovered (description below is based on 2.0 branch).

Currently, we only use TIMEOUT_ERROR in src/libs/zbxexec/execute.c and it is returned by zbx_execute(). That function is called in zbx_execute_script() (note just in case that "result" will not be populated in case of TIMEOUT_ERROR), which itself is called by execute_script(), which is called in node_process_command(). So TIMEOUT_ERROR is propagated from zbx_execute() to node_process_command(), where it is used as follows:

response = (FAIL == ret) ? ZBX_PROTO_VALUE_FAILED : ZBX_PROTO_VALUE_SUCCESS;

So TIMEOUT_ERROR will be considered a success when we execute a command from the frontend or as requested by a master node.

wiper This bug was fixed in 2.2, so I created a new bugreport for 2.0 branch. See ZBX-9657
CLOSED

Comment by Andris Zeila [ 2015 Jun 19 ]

Specification available at https://www.zabbix.org/wiki/Docs/specs/ZBX-4284

Comment by Andris Zeila [ 2015 Jun 25 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4284

Comment by dimir [ 2015 Aug 13 ]

Please review my small changes in r54968.

wiper Thanks!

Comment by Andris Zeila [ 2015 Aug 14 ]

Released in:

  • pre-2.2.11rc1 r54971
  • pre-2.4.7rc1 r54972
  • pre-2.5.0 r54973
Comment by richlv [ 2015 Aug 14 ]

(2) regarding the documentation, this improvement is really great and should go to go in whatsnew, and it would also be great to describe the new behaviour in the main manual.
additionally, the change in the behaviour should be listed in upgrade notes

wiper documentation:

I added to 2.2, after it's reviewed - I will copy to 2.4/3.0 documentation

<richlv> thanks, did some minor style changes, otherwise looks good to me - but would be great to have martins-v review it

martins-v Reading it the first time, I had some difficulty being sure what "this item" refers to in the sentence second part. So changed it to "problematic item" to clear ambiguity. But please check if I understood it right. Other than that, all seemed good.

wiper Copied to 2.4 and 3.0. Not entirely sure if I found right places to insert it.

<richlv> revisited the paragraph. it now says "...without affecting host availability", but that's a bit confusing - when will a host be set to be unreachable then ? after a single timeout ? if so, that's affecting host availability. after two items fail on a host ? that wouldn't work well on hosts with a single item

wiper When item fails with timeout for a first time - the host is internally marked as unreachable. If item fails for a second time and there was a succesfull check between the failures - the item is marked as not supported without affecting host availability. If host has only one item - there won't be succesfull checks between timeouts and host will be processed as unreachable (and marked as unavailable after UnreachablePeriod) as before.

<richlv> thanks. "internally marked" - only in the server cache ? when is that written out to the database ?

wiper after it stays unreachable for UnreachablePeriod seconds. Then its flagged as unavailable and database is updated.

asaveljevs So the documentation basically says the following:

Zabbix now tries to differentiate item timeouts from host timeouts. If another item check was successful between two failed checks of a problematic item, then the problematic item is marked as not supported after the second failed check without affecting host availability.

I must admit that I find it hard to infer this behavior from the specification at https://www.zabbix.org/wiki/Docs/specs/ZBX-4284#Specification , which says:

If an unreachable item failed with network/gateway error (basically it failed 2+ times), but host is not marked as unreachable, then this must be treated as item rather than network failure.
...
For passive agent items TIMEOUT_ERROR must be returned if the item failed with timeout when waiting for response from agent. NETWORK_ERROR must still be returned if the connection or data sending failed. Items returning TIMEOUT_ERROR must be marked as unreachable items and processed as such.

It does not describe how different items affect one another: TIMEOUT_ERROR marks an item as unreachable and, according to the specification, it is stored as not supported with the unreachable poller.

What if we have item checks A, B, C and both A and B time out? Will the host be marked as unreachable after B, but before C?

Does bug ZBX-10215 affect documentation in any way? REOPENED.

wiper The documentation was updated when fixing ZBX-10215.
RESOLVED

asaveljevs CLOSED

sasha Great work. Thanks a lot!

Comment by Aleksandrs Saveljevs [ 2015 Dec 29 ]

This seems to have introduced a bug: ZBX-10215.

Comment by Andris Zeila [ 2016 Jan 21 ]

Continuing from (2)

If A & B time out with TIMEOUT_ERROR, then nothing will be done to the host.

If A & B time out with NETWORK_ERROR, then host will be marked as unreachable (disable_until != 0) after the A timeout. If the B timeout will happen after UnreachablePeriod, then host will be flagged unavailable. Otherwise B timeout will simply increase disable_until by the UnreachableDelay.

Successfull C check will remove unreachable flag (reset the disable_until field to 0).

Comment by Andris Zeila [ 2016 Feb 09 ]

A serious problem was found in one real use case. The network was configured so that the server request to agent went through, but the agent response didn't. So server decided that those items are timeouting and instead of marking host as unreachable (which would be the correct action) the items were made unsupported and left for unreachable poller to process them at their defined intervals. Of course the unreachable poller could not handle it.

Based on all the problems that surfaced after this attempt of timeout item detection we dropped this detection in ZBX-10215. So a timeout/network error will delay all checks on the host by UnreachablePeriod, however it's guaranteed that the failing item(s) will be checked after other items, avoiding the situation described in this issue.

Apparently some kind of per item throttling (similar to all item throttling on unreachable hosts) should be implemented before trying to identify single timeouting items without affecting host unreachability.

Comment by Oleksii Zagorskyi [ 2016 Feb 12 ]

Andris, accidentally got a case when stopped zabbix agent on my workstation caused this:

 21696:20160212:174407.279 Starting Zabbix Server. Zabbix 2.4.7rc1 (revision 54982).
 21701:20160212:174407.327 server #4 started [poller #2]
 21704:20160212:174407.334 server #7 started [poller #5]
 21713:20160212:174407.344 server #16 started [history syncer #1]
...
 21701:20160212:174936.967 Item [Zabbix server:net.if.out[eth0]] error: Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused
 21704:20160212:174936.967 End of get_value_agent():NETWORK_ERROR
 21701:20160212:174936.967 End of get_value():NETWORK_ERROR
 21704:20160212:174936.967 Item [Zabbix server:vm.memory.size[available]] error: Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused
 21701:20160212:174936.967 In deactivate_host() hostid:10084 itemid:31536 type:0
 21704:20160212:174936.967 End of get_value():NETWORK_ERROR
 21704:20160212:174936.967 In deactivate_host() hostid:10084 itemid:23316 type:0
 21701:20160212:174936.967 query [txnlev:1] [begin;]
 21704:20160212:174936.967 End of deactivate_host()
 21704:20160212:174936.967 End of get_values():1
 21704:20160212:174936.967 In DCconfig_get_poller_nextcheck() poller_type:0
 21704:20160212:174936.967 End of DCconfig_get_poller_nextcheck():1455292176
 21701:20160212:174936.967 query [txnlev:1] [update hosts set errors_from=1455292176,disable_until=1455292191,error='Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused' where hostid=10084]
...
 21713:20160212:174937.138 item "Zabbix server:vm.memory.size[available]" became not supported: Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused
 21713:20160212:174937.138 In DCadd_nextcheck()
 21713:20160212:174937.138 End of DCadd_nextcheck()
 21713:20160212:174937.138 multiply_item_value_uint64() 19981,8 159848
 21713:20160212:174937.138 query [txnlev:1] [update items set state=1,error='Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused' where itemid=23316;

note how an item became not supported while I'd not expect that.

I have a full server debug log for that period of time.
Let me know if it's related and you would need full debuglog.
Looking at revisions, I suppose that my server binary includes current fix.

Comment by Andris Zeila [ 2016 Feb 15 ]

wiper Are you really sure the sever had at least this revision? With the fix only items failing with NOTSUPPORTED, AGENT_ERROR or CONFIG_ERROR errors will be made not supported.

zalex_ua You have referenced another issue report - ZBX-10215.
As you can see in first line of zabbix server log server is "Zabbix 2.4.7rc1 (revision 54982)."- it definitely does not include the ZBX-10215 fix, but is does include current issue ZBX-4248 changes.
Feel free to ignore my comment if it's actually outdated.

Comment by Oleksii Zagorskyi [ 2018 May 30 ]

Not sure it was really fixed or we lost those improvement. Further discussion in ZBX-14417.

Now it's clear that changes here were rolled back. Details in the ZBX.

Generated at Fri Apr 19 00:23:57 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.