[ZBX-7847] zabbix server continues polling disabled ipmi hosts Created: 2014 Feb 20 Updated: 2017 May 30 Resolved: 2015 Sep 08 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 2.2.2 |
Fix Version/s: | 3.0.0alpha2 |
Type: | Incident report | Priority: | Blocker |
Reporter: | richlv | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | ipmi | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
openipmi 2.0.16 and 2.0.21 |
Issue Links: |
|
Description |
test scenario : probably same happens if host is disabled, is in nodata maintenance and maybe some other scenario (like is deleted ?). it does not seem to be reproducible with one ipmi host only - there must be another, even though they are not related. |
Comments |
Comment by Aleksandrs Saveljevs [ 2014 May 08 ] |
|
Comment by Igors Homjakovs (Inactive) [ 2015 Jun 30 ] |
Fixed in svn://svn.zabbix.com/branches/dev/ZBX-7847 |
Comment by dimir [ 2015 Jul 29 ] |
Yes, zabbix still connects to ipmi host even after the host is deleted. In result, if you delete ipmi host and then add it again you end up with 2 entities each connecting to ipmi host. And if you do that again you get 3 and so on. |
Comment by dimir [ 2015 Jul 29 ] |
(1) Question, should we delete ipmi connection when host is deleted? With the fix it will be deleted after one day (we check for inactive ipmi connection every hour and delete it if it's one day old). dimir After discussion we decided to keep it that way but this has to be documented (see sub-issue 3 below). CLOSED |
Comment by dimir [ 2015 Jul 29 ] |
(2) In current implementation we use integer to store auto-incremented ID of ipmi host. Should we worry about overflow if we add lots of ipmi hosts and server constantly running without restart? dimir After discussion we decided that 4 billion should be enough for creating all ipmi connections during the working time of zabbix server or proxy. CLOSED |
Comment by dimir [ 2015 Jul 30 ] |
(3) [D] If ipmi checks are not performed (by any reason: all host ipmi items disabled/notsupported, host disabled/deleted, host in maintenance etc.) the ipmi connection will be terminated from Zabbix server or proxy in one day (24 hours + 0..60 minutes). This needs to be documented. igorsh Documented in https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/ipmi RESOLVED. <richlv> 0..60 notation is not used often. also, where does the 60 second interval come ? if that's config cache update, that could be changed, and we should actually say so here. what about older versions, do they keep on making connections for deleted/disabled hosts, too ? if so, we should document that. <dimir> The 60 minutes comes from the fact that we check for inactive hosts once an hour. So 0..60 depends on the time when Zabbix server was started. Documentation updated. RESOLVED <richlv> oh, i was sure it's in seconds <dimir> https://www.zabbix.com/documentation/2.2/manual/config/items/itemtypes/ipmi RESOLVED <richlv> yay, that should help users a lot - thank you |
Comment by dimir [ 2015 Jul 30 ] |
Successfully tested. Please review my changes in r54613 and r54620. |
Comment by richlv [ 2015 Aug 03 ] |
(4) after some discussion on irc, i'd like to raise the fact that one day seems like a very long period. on one hand, we could imagine a user who sets up some initial ipmi monitoring that's too aggressive, disables or deletes that host when it starts killing the endpoint... but zabbix would keep on hammering that interface for 23 more hours. another case could be a support call where zabbix is making ipmi connections, somebody could look at items - and no ipmi items would exist. value cache was brought up as an example as that is checked for old items every 24 hours, but that's conceptually different as it is an internal thing and would not potentially impact other systems. what are the potential drawbacks of lowering this time period ? what about ipmi items with interval longer than one day, how they be impacted ? <dimir> I personally agree, but we already discussed it with sasha and he decided to keep it that way. I'm not sure it's worth making this "24 hours" value configurable, I'd think just lowering it to 3 hours would be much better. 3 as in often used 3 attempts to decide something is not responding. sasha I agree to lowering this period to 3 hours <dimir> RESOLVED in r55456 sasha Looks good! CLOSED |
Comment by dimir [ 2015 Sep 08 ] |
Fixed in pre-3.0.0alpha2 (r55473). |