ZABBIX BUGS AND ISSUES

Proxy doesn't include unsupported items in a list of active checks after 'refresh_unsupported' interval expires

Details

  • Zabbix ID:
    RTF

Description

src/zabbix_server/trapper/active.c functions 'send_list_of_active_checks' and 'send_list_of_active_checks_json' supposed to include unsupported items it a list of active checks for a host after CONFIG_REFRESH_UNSUPPORTED interval expires. Relevant lines from SQL query: 'select ... from items where ... or (i.status=%d and i.lastclock+%d<=%d) ...'.

The problem is that .lastclock field of 'items' table is always NULL, i.e. proxy doesn't update .lastclock field when it receives item values from agents.

To check that I deleted proxy database and let proxy create a new empty one. After some period of time I performed query: 'select count(*) from items where lastclock is not NULL' and got '0'.
Work-around for this issue is to manually set .lastclock field to some value in the past, i.e. 0. After that re-enabling of usupported items starts working again.

Activity

Hide
Łukasz Jernaś added a comment -

This hit us lately as a serious issue, due to MySQL monitoring getting disabled when the database server is down, for doing backups, etc. Almost all of our hosts are monitored via proxies...

Show
Łukasz Jernaś added a comment - This hit us lately as a serious issue, due to MySQL monitoring getting disabled when the database server is down, for doing backups, etc. Almost all of our hosts are monitored via proxies...
Hide
michael chan added a comment -

Same problem here. Often proxied checks become unsupported due to timeouts, etc. and never get re-enabled. I have to manually every day go in and modify this, which is unacceptable for a monitoring system. And all my checks are done via proxies, since this is best practice.

Show
michael chan added a comment - Same problem here. Often proxied checks become unsupported due to timeouts, etc. and never get re-enabled. I have to manually every day go in and modify this, which is unacceptable for a monitoring system. And all my checks are done via proxies, since this is best practice.
Hide
Sergei Turchanov added a comment -

Michael, please try my work-around to see if it works for you. Go to database configured for your proxy and execute:

update items set lastclock = 0 where lastclock is NULL;

after that re-enabling of unsupported items should be working again, although you need to execute this statement every time new hosts or items have been added to monitor through that proxy.

Show
Sergei Turchanov added a comment - Michael, please try my work-around to see if it works for you. Go to database configured for your proxy and execute: update items set lastclock = 0 where lastclock is NULL; after that re-enabling of unsupported items should be working again, although you need to execute this statement every time new hosts or items have been added to monitor through that proxy.
Hide
michael chan added a comment - - edited

This does fix the issue - will it cause any issues if the above is done periodically, e.g. done daily via a cronjob?

Show
michael chan added a comment - - edited This does fix the issue - will it cause any issues if the above is done periodically, e.g. done daily via a cronjob?
Hide
Sergei Turchanov added a comment - - edited

It shouldn't cause any issues. Agents periodically ask proxy for a list of active checks. And the above mentioned functions send_list_of_active_checks{,_json} do not include metrics which became unsupported in that list unless sufficient time has passed ("Refresh unsupported items" configuration option - 600 seconds by default). To do so the field lastclock in the table items hold a timestamp when proxy retrieved successfully(?) a metric from an agent. Given that timestamp and current time you can measure the interval after which an unsupported metric must be re-included in a list of active checks.
Something apparently gone wrong with zabbix 2.x series as the lastclock in not updated anymore, so any new items which appear in proxy database now have NULL value assigned to that field. And SQL arithmetics has special treating of NULL value so that 'i.lastclock+%d<=%d' evaluates to 'false'. So if you set lastclock to some value in the past (0 is ok) you short-cut the logic which postpones sending unsupported items so that those items will always be included in a list.

P.S. If you use sqlite for your proxy db backend you may need to implement re-try logic in cronjob because sqlite sometimes report 'Database is locked' when zabbix-proxy updates it at the same time.

Show
Sergei Turchanov added a comment - - edited It shouldn't cause any issues. Agents periodically ask proxy for a list of active checks. And the above mentioned functions send_list_of_active_checks{,_json} do not include metrics which became unsupported in that list unless sufficient time has passed ("Refresh unsupported items" configuration option - 600 seconds by default). To do so the field lastclock in the table items hold a timestamp when proxy retrieved successfully(?) a metric from an agent. Given that timestamp and current time you can measure the interval after which an unsupported metric must be re-included in a list of active checks. Something apparently gone wrong with zabbix 2.x series as the lastclock in not updated anymore, so any new items which appear in proxy database now have NULL value assigned to that field. And SQL arithmetics has special treating of NULL value so that 'i.lastclock+%d<=%d' evaluates to 'false'. So if you set lastclock to some value in the past (0 is ok) you short-cut the logic which postpones sending unsupported items so that those items will always be included in a list. P.S. If you use sqlite for your proxy db backend you may need to implement re-try logic in cronjob because sqlite sometimes report 'Database is locked' when zabbix-proxy updates it at the same time.
Hide
Alexander Vladishev added a comment -

Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-5149

Show
Alexander Vladishev added a comment - Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-5149
Hide
dimir added a comment - - edited

Successfully tested.

We decided to fix it by adding an item's "lastclock" value to the cache. So, the proxy won't still be updating the "lastclock" in the database but instead keep it in the cache. Cached "lastclock" will be also available on the server side.

This is very important fix. Before it, if you had a proxy (any, active or passive) working with an active agent and any of your items would go NOT_SUPPORTED they wouldn't be back from that status ever (unless you e. g. set the "lastclock" to 0 in proxy DB manually or change item status manually etc).

Show
dimir added a comment - - edited Successfully tested. We decided to fix it by adding an item's "lastclock" value to the cache. So, the proxy won't still be updating the "lastclock" in the database but instead keep it in the cache. Cached "lastclock" will be also available on the server side. This is very important fix. Before it, if you had a proxy (any, active or passive) working with an active agent and any of your items would go NOT_SUPPORTED they wouldn't be back from that status ever (unless you e. g. set the "lastclock" to 0 in proxy DB manually or change item status manually etc).
Hide
Alexander Vladishev added a comment -

Fixed in versions pre-2.0.4 r31543 and pre-2.1.0 (trunk) r31547.

Show
Alexander Vladishev added a comment - Fixed in versions pre-2.0.4 r31543 and pre-2.1.0 (trunk) r31547.

People

Vote (2)
Watch (7)

Dates

  • Created:
    Updated:
    Resolved: