[ZBX-10759] function "update_triggers_status_to_unknown" updates trigger prototypes, which is wrong Created: 2016 May 06  Updated: 2017 May 30  Resolved: 2016 Aug 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.17, 2.2.12, 2.4.8, 3.0.2
Fix Version/s: 2.2.15rc1, 3.0.5rc1

Type: Incident report Priority: Minor
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: triggerprototypes, unavailable
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File ZBX-10759.diff    

 Description   

When host become unavailable, zabbix server marks some triggers (not time based) as unknown.
The problem is that a select query does not limit triggers by normal and lld-created ones, so trigger prototypes being updated too, which is bad itself and may cause other problems.

For example it caused a crash which is reported in ZBX-10692.

Another problem that internal events and escalation (if internal action for triggers does exist) being generated and alerts being send, although for users such alert looks like ones for already deleted trigger, which may mislead too.

Later, such trigger prototype values will never be cleaned out, which is reported as ZBX-10761 to be somehow fixed.

See debug sections below.

Attached patch fixes the issue.



 Comments   
Comment by Oleksii Zagorskyi [ 2016 May 06 ]

Before fix, triggerid=13646 is a trigger prototype, other 2 are ll-generated ones:
you can see filtered activity for "unreachable poller" and then for "escalator":

 23107:20160506:135845.206 __zbx_zbx_setproctitle() title:'unreachable poller #1 [got 0 values in 0.000055 sec, getting values]'
 23107:20160506:135845.206 In get_values()
 23107:20160506:135845.206 In DCconfig_get_poller_items() poller_type:1
 23107:20160506:135845.206 End of DCconfig_get_poller_items():1
 23107:20160506:135845.206 In substitute_key_macros() data:'vfs.fs.size[/,pfree]'
 23107:20160506:135845.206 End of substitute_key_macros():SUCCEED data:'vfs.fs.size[/,pfree]'
 23107:20160506:135845.206 In substitute_simple_macros() data:'10050'
 23107:20160506:135845.206 In get_value() key:'vfs.fs.size[/,pfree]'
 23107:20160506:135845.206 In get_value_agent() host:'test-host2' addr:'127.0.0.1' key:'vfs.fs.size[/,pfree]' conn:'unencrypted'
 23107:20160506:135845.206 End of get_value_agent():NETWORK_ERROR
 23107:20160506:135845.206 Item [test-host2:vfs.fs.size[/,pfree]] error: Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused
 23107:20160506:135845.206 End of get_value():NETWORK_ERROR
 23107:20160506:135845.206 In deactivate_host() hostid:10113 itemid:23843 type:0
 23107:20160506:135845.206 query [txnlev:1] [begin;]
 23107:20160506:135845.206 query [txnlev:1] [update hosts set available=2,error='Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused',disable_until=1462532385 where hostid=10113]
 23107:20160506:135845.207 query [txnlev:1] [commit;]
 23107:20160506:135845.207 temporarily disabling Zabbix agent checks on host "test-host2": host unavailable
 23107:20160506:135845.207 In update_triggers_status_to_unknown() hostid:10113
 23107:20160506:135845.207 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.priority,t.type,t.value,t.state,t.error,t.lastchange from items i,functions f,triggers t,hosts h where i.itemid=f.itemid and f.triggerid=t.triggerid and i.hostid=h.hostid and i.status=0 and i.state=0 and i.type in (0) and f.function not in ('nodata','date','dayofmonth','dayofweek','time','now') and t.status=0 and h.hostid=10113 and h.status=0 and not exists (select 1 from functions f2,items i2,hosts h2 where f2.triggerid=f.triggerid and f2.itemid=i2.itemid and i2.hostid=h2.hostid and (f2.function in ('nodata','date','dayofmonth','dayofweek','time','now') or (i2.type not in (0) and (i2.type not in (0,1,4,6,12,16) or (i2.type in (0) and h2.available=1) or (i2.type in (1,4,6) and h2.snmp_available=1) or (i2.type in (12) and h2.ipmi_available=1) or (i2.type in (16) and h2.jmx_available=1)))) and i2.status=0 and i2.state=0 and h2.status=0) order by t.triggerid]
 23107:20160506:135845.207 In process_trigger() triggerid:13646 value:0(0) new_value:2
 23107:20160506:135845.207 End of process_trigger():SUCCEED
 23107:20160506:135845.207 query [txnlev:1] [begin;]
 23107:20160506:135845.207 query [txnlev:1] [update triggers set state=1,error='Agent is unavailable.' where triggerid=13646]
 23107:20160506:135845.207 query [txnlev:1] [commit;]
 23107:20160506:135845.208 In process_trigger() triggerid:13647 value:0(0) new_value:2
 23107:20160506:135845.208 End of process_trigger():SUCCEED
 23107:20160506:135845.208 query [txnlev:1] [begin;]
 23107:20160506:135845.208 query [txnlev:1] [update triggers set state=1,error='Agent is unavailable.' where triggerid=13647]
 23107:20160506:135845.208 query [txnlev:1] [commit;]
 23107:20160506:135845.208 In process_trigger() triggerid:13648 value:0(0) new_value:2
 23107:20160506:135845.208 End of process_trigger():SUCCEED
 23107:20160506:135845.208 query [txnlev:1] [begin;]
 23107:20160506:135845.208 query [txnlev:1] [update triggers set state=1,error='Agent is unavailable.' where triggerid=13648]
 23107:20160506:135845.208 query [txnlev:1] [commit;]
 23107:20160506:135845.208 query [txnlev:1] [begin;]
 23107:20160506:135845.208 In process_events() events_num:3
 23107:20160506:135845.208 In DCget_nextid() table:'events' num:3
 23107:20160506:135845.208 query [txnlev:1] [select max(eventid) from events where eventid between 0 and 9223372036854775807]
 23107:20160506:135845.208 End of DCget_nextid() table:'events' [71:73]
 23107:20160506:135845.208 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (71,3,0,13646,1462532325,206661718,1),(72,3,0,13647,1462532325,206661718,1),(73,3,0,13648,1462532325,206661718,1);
]
 23107:20160506:135845.208 In process_actions() events_num:3
 23107:20160506:135845.208 In zbx_dc_get_actions_eval()
 23107:20160506:135845.208 End of zbx_dc_get_actions_eval() actions:1
 23107:20160506:135845.208 In check_action_conditions() actionid:8
 23107:20160506:135845.208 In check_action_condition() actionid:8 conditionid:12 cond.value:'4'
 23107:20160506:135845.208 In check_internal_condition()
 23107:20160506:135845.208 End of check_internal_condition():SUCCEED
 23107:20160506:135845.208 End of check_action_condition():SUCCEED
 23107:20160506:135845.208 End of check_action_conditions():SUCCEED
 23107:20160506:135845.208 In DCget_nextid() table:'escalations' num:1
 23107:20160506:135845.208 query [txnlev:1] [select max(escalationid) from escalations where escalationid between 0 and 9223372036854775807]
 23107:20160506:135845.208 End of DCget_nextid() table:'escalations' [1:1]
 23107:20160506:135845.208 In check_action_conditions() actionid:8
 23107:20160506:135845.208 In check_action_condition() actionid:8 conditionid:12 cond.value:'4'
 23107:20160506:135845.208 In check_internal_condition()
 23107:20160506:135845.208 End of check_internal_condition():SUCCEED
 23107:20160506:135845.208 End of check_action_condition():SUCCEED
 23107:20160506:135845.208 End of check_action_conditions():SUCCEED
 23107:20160506:135845.208 In DCget_nextid() table:'escalations' num:1
 23107:20160506:135845.208 End of DCget_nextid() table:'escalations' [2:2]
 23107:20160506:135845.208 In check_action_conditions() actionid:8
 23107:20160506:135845.208 In check_action_condition() actionid:8 conditionid:12 cond.value:'4'
 23107:20160506:135845.208 In check_internal_condition()
 23107:20160506:135845.208 End of check_internal_condition():SUCCEED
 23107:20160506:135845.208 End of check_action_condition():SUCCEED
 23107:20160506:135845.208 End of check_action_conditions():SUCCEED
 23107:20160506:135845.208 In DCget_nextid() table:'escalations' num:1
 23107:20160506:135845.208 End of DCget_nextid() table:'escalations' [3:3]
 23107:20160506:135845.208 query [txnlev:1] [insert into escalations (escalationid,actionid,status,triggerid,itemid,eventid,r_eventid) values (1,8,0,13646,null,71,null),(2,8,0,13647,null,72,null),(3,8,0,13648,null,73,null);
]
 23107:20160506:135845.208 End of process_actions()
 23107:20160506:135845.208 In DBupdate_itservices()
 23107:20160506:135845.208 End of DBupdate_itservices():SUCCEED
 23107:20160506:135845.208 End of process_events()
 23107:20160506:135845.208 query [txnlev:1] [commit;]
 23107:20160506:135845.208 End of update_triggers_status_to_unknown()
 23107:20160506:135845.208 deactivate_host() errors_from:1462532280 available:2
 23107:20160506:135845.209 End of deactivate_host()
 23107:20160506:135845.209 End of get_values():1
 23107:20160506:135845.209 __zbx_zbx_setproctitle() title:'unreachable poller #1 [got 1 values in 0.002552 sec, getting values]'
 23107:20160506:135845.209 In get_values()
 23107:20160506:135845.209 In DCconfig_get_poller_items() poller_type:1
 23107:20160506:135845.209 End of DCconfig_get_poller_items():0
 23107:20160506:135845.209 In DCconfig_get_poller_nextcheck() poller_type:1
 23107:20160506:135845.209 End of DCconfig_get_poller_nextcheck():1462532385
 23107:20160506:135845.209 End of get_values():0
 23107:20160506:135845.209 __zbx_zbx_setproctitle() title:'unreachable poller #1 [got 0 values in 0.000055 sec, idle 5 sec]'

 
 23127:20160506:135845.866 __zbx_zbx_setproctitle() title:'escalator #1 [processed 0 escalations in 0.000629 sec, processing escalations]'
 23127:20160506:135845.866 In process_escalations()
 23127:20160506:135845.866 query [txnlev:0] [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations where triggerid is not null order by actionid,triggerid,itemid,escalationid]
 23127:20160506:135845.866 query [txnlev:1] [begin;]
 23127:20160506:135845.866 In check_escalation() escalationid:1 status:active
 23127:20160506:135845.867 query [txnlev:1] [select source,object from events where eventid=71]
 23127:20160506:135845.867 query [txnlev:1] [select actionid,name,status,eventsource,esc_period,def_shortdata,def_longdata,r_shortdata,r_longdata,recovery_msg from actions where actionid=8]
 23127:20160506:135845.867 End of check_escalation():SUCCEED error: 'trigger id:13646 deleted.'
 23127:20160506:135845.867 In execute_escalation() escalationid:1 status:active
 23127:20160506:135845.867 query [txnlev:1] [select eventid,source,object,objectid,clock,value,acknowledged,ns from events where eventid=71]
 23127:20160506:135845.867 query [txnlev:1] [select description,expression,priority,comments,url from triggers where triggerid=13646]
 23127:20160506:135845.867 In execute_operations()
 23127:20160506:135845.867 query [txnlev:1] [select o.operationid,o.operationtype,o.esc_period,o.evaltype,m.operationid,m.default_msg,subject,message,mediatypeid from operations o left join opmessage m on m.operationid=o.operationid where o.actionid=8 and o.operationtype in (0,1) and o.esc_step_from<=1 and (o.esc_step_to=0 or o.esc_step_to>=1)]
 23127:20160506:135845.868 In check_operation_conditions() operationid:8
 23127:20160506:135845.868 query [txnlev:1] [select conditiontype,operator,value from opconditions where operationid=8 order by conditiontype]
 23127:20160506:135845.868 End of check_operation_conditions():SUCCEED
 23127:20160506:135845.868 Conditions match our event. Execute operation.
 23127:20160506:135845.868 In add_object_msg()
 23127:20160506:135845.868 query [txnlev:1] [select userid from opmessage_usr where operationid=8 union select g.userid from opmessage_grp m,users_groups g where m.usrgrpid=g.usrgrpid and m.operationid=8]
 23127:20160506:135845.868 query [txnlev:1] [select count(*) from usrgrp g,users_groups ug where ug.userid=1 and g.usrgrpid=ug.usrgrpid and g.users_status=1]
 23127:20160506:135845.869 In get_trigger_permission()
 23127:20160506:135845.869 query [txnlev:1] [select distinct i.hostid from items i,functions f where i.itemid=f.itemid and f.triggerid=13646]
 23127:20160506:135845.869 In get_host_permission()
 23127:20160506:135845.869 query [txnlev:1] [select type from users where userid=1]
 23127:20160506:135845.869 End of get_host_permission():rw
 23127:20160506:135845.869 End of get_trigger_permission():rw
 23127:20160506:135845.869 In substitute_simple_macros() data:'NAME'
 23127:20160506:135845.869 In substitute_simple_macros() data:'NOTE: Escalation cancelled: trigger id:13646 deleted.
'
 23127:20160506:135845.869 In add_user_msg()
 23127:20160506:135845.869 End of add_user_msg()
 23127:20160506:135845.869 End of add_object_msg()
 23127:20160506:135845.869 In add_message_alert()
 23127:20160506:135845.869 query [txnlev:1] [select m.mediatypeid,m.sendto,m.severity,m.period,mt.status from media m,media_type mt where m.mediatypeid=mt.mediatypeid and m.active=0 and m.userid=1]
 23127:20160506:135845.869 query [txnlev:1] [select name,surname,alias from users where userid=1]
 23127:20160506:135845.870 In DCget_nextid() table:'alerts' num:1
 23127:20160506:135845.870 query [txnlev:1] [select max(alertid) from alerts where alertid between 0 and 9223372036854775807]
 23127:20160506:135845.870 End of DCget_nextid() table:'alerts' [27:27]
 23127:20160506:135845.870 query [txnlev:1] [insert into alerts (alertid,actionid,eventid,userid,clock,subject,message,status,retries,error,esc_step,alerttype) values (27,8,71,1,1462532325,'NAME','NOTE: Escalation cancelled: trigger id:13646 deleted.
',2,3,'No media defined for user "Zabbix Administrator (Admin)"',1,0);
]

With suggested fix, on "clean" host, so trigger prototype is not included in the select, lld-created triggers do not exist yet:

 21406:20160506:125205.198 __zbx_zbx_setproctitle() title:'unreachable poller #1 [got 0 values in 0.000053 sec, getting values]'
 21406:20160506:125205.198 In get_values()
 21406:20160506:125205.198 In DCconfig_get_poller_items() poller_type:1
 21406:20160506:125205.198 End of DCconfig_get_poller_items():1
 21406:20160506:125205.198 In substitute_key_macros() data:'vfs.fs.discovery'
 21406:20160506:125205.198 End of substitute_key_macros():SUCCEED data:'vfs.fs.discovery'
 21406:20160506:125205.198 In substitute_simple_macros() data:'10050'
 21406:20160506:125205.198 In get_value() key:'vfs.fs.discovery'
 21406:20160506:125205.198 In get_value_agent() host:'test-host2' addr:'127.0.0.1' key:'vfs.fs.discovery' conn:'unencrypted'
 21406:20160506:125205.199 End of get_value_agent():NETWORK_ERROR
 21406:20160506:125205.199 Item [test-host2:vfs.fs.discovery] error: Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused
 21406:20160506:125205.199 End of get_value():NETWORK_ERROR
 21406:20160506:125205.199 In deactivate_host() hostid:10113 itemid:23841 type:0
 21406:20160506:125205.199 query [txnlev:1] [begin;]
 21406:20160506:125205.199 query [txnlev:1] [update hosts set available=2,error='Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused',disable_until=1462528385 where hostid=10113]
 21406:20160506:125205.199 query [txnlev:1] [commit;]
 21406:20160506:125205.199 temporarily disabling Zabbix agent checks on host "test-host2": host unavailable
 21406:20160506:125205.199 In update_triggers_status_to_unknown() hostid:10113
 21406:20160506:125205.199 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.priority,t.type,t.value,t.state,t.error,t.lastchange from items i,functions f,triggers t,hosts h where i.itemid=f.itemid and f.triggerid=t.triggerid and i.hostid=h.hostid and i.status=0 and i.state=0 and i.type in (0) and f.function not in ('nodata','date','dayofmonth','dayofweek','time','now') and t.status=0 and t.flags in (0,4) and h.hostid=10113 and h.status=0 and not exists (select 1 from functions f2,items i2,hosts h2 where f2.triggerid=f.triggerid and f2.itemid=i2.itemid and i2.hostid=h2.hostid and (f2.function in ('nodata','date','dayofmonth','dayofweek','time','now') or (i2.type not in (0) and (i2.type not in (0,1,4,6,12,16) or (i2.type in (0) and h2.available=1) or (i2.type in (1,4,6) and h2.snmp_available=1) or (i2.type in (12) and h2.ipmi_available=1) or (i2.type in (16) and h2.jmx_available=1)))) and i2.status=0 and i2.state=0 and h2.status=0) order by t.triggerid]
 21406:20160506:125205.200 query [txnlev:1] [begin;]
 21406:20160506:125205.200 In process_events() events_num:0
 21406:20160506:125205.200 End of process_events()
 21406:20160506:125205.200 query [txnlev:1] [commit;]
 21406:20160506:125205.200 End of update_triggers_status_to_unknown()
 21406:20160506:125205.200 deactivate_host() errors_from:1462528280 available:2
 21406:20160506:125205.200 End of deactivate_host()
 21406:20160506:125205.200 End of get_values():1
 21406:20160506:125205.200 __zbx_zbx_setproctitle() title:'unreachable poller #1 [got 1 values in 0.002030 sec, getting values]'
 21406:20160506:125205.200 In get_values()
 21406:20160506:125205.200 In DCconfig_get_poller_items() poller_type:1
 21406:20160506:125205.200 End of DCconfig_get_poller_items():0
 21406:20160506:125205.200 In DCconfig_get_poller_nextcheck() poller_type:1
 21406:20160506:125205.201 End of DCconfig_get_poller_nextcheck():1462528385
 21406:20160506:125205.201 End of get_values():0
 21406:20160506:125205.201 __zbx_zbx_setproctitle() title:'unreachable poller #1 [got 0 values in 0.000131 sec, idle 5 sec]'
Comment by Sergejs Paskevics [ 2016 Aug 03 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10759

Comment by Andris Zeila [ 2016 Aug 04 ]

Successfully tested

Comment by Sergejs Paskevics [ 2016 Aug 04 ]

Fixed in pre2.2.15rc1 r61416, pre3.0.5rc1 r61418

Generated at Fri Apr 26 19:08:17 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.