[ZBX-8149] In DM master server is sending mails with *UNKNOWN* values Created: 2014 Apr 24  Updated: 2017 May 30  Resolved: 2015 Feb 02

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Karol Pucynski Assignee: Unassigned
Resolution: Won't fix Votes: 5
Labels: dm, notifications
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File zabbix_server_node_info.patch     File zabbix_server_node_info_v2.patch    
Issue Links:
Duplicate
is duplicated by ZBX-8553 Trigger: Zabbix agent on *UNKNOWN* Closed

 Description   

We have configured DM. Slave server is sengind e-mail allerts independently.
When master server is sending e-mail about the same alert it is not resolving properly hostname, ip address and data values.

Data and configuration is synchronized between nodes.

Mail from Master server:


Event: Priv login failed: PROBLEM
Host: UNKNOWN
Service: Priv login failed
State: PROBLEM-2002000000015442
Date: 2014.04.24 12:25:15
Severity: Disaster
ServiceName: Priv login failed, 2002000000013934
EventId: 2002000000015442
New value: UNKNOWN
Last value: UNKNOWN
MAX for 15 minutes: UNKNOWN
MIN for 15 minutes: UNKNOWN


Mail from Slave server:


Status: PROBLEM
Host: Zabbix server (172.16.200.46)
Severity: Disaster

Event ID: 2002000000015442
Event Time: 2014.04.24 - 12:25:15
Event Duration: 0m

Alert Details:
Priv login failed

Last Item Value:
1


Where can be the problem?



 Comments   
Comment by richlv [ 2014 Apr 24 ]

with nodes being removed in zabbix 2.4, this is unlikely to be looked into, sorry

Comment by Karol Pucynski [ 2014 Apr 25 ]

As far as I know Zabbix 2.2 is LTS release.
2.4 should have to impact on this.

Comment by Anton Samets [ 2014 Apr 25 ]

Probably ZBX-8092?

Comment by Karol Pucynski [ 2014 Apr 25 ]

ZBX-8092 is similar, but my hosts are not disabled.
They have status "Monitored" and Hostname, Agent interface and JMX interface fields filled in the configuration.

Comment by Aleksandrs Saveljevs [ 2014 Apr 25 ]

This might be the same issue as ZBX-8092, because hosts from non-local nodes are not kept in configuration cache.

Comment by Karol Pucynski [ 2014 Apr 28 ]

Zabbix frontend is showing data properly - it seems to be bug only in the zabbix server e-mail handling...

Comment by Karol Pucynski [ 2014 Apr 28 ]

It also affects version 2.2.3

Comment by Giovanni Lovato [ 2014 Jun 03 ]

Will this be looked into? We just upgraded a complex distributed architecture with severals levels of hierarchy and we will stick with 2.2 for a while since 2.4 and 2.6 won't support multi-level DM.

Comment by Karol Pucynski [ 2014 Jul 03 ]

Is there any info about this issue? I think mamy people are now stick to 2.2.X since next LTS release is far away...

Comment by Christian Wolff [ 2014 Nov 26 ]

After a update from 2.0.13 to 2.2.7 we seem to have the same issue with our node setup. Please have a look at it! Thanks!

Please also add this bug to the known issues section: https://www.zabbix.com/documentation/2.2/manual/installation/known_issues

Comment by Christian Wolff [ 2014 Nov 27 ]

We did some further debugging and here is an example from one of our alarms. Maybe this helps or anyone has an idea on this one.

 14937:20141127:112216.388 In substitute_simple_macros() data:'XXX : {HOST.HOST}: {TRIGGER.NAME}: {TRIGGER.STATUS} ({TRIGGER.SEVERITY}) = {ITEM.LASTVALUE}'
 14937:20141127:112216.388 In DBget_trigger_value()
 14937:20141127:112216.388 In get_N_itemid() expression:'{200200000007468}=0' N_functionid:1
 14937:20141127:112216.388 In get_N_functionid() expression:'{200200000007468}=0' N_functionid:1
 14937:20141127:112216.388 get_N_functionid() functionid:200200000007468
 14937:20141127:112216.388 End of get_N_functionid():SUCCEED
 14937:20141127:112216.388 End of get_N_itemid():FAIL
 14937:20141127:112216.388 End of DBget_trigger_value():FAIL
 14937:20141127:112216.388 cannot resolve macro '{HOST.HOST}'
 14937:20141127:112216.388 In substitute_simple_macros() data:'Puppet Agent not running'
 14937:20141127:112216.388 In DBitem_lastvalue()
 14937:20141127:112216.388 In get_N_itemid() expression:'{200200000007468}=0' N_functionid:1
 14937:20141127:112216.388 In get_N_functionid() expression:'{200200000007468}=0' N_functionid:1
 14937:20141127:112216.388 get_N_functionid() functionid:200200000007468
 14937:20141127:112216.388 End of get_N_functionid():SUCCEED
 14937:20141127:112216.388 End of get_N_itemid():FAIL
 14937:20141127:112216.388 End of DBitem_lastvalue():FAIL
 14937:20141127:112216.388 cannot resolve macro '{ITEM.LASTVALUE}'
 14937:20141127:112216.388 End substitute_simple_macros() data:'XXX: *UNKNOWN*: Puppet Agent not running: OK (Information) = *UNKNOWN*'
Comment by Leo Antunes [ 2014 Dec 01 ]

This seems to be caused by the configuration caching mechanism. We came up with the attached patch and it seems to solve the problem.

A bit of background: during the generation of the alarm messages, the "config" structure is used to look for the macros in DBget_trigger_value(). This structure is however only populated with information concerning the current node. That means: if the node is generating alarms for remote nodes, it can't find information about host/item in the "config" structure, leading to the observed symptoms.
The (admittedly dirty) patch attempts to fix this by filling the "config" structure with information about all nodes. After running the patched version of 2.2.7 in production for a few hours we still haven't run into any issues, but YMMV. Since I'm not familiar with the whole code, I can't be sure there aren't any side-effects we still haven't seen.

Comment by Christian Wolff [ 2014 Dec 01 ]

Thanks again for the good work Leo!

Comment by Leo Antunes [ 2014 Dec 02 ]

Guess I spoke too soon. Just updated the (still dirty) patch with a second try, but it seems the patch creates a new problem: the .nodata() trigger function started triggering on our patched server, even though data is coming in for the affected items.

Unfortunately, we're considering whether to completely abandon the DM setup and just use proxies (or even independent servers), so I'm not sure I'll be able to dedicate any more time trying to ferret out this side-effect.

Comment by richlv [ 2014 Dec 02 ]

note that upgrade to 2.4 will automatically convert child nodes into proxies (as dm is removed in 2.4)

Comment by Christian Wolff [ 2014 Dec 02 ]

Is the historical data from the child nodes still accessible after an upgrade?

Comment by richlv [ 2014 Dec 02 ]

such discussion is out of scope for this issue. any documentation related to the dm removal task should be handled in ZBXNEXT-1343

Comment by richlv [ 2015 Feb 02 ]

with nodes being removed since 2.4, this issue is unlikely to be looked in - closing

Generated at Thu May 02 00:57:49 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.