[ZBX-13765] Network Discovery Action - Unexpected message contains are being sent Created: 2018 Apr 18  Updated: 2024 Apr 10  Resolved: 2019 May 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.0.0alpha5
Fix Version/s: 3.0.19rc1, 3.4.11rc1, 4.0.0alpha8, 4.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Aleksejs Petrovs Assignee: Andris Mednis
Resolution: Fixed Votes: 0
Labels: action, notifications
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File email-sending-upon-device-discovered.png    
Issue Links:
Duplicate
Team: Team A
Team: Team A
Sprint: Sprint 32, Sprint 33, Sprint 34, Sprint 35, Sprint 41, Sprint 42, Sprint 46, Nov 2018, Sprint 47, Dec 2018, Sprint 48, Jan 2019, Sprint 49 (Feb 2019), Sprint 50 (Mar 2019), Sprint 51 (Apr 2019), Sprint 52 (May 2019)
Story Points: 7

 Description   

Steps to reproduce:
1. Configure the Network Discovery to discover the service, for example - HTTP
2. Create an action to send a notification with the condition "Discovery state = Discovered"

Result:
The message contains:

Discovery rule: Test

Device IP:192.168.3.10
Device DNS:
Device status: DOWN
Device uptime: 29m

Device service name: HTTP
Device service port: 80
Device service status: UP
Device service uptime: 0m

Action: Create host
Action ID: 8

Expected:
The message should contain the information that the device is UP, not DOWN



 Comments   
Comment by Alexander Vladishev [ 2018 Apr 20 ]

It works as documented.

Comment by Alexander Vladishev [ 2018 Apr 20 ]

I close this issue as "Won't fix".

Comment by richlv [ 2018 May 08 ]

Alexander, could you please expand on how this matches the documentation? There seem to be two discrepancies:

Problem 1. Why are up/down alerts sent when the condition limits to "discovered" only?
Problem 2. How does the device get marked as down in the first place if there is at least one service up on that device?

Comment by Alexander Vladishev [ 2018 May 08 ]

richlv, I misunderstood the problem. You are absolutely right. If the service is active, the device must not be in the DOWN state. It will be fixed soon.

Comment by richlv [ 2018 May 08 ]

Alexander, thank you for the quick reply. What about the alerts being sent about up/down events when the action has a condition "Discovery state = Discovered"?

Comment by Andris Mednis [ 2018 May 15 ]

Cannot yet reproduce on 4.0.0alpha5.
Has anyone reproduced it or only observed sometimes ?

Comment by Andris Mednis [ 2018 May 15 ]

richlv, what is wrong with "alerts being sent about up/down events when the action has a condition "Discovery state = Discovered" ?
A user wants to get notified upon "Discovery state = Discovered" events.
In the e-mail subject and body by default the {DISCOVERY.DEVICE.STATUS} macro is inserted and its value is resolved from database when sending e-mail.
What is wrong with that ? What is expected ?

Comment by richlv [ 2018 May 15 ]

andris, the "discovered" type of event is supposed to be only generated when a device or service is first discovered. All further occurrences are expected to be "up" type events.
In this case, even though the discovery action had a condition limiting notifications only to "discovered" events, alerts about "up" and "down" events are sent every discovery cycle.

Comment by Andris Mednis [ 2018 May 16 ]

Thanks, richlv! Now I understand the Problem 1.

It can be reproduced:

  • create a discovery rule, e.g. to discover LDAP service on some host.
  • create action with condition "Discovery status = Discovered" and operation "Send message".

When a working LDAP service is discovered, a user gets 2 e-mails. While the service is running there are no new e-mails. Now, switch on and off the LDAP service periodically.

Whenever the discovery rule runs and detects that service was DOWN and now is UP, a user gets emails again. So, in user experience the discovery notification has effectively degraded into status notification.

Comment by Andris Mednis [ 2018 May 16 ]

Unfortunately it seems like the Problem 1 cannot be fixed easily. It is not a mere bugfix, rather a new development, affecting discoverer, actions, events (a new event FIRST_DISCOVERED ?), possibly housekeeper.

Comment by Andris Mednis [ 2018 May 16 ]

Problem 2 seems like a race condition - only sometimes it happens. It can be reliably reproduced:

  • In 4.0.0alpha5 insert artificial sleep 10 sec in process_rule() between process_checks() discovery_update_host():
    Index: src/zabbix_server/discoverer/discoverer.c
    ===================================================================
    --- src/zabbix_server/discoverer/discoverer.c	(revision 80861)
    +++ src/zabbix_server/discoverer/discoverer.c	(working copy)
    @@ -557,7 +557,7 @@
     
     				goto out;
     			}
    -
    +zbx_sleep(10);
     			if (0 != (program_type & ZBX_PROGRAM_TYPE_SERVER))
     				discovery_update_host(&dhost, host_status, now);
     			else if (0 != (program_type & ZBX_PROGRAM_TYPE_PROXY))
    

    and recompile.

  • StartDiscoverers=1 (one discoverer process is enough).
  • create 2 service discovery rules, e.g.
     	Discover LDAP	127.0.0.1	10	LDAP	Enabled
    	Discover SSH	127.0.0.1	10	SSH	Enabled
    
  • create action with condition "Discovery status = Discovered", and operation "Send message to users" with default subject/message.
    *Start Zabbix server and switch endlessly services UP/DOWN in opposite directions:
    # while [ 1 ]; do /etc/init.d/slapd start; /etc/init.d/ssh stop; sleep 2; /etc/init.d/slapd stop; /etc/init.d/ssh start; sleep 2; done
    

Soon emails start coming like

Discovery rule: Discover SSH

Device IP:127.0.0.1
Device DNS: localhost
Device status: DOWN   <---- DOWN !?
Device uptime: 0m

Device service name: SSH
Device service port: 22
Device service status: UP     <---- ok, good
Device service uptime: 0m
Comment by richlv [ 2018 May 16 ]

Thank you for looking into this. Could you please expand on why is problem 1 complicated?

Comment by Andris Mednis [ 2018 May 17 ]

Problem 2 also reproduced on version 3.0.17 (with inserted artificial sleep as described above), using discovery rules on proxy. (On 4.0.alpha5 it was observed with discovery rules on server).

It goes as follows:

  1. Discoverer's  process_checks() calls process_check() which goes through port range to discover specified service. When a service is discovered on some port, events are created and written into DB. This can produce records in '"escalations" table. However, after discovering a known service its host status is not yet updated in DB.
  2. It could happen that escalator process starts processing escalations (it is not coordinated with discoverer processes). Escalator prepares e-mails to users with "Service status: UP", but when resolving macro DISCOVERY.DEVICE.STATUS, it may turn out that in DB it is still DOWN - and the user gets confusing e-mail "Device: DOWN, service: UP".
  3. Finally, discoverer updates host status to UP, but escalator has already picked up confusing data.
Comment by Andris Mednis [ 2018 May 24 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-13765-30 (3.0).

Comment by richlv [ 2018 May 30 ]

andris, thank you for looking into this. Does the fix solve both problems? Could you please expand on why was problem 1 complicated?

Comment by Andris Mednis [ 2018 May 30 ]

The fix is an attempt to solve only Problem 2.

Comment by Andris Mednis [ 2018 May 30 ]

Problem 1 does not seem a bugfix but rather a new feature from development view. As far as I understand discoverer code, there are no separation of "service discovered first time" from "service discovered after a period of inaccessibility" - when a network discovery rule runs and detect change in status, it is either DISCOVERED or LOST event - with e-mails you complain about. Also I do not see separation between "a previously discovered service has not been seen for a long time (configurable), let's forget it as it was never there" and "a previously discovered service has not been seen for a short time, keep its data, maybe it will become available again" - as far as its discovery rule is not deleted.

Comment by Andris Mednis [ 2018 Jun 06 ]

Available in versions:

  • pre-3.0.19rc1 r81558
  • pre-3.4.11rc1 r81597
  • pre-4.0.0alpha8 (trunk) r81601
Comment by Andris Mednis [ 2018 Jun 06 ]

No documentation changes required.

Comment by richlv [ 2018 Jun 18 ]

Thank you for the fix and the detail, Andri.

Your description of problem 1 matches the documentation, too. The behaviour makes this feature much less useful, though.

A question on the documentation - it states "At least one service of a host is 'up' after all services of that host were 'down'.".
What about the initial discovery?

Comment by richlv [ 2018 Sep 05 ]

"Discovered" event operation also discussed in ZBX-14813.

Comment by Aigars Kadikis [ 2019 Jan 30 ]

I did implement an example like in title by discovering the HTTP service.

While setting and action with only one condition Discovery state = Discovered it generates two emails. The first one is:

Discovery rule: behind pi

Device IP: z.w.x.y
Device DNS: 
Device status: UP
Device uptime: 0m

Device service name: *UNKNOWN*
Device service port: *UNKNOWN*
Device service status: *UNKNOWN*
Device service uptime: *UNKNOWN*

And the second email is:

Discovery rule: behind pi

Device IP: z.w.x.y
Device DNS: 
Device status: UP
Device uptime: 0m

Device service name: HTTP
Device service port: 80
Device service status: UP
Device service uptime: 0m

I find out we can workaround this issue while using two conditions Discovery status=Discovered and Service type:

This will generate only one email.

Comment by Andris Mednis [ 2019 May 07 ]

richlv wrote:

A question on the documentation - it states "At least one service of a host is 'up' after all services of that host were 'down'.".
What about the initial discovery?

I propose to add to description of "Host Discovered" event in the table at https://www.zabbix.com/documentation/4.2/manual/discovery/network_discovery:

At least one service of a host is 'up' after all services of that host were 'down' or a service is discovered which belongs to a not registered host.

Comment by Andris Mednis [ 2019 May 10 ]

Modified https://www.zabbix.com/documentation/4.2/manual/discovery/network_discovery#discovery: added "or a service is discovered which belongs to a not registered host." to description of "Host Discovered" event.

martins-v Thanks, andris and richlv, added to other documentation versions as well.

Generated at Tue Apr 16 21:02:41 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.