LLD worker hangs on 90% CPU usage when linking template to existing hosts

XMLWordPrintable

    • Type: Problem report
    • Resolution: Unresolved
    • Priority: Trivial
    • None
    • Affects Version/s: 8.0.0alpha2 (master)
    • Component/s: Server (S)

      In a case where 100k hosts have been generated by LLD, then a template is added to the used host prototype and same LLD rule is re-run, LLD worker process hangs on around 90% CPU usage, no templates get linked and server produces no logs

      Precondition: 

      Have 100k hosts generated via LLD

      Steps to reproduce: 

      1. Add a template to the host prototype you used 
      2. Rerun the LLD you already ran to re-discover the already discovered 100k hosts

      Expected result: 

      1. Template is added to the already generated hosts
      2. Some type of logs are produced if there are any kind of errors or slow queries

      Actual result: 

      1. LLD worker process occupies near 90% CPU for over 10 minutes until server is manually stopped
      2. During these 10 minutes until server is manually killed, server produces no logs
      3. No templates get linked

      Excerpt from server logs (16:16 is when I ran the LLD rule the second time, and 16:26 is when I manually stopped the server)

      97484: 20251203:161633.670 server #57 started [configuration syncer worker #1]
      97463:20251203:161633.670 server #37 started [poller #5]
      97476:20251203:161633.671 server #49 started [history poller #4]
      97483:20251203:161633.672 thread started
      97482:20251203:161633.673 thread started
      97367:20251203:162614.442 Got signal [signal: 15(SIGTERM) ,sender_pid:98225, sender_utd:1000,reason:0]. Exiting ...
      97432:20251203:162614.443 thread stopped [discovery worker #2]
      97432:20251203:162614.444 thread stopped [discovery worker #5]l
      97436:20251203:162614.444 syncing history data in progress...
      97436:20251203:162614.444 syncing history data done
      97481:20251203:162614.444 thread stopped
      97483:20251203:162614.445 thread stopped
      97432:20251203:162614.447 thread stopped [discovery worker #4]
      97406:20251203:162614.447 cannot read alert manager service request
      97402:20251203:162614.447 cannot read alert manager service request
      97403:20251203:162614.448 cannot read alert manager service request

      Regarding the server config, the only things that were changed were the LLD worker count (down to 1) and cache size (up to 512M)

            Assignee:
            Zabbix Support Team
            Reporter:
            Anna Grinhofa
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: