Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-24549

Zabbix server 6.4.12 probably bug or unefficient value cache

XMLWordPrintable

    • S24-W50/51/52
    • 2

      Hi. I use zabbix server 6.4.12 with HA cluster (3 nodes) with Postgresql 13.7 and Elasticsearch 7.10.2 as a history storage.  

      My server config is(perfomance options)

      StartDBSyncers=100
      StartPollers=200
      StartPreprocessors=200
      StartPollersUnreachable=100
      StartHistoryPollers=100
      StartTrappers=500
      StartPingers=100
      StartDiscoverers=20
      StartHTTPPollers=5
      StartTimers=20
      StartEscalators=70
      StartAlerters=30
      SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
      StartSNMPTrapper=1
      HousekeepingFrequency=1
      MaxHousekeeperDelete=10000
      CacheSize=4G
      HistoryCacheSize=128M
      HistoryIndexCacheSize=128M
      TrendCacheSize=96M
      TrendFunctionCacheSize=64M
      ValueCacheSize=3G

       

      Main data server gathers from active proxies.

      Server instances HW config:

      58 CPU 32G RAM

      Postgresql: 48 CPU 32G RAM 1.5T NVME

      Elasticsearch: 48 CPU 64G RAM 1T SSD each node in 20 nodes cluster

       

      This setup works fine until active HA node restarts. The active node changes and zabbix server starts. After that it seems like zabbix server  starts fill value cache from history storage. I check cache by runtime command diaginfo=valuecache and see that cache filling with too big amount of data for items. As far as this process going all items has a growing lag of last execution time. This occurs switching to problem state for a lot of triggers, because of nodata in items. After history loading finished, all delays are quickly released and triggers switched to recovery too. This process takes about 40-60 minutes, it's critically long time. 

       

      About HW resources usage: I see only elasticsearch have hi usage of HW resources during this process.

      For example I see in valuecache top something like this:

      
      == value cache diagnostic information ==
      Items:1656883 values:29262184 mode:0 time:0.616933
      Memory:
        size: free:2132556224 used:941124992
        chunks: free:337247 used:8884246 min:24 max:2021103920
          buckets:
            24:1085
            32:67598
            40:2665
            48:32936
            56:329
            64:54257
            72:236
            80:64970
            88:232
            96:6
            112:254
            120:8
            128:22
            144:241
            152:1
            160:75
            168:4
            176:4
            192:3312
            200:9
            208:931
            216:10
            224:904
            232:20
            240:802
            248:4
            256+:106332
      Top.values:
        itemid:2484077 values:14082 request.values:1
        itemid:2484074 values:14075 request.values:3
        itemid:2484067 values:14074 request.values:3
        itemid:1855764 values:14065 request.values:3
        itemid:1855767 values:14065 request.values:1
        itemid:1855757 values:14057 request.values:3
        itemid:1224231 values:11038 request.values:2
        itemid:1709840 values:11026 request.values:2
        itemid:3774916 values:10006 request.values:3
        itemid:1249227 values:9404 request.values:3
        itemid:1249220 values:9400 request.values:3
        itemid:1249230 values:9400 request.values:1
        itemid:914472 values:7388 request.values:2
        itemid:3774763 values:7358 request.values:2
        itemid:2690617 values:6737 request.values:5
        itemid:323702 values:4880 request.values:3
        itemid:3774762 values:4416 request.values:6
        itemid:1249154 values:4415 request.values:3
        itemid:1249155 values:4415 request.values:3
        itemid:2374951 values:4414 request.values:3
        itemid:3629937 values:4414 request.values:3
        itemid:2104685 values:4413 request.values:3
        itemid:2975960 values:4409 request.values:2
        itemid:1249211 values:4383 request.values:1
        itemid:2376807 values:4109 request.values:2
      Top.request.values:
        itemid:2142673 values:267 request.values:241
        itemid:2142639 values:159 request.values:145
        itemid:2517005 values:103 request.values:73
        itemid:2516394 values:103 request.values:73
        itemid:2343903 values:103 request.values:73
        itemid:2517004 values:102 request.values:73
        itemid:2516393 values:102 request.values:73
        itemid:2343902 values:102 request.values:73
        itemid:318651 values:102 request.values:73
        itemid:2343262 values:100 request.values:73
        itemid:2333068 values:100 request.values:73
        itemid:318736 values:99 request.values:73
        itemid:318650 values:99 request.values:73
        itemid:354746 values:99 request.values:73
        itemid:3629366 values:98 request.values:73
        itemid:318735 values:98 request.values:73
        itemid:354745 values:98 request.values:73
        itemid:354918 values:98 request.values:73
        itemid:3629365 values:97 request.values:73
        itemid:354831 values:97 request.values:73
        itemid:2343261 values:97 request.values:73
        itemid:354917 values:97 request.values:73
        itemid:318821 values:97 request.values:73
        itemid:379123 values:97 request.values:73
        itemid:354832 values:97 request.values:73
      ==
      
      

      Let's see this items from Top.values - (see screenshot 1)

      and item and trigger config(screenshots 2-4).

      Trigger requires just last 3 values, so i expected that for each item like this cache keeps 3 values, but it really it try keep 14082.

      All values in Top.values have the same issue.

      For example let's see one more:

      itemid:3774916 values:10006 request.values:3

      screenshots 5 -8.

      There is the same situation: we need just 3 values but keep in cache 10006 an try fill it on server start.

       

        1. ZBX-24549-7.0-test-4.diff
          16 kB
        2. ZBX-24549-7.0-alternative.diff
          15 kB
        3. image-2024-06-18-15-50-10-485.png
          image-2024-06-18-15-50-10-485.png
          108 kB
        4. image-2024-06-06-11-01-29-977.png
          image-2024-06-06-11-01-29-977.png
          120 kB
        5. 8.png
          8.png
          5 kB
        6. 7.png
          7.png
          65 kB
        7. 6.png
          6.png
          55 kB
        8. 5.png
          5.png
          55 kB
        9. 4.png
          4.png
          24 kB
        10. 3.png
          3.png
          43 kB
        11. 2.png
          2.png
          51 kB
        12. 1.png
          1.png
          44 kB

            wiper Andris Zeila
            artem.kh Artem
            Team A
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: