Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-21214

Zabbix Agent 2 crashes on 5.15 kernel

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • None
    • 6.0.5
    • Agent (G)
    • None
    • Proxmox 7with 5.15.5-1-pve kernel
    • Team A
    • Sprint 91 (Aug 2022)
    • 1

       

      Zabbix Agent 2 crashed on Proxmox 7with 5.15.5-1-pve kernel.

      5.15 has huge scheduler changes to support Intel Thread Director from Alder Lake CPU's, and big.LITTLE architecture with asymmetric cores, so maybe something weird with Agent 2 and scheduler logic requirements.

      Crash:

      Jun 15 18:59:56 HOSTNAME [daemon.notice] pmxcfs[1758]: [status] notice: received log
      Jun 15 19:00:01 HOSTNAME [daemon.notice] pmxcfs[1758]: [status] notice: received log
      Jun 15 19:00:37 HOSTNAME [auth.info] sshd[115650]: Connection closed by IP port 42902 [preauth]
      Jun 15 19:01:37 HOSTNAME [auth.info] sshd[116049]: Connection closed by IP port 44006 [preauth]
      Jun 15 19:01:50 HOSTNAME [kern.err] kernel: [16582.345979] BUG: scheduling while atomic: zabbix_agent2/1667/0x00000100
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346089] Modules linked in: rbd libceph binfmt_misc veth rpcsec_gss_krb5 nfsv4 nfs lockd grace fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter 8021q garp mrp bonding tls nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd mgag200 dell_smbios rapl intel_cstate drm_kms_helper cec rc_core dcdbas dell_wmi_descriptor wmi_bmof pcspkr input_leds joydev fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me intel_pch_thermal mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp auth_rpcgss libiscsi
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346275]  scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c lpfc nvmet_fc crc32_pclmul nvmet qede nvme_fc nvme_fabrics qed nvme_core crc8 scsi_transport_fc xhci_pci igb xhci_pci_renesas i2c_algo_bit megaraid_sas dca i2c_i801 ahci i2c_smbus lpc_ich xhci_hcd libahci wmi
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346366] CPU: 15 PID: 1667 Comm: zabbix_agent2 Kdump: loaded Tainted: P          IO      5.15.5-1-pve #1
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346375] Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.14.2 03/21/2022
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346379] Call Trace:
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346383]  <IRQ>
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346387]  dump_stack_lvl+0x4a/0x5f
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346400]  dump_stack+0x10/0x12
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346405]  __schedule_bug.cold+0x4c/0x5d
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346419]  __schedule+0x1120/0x1500
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346429]  ? timerqueue_add+0x6e/0xc0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346438]  ? enqueue_hrtimer+0x36/0x70
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346445]  ? hrtimer_start_range_ns+0x125/0x370
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346452]  schedule+0x4e/0xb0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346459]  schedule_hrtimeout_range_clock+0x9a/0x120
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346471]  ? __hrtimer_init+0xd0/0xd0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346484]  schedule_hrtimeout_range+0x13/0x20
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346493]  usleep_range+0x65/0x90
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346504]  qed_ptt_acquire+0x30/0xd0 [qed]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346582]  _qed_get_vport_stats+0x145/0x240 [qed]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346655]  qed_get_vport_stats+0x1c/0x80 [qed]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346725]  qed_get_protocol_stats+0x95/0xd0 [qed]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346805]  qed_mcp_handle_events+0x351/0x6e0 [qed]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346876]  ? kmem_cache_alloc+0x1ab/0x2e0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346887]  ? enqueue_task_rt+0x21b/0x2e0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346900]  qed_int_sp_dpc+0x6ce/0xb80 [qed]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346968]  ? ttwu_do_wakeup+0x4c/0x160
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346979]  ? ttwu_do_activate+0x6d/0xd0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346989]  ? try_to_wake_up+0x1f7/0x510
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346999]  ? __hrtimer_init+0xd0/0xd0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347009]  ? wake_up_process+0x15/0x20
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347020]  tasklet_action_common.constprop.0+0xfa/0x120
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347033]  tasklet_action+0x22/0x30
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347042]  __do_softirq+0xce/0x274
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347052]  irq_exit_rcu+0x8c/0xb0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347062]  common_interrupt+0x8a/0xa0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347072]  </IRQ>
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347075]  <TASK>
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347078]  asm_common_interrupt+0x1e/0x40
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347084] RIP: 0010:native_queued_spin_lock_slowpath+0x1a3/0x1e0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347096] Code: c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 1d 03 00 48 03 04 f5 c0 fa 69 b4 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 32 48 85 f6 0f 84 67 ff ff ff 0f 0d 0e
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347103] RSP: 0018:ffffb04f4513baf0 EFLAGS: 00000246
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347110] RAX: 0000000000000000 RBX: ffff8c0d89a58000 RCX: 0000000000400000
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347116] RDX: ffff8c3cbfdf1d00 RSI: 0000000000000006 RDI: ffff8c0d89a589c4
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347121] RBP: ffffb04f4513baf0 R08: 0000000000400000 R09: 0000000000000000
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347125] R10: 000000000000043e R11: 0000000003800001 R12: ffff8c0d89a58d80
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347130] R13: ffff8c0d89a58000 R14: ffffb04f4513bd08 R15: ffffb04f4513bc40
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347138]  _raw_spin_lock+0x1e/0x30
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347150]  bond_get_stats+0x52/0x1e0 [bonding]
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347176]  ? seq_printf+0x8a/0xb0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347189]  ? dev_seq_printf_stats+0xb3/0xe0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347200]  dev_get_stats+0x60/0xc0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347207]  dev_seq_printf_stats+0x38/0xe0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347219]  dev_seq_show+0x14/0x30
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347227]  seq_read_iter+0x2d1/0x4c0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347239]  seq_read+0xfd/0x140
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347251]  proc_reg_read+0x5a/0x90
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347262]  vfs_read+0xa0/0x1a0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347271]  ksys_read+0x67/0xe0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347280]  __x64_sys_read+0x1a/0x20
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347288]  do_syscall_64+0x5c/0xc0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347296]  ? do_syscall_64+0x69/0xc0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347302]  ? syscall_exit_to_user_mode+0x27/0x50
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347312]  ? do_syscall_64+0x69/0xc0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347319]  ? exc_page_fault+0x89/0x160
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347327]  ? asm_exc_page_fault+0x8/0x30
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347333]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347345] RIP: 0033:0x4d02fb
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347351] Code: e8 0a 2e fb ff eb 88 cc cc cc cc cc cc cc cc e8 9b 73 fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347357] RSP: 002b:000000c000660a08 EFLAGS: 00000216 ORIG_RAX: 0000000000000000
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347364] RAX: ffffffffffffffda RBX: 000000c00004a000 RCX: 00000000004d02fb
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347369] RDX: 0000000000001000 RSI: 000000c000664000 RDI: 000000000000000f
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347373] RBP: 000000c000660a58 R08: 0000000000000001 R09: 000000c0001a53e0
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347377] R10: 0000000000001000 R11: 0000000000000216 R12: 000000c000664000
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347381] R13: 0000000000000000 R14: 000000c000455ba0 R15: 0000000000000040
      Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347389]  </TASK>
      Jun 15 19:01:50 HOSTNAME [kern.err] kernel: [16582.348209] softirq: huh, entered softirq 6 TASKLET 0000000053d6a6ee with preempt_count 00000100, exited with 00000000?
      Jun 15 19:02:37 HOSTNAME [auth.info] sshd[116437]: Connection closed by IP port 45096 [preauth]
      Jun 15 19:03:37 HOSTNAME [auth.info] sshd[116825]: Connection closed by IP port 46160 [preauth]
      Jun 15 19:04:37 HOSTNAME [auth.info] sshd[117215]: Connection closed by IP port 47192 [preauth]
      Jun 15 19:05:01 HOSTNAME [daemon.notice] pmxcfs[1758]: [status] notice: received log  

       

       

       

        1. log1
          11 kB
        2. log2
          11 kB

            vso Vladislavs Sokurenko
            avolodin Aleksey Volodin
            Team A
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: