-
Problem report
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
6.0.5
-
None
-
Proxmox 7with 5.15.5-1-pve kernel
-
Sprint 91 (Aug 2022)
-
1
Zabbix Agent 2 crashed on Proxmox 7with 5.15.5-1-pve kernel.
5.15 has huge scheduler changes to support Intel Thread Director from Alder Lake CPU's, and big.LITTLE architecture with asymmetric cores, so maybe something weird with Agent 2 and scheduler logic requirements.
Crash:
Jun 15 18:59:56 HOSTNAME [daemon.notice] pmxcfs[1758]: [status] notice: received log
Jun 15 19:00:01 HOSTNAME [daemon.notice] pmxcfs[1758]: [status] notice: received log
Jun 15 19:00:37 HOSTNAME [auth.info] sshd[115650]: Connection closed by IP port 42902 [preauth]
Jun 15 19:01:37 HOSTNAME [auth.info] sshd[116049]: Connection closed by IP port 44006 [preauth]
Jun 15 19:01:50 HOSTNAME [kern.err] kernel: [16582.345979] BUG: scheduling while atomic: zabbix_agent2/1667/0x00000100
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346089] Modules linked in: rbd libceph binfmt_misc veth rpcsec_gss_krb5 nfsv4 nfs lockd grace fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter 8021q garp mrp bonding tls nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd mgag200 dell_smbios rapl intel_cstate drm_kms_helper cec rc_core dcdbas dell_wmi_descriptor wmi_bmof pcspkr input_leds joydev fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me intel_pch_thermal mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp auth_rpcgss libiscsi
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346275] scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c lpfc nvmet_fc crc32_pclmul nvmet qede nvme_fc nvme_fabrics qed nvme_core crc8 scsi_transport_fc xhci_pci igb xhci_pci_renesas i2c_algo_bit megaraid_sas dca i2c_i801 ahci i2c_smbus lpc_ich xhci_hcd libahci wmi
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346366] CPU: 15 PID: 1667 Comm: zabbix_agent2 Kdump: loaded Tainted: P IO 5.15.5-1-pve #1
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346375] Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.14.2 03/21/2022
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346379] Call Trace:
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346383] <IRQ>
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346387] dump_stack_lvl+0x4a/0x5f
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346400] dump_stack+0x10/0x12
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346405] __schedule_bug.cold+0x4c/0x5d
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346419] __schedule+0x1120/0x1500
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346429] ? timerqueue_add+0x6e/0xc0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346438] ? enqueue_hrtimer+0x36/0x70
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346445] ? hrtimer_start_range_ns+0x125/0x370
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346452] schedule+0x4e/0xb0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346459] schedule_hrtimeout_range_clock+0x9a/0x120
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346471] ? __hrtimer_init+0xd0/0xd0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346484] schedule_hrtimeout_range+0x13/0x20
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346493] usleep_range+0x65/0x90
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346504] qed_ptt_acquire+0x30/0xd0 [qed]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346582] _qed_get_vport_stats+0x145/0x240 [qed]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346655] qed_get_vport_stats+0x1c/0x80 [qed]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346725] qed_get_protocol_stats+0x95/0xd0 [qed]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346805] qed_mcp_handle_events+0x351/0x6e0 [qed]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346876] ? kmem_cache_alloc+0x1ab/0x2e0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346887] ? enqueue_task_rt+0x21b/0x2e0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346900] qed_int_sp_dpc+0x6ce/0xb80 [qed]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346968] ? ttwu_do_wakeup+0x4c/0x160
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346979] ? ttwu_do_activate+0x6d/0xd0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346989] ? try_to_wake_up+0x1f7/0x510
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.346999] ? __hrtimer_init+0xd0/0xd0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347009] ? wake_up_process+0x15/0x20
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347020] tasklet_action_common.constprop.0+0xfa/0x120
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347033] tasklet_action+0x22/0x30
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347042] __do_softirq+0xce/0x274
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347052] irq_exit_rcu+0x8c/0xb0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347062] common_interrupt+0x8a/0xa0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347072] </IRQ>
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347075] <TASK>
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347078] asm_common_interrupt+0x1e/0x40
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347084] RIP: 0010:native_queued_spin_lock_slowpath+0x1a3/0x1e0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347096] Code: c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 1d 03 00 48 03 04 f5 c0 fa 69 b4 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 32 48 85 f6 0f 84 67 ff ff ff 0f 0d 0e
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347103] RSP: 0018:ffffb04f4513baf0 EFLAGS: 00000246
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347110] RAX: 0000000000000000 RBX: ffff8c0d89a58000 RCX: 0000000000400000
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347116] RDX: ffff8c3cbfdf1d00 RSI: 0000000000000006 RDI: ffff8c0d89a589c4
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347121] RBP: ffffb04f4513baf0 R08: 0000000000400000 R09: 0000000000000000
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347125] R10: 000000000000043e R11: 0000000003800001 R12: ffff8c0d89a58d80
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347130] R13: ffff8c0d89a58000 R14: ffffb04f4513bd08 R15: ffffb04f4513bc40
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347138] _raw_spin_lock+0x1e/0x30
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347150] bond_get_stats+0x52/0x1e0 [bonding]
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347176] ? seq_printf+0x8a/0xb0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347189] ? dev_seq_printf_stats+0xb3/0xe0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347200] dev_get_stats+0x60/0xc0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347207] dev_seq_printf_stats+0x38/0xe0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347219] dev_seq_show+0x14/0x30
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347227] seq_read_iter+0x2d1/0x4c0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347239] seq_read+0xfd/0x140
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347251] proc_reg_read+0x5a/0x90
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347262] vfs_read+0xa0/0x1a0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347271] ksys_read+0x67/0xe0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347280] __x64_sys_read+0x1a/0x20
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347288] do_syscall_64+0x5c/0xc0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347296] ? do_syscall_64+0x69/0xc0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347302] ? syscall_exit_to_user_mode+0x27/0x50
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347312] ? do_syscall_64+0x69/0xc0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347319] ? exc_page_fault+0x89/0x160
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347327] ? asm_exc_page_fault+0x8/0x30
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347333] entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347345] RIP: 0033:0x4d02fb
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347351] Code: e8 0a 2e fb ff eb 88 cc cc cc cc cc cc cc cc e8 9b 73 fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347357] RSP: 002b:000000c000660a08 EFLAGS: 00000216 ORIG_RAX: 0000000000000000
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347364] RAX: ffffffffffffffda RBX: 000000c00004a000 RCX: 00000000004d02fb
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347369] RDX: 0000000000001000 RSI: 000000c000664000 RDI: 000000000000000f
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347373] RBP: 000000c000660a58 R08: 0000000000000001 R09: 000000c0001a53e0
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347377] R10: 0000000000001000 R11: 0000000000000216 R12: 000000c000664000
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347381] R13: 0000000000000000 R14: 000000c000455ba0 R15: 0000000000000040
Jun 15 19:01:50 HOSTNAME [kern.warning] kernel: [16582.347389] </TASK>
Jun 15 19:01:50 HOSTNAME [kern.err] kernel: [16582.348209] softirq: huh, entered softirq 6 TASKLET 0000000053d6a6ee with preempt_count 00000100, exited with 00000000?
Jun 15 19:02:37 HOSTNAME [auth.info] sshd[116437]: Connection closed by IP port 45096 [preauth]
Jun 15 19:03:37 HOSTNAME [auth.info] sshd[116825]: Connection closed by IP port 46160 [preauth]
Jun 15 19:04:37 HOSTNAME [auth.info] sshd[117215]: Connection closed by IP port 47192 [preauth]
Jun 15 19:05:01 HOSTNAME [daemon.notice] pmxcfs[1758]: [status] notice: received log