[ZBX-25979] futex(0x7c1580cdc000, FUTEX_WAIT, 2, NULL) zabbix server hang Created: 2025 Feb 03  Updated: 2025 Mar 03  Resolved: 2025 Mar 03

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Lim-IT Assignee: Vladislavs Sokurenko
Resolution: Won't fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

zabbix 7.0.9


Attachments: Text File zabbix_log_20250217.txt     Text File zabbix_server_20250213.log    
Team: Team A
Sprint: S25-W8/9
Story Points: 0.25

 Description   

After upgrade of 6.4 to 7.0.9, system crashed multiple times with error:
futex(0x7c1580cdc000, FUTEX_WAIT, 2, NULL) zabbix server hang

 

 



 Comments   
Comment by Alexey Pustovalov [ 2025 Feb 03 ]

Please share full crash log.

Comment by Alexander Vladishev [ 2025 Feb 05 ]

Which Zabbix server process is hanging? Could you attach the log file with a debug level of at least 4 (parameter DebugLevel) for this process?

Comment by Lim-IT [ 2025 Feb 10 ]

due to timing constraints, we have fully reinstalled the server.
Used Ubuntu server 24.04 LTS, instead of Ubuntu minimal 24.04 LTS.

Root cause might be that the minimal didn't had all necessary dependencies.

 

Comment by Lim-IT [ 2025 Feb 10 ]

due to timing constraints, we had to reinstall the production server.

changed from Ubuntu minimal to Ubuntu normal 24.04 LTS

Comment by Lim-IT [ 2025 Feb 12 ]

New server has the same problem .. wil enable debug level 4 on server to get debug logs

Comment by Lim-IT [ 2025 Feb 12 ]

sudo ipcs -m 

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 32770      zabbix     600        848        55         dest         
0x00000000 32771      zabbix     600        268435456  55         dest         
0x00000000 32772      zabbix     600        4194304    55         dest         
0x00000000 32773      zabbix     600        16777216   55         dest         
0x00000000 32774      zabbix     600        57256      55         dest         
0x00000000 32775      zabbix     600        536870912  54         dest         
0x00000000 32776      zabbix     600        268435456  54         dest         
0x00000000 32777      zabbix     600        16777216   54         dest         
0x00000000 32778      zabbix     600        261711     54         dest         
0x00d6041b 32779      postgres   600        56         6                       

Comment by Lim-IT [ 2025 Feb 12 ]

zabbix running in debug mode 4

 

Comment by Edgar Akhmetshin [ 2025 Feb 13 ]

Could you please provide the log as requested?

Comment by Lim-IT [ 2025 Feb 14 ]

serverlogs are 15GB - downloading in progress.

They do contain sensitive data, how is this handled? can any one read them?

 

 

Comment by Alexander Vladishev [ 2025 Feb 17 ]

We need the last 100–200 lines of the log file for the process that is hanging. You can mask any sensitive information.

Comment by Lim-IT [ 2025 Feb 18 ]

zabbix_server_20250213.log

Comment by Lim-IT [ 2025 Feb 18 ]

zabbix_log_20250217.txt

Comment by Alexander Vladishev [ 2025 Feb 18 ]

Thank you! I will pass this on to the development team for analysis of your logs. Please do not delete the original full log file yet, as additional information may be needed.

Comment by Vladislavs Sokurenko [ 2025 Feb 18 ]

Last log entry in preprocessing manager indicate that it try to receive data and wait, what generates error about futex ?
1074:20250213:044401.687 End of preproc_flush_value_server()
1074:20250213:044401.687 In zbx_ipc_service_recv() timeout:0.500

Comment by Lim-IT [ 2025 Feb 18 ]

all seem to point towards FPING

ICMP pollers

Comment by Vladislavs Sokurenko [ 2025 Feb 18 ]

I am sorry, could you please provide more details, there is no indication of problem in Zabbix server log, maybe there is some other log that contains some information ?

Comment by Lim-IT [ 2025 Feb 18 ]

[   92.163591] audit: type=1400 audit(1739545298.467:135): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=1642 comm="snap-confine" capability=12  capname="net_admin"
[   92.163593] audit: type=1400 audit(1739545298.467:136): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=1642 comm="snap-confine" capability=38  capname="perfmon"
[  356.993564] audit: type=1400 audit(1739545563.306:137): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=2416 comm="snap-confine" capability=12  capname="net_admin"
[  356.993577] audit: type=1400 audit(1739545563.307:138): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=2416 comm="snap-confine" capability=38  capname="perfmon"
[  903.844178] audit: type=1400 audit(1739546110.166:139): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=3565 comm="snap-confine" capability=12  capname="net_admin"
[  903.844189] audit: type=1400 audit(1739546110.166:140): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=3565 comm="snap-confine" capability=38  capname="perfmon"
[26104.611561] audit: type=1400 audit(1739571310.963:141): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=48992 comm="snap-confine" capability=12  capname="net_admin"
[26104.611580] audit: type=1400 audit(1739571310.963:142): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=48992 comm="snap-confine" capability=38  capname="perfmon"
[29953.570372] audit: type=1400 audit(1739575159.931:143): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=55444 comm="snap-confine" capability=12  capname="net_admin"
[29953.570383] audit: type=1400 audit(1739575159.931:144): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=55444 comm="snap-confine" capability=38  capname="perfmon"
[48538.718508] sh[86914]: segfault at 7848464abc99 ip 00007848c64abc99 sp 00007ffefb5f4180 error 14 in libc.so.6[7848c6428000+188000] likely on CPU 4 (core 0, socket 1)
[48538.718516] Code: 00 83 45 b0 01 8b 45 b0 3d 0f 27 00 00 0f 8f 72 11 00 00 49 8b 54 24 78 49 39 d0 0f 85 1a fb ff ff 45 85 f6 0f 85 5b 11 00 00 <43> 8d 04 3f 48 81 fb ff 03 00 00 76 21 43 8d 04 3f 8d 50 0c 49 8d
[50839.059279] audit: type=1400 audit(1739596045.464:145): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=90442 comm="snap-confine" capability=12  capname="net_admin"
[50839.059286] audit: type=1400 audit(1739596045.464:146): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=90442 comm="snap-confine" capability=38  capname="perfmon"
[73704.730144] audit: type=1400 audit(1739618911.182:147): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=108094 comm="snap-confine" capability=12  capname="net_admin"
[73704.730157] audit: type=1400 audit(1739618911.182:148): apparmor="DENIED" operation="capable" class="cap" profile="/snap/snapd/23545/usr/lib/snapd/snap-confine" pid=108094 comm="snap-confine" capability=38  capname="perfmon"
[103137.298539] find[111411]: segfault at 7456254c9fa2 ip 00007456a54c9fa2 sp 00007ffdf2a3ef90 error 14 in libc.so.6[7456a5428000+188000] likely on CPU 3 (core 3, socket 0)
[103137.298546] Code: ff ff 01 00 00 00 48 85 c0 0f 84 d9 02 00 00 4c 8b 28 49 83 7d 00 00 4d 8b 65 28 74 0d 49 c1 cc 11 64 4c 33 24 25 30 00 00 00 <48> 8b 85 50 fe ff ff 48 83 bd 58 fe ff ff 00 4c 8b 30 4c 89 b5 80

Comment by Vladislavs Sokurenko [ 2025 Feb 18 ]

Thanks, there was no information in Zabbix server log about it crashing, it just simply stops writing logs?
It shows segfault for 111411 and 86914 is it known what process is it ?
Please search log for this pid and it's last entries

Comment by Lim-IT [ 2025 Feb 19 ]

hello, zabbix server koms in a futex wait state.

the segfaults are from fping, which in their turn are using shared memory which is not available any more.

Same for the server which stops working due to the waiting state. possible for stale memory share that is not there any more.

Can we check other things to resolve this problem?

Comment by Vladislavs Sokurenko [ 2025 Feb 19 ]

Yes, please check icmp pinger logs, it's possible to increase them like this:

zabbix_server -R log_level_increase="icmp pinger"

Also for pid of pinger when it hangs could connect using "gdb -p <pid>" and then type backtrace and enter until full backtrace is seen.
Maybe fping stops working for some reason and Zabbix server will need to handle it.

Comment by Lim-IT [ 2025 Feb 19 ]

pid is the one of zabbix-server or the fping's that crash due to memory permissions?

 

Comment by Vladislavs Sokurenko [ 2025 Feb 19 ]

pid is one of zabbix_server imp pinger processes

Comment by Lim-IT [ 2025 Feb 25 ]

Hello,

 

Ticket can be closed. After further troubleshooting and finaly had the cloud support team have replaced the SSD adaptor and cables in DC,
all errors have disappeared and running stable now for approxemately 36h.

 

Generated at Fri Apr 04 15:51:35 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.