[ZBX-15970] Deadlock in semaphores (history cache and configuration cache) Created: 2019 Apr 08  Updated: 2019 Jun 10  Resolved: 2019 Apr 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 4.0.6, 4.2.0, 4.4.0alpha1, 4.4 (plan)
Fix Version/s: 4.0.7rc1, 4.2.1rc1, 4.4.0alpha1, 4.4 (plan)

Type: Problem report Priority: Critical
Reporter: Alexey Pustovalov Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: semaphores
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

3.4.10


Issue Links:
Causes
caused by ZBXNEXT-1675 Add macros support for update intervals Closed
Duplicate
Team: Team A
Sprint: Sprint 51 (Apr 2019)
Story Points: 1

 Description   

Sempahores:

# ipcs -s -i 1179651
semnum     value      ncount     zcount     pid       
0          1          0          0          8463      
1          0          2          0          8459      
2          1          0          0          8459      
3          1          0          0          8456      
4          0          65         0          8493      
5          1          0          0          8463      
6          1          0          0          8308      
7          1          0          0          8308      
8          1          0          0          8460      
9          1          0          0          8460      
10         1          0          0          8308      
11         1          0          0          8308      
12         1          0          0          8308      
13         1          0          0          8308    

Zabbix processes:

zabbix    8308  1.0 18.2 5227496 1081312 ?     S    21:43   0:30 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
zabbix    8436  0.8 17.1 5227496 1016192 ?     S    21:44   0:26  \_ /usr/sbin/zabbix_server: configuration syncer [synced configuration in 64.600360 sec, syncing configuration]
zabbix    8437  0.0  0.2 5227496 17116 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: alerter #1 [sent 0, failed 0 alerts, idle 5.190856 sec during 6.124104 sec]
zabbix    8438  0.0  0.2 5227496 17116 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: alerter #2 [sent 0, failed 0 alerts, idle 5.337377 sec during 6.102741 sec]
zabbix    8439  0.0  0.2 5227496 17116 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: alerter #3 [sent 0, failed 0 alerts, idle 5.315995 sec during 6.202974 sec]
zabbix    8440  0.0  0.2 5227496 17116 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: alerter #4 [sent 0, failed 0 alerts, idle 5.358273 sec during 6.042164 sec]
zabbix    8441  0.0  0.2 5227496 17112 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: housekeeper [connecting to the database]
zabbix    8442  0.0  0.3 5227580 18824 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #1 [processed 6 triggers, 0 events in 0.000648 sec, 0 maintenances in 0.000000 sec, processing time functions]
zabbix    8443  0.0  0.3 5227580 19096 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #2 [processed 8 triggers, 0 events in 0.002692 sec, processing time functions]
zabbix    8444  0.0  0.3 5227580 18832 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #3 [processed 7 triggers, 0 events in 0.022708 sec, processing time functions]
zabbix    8445  0.0  0.3 5227580 18392 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #4 [processed 4 triggers, 0 events in 0.002274 sec, processing time functions]
zabbix    8446  0.0  0.3 5227580 18856 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #5 [processed 8 triggers, 0 events in 0.022439 sec, processing time functions]
zabbix    8447  0.0  0.3 5227580 18832 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #6 [processed 7 triggers, 0 events in 0.018860 sec, processing time functions]
zabbix    8448  0.0  0.3 5227580 18600 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #7 [processed 5 triggers, 0 events in 0.002910 sec, processing time functions]
zabbix    8450  0.0  0.3 5227580 18616 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #8 [processed 6 triggers, 0 events in 0.018541 sec, processing time functions]
zabbix    8451  0.0  0.3 5227580 18604 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #9 [processed 5 triggers, 0 events in 0.017937 sec, processing time functions]
zabbix    8452  0.0  0.3 5227580 18392 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: timer #10 [processed 4 triggers, 0 events in 0.012610 sec, processing time functions]
zabbix    8453  0.0  0.2 5227496 17172 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: http poller #1 [got 0 values in 0.000872 sec, idle 5 sec]
zabbix    8454  0.0  0.3 5331088 18396 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.000485 sec, idle 60 sec]
zabbix    8455  0.8 17.4 5269076 1035928 ?     S    21:44   0:25  \_ /usr/sbin/zabbix_server: history syncer #1 [synced 2939 items in 0.714125 sec, syncing history]
zabbix    8456  0.8 17.4 5268952 1036108 ?     S    21:44   0:24  \_ /usr/sbin/zabbix_server: history syncer #2 [synced 3945 items in 0.739598 sec, syncing history]
zabbix    8457  0.8 17.4 5270884 1036748 ?     S    21:44   0:23  \_ /usr/sbin/zabbix_server: history syncer #3 [synced 4466 items in 0.931370 sec, syncing history]
zabbix    8458  0.9 17.6 5269076 1047144 ?     S    21:44   0:26  \_ /usr/sbin/zabbix_server: history syncer #4 [synced 4837 items in 0.928945 sec, syncing history]
zabbix    8459  0.7 17.3 5270324 1031956 ?     S    21:44   0:23  \_ /usr/sbin/zabbix_server: history syncer #5 [synced 4620 items in 0.918583 sec, syncing history]
zabbix    8460  0.8 17.5 5270260 1042956 ?     S    21:44   0:24  \_ /usr/sbin/zabbix_server: history syncer #6 [synced 0 items in 0.001656 sec, syncing history]
zabbix    8461  0.0  1.0 5227496 64052 ?       S    21:44   0:02  \_ /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.015006 sec, processing escalations]
zabbix    8462  0.0  0.3 5227496 17920 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.004546 sec, exchanging data]
zabbix    8463  0.0  0.2 5227496 16720 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: self-monitoring [processed data in 0.000019 sec, idle 1 sec]
zabbix    8464  0.0  0.2 5227496 17176 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000384 sec, idle 5 sec]
zabbix    8465  0.4 14.3 5333648 853408 ?      S    21:44   0:12  \_ /usr/sbin/zabbix_server: poller #1 [got 9 values in 0.046461 sec, getting values]
zabbix    8466  0.3 11.3 5333648 676792 ?      S    21:44   0:11  \_ /usr/sbin/zabbix_server: poller #2 [got 9 values in 0.046904 sec, getting values]
zabbix    8467  0.4 14.3 5333648 853936 ?      S    21:44   0:12  \_ /usr/sbin/zabbix_server: poller #3 [got 10 values in 0.050010 sec, getting values]
zabbix    8468  0.4 14.3 5333648 853936 ?      S    21:44   0:11  \_ /usr/sbin/zabbix_server: poller #4 [got 9 values in 0.046098 sec, getting values]
zabbix    8469  0.4 14.3 5333648 853936 ?      S    21:44   0:13  \_ /usr/sbin/zabbix_server: poller #5 [got 10 values in 0.046595 sec, getting values]
zabbix    8470  0.3  8.8 5333648 524200 ?      S    21:44   0:09  \_ /usr/sbin/zabbix_server: poller #6 [got 10 values in 0.050137 sec, getting values]
zabbix    8472  0.3 11.4 5333648 677584 ?      S    21:44   0:11  \_ /usr/sbin/zabbix_server: poller #7 [got 9 values in 0.043882 sec, getting values]
zabbix    8473  0.3 11.4 5333648 677320 ?      S    21:44   0:10  \_ /usr/sbin/zabbix_server: poller #8 [got 9 values in 0.043658 sec, getting values]
zabbix    8474  0.0  0.3 5333640 18648 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.003790 sec, getting values]
zabbix    8475  0.0  0.3 5333640 18648 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: unreachable poller #2 [got 0 values in 0.010126 sec, getting values]
zabbix    8476  0.0  0.3 5333640 18648 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: unreachable poller #3 [got 0 values in 0.003362 sec, getting values]
zabbix    8477  0.0  0.3 5333640 18648 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: unreachable poller #4 [got 0 values in 0.003802 sec, getting values]
zabbix    8478  0.1  5.4 5227520 325840 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #1 [processing data]
zabbix    8479  0.2  7.3 5227520 436552 ?      S    21:44   0:06  \_ /usr/sbin/zabbix_server: trapper #2 [processing data]
zabbix    8480  0.1  5.7 5227552 342424 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #3 [processing data]
zabbix    8481  0.0  5.3 5227520 319216 ?      S    21:44   0:02  \_ /usr/sbin/zabbix_server: trapper #4 [processing data]
zabbix    8483  0.1  5.6 5227556 333576 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #5 [processing data]
zabbix    8484  0.1  5.8 5227520 347396 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #6 [processing data]
zabbix    8485  0.0  3.2 5227520 192320 ?      S    21:44   0:01  \_ /usr/sbin/zabbix_server: trapper #7 [processing data]
zabbix    8486  0.1  6.7 5227552 401512 ?      S    21:44   0:04  \_ /usr/sbin/zabbix_server: trapper #8 [processing data]
zabbix    8487  0.0  4.5 5227520 270796 ?      S    21:44   0:02  \_ /usr/sbin/zabbix_server: trapper #9 [processing data]
zabbix    8488  0.2 12.3 5227520 736152 ?      S    21:44   0:06  \_ /usr/sbin/zabbix_server: trapper #10 [processing data]
zabbix    8489  0.1  7.1 5227520 427336 ?      S    21:44   0:05  \_ /usr/sbin/zabbix_server: trapper #11 [processing data]
zabbix    8490  0.2  7.2 5227520 429984 ?      S    21:44   0:06  \_ /usr/sbin/zabbix_server: trapper #12 [processing data]
zabbix    8491  0.2  7.5 5227520 445816 ?      S    21:44   0:07  \_ /usr/sbin/zabbix_server: trapper #13 [processing data]
zabbix    8492  0.0  5.3 5227520 317256 ?      S    21:44   0:02  \_ /usr/sbin/zabbix_server: trapper #14 [processing data]
zabbix    8493  0.2  7.0 5227552 419096 ?      S    21:44   0:07  \_ /usr/sbin/zabbix_server: trapper #15 [processing data]
zabbix    8494  0.1  7.2 5232860 429040 ?      S    21:44   0:05  \_ /usr/sbin/zabbix_server: trapper #16 [processing data]
zabbix    8495  0.1  6.6 5227520 393712 ?      S    21:44   0:04  \_ /usr/sbin/zabbix_server: trapper #17 [processing data]
zabbix    8496  0.0  5.0 5227520 301568 ?      S    21:44   0:02  \_ /usr/sbin/zabbix_server: trapper #18 [processing data]
zabbix    8497  0.1  6.2 5227520 371084 ?      S    21:44   0:04  \_ /usr/sbin/zabbix_server: trapper #19 [processing data]
zabbix    8498  0.0  4.1 5227520 248740 ?      S    21:44   0:01  \_ /usr/sbin/zabbix_server: trapper #20 [processing data]
zabbix    8499  0.1  6.6 5227520 395252 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #21 [processing data]
zabbix    8500  0.0  4.0 5227520 241960 ?      S    21:44   0:01  \_ /usr/sbin/zabbix_server: trapper #22 [processing data]
zabbix    8501  0.1  6.8 5227520 409524 ?      S    21:44   0:05  \_ /usr/sbin/zabbix_server: trapper #23 [processing data]
zabbix    8502  0.1  6.3 5227520 374824 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #24 [processing data]
zabbix    8503  0.0  3.9 5227520 234568 ?      S    21:44   0:01  \_ /usr/sbin/zabbix_server: trapper #25 [processing data]
zabbix    8504  0.0  2.4 5227520 147196 ?      S    21:44   0:00  \_ /usr/sbin/zabbix_server: trapper #26 [processing data]
zabbix    8505  0.1  6.0 5227520 362356 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #27 [processing data]
zabbix    8506  0.1  5.3 5227520 319632 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #28 [processing data]
zabbix    8507  0.1  6.0 5227520 359752 ?      S    21:44   0:03  \_ /usr/sbin/zabbix_server: trapper #29 [processing data]
zabbix    8508  0.1  7.1 5227520 423472 ?      S    21:44   0:04  \_ /usr/sbin/zabbix_server: trapper #30 [processing data]
zabbix    8509  0.0  0.2 5230040 17312 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: icmp pinger #1 [getting values]
zabbix    8510  0.0  0.2 5230040 17408 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: icmp pinger #2 [getting values]
zabbix    8511  0.0  0.2 5230040 17408 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: icmp pinger #3 [getting values]
zabbix    8512  0.0  0.2 5230040 17400 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: icmp pinger #4 [getting values]
zabbix    8513  0.0  0.2 5227496 17496 ?       S    21:44   0:00  \_ /usr/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.009052 sec during 5.009055 sec]
zabbix    8514  2.8 15.4 5323956 916616 ?      S    21:44   1:22  \_ /usr/sbin/zabbix_server: preprocessing manager #1 [queued 10930, processed 104140 values, idle 1.747914 sec during 5.016581 sec]
zabbix    8515  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #1 started
zabbix    8516  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #2 started
zabbix    8517  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #3 started
zabbix    8518  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #4 started
zabbix    8519  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #5 started
zabbix    8520  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #6 started
zabbix    8521  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #7 started
zabbix    8522  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #8 started
zabbix    8523  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #9 started
zabbix    8524  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #10 started
zabbix    8525  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #11 started
zabbix    8526  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #12 started
zabbix    8527  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #13 started
zabbix    8528  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #14 started
zabbix    8529  0.1  0.2 5227496 16720 ?       S    21:44   0:03  \_ /usr/sbin/zabbix_server: preprocessing worker #15 started

Stuck processes last operation:

[root@host ]# strace -s 256 -ttt -p 8459
strace: Process 8459 attached
1554669883.541322 semop(1179651, [{4, -1, SEM_UNDO}], 1
^Cstrace: Process 8459 detached
 <detached ...>
[root@host ]# strace -s 256 -ttt -p 8456
strace: Process 8456 attached
1554669906.699857 semop(1179651, [{4, -1, SEM_UNDO}], 1
^Cstrace: Process 8456 detached
 <detached ...>
[root@host ]# strace -s 256 -ttt -p 8460
strace: Process 8460 attached
1554669926.525719 semop(1179651, [{4, -1, SEM_UNDO}], 1
^Cstrace: Process 8460 detached
 <detached ...>
[root@host ]# strace -s 256 -ttt -p 8493
strace: Process 8493 attached
1554669939.263531 semop(1179651, [{1, -1, SEM_UNDO}], 1

c^Cstrace: Process 8493 detached
 <detached ...>

So 1 semnum (history cache) and 4 semnum (config cache) are locked by 8459 (history syncer #5) and 8493 (trapper #15) respectively. Then 8459 (history syncer #5) waits 4 semnum unlock and 8493 (trapper #15) waits 1 semnum unlock.



 Comments   
Comment by Alexey Pustovalov [ 2019 Apr 08 ]

I know we do not support 3.4 version anymore. But it would be great to check the problem for all currently supported versions, probably we still have such issue.

Comment by Vladislavs Sokurenko [ 2019 Apr 08 ]

Suspicious code:

static dc_item_value_t	*dc_local_get_history_slot(void)
{
	if (ZBX_MAX_VALUES_LOCAL == item_values_num)
		dc_flush_history();
....

Called from:

void	zbx_dc_items_update_nextcheck(DC_ITEM *items, zbx_agent_value_t *values, int *errcodes, size_t values_num)
{
	size_t		i;
	ZBX_DC_ITEM	*dc_item;
	ZBX_DC_HOST	*dc_host;

	LOCK_CACHE;

	for (i = 0; i < values_num; i++)
	{
...

		/* update nextcheck for items that are counted in queue for monitoring purposes */
		if (SUCCEED == is_counted_in_item_queue(dc_item->type, dc_item->key))
			DCitem_nextcheck_update(dc_item, dc_host, items[i].state, ZBX_ITEM_COLLECTED, values[i].ts.sec);
	}

	UNLOCK_CACHE;
}

Comment by Vladislavs Sokurenko [ 2019 Apr 08 ]

Can you please check if there are unsupported items due to invalid update interval ?

Comment by Vladislavs Sokurenko [ 2019 Apr 09 ]

Looks like 4.0 and up are not affected due to some improvements.

Comment by Alexey Pustovalov [ 2019 Apr 09 ]

I'm pretty sure there are a lot of unsupported items, the installation has 12k nvps

So you confirm that 3.0, 4.0 and newer do not affect?

Comment by Vladislavs Sokurenko [ 2019 Apr 09 ]

There is no such thing as nested lock in history syncer since 4.0 so I confirm

Comment by Alexey Pustovalov [ 2019 Apr 09 ]

What about 3.0?

Comment by Vladislavs Sokurenko [ 2019 Apr 09 ]

No, it's caused by ZBXNEXT-1675.

Comment by Vladislavs Sokurenko [ 2019 Apr 10 ]

Fixed in development branch:
svn://svn.zabbix.com/branches/dev/DEV-1063

Comment by Vladislavs Sokurenko [ 2019 Apr 11 ]

Fixed in:

  • pre-4.0.7rc1 r92492
  • pre-4.2.1rc1 r92493
  • pre-4.4.0alpha1 (trunk) r92494
Comment by dimir [ 2019 Jun 10 ]

Summary

When receiving proxy history and updating item nextcheck for statistics, if there's item unsupported data with faulty update interval as a reason, trapper will add unsupported values to history cache but not flush them to the database.

Generated at Fri Mar 29 14:00:00 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.