[ZBX-10194] zabbix server on ARM platform tries to perform duplicated insert SQL Created: 2015 Dec 23  Updated: 2019 Dec 12  Resolved: 2016 Jan 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.0.0alpha5
Fix Version/s: 3.0.0beta1

Type: Incident report Priority: Critical
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: arm, sql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File ZBX-10194.patch     File config-prev.log     File config.log     File memcmp.c     File zabbix_server.log    

 Description   

I could not reproduce it on regular desktop with Intel CPU.
Not sure platform is related, but who knows ....

This is small zabbix server running on a RaspberryPi Model B (512 MB RAM).

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 2.00
Features        : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2708
Revision        : 000e
Serial          : 00000000dc8125bb
# cat /etc/debian_version
7.8
# cat /etc/issue
Raspbian GNU/Linux 7 \n \l
# uname -a
Linux kot2 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux
# mysql -V
mysql  Ver 14.14 Distrib 5.5.46, for debian-linux-gnu (armv7l) using readline 6.2

Zabbix server is compiled on this host from sources and running with mysql server (innodb_buffer_pool_size = 128M).
Regular 32GB SD card is used as a hard drive.
Raspbian linux is up to date.
Zabbix trunk versions 3.0.0+ are always running on this host, sometimes recompiled.

I've noticed some SQL errors in zabbix_server.log, could not understand why they appear.
I'm 99% sure that such SQLs were missing for example 5 months ago when this small zabbix installation was rolled out based on some revision of trunk (3.0.0+).

To make a clean test I've created a fresh mysql database (based on trunk r57353), unlink&clear "app zabbix server" template, set 2 discovery rules (from "Template OS Linux") scheduled update interval to m/1 (just to be sure when LLD rules are executed), but SQL errors still are appearing.

  3016:20151223:183401.036 [Z3005] query failed: [1062] Duplicate entry '350-23663' for key 'items_applications_1' [insert into items_applications (itemappid,applicationid,itemid) values (5893,350,23663),(5894,350,23664),(5895,350,23662),(5896,350,23665);]
  3016:20151223:183401.159 [Z3005] query failed: [1062] Duplicate entry '347-23667' for key 'items_applications_1' [insert into items_applications (itemappid,applicationid,itemid) values (5893,347,23667),(5894,347,23669),(5895,347,23666),(5896,347,23668),(5897,347,23670);]

Taking into account that I cannot reproduce it on my Intel CPU workstation - I suppose that something is wrong with zabbix server daemon on ARM platform.

Attached server log file shows those SQLs when debug level was 3 and also when level was increased to 4 for single running poller.

p.s. when this zabbix server is running with "production" database - server logs additional SQL error, related to lld graph elements creation for missing items, but I guess it has the same nature.
They are probably relates to new network interface recently added to this host and started to be discoverable.

9193:20151222:182758.445 [Z3005] query failed: [1452] Cannot add or update a child row: a foreign key constraint fails (`trunk`.`graphs_items`, CONSTRAINT `c_graphs_items_2` FOREIGN KEY (`itemid`) REFERENCES `items` (`itemid`) ON DELETE CASCADE) [insert into graphs_items (gitemid,graphid,itemid,drawtype,sortorder,color,yaxisside,calc_fnc,type) values (1849,554,23748,5,0,'00AA00',0,2,0),(1850,554,23749,5,1,'3333FF',0,2,0);]


 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 04 ]

In src/libs/zbxdbhigh/lld_item.c in lld_item_application_compare_func() we use memcmp() to compare two structures

struct
{
	zbx_uint64_t	x;
	void	*y;
}

This is not good, because on 32-bit platform with 8 byte alignment restrictions for 64-bit integers this structure will contain 4 bytes of padding after pointer. (Raspberry Pi 2 uses ARM Cortex-A7 32-bit CPU. Similar results can be achieved on x86 platform using -m32 -malign-double compiler options.)

Since padding usually contains rubbish, memcmp() is very unlikely to return zero when it compares two structures with equal fields. This results in two identical item-application mappings being treated as different. And Zabbix server tries to insert both of them into the database which causes a failed query.

Issue introduced with ZBXNEXT-1219.

Comment by Oleksii Zagorskyi [ 2016 Jan 11 ]

The paspberry pi I'm using is actually not "2", but just "Model B". (issue summary fixed)
This one https://www.raspberrypi.org/products/model-b/
And it's ARMv6 compatible, not ARMv7

Some possibly useful links are:
https://www.raspbian.org/RaspbianFAQ#What_is_Raspbian.3F
https://www.raspbian.org/RaspbianFAQ#What_compilation_options_should_be_set_Raspbian_code.3F

Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 11 ]

Still a 32-bit processor with 8-byte structure alignment rules. We have a fix in mind, we are just waiting for you to return from vacation to test it.

Comment by Oleksii Zagorskyi [ 2016 Jan 11 ]

doh, I'd test it already if it would be posted here

Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 11 ]

Fix available in development branch svn://svn.zabbix.com/branches/dev/ZBX-10194 revision 57531.

Patch attached as well.

Comment by Sandis Neilands (Inactive) [ 2016 Jan 12 ]

Reviewed the code. zalex_ua, please test on the actual hardware.

Comment by Oleksii Zagorskyi [ 2016 Jan 14 ]

Before the patch has been provided, the Raspberry host has been upgraded to Debin 8 Jessie (from 7 Whenzy)

After upgrade:

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 2.00
Features        : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2708
Revision        : 000e
Serial          : 00000000dc8125bb
# cat /etc/issue
Raspbian GNU/Linux 8 \n \l

root@kot2:/var/log/zabbix# cat /etc/debian_version
8.0

# uname -a
Linux kot2 4.1.13+ #826 PREEMPT Fri Nov 13 20:13:22 GMT 2015 armv6l GNU/Linux

# mysql -V
mysql  Ver 14.14 Distrib 5.5.46, for debian-linux-gnu (armv7l) using readline 6.3

Because libnetsnmp (as I recall) was updated too, existing zabbix_server binary could not start, complaining on a missing particular version of so file.
So I had to recompile zabbix_server. I used the same zabbix code revision 57353.
I DID NOT APPLY THE PATCH !
With recompiled binary - the issue is not reproducible anymore. I tested new binary on those "production" and "testing" databases.

Before recompilation I copied config.log to another folder, so now I have the file for current binary and for previous one - in case if you will need it.

Let me know if I still need to check the patch on current upgraded OS.

Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 15 ]

I would like to see both config.log's.

If the issue comes from where I suppose it comes from even a simple server restart could have helped.

Have you tried to reproduce the issue with another freshly created database?

Comment by Oleksii Zagorskyi [ 2016 Jan 15 ]

config-prev.log (Raspbian 7 compilation), config.log (Raspbian 8 compilation) are attached.

As for restarts: I may recall that on Raspbian 7 I've restarted zabbix server 10+ times as I think all the time zabbix server logged those SQL errors.

On current Raspbian 8 I've restarted server ~90 times in a loop (after every restart the discovery rules were performed at least twice) and any SQL error was not logged.

Last test includes a case when discovered items deletion (filter was changed temporary), server restart, rediscovering those items - i.e. the same if I'd try freshly created database.

Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 18 ]

I see that your system upgraded gcc from 4.6 to 4.9 and this could have an effect on Zabbix compilation.

Quoting gcc 4.7 changes:

On ARM, when compiling for ARMv6 (but not ARMv6-M), ARMv7-A, ARMv7-R, or ARMv7-M, the new option -munaligned-access is active by default, which for some sources generates code that accesses memory on unaligned addresses. This requires the kernel of those systems to enable such accesses (controlled by CP15 register c1, refer to ARM documentation). Alternatively, or for compatibility with kernels where unaligned accesses are not supported, all code has to be compiled with -mno-unaligned-access. Upstream Linux kernel releases have automatically and unconditionally supported unaligned accesses as emitted by GCC due to this option being active since version 2.6.28.

I've attached a small program which emulates one particular aspect of LLD item-application mapping processing in Zabbix. Would be very helpful if you could compile and run it on Raspberry Pi. Try to use gcc 4.9 as it is and with option -mno-unaligned-access.

zalex_ua
Results are identical:

# gcc --version
gcc (Raspbian 4.9.2-10) 4.9.2

# ./a.out
size of uint64_t: 8
size of pointer: 4
size of structure: 16
memcmp result: -1

# ./a.out-mno-unaligned-access
size of uint64_t: 8
size of pointer: 4
size of structure: 16
memcmp result: -1

glebs.ivanovskis This means that changes in gcc I quoted above are not related to the issue.

But the issue still exists because the structure is bigger than two its fields and memcmp() result for two structures with identical fields can differ from 0.

Issue of duplicate inserts can be hard to catch "in the wild" because it depends on many factors and one of them is random junk in memory.

Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 18 ]

Fixed in pre-3.0.0beta1 (trunk) r57735.

Comment by Marc F [ 2019 Apr 03 ]

Hello

 

I'm still seeing this on official packages 4.0.6 from Zabbix repo for raspbian...

Comment by Oleksii Zagorskyi [ 2019 Dec 12 ]

Similar issue came back again, see ZBX-17073 ...

Generated at Wed Apr 24 16:32:30 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.