[ZBX-6630] Host status is not actual on proxy side because of configuration syncer process Created: 2013 May 27  Updated: 2022 Oct 08  Resolved: 2013 Jul 05

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P)
Affects Version/s: 2.0.7rc1, 2.1.0
Fix Version/s: 2.0.7rc1, 2.1.0

Type: Incident report Priority: Major
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: performance, proxy
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MySQL with locktimeout 50 seconds (default)


Attachments: Text File ZBX-6630.log     PNG File blocked connections.png     File proxy.c.gz    
Issue Links:
Duplicate

 Description   

On heavy proxy (about 2000nvps), we block hosts table for long time. While this time unreachable poller and pollers can not update host status. So we should update only changed configration rows instead of all rows.



 Comments   
Comment by Andris Mednis [ 2013 Jun 28 ]

Available in the development branch svn://svn.zabbix.com/branches/dev/ZBX-6630.

Comment by Alexey Pustovalov [ 2013 Jun 28 ]

(1) proxy died because of null value in port column for Trapper item:

 26463:20130628:201123.574 Number of cell 26 []
 26463:20130628:201123.574 Number of cell 27 [(null)]
 26463:20130628:201123.574 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
 26463:20130628:201123.574 ====== Fatal information: ======
 26463:20130628:201123.574 Program counter: 0x7fb720446d5f
 26463:20130628:201123.574 === Registers: ===
 26463:20130628:201123.574 r8      =                0 =                    0 =                    0
 26463:20130628:201123.574 r9      =     7fb7206a2ed0 =      140424499572432 =      140424499572432
 26463:20130628:201123.574 r10     =     7fb7206a2ed0 =      140424499572432 =      140424499572432
 26463:20130628:201123.574 r11     =              206 =                  518 =                  518
 26463:20130628:201123.574 r12     =     7fff879eac88 =      140735468711048 =      140735468711048
 26463:20130628:201123.574 r13     =                0 =                    0 =                    0
 26463:20130628:201123.574 r14     =     7fff879eac88 =      140735468711048 =      140735468711048
 26463:20130628:201123.574 r15     =     7fff879eac90 =      140735468711056 =      140735468711056
 26463:20130628:201123.574 rdi     =                0 =                    0 =                    0
 26463:20130628:201123.574 rsi     =     7fff879eac90 =      140735468711056 =      140735468711056
 26463:20130628:201123.574 rbp     =     7fff879eac90 =      140735468711056 =      140735468711056
 26463:20130628:201123.574 rbx     =     7fff879eacb8 =      140735468711096 =      140735468711096
 26463:20130628:201123.574 rdx     =     7fff879eac88 =      140735468711048 =      140735468711048
 26463:20130628:201123.574 rax     =                0 =                    0 =                    0
 26463:20130628:201123.574 rcx     =                0 =                    0 =                    0
sqlite> select itemid,type,snmp_community,snmp_oid,hostid,key_,delay,status,value_type,trapper_hosts,snmpv3_securityname,snmpv3_securitylevel,snmpv3_authpassphrase,snmpv3_privpassphrase,formula,logtimefmt,delay_flex,params,ipmi_sensor,data_type,authtype,username,password,publickey,privatekey,flags,filter,interfaceid,port from items where itemid in (7048852,6746840);
6746840|3|||45319|icmppingloss[{HOSTNAME},10,,32,600]|120|0|3|||0|||1|||||0|0|||||0||68134|
7048852|2|||45319|trap.maintenance.callerid|0|0|4|||0|||1|||||0|0|||||0|||

Proxy dies while processing 7048852 item.

[6746840,3,"","",45319,"icmppingloss[{HOSTNAME},10,,32,600]",120,0,3,"","",0,"","","1","","","","",0,0,"","","","",0,"",68134,""],
[7048852,2,"","",45319,"trap.maintenance.callerid",0,0,4,"","",0,"","","1","","","","",0,0,"","","","",0,"",null,""],

andris RESOLVED in r36669
dotneft TESTED.
andris More bugs discovered in handling NULL values and fixed in r36699. Could you test again ?
dotneft CLOSED.

Comment by Alexey Pustovalov [ 2013 Jun 28 ]

tests:

new:
13921:20130628:205148.923 In process_configuration_sync()
13921:20130628:205311.069 End of process_configuration_sync()
13921:20130628:205454.238 In process_configuration_sync()
13921:20130628:205616.793 End of process_configuration_sync()

old
23986:20130628:210007.139 In process_configuration_sync()
23986:20130628:210017.850 Received configuration data from server. Datalen 54271872
23986:20130628:210106.888 End of process_configuration_sync()

Comment by Andris Mednis [ 2013 Jun 28 ]

Thanks for a good test, Alexey! I will prepare a "proxy.c" file with more time logging to find out which part of the fix is the slowest and needs improvement.

Comment by Andris Mednis [ 2013 Jun 28 ]

Attached is a modified "src/libs/zbxdbhigh/proxy.c" with more time logging added (it does not fix crash).
dotneft Tested. Please find attached log file.
andris The log file shows that the slowest part is comparing "items" table new data (from JSON) with current data (from DB) to find out which records to add/delete/modify and determine fields requiring update. Both data are in memory at that time, so it is CPU/memory intensive, without I/O. Is it important to improve performance here ? If yes, how much? Or is it ok to be slower, but do not issue unnecessary updates to DB ?
andris Well, I've modified the slowest part and hope it will help. Let's test it on Monday.

Comment by Alexey Pustovalov [ 2013 Jul 02 ]
 23614:20130702:195926.736 proxy #1 started [configuration syncer #1]
 23614:20130702:195926.762 In process_configuration_sync()
 23614:20130702:195940.736 Received configuration data from server. Datalen 66192063
 23614:20130702:195943.627 slow query: 2.266737 sec, "select itemid,type,snmp_community,snmp_oid,hostid,key_,delay,status,value_type,trapper_hosts,snmpv3_securityname,snmpv3_securitylevel,snmpv3_authpassphrase,snmpv3_privpassphrase,formula,logtimefmt,delay_flex,params,ipmi_sensor,data_type,authtype,username,password,publickey,privatekey,flags,filter,interfaceid,port from items"
 23614:20130702:195949.220 slow query: 2.828733 sec, "select i.itemid,i.hostid,h.proxy_hostid,i.type,i.data_type,i.value_type,i.key_,i.snmp_community,i.snmp_oid,i.port,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.ipmi_sensor,i.delay,i.delay_flex,i.trapper_hosts,i.logtimefmt,i.params,i.status,i.authtype,i.username,i.password,i.publickey,i.privatekey,i.flags,i.interfaceid,i.lastclock from items i,hosts h where i.hostid=h.hostid and h.status in (0) and i.status in (0,3)"
 23614:20130702:195951.633 End of process_configuration_sync()
 23614:20130702:200022.043 forced reloading of the configuration cache
 23614:20130702:200022.043 In process_configuration_sync()
 23614:20130702:200036.543 Received configuration data from server. Datalen 66192063
 23614:20130702:200039.537 slow query: 2.314157 sec, "select itemid,type,snmp_community,snmp_oid,hostid,key_,delay,status,value_type,trapper_hosts,snmpv3_securityname,snmpv3_securitylevel,snmpv3_authpassphrase,snmpv3_privpassphrase,formula,logtimefmt,delay_flex,params,ipmi_sensor,data_type,authtype,username,password,publickey,privatekey,flags,filter,interfaceid,port from items"
 23614:20130702:200045.189 slow query: 2.868602 sec, "select i.itemid,i.hostid,h.proxy_hostid,i.type,i.data_type,i.value_type,i.key_,i.snmp_community,i.snmp_oid,i.port,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.ipmi_sensor,i.delay,i.delay_flex,i.trapper_hosts,i.logtimefmt,i.params,i.status,i.authtype,i.username,i.password,i.publickey,i.privatekey,i.flags,i.interfaceid,i.lastclock from items i,hosts h where i.hostid=h.hostid and h.status in (0) and i.status in (0,3)"
 23614:20130702:200047.578 End of process_configuration_sync()
Comment by Andris Mednis [ 2013 Jul 03 ]

The performance issue, crash, and NULL value handling are fixed in r36699.

Comment by Alexander Vladishev [ 2013 Jul 04 ]

Successfully tested! Please review my changes in r36729.

Comment by richlv [ 2013 Jul 05 ]

(2) added to whatsnew at https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew207#improved_proxy_performance , please review

andris Reviewed. Minor changes proposed.

<richlv> CLOSED

Comment by Andris Mednis [ 2013 Jul 05 ]

Fixed in versions pre-2.0.7 rev. 36754 and pre-2.1.0 rev. 36769.

Generated at Tue Apr 23 14:20:36 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.