[ZBX-7793] Zabbix doesn't alter mySQL wait_timeout for it's sessions Created: 2014 Feb 12  Updated: 2017 May 30  Resolved: 2014 Feb 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Timur Bakeyev Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: database, timeout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: Debian
Description: Debian GNU/Linux 7.4 (wheezy)
Release: 7.4
Codename: wheezy


Issue Links:
Duplicate
duplicates ZBX-6163 [Z3005] query failed: [2006] MySQL se... Closed

 Description   

I'm not sure, is this really a case, as I've experience it on a production server and had to revert everything back very soon.

So, I've pointed zabbix from the local mySQL to out production mySQL cluster, which has better performance, memory, etc. But, as it is used by a lot of scripts and programs, which sometimes don't behave nice, it has in mySQL configuration following options:

wait_timeout = 30
interactive_timeout = 900

After restart of the Zabbix server I've started to see messages in the log like:

27033:20140212:050417.071 server #23 started discoverer #5
27032:20140212:050417.072 server #22 started discoverer #4
27030:20140212:050417.072 server #20 started discoverer #2
27034:20140212:050417.072 server #24 started discoverer #6
27014:20140212:050417.072 server #4 started poller #2
27026:20140212:050431.435 slow query: 14.065646 sec, "select itemid,min(clock) from history_uint group by itemid"
27026:20140212:050440.305 housekeeper [deleted 91455 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 24.347664 sec, idle 1 hour(s)]
27033:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=4]
27029:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=0]
27030:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=1]
27034:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=5]
27018:20140212:050528.067 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
27015:20140212:050546.673 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=25546]
27036:20140212:050557.267 [Z3005] query failed: [2006] MySQL server has gone away [begin;]

While mySQL server was up and running and Zabbix itself was working kind of OK.

So, based on 'MySQL server has gone away' I assumed that this is connected with 'wait_timeout' and raised it to 900.

After restart number of such messages decreased dramatically:

27017:20140212:082321.399 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26601]
27026:20140212:090542.627 executing housekeeper
27026:20140212:090552.037 housekeeper [deleted 269757 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 9.409129 sec, idle 1 hour(s)]
27017:20140212:092540.623 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26740]
27015:20140212:094659.217 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=24419]
27026:20140212:100552.038 executing housekeeper
27026:20140212:100602.768 housekeeper [deleted 269648 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 10.730107 sec, idle 1 hour(s)]
27013:20140212:102315.571 SNMP agent item "ifOperStatus[ge-0/0/39.0]" on host "sw0" failed: first network error, wait for 15 seconds
27018:20140212:102330.690 resuming SNMP agent checks on host "sw0.ilo.ad-lab.nl": connection restored
27014:20140212:102446.059 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26686]
27015:20140212:103928.589 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=27568]
27026:20140212:110602.769 executing housekeeper
27026:20140212:110612.635 housekeeper [deleted 268852 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 9.865175 sec, idle 1 hour(s)]

But still occur from time to time. Now I raised wait_timeout to 28800(default mySQL value), but that of course breaks usage patter of the server. Have to observe logs to gather statistics though.

Meanwhile If I'm right, and this behavior caused by low global wait_timeout, the fix for it is quite simple - at the beginning of the mySQL session, when zabbix connects to it the first SQL statement should be:

SET SESSION wait_timeout=84600

or whatever looks appropriate. That would make zabbix independent from global server setting and will help it to keep idle connections for log periods open.

With regards,
Timur Bakeyev.



 Comments   
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ]

Timur, thank you for the detailed description. However, the issue is a duplicate of ZBX-6163, let's keep only that one open.

Generated at Wed Apr 24 11:10:33 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.