Loading...

Type: Incident report
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 2.2.1
Component/s: Server (S)
Labels:
- database
- timeout
Environment:

Hide
LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: Debian
Description: Debian GNU/Linux 7.4 (wheezy)
Release: 7.4
Codename: wheezy

Show
LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch Distributor ID: Debian Description: Debian GNU/Linux 7.4 (wheezy) Release: 7.4 Codename: wheezy

I'm not sure, is this really a case, as I've experience it on a production server and had to revert everything back very soon.

So, I've pointed zabbix from the local mySQL to out production mySQL cluster, which has better performance, memory, etc. But, as it is used by a lot of scripts and programs, which sometimes don't behave nice, it has in mySQL configuration following options:

wait_timeout = 30
interactive_timeout = 900

After restart of the Zabbix server I've started to see messages in the log like:

27033:20140212:050417.071 server #23 started discoverer #5
27032:20140212:050417.072 server #22 started discoverer #4
27030:20140212:050417.072 server #20 started discoverer #2
27034:20140212:050417.072 server #24 started discoverer #6
27014:20140212:050417.072 server #4 started poller #2
27026:20140212:050431.435 slow query: 14.065646 sec, "select itemid,min(clock) from history_uint group by itemid"
27026:20140212:050440.305 housekeeper [deleted 91455 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 24.347664 sec, idle 1 hour(s)]
27033:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=4]
27029:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=0]
27030:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=1]
27034:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=5]
27018:20140212:050528.067 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
27015:20140212:050546.673 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=25546]
27036:20140212:050557.267 [Z3005] query failed: [2006] MySQL server has gone away [begin;]

While mySQL server was up and running and Zabbix itself was working kind of OK.

So, based on 'MySQL server has gone away' I assumed that this is connected with 'wait_timeout' and raised it to 900.

After restart number of such messages decreased dramatically:

27017:20140212:082321.399 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26601]
27026:20140212:090542.627 executing housekeeper
27026:20140212:090552.037 housekeeper [deleted 269757 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 9.409129 sec, idle 1 hour(s)]
27017:20140212:092540.623 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26740]
27015:20140212:094659.217 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=24419]
27026:20140212:100552.038 executing housekeeper
27026:20140212:100602.768 housekeeper [deleted 269648 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 10.730107 sec, idle 1 hour(s)]
27013:20140212:102315.571 SNMP agent item "ifOperStatus[ge-0/0/39.0]" on host "sw0" failed: first network error, wait for 15 seconds
27018:20140212:102330.690 resuming SNMP agent checks on host "sw0.ilo.ad-lab.nl": connection restored
27014:20140212:102446.059 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26686]
27015:20140212:103928.589 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=27568]
27026:20140212:110602.769 executing housekeeper
27026:20140212:110612.635 housekeeper [deleted 268852 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 9.865175 sec, idle 1 hour(s)]

But still occur from time to time. Now I raised wait_timeout to 28800(default mySQL value), but that of course breaks usage patter of the server. Have to observe logs to gather statistics though.

Meanwhile If I'm right, and this behavior caused by low global wait_timeout, the fix for it is quite simple - at the beginning of the mySQL session, when zabbix connects to it the first SQL statement should be:

SET SESSION wait_timeout=84600

or whatever looks appropriate. That would make zabbix independent from global server setting and will help it to keep idle connections for log periods open.

With regards,
Timur Bakeyev.

duplicates

ZBX-6163 [Z3005] query failed: [2006] MySQL server has gone away

Closed

Details

Description

Attachments

Issue Links

Activity

People

Dates