Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-7793

Zabbix doesn't alter mySQL wait_timeout for it's sessions

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 2.2.1
    • Server (S)

      I'm not sure, is this really a case, as I've experience it on a production server and had to revert everything back very soon.

      So, I've pointed zabbix from the local mySQL to out production mySQL cluster, which has better performance, memory, etc. But, as it is used by a lot of scripts and programs, which sometimes don't behave nice, it has in mySQL configuration following options:

      wait_timeout = 30
      interactive_timeout = 900

      After restart of the Zabbix server I've started to see messages in the log like:

      27033:20140212:050417.071 server #23 started discoverer #5
      27032:20140212:050417.072 server #22 started discoverer #4
      27030:20140212:050417.072 server #20 started discoverer #2
      27034:20140212:050417.072 server #24 started discoverer #6
      27014:20140212:050417.072 server #4 started poller #2
      27026:20140212:050431.435 slow query: 14.065646 sec, "select itemid,min(clock) from history_uint group by itemid"
      27026:20140212:050440.305 housekeeper [deleted 91455 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 24.347664 sec, idle 1 hour(s)]
      27033:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=4]
      27029:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=0]
      27030:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=1]
      27034:20140212:050517.078 [Z3005] query failed: [2006] MySQL server has gone away [select distinct r.druleid,r.iprange,r.name,c.dcheckid from drules r left join dchecks c on c.druleid=r.druleid and uniq=1 where r.proxy_hostid is null and r.status=0 and (r.nextcheck<=1392177917 or r.nextcheck>1392177917+r.delay) and mod(r.druleid,6)=5]
      27018:20140212:050528.067 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
      27015:20140212:050546.673 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=25546]
      27036:20140212:050557.267 [Z3005] query failed: [2006] MySQL server has gone away [begin;]

      While mySQL server was up and running and Zabbix itself was working kind of OK.

      So, based on 'MySQL server has gone away' I assumed that this is connected with 'wait_timeout' and raised it to 900.

      After restart number of such messages decreased dramatically:

      27017:20140212:082321.399 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26601]
      27026:20140212:090542.627 executing housekeeper
      27026:20140212:090552.037 housekeeper [deleted 269757 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 9.409129 sec, idle 1 hour(s)]
      27017:20140212:092540.623 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26740]
      27015:20140212:094659.217 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=24419]
      27026:20140212:100552.038 executing housekeeper
      27026:20140212:100602.768 housekeeper [deleted 269648 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 10.730107 sec, idle 1 hour(s)]
      27013:20140212:102315.571 SNMP agent item "ifOperStatus[ge-0/0/39.0]" on host "sw0" failed: first network error, wait for 15 seconds
      27018:20140212:102330.690 resuming SNMP agent checks on host "sw0.ilo.ad-lab.nl": connection restored
      27014:20140212:102446.059 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=26686]
      27015:20140212:103928.589 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,state,filter,error,lifetime from items where itemid=27568]
      27026:20140212:110602.769 executing housekeeper
      27026:20140212:110612.635 housekeeper [deleted 268852 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 9.865175 sec, idle 1 hour(s)]

      But still occur from time to time. Now I raised wait_timeout to 28800(default mySQL value), but that of course breaks usage patter of the server. Have to observe logs to gather statistics though.

      Meanwhile If I'm right, and this behavior caused by low global wait_timeout, the fix for it is quite simple - at the beginning of the mySQL session, when zabbix connects to it the first SQL statement should be:

      SET SESSION wait_timeout=84600

      or whatever looks appropriate. That would make zabbix independent from global server setting and will help it to keep idle connections for log periods open.

      With regards,
      Timur Bakeyev.

            Unassigned Unassigned
            timur Timur Bakeyev
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: