-
Incident report
-
Resolution: Incomplete
-
Blocker
-
None
-
None
-
None
-
Master Node: CentOS 6, Zabbix 1.8.9, mysql-server-5.1.52-1.el6_0.1.i686, php-5.3.3-3.el6_1.3.i686, httpd-2.2.15-9.el6.centos.3.i686
Installation parameters: --enable-server --with-mysql --with-net-snmp --with-libcurl --enable-agent
Slave Node: CentOS 5.5, Zabbix 1.8.5, mysql-server-5.0.77-4.el5_5.4, php-5.2.17-7.el5, httpd-2.2.21-3.el5
Installation parameters: --enable-server --with-mysql --with-net-snmp --with-libcurl --enable-agentMaster Node: CentOS 6, Zabbix 1.8.9, mysql-server-5.1.52-1.el6_0.1.i686, php-5.3.3-3.el6_1.3.i686, httpd-2.2.15-9.el6.centos.3.i686 Installation parameters: --enable-server --with-mysql --with-net-snmp --with-libcurl --enable-agent Slave Node: CentOS 5.5, Zabbix 1.8.5, mysql-server-5.0.77-4.el5_5.4, php-5.2.17-7.el5, httpd-2.2.21-3.el5 Installation parameters: --enable-server --with-mysql --with-net-snmp --with-libcurl --enable-agent
Everything was working until we tried to update the Master Node from 1.8.8 to 1.8.9.
Here I have some info that I've already collected about everything:
http://www.zabbix.com/forum/showthread.php?t=24367
http://www.zabbix.com/forum/showthread.php?t=24433
I already had updated Zabbix before with no problems, but this time the server get out of space because of InnoDB and I've lost all Master Node data.
SQL backup was corrupted, I've lost everything on the Master Node database.
There's no problem because didn't affected the slaves and the master isn't so important, but now it doesn't sync anything.
Please, I'm in a hurry, my employers gave me one week to fix it.
I can give any information you need, just ask.
Thank you.
Here are some logs from the slave (filtered):
1358:20111216:165534.090 Query [txnlev:0] [select masterid from nodes where nodeid=2]
1358:20111216:165534.090 Query [txnlev:0] [select masterid from nodes where nodeid=1]
1358:20111216:165534.090 Query [txnlev:0] [select ip,port from nodes where nodeid=1]
1358:20111216:165534.139 NODE 2: Sending [ZBX_GET_HISTORY_LAST_ID22
alertsalertid] to Node [1]
1358:20111216:165534.175 NODE 2: Receiving [200000000012160] from Node [1]
1358:20111216:165534.175 Query [txnlev:1] [begin;]
1358:20111216:165534.175 Query [txnlev:1] [commit;]
1358:20111216:165534.175 Query [txnlev:1] [begin;]
1358:20111216:165534.175 Query [txnlev:1] [select id,itemid,clock,value from history_sync where nodeid=2 order by id limit 10000]
1358:20111216:165534.176 NODE 2: Sending history_sync of node 2 to node 1 datalen 2396
1358:20111216:165534.176 Query [txnlev:1] [select ip,port from nodes where nodeid=1]
1358:20111216:165534.196 NODE 2: Sending [History22history_sync
1358:20111216:165534.298 NODE 2: Receiving [OK] from Node [1]
1358:20111216:165534.298 OK
1358:20111216:165534.298 Query [txnlev:1] [delete from history_sync where nodeid=2 and id<=44509441]
1358:20111216:165534.299 Query [txnlev:1] [commit;]
1358:20111216:165534.306 Query [txnlev:1] [begin;]
1358:20111216:165534.306 Query [txnlev:1] [select id,itemid,clock,value from history_uint_sync where nodeid=2 order by id limit 10000]
1358:20111216:165534.307 NODE 2: Sending history_uint_sync of node 2 to node 1 datalen 480
1358:20111216:165534.307 Query [txnlev:1] [select ip,port from nodes where nodeid=1]
1358:20111216:165534.324 NODE 2: Sending [History22history_uint_sync
1330:20111216:165534.335 Get value from agent result: '659947.522052'
1334:20111216:165534.337 Sending [vfs.fs.size[/,free]
]
1334:20111216:165534.338 Get value from agent result: '11515592704'
This time I do have some "OK"s in the log, but still nothing on the Master Node.
The imagem in attachment shows what I say: even in fresh install the data doesn't sync.
And the master only answer this:
32322:20111216:175930.853 NODE 1: Received events from node 2 for node 2 datalen 510018
32323:20111216:175932.235 NODE 1: Received history from node 2 for node 2 datalen 2037
32321:20111216:175932.339 NODE 1: Received history_uint from node 2 for node 2 datalen 657
32322:20111216:175940.974 NODE 1: Received events from node 2 for node 2 datalen 510018
32322:20111216:175942.376 NODE 1: Received history from node 2 for node 2 datalen 1890
32321:20111216:175942.463 NODE 1: Received history_uint from node 2 for node 2 datalen 213
32322:20111216:175951.091 NODE 1: Received events from node 2 for node 2 datalen 510018
32321:20111216:175952.649 NODE 1: Received history from node 2 for node 2 datalen 1699
32323:20111216:175952.736 NODE 1: Received history_uint from node 2 for node 2 datalen 449
32322:20111216:180001.297 NODE 1: Received events from node 2 for node 2 datalen 510018
32321:20111216:180002.680 NODE 1: Received history from node 2 for node 2 datalen 2152
32323:20111216:180002.788 NODE 1: Received history_uint from node 2 for node 2 datalen 700
In one of the re-installation tries we did get 7 hosts been added, but we've only got NULL on mysql our "Agent droped connection..... ZBX_TCP_READ..." ...