Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4462

Node sync problems when master reinstalled (server formated)

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Incomplete
    • Icon: Blocker Blocker
    • None
    • None
    • None

      Everything was working until we tried to update the Master Node from 1.8.8 to 1.8.9.

      Here I have some info that I've already collected about everything:
      http://www.zabbix.com/forum/showthread.php?t=24367
      http://www.zabbix.com/forum/showthread.php?t=24433

      I already had updated Zabbix before with no problems, but this time the server get out of space because of InnoDB and I've lost all Master Node data.
      SQL backup was corrupted, I've lost everything on the Master Node database.

      There's no problem because didn't affected the slaves and the master isn't so important, but now it doesn't sync anything.

      Please, I'm in a hurry, my employers gave me one week to fix it.

      I can give any information you need, just ask.

      Thank you.

      Here are some logs from the slave (filtered):

      1358:20111216:165534.090 Query [txnlev:0] [select masterid from nodes where nodeid=2]
      1358:20111216:165534.090 Query [txnlev:0] [select masterid from nodes where nodeid=1]
      1358:20111216:165534.090 Query [txnlev:0] [select ip,port from nodes where nodeid=1]
      1358:20111216:165534.139 NODE 2: Sending [ZBX_GET_HISTORY_LAST_ID­2­2
      alerts­alertid] to Node [1]
      1358:20111216:165534.175 NODE 2: Receiving [200000000012160] from Node [1]
      1358:20111216:165534.175 Query [txnlev:1] [begin;]
      1358:20111216:165534.175 Query [txnlev:1] [commit;]
      1358:20111216:165534.175 Query [txnlev:1] [begin;]
      1358:20111216:165534.175 Query [txnlev:1] [select id,itemid,clock,value from history_sync where nodeid=2 order by id limit 10000]
      1358:20111216:165534.176 NODE 2: Sending history_sync of node 2 to node 1 datalen 2396
      1358:20111216:165534.176 Query [txnlev:1] [select ip,port from nodes where nodeid=1]
      1358:20111216:165534.196 NODE 2: Sending [History­2­2­history_sync
      1358:20111216:165534.298 NODE 2: Receiving [OK] from Node [1]
      1358:20111216:165534.298 OK
      1358:20111216:165534.298 Query [txnlev:1] [delete from history_sync where nodeid=2 and id<=44509441]
      1358:20111216:165534.299 Query [txnlev:1] [commit;]
      1358:20111216:165534.306 Query [txnlev:1] [begin;]
      1358:20111216:165534.306 Query [txnlev:1] [select id,itemid,clock,value from history_uint_sync where nodeid=2 order by id limit 10000]
      1358:20111216:165534.307 NODE 2: Sending history_uint_sync of node 2 to node 1 datalen 480
      1358:20111216:165534.307 Query [txnlev:1] [select ip,port from nodes where nodeid=1]
      1358:20111216:165534.324 NODE 2: Sending [History­2­2­history_uint_sync
      1330:20111216:165534.335 Get value from agent result: '659947.522052'
      1334:20111216:165534.337 Sending [vfs.fs.size[/,free]
      ]
      1334:20111216:165534.338 Get value from agent result: '11515592704'

      This time I do have some "OK"s in the log, but still nothing on the Master Node.

      The imagem in attachment shows what I say: even in fresh install the data doesn't sync.

      And the master only answer this:

      32322:20111216:175930.853 NODE 1: Received events from node 2 for node 2 datalen 510018
      32323:20111216:175932.235 NODE 1: Received history from node 2 for node 2 datalen 2037
      32321:20111216:175932.339 NODE 1: Received history_uint from node 2 for node 2 datalen 657
      32322:20111216:175940.974 NODE 1: Received events from node 2 for node 2 datalen 510018
      32322:20111216:175942.376 NODE 1: Received history from node 2 for node 2 datalen 1890
      32321:20111216:175942.463 NODE 1: Received history_uint from node 2 for node 2 datalen 213
      32322:20111216:175951.091 NODE 1: Received events from node 2 for node 2 datalen 510018
      32321:20111216:175952.649 NODE 1: Received history from node 2 for node 2 datalen 1699
      32323:20111216:175952.736 NODE 1: Received history_uint from node 2 for node 2 datalen 449
      32322:20111216:180001.297 NODE 1: Received events from node 2 for node 2 datalen 510018
      32321:20111216:180002.680 NODE 1: Received history from node 2 for node 2 datalen 2152
      32323:20111216:180002.788 NODE 1: Received history_uint from node 2 for node 2 datalen 700

      In one of the re-installation tries we did get 7 hosts been added, but we've only got NULL on mysql our "Agent droped connection..... ZBX_TCP_READ..." ...

            Unassigned Unassigned
            adriano adriano
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: