-
Incident report
-
Resolution: Duplicate
-
Critical
-
None
-
1.6.2
-
None
-
Version installed on both nodes is 1.6.2
System is RHEL 5.1 x86_64
Backend is Postgresql (8.1.11)
Every other aspect of zabbix is working fine with our installation except when I activated this second node. I followed the instruction and prepared the 2 nodes with ID 3 and 4.
ID 4 is the master at the moment, From the logs (in debugging) most of the replication and inter-comms are working (like get last ID....), but when the node 3 tries to send its auditlog to node 4, node4 fails to insert the elements in the DB.
Logs on node 3:
24636:20090303:053805 NODE 3: Receiving [0] from Node [4] 24636:20090303:053805 Query [select acknowledgeid,userid,eventid,clock,message from acknowledges where acknowledgeid>0 and acknowledgeid between 300000000000000 and 399999999999999 order by acknowledgeid limit 10000] 24636:20090303:053805 Query [commit;] 24636:20090303:053805 In process_history_table_data() 24636:20090303:053805 Query [begin;] 24636:20090303:053805 In get_history_lastid() 24636:20090303:053805 In connect_to_node(nodeid:4) 24636:20090303:053805 Query [select ip,port from nodes where nodeid=4] 24636:20090303:053805 NODE 3: Sending [ZBX_GET_HISTORY_LAST_ID<AD>3<AD>3 auditlog<AD>auditid] to Node [4] 24636:20090303:053805 NODE 3: Receiving [0] from Node [4] 24636:20090303:053805 Query [select auditid,userid,clock,action,resourcetype,details from auditlog where auditid>0 and auditid between 300000000000000 and 399999999999999 order by auditid limit 10000] 24636:20090303:053805 NODE 3: Sending auditlog of node 3 to node 4 datalen 488 24636:20090303:053805 In connect_to_node(nodeid:4) 24636:20090303:053805 Query [select ip,port from nodes where nodeid=4] 24636:20090303:053805 NODE 3: Sending [History<AD>3<AD>3<AD>auditlog 300300000000064<AD>300300000000003<AD>1236066911<AD>0<AD>4<AD>486f7374205b626162626167655d204950205b3139322e3136382e3137322e395d20537461747573205b305d 300300000000065<AD>300300000000002<AD>1236068025<AD>3<AD>0<AD>436f7272656374206c6f67696e205b6a666973636865725d 300300000000066<AD>300300000000003<AD>1236068112<AD>0<AD>4<AD>486f7374205b74656d6167616d695d204950205b3139322e3136382e3136392e32355d20537461747573205b305d 300300000000067<AD>300300000000002<AD>1236073326<AD>3<AD>0<AD>436f7272656374206c6f67696e205b6a666973636865725d] to Node [4] 24636:20090303:053806 NODE 3: Receiving [] from Node [4] 24636:20090303:053806 NOT OK
The events on node4:
2610:20090303:113835 Get value from agent result: '1356491' 2610:20090303:113835 End get_value() 2610:20090303:113835 Query [begin;] 2610:20090303:113835 In process_new_value(system.uptime) 2610:20090303:113835 In add_history(key:system.uptime,value_type:3,type:1) 2610:20090303:113835 In add_history(itemid:400400000023115,UINT64:1356491) 2610:20090303:113835 In add_history_uint() 2610:20090303:113835 Query [insert into history_uint (clock,itemid,value) values (1236076715,400400000023115,1356491)] 2610:20090303:113835 In add_trend_uint() 2610:20090303:113835 Query [select num,value_min,value_avg,value_max from trends_uint where itemid=400400000023115 and clock=1236074400] 2610:20090303:113835 Query [update trends_uint set num=3,value_min=1354691,value_avg=1355590,value_max=1356491 where itemid=400400000023115 and clock=1236074400] 2610:20090303:113835 End of add_history():SUCCEED 2610:20090303:113835 In update_item() 2610:20090303:113835 In calculate_item_nextcheck (400400000023115,900,"",1236076715) 2610:20090303:113835 End calculate_item_nextcheck (result:1236077615) 2610:20090303:113835 Query [update items set nextcheck=1236077615,prevvalue=lastvalue,prevorgvalue=NULL,lastvalue='1356491',lastclock=1236076715 where itemid=400400000023115] 2610:20090303:113835 End of update_item() 2610:20090303:113835 In update_functions(400400000023115) 2610:20090303:113835 Query [select distinct function,parameter,itemid,lastvalue from functions where itemid=400400000023115] 2619:20090303:113835 Trapper got [History<AD>3<AD>3<AD>auditlog 300300000000064<AD>300300000000003<AD>1236066911<AD>0<AD>4<AD>486f7374205b626162626167655d204950205b3139322e3136382e3137322e395d20537461747573205b305d 300300000000065<AD>300300000000002<AD>1236068025<AD>3<AD>0<AD>436f7272656374206c6f67696e205b6a666973636865725d 300300000000066<AD>300300000000003<AD>1236068112<AD>0<AD>4<AD>486f7374205b74656d6167616d695d204950205b3139322e3136382e3136392e32355d20537461747573205b305d 300300000000067<AD>300300000000002<AD>1236073326<AD>3<AD>0<AD>436f7272656374206c6f67696e205b6a666973636865725d] len 488 2619:20090303:113835 In node_history() 2619:20090303:113835 Query [begin;] 2619:20090303:113835 NODE 4: Received auditlog from node 3 for node 3 datalen 488 2610:20090303:113835 ItemId:400400000023115 Evaluating last(0) 2619:20090303:113835 In process_record () 2610:20090303:113835 In evaluate_function('abaddon:system.uptime.last(0)') 2619:20090303:113835 Query [insert into auditlog (auditid,userid,clock,action,resourcetype,details) values (300300000000064,300300000000003,1236066911,0,4,0x486f7374205b626162626167655d204950205b3139322e3136382e3137322e395d205374 61747573205b305d)] 2610:20090303:113835 In evaluate_LAST() 2610:20090303:113835 End of evaluate_LAST() 2610:20090303:113835 End of evaluate_function('abaddon:system.uptime.last(0)',value:'1356491'):SUCCEED 2610:20090303:113835 Query [update functions set lastvalue='1356491' where itemid=400400000023115 and function='last' and parameter='0'] NOTICE: identifier "x486f7374205b626162626167655d204950205b3139322e3136382e3137322e395d20537461747573205b305d" will be truncated to "x486f7374205b626162626167655d204950205b3139322e3136382e3137322e" 2619:20090303:113835 [Z3005] Query failed: [0] PGRES_FATAL_ERROR:ERROR: syntax error at or near "x486f7374205b626162626167655d204950205b3139322e3136382e3137322e395d20537461747573205b305d" at character 129 [insert into auditlog (auditid,userid,clock,action,resourcetype,details) values (300300000000064,300300000000003,1236066911,0,4,0x486f7374205b626162626167655d204950205b3139322e3136382e3137322e395d20537461747573205b305d)] 2619:20090303:113835 Query [rollback;] 2610:20090303:113835 End update_functions() {nofomrat} I had a long look at the source code and I believe that the receiver (node 4) forgot to run the zbx_hex2binary() function on the received buffer (value) which is a TEXT type. I tried to apply a small patch to {{zabbix-1.6.2/src/zabbix_server/trapper/nodehistory.c}} in the {{process_record()}} function, but it failed badly. There is a similar process_record function in src/zabbix_server/trapper/nodesync.c which seems to call that function when postgres is in use:
#if defined(HAVE_POSTGRESQL)
len = zbx_hex2binary(buffer);
zbx_pg_escape_bytea((u_char *)buffer, len, &tmp, &tmp_allocated);
zbx_snprintf_alloc(&fields_update, &fields_update_allocated, &fields_update_offset, strlen(tmp)+256, "%s='%s',",
fieldname,
tmp);
zbx_snprintf_alloc(&values, &values_allocated, &values_offset, strlen(tmp)+256, "'%s',",
tmp);
#else
zbx_snprintf_alloc(&fields_update, &fields_update_allocated, &fields_update_offset, strlen(buffer)+256, "%s=0x%s,",
fieldname,
buffer);
zbx_snprintf_alloc(&values, &values_allocated, &values_offset, strlen(buffer)+256, "0x%s,",
buffer);
#endif
so I'm thinking the same method could be used as well.
Could you help ?