-
Incident report
-
Resolution: Won't fix
-
Major
-
None
-
2.0.9
-
Package: zabbix-server-pgsql 1:2.0.9+dfsg-1~bpo70+2 as packaged for Debian (the package does not modify any of the server code).
Database: PostgreSQL 9.1.9.
Item volume : 1350 ICMP pingers, 2700 SNMP items, some Zabbix agents, web scenarii and trappers. Most items are updated every 60 secs.
Proxy usage : 340 hosts are monitored through a Zabbix proxy.Package: zabbix-server-pgsql 1:2.0.9+dfsg-1~bpo70+2 as packaged for Debian (the package does not modify any of the server code). Database: PostgreSQL 9.1.9. Item volume : 1350 ICMP pingers, 2700 SNMP items, some Zabbix agents, web scenarii and trappers. Most items are updated every 60 secs. Proxy usage : 340 hosts are monitored through a Zabbix proxy.
Our Zabbix monitoring stopped processing events (silently!) last week until we finally noticed it. We had these errors in the logs:
38737:20131227:233018.004 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: duplicate key value violates unique constraint "events_pkey"
DETAIL: Key (eventid)=(200379) already exists.
[insert into events (eventid,source,object,objectid,clock,ns,value,value_changed) values (200379,0,0,14849,1388183416,743459150,0,1)]
We restored the service by deleting the 'events' row from the 'ids' table. It may happen again however and have a significant impact on our business, so I investigated the issue.
It smells like concurrent event creation outside of a transaction. I have found the following calls to process_event() outside of a DBbegin()/DBcommit() block:
- zabbix_server/timer/timer.c – generate_events()
- zabbix_server/poller/poller.c
This may not be exhaustive, I am not familiar with the Zabbix source code.
Could you confirm that it's a likely cause of the bug and include a fix in the next 2.0 release?
Thanks!
See also : ZBX-5881.