[ZBXNEXT-1006] in PostgreSQL use SERIAL instead of IDs Created: 2011 Oct 20 Updated: 2021 Oct 15 Resolved: 2021 Oct 15 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Server (S) |
Affects Version/s: | None |
Fix Version/s: | None |
Type: | Change Request | Priority: | Minor |
Reporter: | Stefan | Assignee: | Unassigned |
Resolution: | Declined | Votes: | 9 |
Labels: | database, performance, postgresql | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
Hello, why you dont use SERIAL instead of ID-Tables or rather to calculate the next ID.. http://neilconway.org/docs/sequences |
Comments |
Comment by Aleksandrs Saveljevs [ 2011 Oct 21 ] |
Zabbix used autoincremental fields prior to the introduction of distributed monitoring. With distributed monitoring and the structure of its IDs, it is no longer possible (see question #8 at http://www.zabbix.com/documentation/2.0/manual/appendix/faq). |
Comment by Stefan [ 2011 Oct 21 ] |
yes but you can use Serial for the DDDDDDDDDDDDD.. and why dont use you several fields for every number?! |
Comment by Stefan [ 2011 Oct 24 ] |
reopen for answer |
Comment by Aleksandrs Saveljevs [ 2011 Oct 24 ] |
Discussing the existing structure, suppose we have a database record with ID "N2N2DDDDDDDDDDD" on node N2. How can we insert a record with that ID into a table having an autoincremental ID field into the database on node N1? Now, your proposal. Suppose we use several fields for every ID: a triple (destination node number, source node number, ID). Suppose also that there is a record (N1, N1, 12345) on node N1. When transferring a record with ID (N2, N2, 12345) from node N2 to node N1, how would it be possible to use autoincremental property of table IDs and wouldn't "12345" from both nodes conflict with each other? |
Comment by Markus Wanner [ 2011 Oct 24 ] |
Hello Aleksandrs, I'm a Postgres developer and discussed this with Stefan on the German Postgres channel a bit already. I don't quite understand your first question. At least in Postgtres, it's easily possible to write custom data into an AUTOINC column (well, it's just a DEFAULT pointing to a SERIAL, there). I'd expect MySQL to work the same. One common way to solve the unique identifier problem for multiple databases is having a pair (origin node id, SERIAL), where each node has its own serial that it increments. The UNIQUENESS constraint is then on that pair. Assuming your node ids don't collide, that usually results in unique identifiers. Of course, that doesn't guarantee monotonically incrementing numbers. Nor gap-less increments. I don't know your specific needs, though. |
Comment by Aleksandrs Saveljevs [ 2011 Oct 25 ] |
There are two ID structures we are discussing in this issue. The first structure is the existing one (described at http://www.zabbix.com/documentation/2.0/manual/appendix/faq), the second one is splitting the ID into several columns. For the first structure, we need to generate IDs in the form NNNSSSDDDDDDDDDDD, which I am not sure is possible automatically with autoincremental fields. Another problem is that, as an optimization, we use multirow inserts and then reference the inserted IDs. We know how to get an inserted ID for a single insert statement, but how to get the inserted IDs if multiple rows were inserted? That is why we use the "ids" table to manage the generated IDs ourselves. For the second structure, that would mean splitting a single-column ID into three columns, which also includes foreign keys, which is not very convenient. Also, the problem with multirow inserts from the first structure remains. In any case, the structure of the database is not going to change for 1.8 and is pretty much fixed for 2.0 as well. We might return to autoincremental fields when we redesign distributed monitoring, but it is probably not going to happen anytime soon. |
Comment by richlv [ 2011 Oct 25 ] |
please also note that the solution would have to work on 5 different databases (as currently supported by zabbix - mysql, pgsql, oracle, sqlite, db2) |
Comment by Markus Wanner [ 2011 Oct 25 ] |
I agree than only the DDDDDDDDDDD part can usually be filled by a SERIAL or AUTOINC column. However, I see no reason for stuffing all of that information into a single column. I personally consider the separation good practice, whereas the combined variant seems inconvenient to me. I'm not sure what the issue with multiple row inserts are. Postgres offers INSERT .. RETURNING, which can perfectly well return multiple row's ids (possibly consisting of multiple columns). At least Oracle, DB2 and SQL Server offer that or a similar feature as well. SQLite is quite a different kind of database, anyway, which I think should be coded to differently (it runs in the same process as the client, doesn't support concurrent writers, etc.. "embedded"). I understand that you don't want to change the database structure. This usually is a huge task. And I understand the wish to support multiple databases. However, I'd argue that the current scheme is sub-optimal for enterprise class databases like Postgres or Oracle. |
Comment by Stefan [ 2011 Nov 03 ] |
@richlv: ok but why must i zabbix compile for a special database? (--with-pgsql) When i don“t use special databasefeatures?! |
Comment by richlv [ 2011 Nov 03 ] |
to use the appropriate db library ? |
Comment by Stefan [ 2012 Nov 09 ] |
postgresql has serial "I understand that you don't want to change the database structure. This usually is a huge task. And I understand the wish to support multiple databases. However, I'd argue that the current scheme is sub-optimal for enterprise class databases like Postgres or Oracle." maybe more talk in postgresql/mysql -IRC for better performance solutions.. |
Comment by Randall Svancara [ 2013 Mar 01 ] |
The way that Zabbix performs multi-row or grouped inserts/updates in one large transaction is a performance killer in terms of row locking. This approach simply does not scale. First of all, I think there needs to be some super process that can intelligently broker grouped updates so that no two transactions contain an update for the same record. This will reduce deadlocks. Deadlocks are bad because the database must detect these and there is a long delay usually to discover deadlocks. Additionally, the IDS table, responsible for storing the next ID is also another point of contention in the database. I am seeing error messages in the logs "Something Impossible Just Happened". Again, another deadlock situation where a process is attempting to grab the "next id" for some table and it fails. Why not move to a random string with a very large key space...zabbix is rarely CPU bound, so generation of large keys should not be a problem. Additionally, disk space is also cheap, so storing them should not be a problem either. This opens the way for other possibilities as well, like sharded database, which would provide a greater ability to scale, especially in the cloud where running six instances of a server with 2GB of RAM is cheaper than running 1 instance with 96GB of RAM. |
Comment by Marc [ 2015 Sep 04 ] |
Since Edit: |
Comment by Alexei Vladishev [ 2021 Oct 15 ] |
No need to keep this ticket open for so long, I am closing it. We won't use serial data type. |