[ZBX-4881] Slow select queries on large databases with partitions (at least on postgresql) Created: 2012 Apr 16 Updated: 2017 May 30 Resolved: 2012 Apr 19 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 1.8.11 |
Fix Version/s: | 1.8.13rc1, 2.0.0rc4 |
Type: | Incident report | Priority: | Major |
Reporter: | Sébastien Riccio | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | performance, sql | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Linux 2.6.32-5-amd64 #1 SMP Mon Oct 3 03:59:20 UTC 2011 x86_64 GNU/Linux |
Attachments: | ZBX-4881.patch | ||||
Issue Links: |
|
Description |
Hi, I dunno where to write about this so I'll try here. I noticed that often our zabbix server loads being stuck on queries like this one: select value from history where itemid=23719 and clock<=1334581521 order by itemid,clock desc limit 1 It seems, if i'm not wrong, that this query is used by the item.last trigger (to get the last value of an item) Could also be used by item.delta (with limit 2 instead of 1), etc... The thing is that our history table is partitionned with constraints so we can disable housekeeper which is a resource eater on large dbs and like this we can keep a nice amount of history without too much perfs drop. But sometimes such queries takes more than 5 minutes to complete, because clock<=xxxxxxxxx makes postgresql browse all the table partitions and render the partitionning quite useless. if you replace that kind of query with (specifying a time range) it will be lightning fast at it will only read the patition table concerned by this time range. Wouldn't be an idea to modify those queries so they use a time range to drastically improve the perfs for the people using partitons or Help please Here you'll find EXPLAINs for both queries : select value from history where itemid=23719 and clock<=1334581521 and clock >=1334580521 (takes ages to complete) QUERY PLAN select value from history where itemid=23719 and clock<=1334581521 and clock >=1334580521 order by itemid,clock desc limit 1 (ultra fast) QUERY PLAN Kind regards, |
Comments |
Comment by Sébastien Riccio [ 2012 Apr 17 ] |
Again if I'm not wrong the function that issue that query is: I've looked a bit into the source code to try to understand when such a query (with only clock_from arg passed, and no clock_to) root@eye:~/install/zabbix-1.8.11/src/libs/zbxserver# grep DBget_history -R * Here there is nothing "suspicious", all are using a clock_to parameter. But this one: zabbix_server/poller/checks_aggregate.c: h_value = DBget_history(itemid, value_type, item_func, clock_from, 0, NULL, 0); Could be the one generating this query as it seems to pass clock_from but 0 for clock_to. What is this function used for ? Thanks |
Comment by Sébastien Riccio [ 2012 Apr 18 ] |
Sorry, Looking at it again it seems I've inverted clock_from and clock_to. So that means a lot of DBget_history calls can be the cause. Still trying to find why some are so slow and other aren't. |
Comment by Alexander Vladishev [ 2012 Apr 18 ] |
Hi, I confirm a given problem. It will be fixed before release 2.0.0. Thank you! |
Comment by Sébastien Riccio [ 2012 Apr 18 ] |
Hi Alexander, Thanks a lot for the patch, I've applied it to the 1.8.11 source and recompiled. Thanks again, |
Comment by Alexander Vladishev [ 2012 Apr 19 ] |
Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-4881 |
Comment by Sébastien Riccio [ 2012 Apr 19 ] |
Hi Alexander, Your patch is working like a charm, the average load of the box is now very low and we can't see any more queries slowing down all the processing. But, it seems seems it brought a bug with the notifications. The values for the item in the mail notifications are now UNKNOWN. This started to happen after I applied your patch. Here is an example: Alert: Host: celsius-B1 More informations: Sensor 1 is now UNKNOWN Recovery: Host: celsius-B1 More informations: Sensor 1 is now UNKNOWN (edit: here is the macros I'm using in the mail body) SwissCenter Monitoring has detected the following problem: Host: {HOSTNAME}What: {TRIGGER.NAME}: {STATUS}When: {EVENT.DATE}at {EVENT.TIME}( {EVENT.AGE} ago) More informations: {ITEM.NAME1}is now {ITEM.VALUE1}Maybe a side effect of changing the queries that are used to get this value ? Thanks a lot for your work, |
Comment by Sébastien Riccio [ 2012 Apr 19 ] |
I changed my actions messages and replaced ITEM.VALUE1 with ITEM.LASTVALUE1 and it fixes the problem. But still it was working with ITEM.VALUE1 before applying the patch. Could it be a side effect of changes made to : in file /src/libs/zbxserver/expression.c the function int substitute_simple_macros(DB_EVENT *event, DB_ITEM *item, DC_HOST *dc_host, make uses of static int DBget_item_value(DB_TRIGGER *trigger, char **value, int N_functionid, int clock) { [...] h_value = DBget_history(itemid, value_type, ZBX_DB_GET_HIST_VALUE, 0, clock, delay, NULL, 1); [...] }It seems to return no value (maybe clock range too short?) Thanks, |
Comment by Alexey Pustovalov [ 2012 Apr 21 ] |
Hi Sébastien, Please try development branch svn://svn.zabbix.com/branches/dev/ZBX-4881 Regards, Alexey. |
Comment by Oleksii Zagorskyi [ 2012 Apr 21 ] |
Forum threads where this fix can be published as a message: |
Comment by Sébastien Riccio [ 2012 Apr 21 ] |
Hi Alexey, I tried to build the branch but seems there is something wrong, or maybe I am doing something wrong:
Cheers, |
Comment by richlv [ 2012 Apr 21 ] |
see https://zabbix.org/wiki/Compilation_instructions - you must run "make dbschema" before make |
Comment by Sébastien Riccio [ 2012 Apr 21 ] |
Awww sorry I skipped this step. Thanks a lot richlv! |
Comment by Sébastien Riccio [ 2012 Apr 21 ] |
Compiled and installed, I can confirm it now indeed works with ITEM.VALUE1 macro in actions. Cheers, |
Comment by dimir [ 2012 Apr 27 ] |
Successfully tested. |
Comment by Alexander Vladishev [ 2012 Apr 28 ] |
Available in versions pre-1.8.13rc1 r27169 and pre-2.0.0rc4 r27178. |