[ZBX-7373] Very slow SQL query for getting last values from history and sometimes can be finished with error Created: 2013 Nov 15  Updated: 2018 Nov 22  Resolved: 2014 Jun 10

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F)
Affects Version/s: 2.2.1rc1
Fix Version/s: 2.2.1rc1, 2.3.1

Type: Incident report Priority: Critical
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 36
Labels: latestdata, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File history2.1.patch     File history2.2.patch     File history2.patch     Text File test.txt     PNG File zabbix-2.0.4.PNG     PNG File zabbix-2.2.5.PNG     PNG File zabbix_2.2.4_latestdata.png    
Issue Links:
Duplicate
is duplicated by ZBXNEXT-2013 Problems with "Latest data" after 2.2... Closed
is duplicated by ZBX-7810 Slow latest data section due to an in... Closed
is duplicated by ZBX-8345 PHP fatal error on latest data page. Closed
is duplicated by ZBX-8016 Latest data is very slow when "Dropdo... Closed

 Description   

On Monitoring->Overview and Monitoring->Latest data we have function CHistoryManager->getLast() which gets values from history using union way. Sometimes:
1. If the page contains a lot of items
2. If some of values have latest values ages ago
it can be very slow and can be failed with "memory is exhausted near" (increasing memory_limit to 512M, 1G does not help).



 Comments   
Comment by Alexey Pustovalov [ 2013 Nov 15 ]

I made a few tests to check performance issues with the problem:
1. Vanilla Latest data with about 3000 items:

******************** Script profiler ********************
Total time: 38.640018
Total SQL time: 26.696099
SQL count: 782 (selects: 392 | executes: 390)
Peak memory usage: 111.5M
Memory limit: 2048M

2. Separated queries by 10 items per query:

******************** Script profiler ********************
Total time: 14.392597
Total SQL time: 2.572941
SQL count: 1252 (selects: 863 | executes: 389)
Peak memory usage: 113.25M
Memory limit: 2048M

3. Added limit for 3 refresh interval to big query:

******************** Script profiler ********************
Total time: 38.431007
Total SQL time: 25.909821
SQL count: 782 (selects: 393 | executes: 389)
Peak memory usage: 111.5M
Memory limit: 2048M

The last test does not help, but for Oracle DB it helps much better than for MySQL.
The second test calls crash of Oracle DB related with reached max amount of opened cursors.

Comment by Ткаченко Максим Александрович [ 2013 Nov 27 ]

New trace for changed sql in test.txt

Comment by Kenneth Palmertree [ 2013 Dec 02 ]

Applied the patch in ZBX-7373 and we noticed improved performance for latest data. Went from about 90 secs to about 1 sec to bring up latest data for a host with about 1300 items.

Comment by Kenneth Palmertree [ 2013 Dec 02 ]

Getting Undefined variable: rs [dashboard.php:158 → make_webmon_overview() → CHttpTestManager->getLastData() → CHistoryManager->getLast() in /var/www/html/WEBSITE/api/classes/managers/CHistoryManager.php:50]: for web widget if nothing is being monitored. Fixed the issue by adding

$rs = array();

class CHistoryManager {

	/**
	 * Returns the last $limit history objects for the given items.
	 *
	 * @param array $items  an array of items with the 'itemid' and 'value_type' properties
	 * @param int $limit
	 *
	 * @return array    an array with items IDs as keys and arrays of history objects as values
	 */
	public function getLast(array $items, $limit = 1) {
		$rs = array();
		foreach ($items as $item) {
			$table = self::getTableName($item['value_type']);
			$query = DBselect(
				'SELECT *'.
				' FROM '.$table.' h'.
				' WHERE h.itemid='.zbx_dbstr($item['itemid']).
				' ORDER BY h.clock DESC',
				$limit
			);
			while ($history = DBfetch($query)) {
				$rs[$history['itemid']][] = $history;
			}
		}

		return $rs;
	}
Comment by Pavels Jelisejevs (Inactive) [ 2013 Dec 03 ]

Kenneth, thanks for the feedback and the patch. We've decided to include the fix in 2.2.1 and, possibly, improve it later.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Dec 03 ]

RESOLVED in svn://svn.zabbix.com/branches/dev/ZBX-7373.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Dec 03 ]

Fixed in 2.2.1rc1 r40684 and 2.3.0 r40685.

CLOSED.

Comment by Stephen Dayman [ 2013 Dec 03 ]

I have added Kenneth's update function to my app and the page is now loading in less than a second (used to take several minutes). Thank you!

Comment by Oleksii Zagorskyi [ 2013 Dec 05 ]

Reopen to specify correctly Fix Version/s:

Comment by Oleksii Zagorskyi [ 2013 Dec 05 ]

Closed again.

Comment by Pascal Schmiel [ 2013 Dec 11 ]

There is a chance to improve the performance by reducing database overhead when just spliting the UNIONs instead of using individual queries:

getLast.patch
--- /opt/app/zabbix/pschmiel/zabbix-2.2.0/frontends/php/api/classes/managers/CHistoryManager.php 2013-11-12 07:07:38.000000000 +0100
+++ /opt/app/zabbix/frontend/api/classes/managers/CHistoryManager.php   2013-12-05 09:40:26.101923782 +0100
@@ -31,7 +31,7 @@
         * @param int $limit
         *
         * @return array    an array with items IDs as keys and arrays of history objects as values
-        */
+        */
        public function getLast(array $items, $limit = 1) {
                $queries = array();
                foreach ($items as $item) {
@@ -47,9 +47,12 @@
 
                $rs = array();
                foreach ($queries as $tableQueries) {
-                       $query = DBunion($tableQueries);
-                       while ($history = DBfetch($query)) {
-                               $rs[$history['itemid']][] = $history;
+                       $tableQueriesChunk = array_chunk($tableQueries, 200); //value may be optimized
+                       foreach ($tableQueriesChunk as $tableQueriesSmall) {
+                               $query = DBunion($tableQueriesSmall);
+                               while ($history = DBfetch($query)) {
+                                       $rs[$history['itemid']][] = $history;
+                               }
                        }
                }
Comment by Dimitri Bellini [ 2013 Dec 12 ]

I have tested the latest 2.2 svn branch (05/12/2013) compared to the old Zabbix 2.0.9 with the same host configuration and running the "latest data" page i have very huge difference, below detail:

HOST Configuration

HOSTNAME01 Applications (7) Items (4422) Triggers (2639) Graphs (198) Discovery (2)

Zabbix 2.0.9

******************** Script profiler ********************
Total time: 1.073098
Total SQL time: 0.563685
SQL count: 44 (selects: 21 | executes: 23)
Peak memory usage: 11.450989M
Memory limit: 512M

Zabbix 2.2 svn (05/12/2013)

******************** Script profiler ********************
Total time: 11.397692
Total SQL time: 4.470916
SQL count: 4556 (selects: 4517 | executes: 39)
Peak memory usage: 103M
Memory limit: 1G

On new 2.2 you can see very huge difference in general but mainly the "Total time" compared to "Total SQL time" is very high.
If you need other info please ask me.
Thanks

Comment by Pascal Schmiel [ 2013 Dec 12 ]

Dimitri, try to use my patch on the latest SVN. (You have to change the getLast method by hand because the patchfile is for zabbix 2.2.0).
If it makes things better, I will get my hands on a patchfile for current 2.2.1.

(Maybe you want to try different values for the array_chunk())

Best
Pascal

Comment by Dimitri Bellini [ 2013 Dec 12 ]

Hi Pascal
thanks for your reply, i have tested your patch on Zabbix 2.2.0 because the TRUNK is very different.
Below i will report the same test:

******************** Script profiler ********************
Total time: 18.430855
Total SQL time: 11.496547
SQL count: 157 (selects: 120 | executes: 37)
Peak memory usage: 84.5M
Memory limit: 1G

Is much better then the original code but using the Zabbix TRUNK code seems better but is still not fast as 2.0.9.
Many thanks

Comment by Pascal Schmiel [ 2013 Dec 12 ]

Try to raise the chunk size from 200 to 1000.
I'm not familiar with the 2.3.0 trunk but maybe I have a look at this later.

Comment by Dimitri Bellini [ 2013 Dec 12 ]

Hi Pascal,
you are right! I have changed the Chunk size to 10 or 5 i have this value:

******************** Script profiler ********************
Total time: 7.750821
Total SQL time: 1.960531
SQL count: 1018 (selects: 981 | executes: 37)
Peak memory usage: 88M
Memory limit: 1G

Now seems much better! but i don't understand why i have SQL time near to 2 sec and a Total Time of 8 sec...
Do you know Why?

Comment by Oleksii Zagorskyi [ 2013 Dec 12 ]

Dimitri, please use comments formating to paste SQLs output, I mean {code} or {noformat} tags.

Then it will look like this:

Total time: 7.750821
Comment by Dimitri Bellini [ 2013 Dec 13 ]

Thanks Oleksiy
next time i will do that
But please considering to not solved this issue, i still have slow query on "Latest Data" page.
As i understand i'm not the only one with this problem.
If you have suggestion or you need detail please say to me.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Dec 16 ]

Reopening since this issue still requires work.

Comment by Stas Shaklein [ 2013 Dec 16 ]

Hello. I made a patch for this issue.
(already works for my installation: 13000 hosts, 150000 items, 90000 triggers, incoming data ~5Gb per day [Debian Wheezy + pgpool2 + PostgreSQL 9.3 + Zabbix 2.1.6] partitioning from https://www.zabbix.com/wiki/non-english/ru/partitioning_in_postgresql ; Current DB size: 380 Gb)

Patch: http://pastebin.com/kzbXxGy9

Variables

// partitions.history_uint_p2013_01_02
// |          |           | \ PARTITION_DATEFORMAT
// |          |           \ PARTITION_DATE_PREFIX
// |          \ normal table name
// \ PARTITION_SCHEMA
define('PARTITION_SCHEMA', 'partitions');
define('PARTITION_TABLES_DAILY', '"history" "history_log" "history_str" history_text" "history_uint"');
define('PARTITION_DATE_PREFIX', '_p');
define('PARTITION_DATEFORMAT', 'Y_m_d');
define('LATEST_DATA_MULTIPLIER', 3);
  • PARTITION_SCHEMA [string] - schema's name in DB with partition tables
  • PARTITION_TABLES_DAILY [string] - which tables exactly we must use with patch
  • PARTITION_DATE_PREFIX [string] - chars between table name and date prefix
  • PARTITION_DATEFORMAT [string] - date format for your tables ( syntax - http://php.net/manual/en/function.date.php )
  • LATEST_DATA_MULTIPLIER [unsigned integer] - "clock" limit for SQL query (this variable multiplied on delay of each item).
    ... WHERE clock >= <NOW> - ( <ITEM.DELAY> * LATEST_DATA_MULTIPLIER )
        AND clock < <NOW>
    

Tests with latests data for 50000 items:

Env Time for generate "Latests data" page
Zabbix 2.1.6 163 s
Zabbix 2.2 118 s
Zabbix 2.[1|2]+ patch without partitions variables ~10-12 s
Zabbix 2.[1|2]+ patch with partitions variables ~3-4 s
Comment by richlv [ 2014 Jan 08 ]

ZBX-7205 might have the same cause
dotneft not quite right. It can be when a lot of queries to database as well or when PHP selects a lot of data from database.

Comment by Eduardo Ruiz [ 2014 Jan 09 ]

Stas Shaklein,

On version 2.2.1, CHistoryManager.php is almost the same, and I think there will be no problem.

However, latest.php is slightly different. What would be the best approach?

Thanks.

Comment by Stas Shaklein [ 2014 Jan 23 ]

Hello Eduardo,

As 2.2 I have tested "r41010".

 ~$ svn co svn://svn.zabbix.com/trunk zabbix-trunk_r41010 -r r41010 

Compared to the previous version, r41010 performs a separate SQL query for each item. But that's not really resolving an issue with high load.

Anyway, the best approach is different for each installation.
So, this patch not the universal answer. (But I hope, patch contain an helpful idea for the main Zabbix release)

Comment by Eduardo Ruiz [ 2014 Jan 27 ]

Stas (and folks),

I've tested the solution on 2.2.1 on a testing environment. It seems that is working without any modifications on latest.php. Tables are being created:

Table Lines
history_p2014_01_23 1258860
history_p2014_01_24 3948204
history_p2014_01_25 3954510
history_p2014_01_26 3934371
history_p2014_01_27 1834014
history_str_p2014_01_23 0
history_str_p2014_01_24 235
history_str_p2014_01_25 235
history_str_p2014_01_26 235
history_str_p2014_01_27 51
history_text_p2014_01_23 2644
history_text_p2014_01_24 11538
history_text_p2014_01_25 11565
history_text_p2014_01_26 11818
history_text_p2014_01_27 6457
history_uint_p2014_01_23 632804
history_uint_p2014_01_24 1944413
history_uint_p2014_01_25 1887237
history_uint_p2014_01_26 2023270
history_uint_p2014_01_27 963284
trends_p2014_01 1094036
trends_uint_p2014_01 262199

And they can be consulted normally with CHistoryManager.php and defines.inc.php.

Would we say that with these two modifications, Zabbix environment would be ready for partitioning?

Will Zabbix internal functions (like last(), the most used/common) correctly handle data?

Regards,

Eduardo

Comment by Dimitri Bellini [ 2014 Feb 06 ]

Hi guys,
I have tested the new Zabbix 2.2.2rc2 but still have very slow performance with a host with a lot of items, the first run sometimes is going in timeout or have high load time.
The strange thing is when i open the debug option, the time is not huge as i wait the first time i look the latest data for that host.

******************** Script profiler ********************
Total time: 8.693752
Total SQL time: 2.208125
SQL count: 4536 (selects: 4512 | executes: 24)
Peak memory usage: 104.25M
Memory limit: 1G

After the first run (MySQL Cahe?) the performance is better:

******************** Script profiler ********************
Total time: 1.133612
Total SQL time: 0.349234
SQL count: 699 (selects: 670 | executes: 29)
Peak memory usage: 27.5M
Memory limit: 1G
Comment by Ben Lambrey [ 2014 Feb 17 ]

Still a problem using 2.2.2 final.

Comment by Synthead [ 2014 Feb 26 ]

I can also verify that this is still an issue with 2.2.2. I have a Zabbix server with 570 hosts.

In my case, in "Latest Data", if I select host "all", Apache and MySQL chews on a CPU for approximately 30 seconds, then returns to the previously-selected host (instead of "all"). Due to this, I am only able to view the latest data for specific hosts via the UI. Setting the row limit in the user's profile doesn't seem to yield any change of behavior. There are no error messages in Apache's error log, either.

Comment by Aleksandrs Saveljevs [ 2014 Feb 28 ]

One of "Latest data" bugs we might wish to fix under this issue is as follows. Suppose we have just configured a numeric item and it received a value of 5. Then the change column will show a change of "+5", implying that the previous value was 0. It should not.

Comment by Rob Roovers [ 2014 Mar 10 ]

Still having issues with zabbix 2.2.2 Latest data page is blank

Comment by Oleg Ivanivskyi [ 2014 Mar 10 ]

Parameter "Dropdown first entry = none" may be used as a temporary solution (https://www.zabbix.com/documentation/2.2/manual/web_interface/frontend_sections/administration/general#gui).

Comment by Rob Roovers [ 2014 Mar 10 ]

Thanks for that I do see the dropdowns now, although if I select an single entity it doesn't show any data and the page goes blank again.

Comment by Janis Jansons [ 2014 Mar 13 ]

My page doesn't go blank, but it's soooo slow... Latest data is almost unusable after upgrade from 2.0.x

Comment by Adam K [ 2014 Mar 13 ]

I noticed that the problem was getting worse after the upgrade because the housekeeper is by default disabled in 2.2.0. The size of the history tables was just growing endlessly, and the latest data page getting slower and slower.

After re-enabling the housekeeper in administration/general settings, manually cleaning up 3 months of built up history, and rebuilding the table, the latest data page was usable again. Still not quite as good as 2.0.x though.

Comment by Ben Lambrey [ 2014 Mar 13 ]

Janis, Oleg's temp. solution worked for us.

Comment by Shane McEwan [ 2014 Mar 20 ]

I just tried Oleg's "history2" patch and it has improved things tremendously! I have a slow database so even with only 200 items on a host I couldn't use the Latest Data page. After installing Oleg's patch the page loads in a couple of seconds.

Thanks for everyone's work on this issue. We're nearly there!

Comment by Janis Jansons [ 2014 Mar 21 ]

Oleg's "history2" patch seems to make a difference. The performance is still not perfect, but way better then before the patching.

FYI the patch adds extra ":" on api/classes/managers/CHistoryManager.php line 52 which causes the php to fail executing. Just remove it manually and everything should be working fine.

Comment by larry j prikockis [ 2014 Mar 27 ]

Any chance this fix will make it into the 2.2.3 release? or is it further down the road? latest data has been painfully slow since the last upgrade. Guess I'll try the patch.

Comment by Pavels Jelisejevs (Inactive) [ 2014 Mar 28 ]

Larry, the 2.2.3 release candidate is already out and we'll need more time to figure out a better way to resolve this issue, so this problem will be fixed only in one of the future releases.

Comment by Alexey Pustovalov [ 2014 Mar 30 ]

Possible MySQL solution...

1.

CREATE TABLE item_values(itemid bigint(20), prevvalue varchar(512), prevclock int(11), lastvalue varchar(512), lastclock int(11));
ALTER TABLE item_values ADD PRIMARY KEY(itemid);

delimiter |
CREATE TRIGGER [history,history_uint,history_str,history_text,history_log]_trg BEFORE INSERT ON [history,history_uint,history_str,history_text,history_log]
  FOR EACH ROW
BEGIN
    INSERT INTO item_values (itemid,lastvalue,lastclock) VALUES (NEW.itemid,NEW.value,NEW.clock)
        ON DUPLICATE KEY UPDATE prevvalue = lastvalue, prevclock = lastclock, lastvalue = NEW.value, lastclock = NEW.clock;
END;
|

delimiter ;

[history,history_uint,history_str,history_text,history_log] - replace with table name, create the trigger for each table.

2. Apply the next patch to the frontend:

Index: api/classes/managers/CHistoryManager.php
===================================================================
--- api/classes/managers/CHistoryManager.php	(revision 43862)
+++ api/classes/managers/CHistoryManager.php	(working copy)
@@ -35,17 +35,37 @@
 	public function getLast(array $items, $limit = 1) {
 		$rs = array();
 		foreach ($items as $item) {
-			$table = self::getTableName($item['value_type']);
-			$query = DBselect(
-				'SELECT *'.
-				' FROM '.$table.' h'.
-				' WHERE h.itemid='.zbx_dbstr($item['itemid']).
-				' ORDER BY h.clock DESC',
-				$limit
-			);
-			while ($history = DBfetch($query)) {
-				$rs[$history['itemid']][] = $history;
+			if ($limit > 2) {
+				$table = self::getTableName($item['value_type']);
+				$query = DBselect(
+					'SELECT *'.
+					' FROM '.$table.' h'.
+					' WHERE h.itemid='.zbx_dbstr($item['itemid']).
+					' ORDER BY h.clock DESC',
+					$limit
+				);
+	    			while ($history = DBfetch($query)) {
+					$rs[$history['itemid']][] = $history;
+				}
+
 			}
+			else {
+				$query = DBselect(
+                                        'SELECT *'.
+                                        ' FROM item_values iv'.
+                                        ' WHERE iv.itemid='.zbx_dbstr($item['itemid'])
+                                );
+                                while ($history = DBfetch($query)) {
+                                        $rs[$history['itemid']][] = ['itemid' => $history['itemid'],
+								     'clock' => $history['lastclock'],
+								     'value' => $history['lastvalue']];
+
+					$rs[$history['itemid']][] = ['itemid' => $history['itemid'],
+                                                                     'clock' => $history['prevclock'],
+                                                                     'value' => $history['prevvalue']];
+
+                                }
+			}
 		}
 
 		return $rs;

Possible bugs:
1. Deadlocks. testes on system with 2.5m items. Without deadlocks
2. Truncated data for long values like history_text, history_log tables.

Known bugs:
1. Removed items remain in the table. To remove deleted items:

DELETE FROM item_values WHERE itemid NOT IN (SELECT itemid FROM items);

Can anybody test that solution?

Comment by Marc [ 2014 Mar 30 ]

Doesn't work in connection with PostgreSQL:

pg_query(): Query failed: ERROR: syntax error at or near "WHERE" LINE 1: ...id='577847' ORDER BY h.clock DESC LIMIT 2) as sel WHERE h2.i... ^ [latest.php:213 → CHistoryManager->getLast() → DBselect() → pg_query() in /usr/share/zabbix/include/db.inc.php:411]

Just in case this was not intended to work for MySQL only.

Comment by Alexey Pustovalov [ 2014 Mar 30 ]

Marc, please try corrected patch.

Comment by Marc [ 2014 Mar 30 ]

The corrected patch works (syntactically).

Unfortunately it doesn't have a positive effect in terms of performance (at least for my environment)
I've repeatedly refreshed Latest data for a host with 3576 items having data (3596 in total):

  • without patch ~8 sec
  • patch applied ~11 sec

Edit:

Total time: 8.064284
Total SQL time: 5.008266
SQL count: 3660 (selects: 3632 | executes: 28)
Peak memory usage: 79.75M
Memory limit: 512M
Total time: 11.609592
Total SQL time: 8.816184
SQL count: 3657 (selects: 3633 | executes: 24)
Peak memory usage: 80.25M
Memory limit: 512M
Comment by Dimitri Bellini [ 2014 Mar 31 ]

I have try to test the Alexey Patch, as first step i have created the mysql table but...

mysql> drop table item_values;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE item_values(itemid bigint(20), prevvalue varchar(512), prevclock int(11), lastvalue varchar(512), lastclock int(11));
Query OK, 0 rows affected (0.03 sec)

mysql> ALTER TABLE item_values ADD PRIMARY KEY(itemid);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> delimiter |
mysql> CREATE TRIGGER [history,history_uint,history_str,history_text,history_log]_trg BEFORE INSERT ON [history,history_uint,history_str,history_text,history_log]
    -> FOR EACH ROW
    -> BEGIN
    -> INSERT INTO item_values (itemid,lastvalue,lastclock) VALUES (NEW.itemid,NEW.value,NEW.clock)
    -> ON DUPLICATE KEY UPDATE prevvalue = lastvalue, prevclock = lastclock, lastvalue = NEW.value, lastclock = NEW.clock;
    -> END;
    -> |
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[history,history_uint,history_str,history_text,history_log]_trg BEFORE INSERT ON' at line 1

My Zabbix DB version is:

mysql> select * from dbversion;
+-----------+----------+
| mandatory | optional |
+-----------+----------+
|   2020000 |  2020001 |
+-----------+----------+
1 row in set (0.00 sec)

Suggestion?

Comment by Alexey Pustovalov [ 2014 Mar 31 ]

Dimitri, you need to create the trigger for each table. For example,

CREATE TRIGGER [history,history_uint,history_str,history_text,history_log]_trg BEFORE INSERT ON [history,history_uint,history_str,history_text,history_log]

must be replaced for each history related table, like

CREATE TRIGGER history_trg BEFORE INSERT ON history

or

CREATE TRIGGER history_uint_trg BEFORE INSERT ON history_uint
Comment by Dimitri Bellini [ 2014 Mar 31 ]

HI Alexey,
After some work... I have now tested your solution (MySQL DB patch and php file patch) and seems better than every other test, below detail:

******************** Script profiler ********************
Total time: 9.671476
Total SQL time: 1.992437
SQL count: 5718 (selects: 5690 | executes: 28)
Peak memory usage: 119.75M
Memory limit: 1G

What do you think about? Is the expected result?
Thanks so much

Comment by Alexey Pustovalov [ 2014 Apr 01 ]

Dimitri, do you see deadlocks in zabbix_server.log file?

Comment by epingnet [ 2014 Apr 28 ]

I upgrade zabbix to 2.2.3 from 2.0.11,now open monitoring--latet data is very slow, which version can fix the BUG.
Thanks

Comment by Justin McNutt [ 2014 May 03 ]

We have observed this same problem on Zabbix 2.2.x, as compared to Zabbix 2.0.x. We rebuilt our Zabbix server from scratch, so it definitely has nothing to do with the upgrade process. Our database server is MySQL 5.5.23 (RHEL 6, 64-bit) and our Zabbix servers and proxies are Zabbix 2.2.1.

The problem definitely only occurs on the Latest Data page when one or more Hosts has a large number of Items.

Does this need more testing or is there a patch that will be published in an upcoming version?

Thanks!

Comment by Bruce Reed [ 2014 May 06 ]

I have problem under 2.2.3 with latest data page query timing out and leaving me on last page. Tried Alexey's patch above and we get an empty page when hitting latest data. Had to revert.

Comment by Gertjan Awater [ 2014 May 06 ]

We have reverted to 2.0.x. because of this.
I can setup remote access to our test server which is still at 2.2.3 if that's any help.

Comment by Gerd Oberlechner [ 2014 May 12 ]

From a database esthetics view point it was a good idea to remove the lastvalue and lastclock columns from the items table, since they are nothing more but a duplication of data considering that they are also saved in the history* tables. But from the performance view the change was problematic, especially in large environments, where the N+1 query problem rockets off to new dimensions. I tried to optimize the query on my own but did not find any solutions with an acceptable complexity class (http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql).

Pavels Jelisejevs, you mentioned you will solve this in future releases. Can you tell us if this will be a 2.2.x release or 2.4?
Can you give us any insight about possible solutions?

I'm thinking about Alexey Pustovalov trigger based solution for the time being (without any knowhow about MySQLs trigger performance implications for the whole database).

Comment by Slash [ 2014 May 13 ]

We are currently using postgresql 9.3, is the trigger solution for mysql usable with postgres?

By the way, this issue is very problematic, is there any chance that a fix could be included for the next minor 2.2 release?

Comment by Paweł Sasin [ 2014 May 29 ]

A colleague of mine has some thoughts on this topic, documented here:
http://remigium.blogspot.com/2014/05/zabbix-223-and-zbx-7373.html

Comment by Pavels Jelisejevs (Inactive) [ 2014 Jun 05 ]

We're currently considering the idea of limit the history values displayed on certain pages like "Latest data" and "Overview" to 24 hours.

It would be a lot of help if someone could test these two patches: history2.1.patch and history2.2.patch. This is a simplified version of the fix: it applies a global 24 hour limit to history data everywhere. Our lab tests show good results, but we would also like to try them out on real environments. The patch can be applied to any version newer than 2.2.3. Please post any testing results here.

Comment by Aleksandrs Saveljevs [ 2014 Jun 05 ]

Will 24 hours be configurable?

Comment by Marc [ 2014 Jun 05 ]

jelisejev, just made some quick'n dirty tests on our live system.
With warm buffer caches, what means Latest data was already accessed previously, there is no difference (what I expect that way).
For initial access there appears to be a huge difference in a positive manner

Cache PHP Total time SQL time SQL count
warm unpatched 8.009978 5.056313 3659
warm patch2.1 8.397428 5.319804 4400
warm patch2.2 7.52639 4.408044 3659
cold unpatched 252.204879 248.803407 3678
cold patch2.1 9.018636 5.882898 4437
cold patch2.2 8.246099 5.260707 3676

Tests with 'warm' caches are done on the same host
Tests with 'cold' caches are done on separate hosts, of course.

All hosts have approximately 3500 items and 15 applications.
I'll keep one of the patches applied and an eye on it.

Addendum:
GNU/Linux
Zabbix 2.2.2
PostgreSQL 9.3.4
Partition range: weekly
Partition count: 11
Partition volume (float/uint): ~450/~250 million
Partition size (float/uint): ~25/~45 GB

Comment by Pavels Jelisejevs (Inactive) [ 2014 Jun 05 ]

Aleksandrs, yes, 24 will be configurable from the frontend, and 2.2 will be configurable from defines.inc.php.

Comment by Dimitri Bellini [ 2014 Jun 06 ]

Hi Pavels,
I have made some test on my Zabbix test server (zabbix 2.2.3), below the result but seems there are not so many differences.

Cache	PHP	Total time	SQL time	SQL count
warm	unpatched	2,0929	0,248218	2134
warm	patch2.2	2,2705	0,696319	2078
Comment by Krists Krigers (Inactive) [ 2014 Jun 06 ]

As discussed with sasha, merged jelisejev's patch to 2.2.4rc1 in r46266 and to 2.3.2 (trunk) in r46268.

A new development branch svn://svn.zabbix.com/branches/dev/ZBX-7373-patch was made for this patch.

Comment by Pavels Jelisejevs (Inactive) [ 2014 Jun 10 ]

(1) This needs to be document: on the "Latest data", "Overview" pages and in "Data overview" screen items we only display values for the last 24 hours.

<richlv> also whatsnew and https://www.zabbix.com/documentation/2.2/manual/web_interface/definitions

martins-v Updated documentation:

RESOLVED.

<richlv> in all the places where we say "It is also possible to change this limitation by changing the value of ZBX_HISTORY_PERIOD constant in include/defines.inc.php.", let's link to the definitions page

martins-v Done so. RESOLVED.

<richlv> thanks, that looks good - but i got confused about "item overview pages". let's explicitly mention overview page & screen element. do we need links to corresponding frontend section pages from here ?

martins-v Right, that was just copy-paste from code comments. Made more explicit now, perhaps can do without links here. RESOLVED.

<richlv> it says "for large pages of latest data" in these two pages, which seems wrong :

REOPENED

martins-v Removed the 'latest data' mention there. Hopefully that resolves it.

sasha CLOSED

Comment by Tatapoum [ 2014 Jun 11 ]

In my setup it doesn't help much. I have about 400 hosts and 100000 items for the moment. This is a distributed setup with an external Mysql DB server with 12G memory.
Opening the latest data page is VERY slow if the last selection was "all" (up to 3-4 minutes). The PHP memory gets exhausted if it is under 2G. Here are my PHP memory parameters :

    php_value max_execution_time 300
    php_value memory_limit 2048M
    php_value post_max_size 128M
    php_value upload_max_filesize 2M
    php_value max_input_time 300

If I enable the debug, the queries uses too much PHP memory and the page doesn't appears. There are no MySQL slow queries.

For example, in my setup, does it make sense to display 100000 items on one page ? Wouldn't it be better to implement some kind of paging (1000 items per page is enough) using the LIMIT clause? Sorry if my question is stupid!

Comment by Tatapoum [ 2014 Jun 11 ]

OK, as I understand it, the CHistoryManager class isn't really an issue as the getLastForItem() function can now be limited to a one day scope, which should be appropriate for most setups.
Correct me if I'm wrong, but as I said above this, the biggest issue is that the latest data page tries to fetch all the items, which is useless if you have a high number of items and no selection filter. Moreover, it's not only the calls to getLastForItem() that are long, but the first call to the item->get() API method is VERY slow if you get 100000+ items. Pagination seems to be the only possible solution. As latest.php use an API call to get the item list, we need to add pagination to item->get().

As a test, I hacked CItem.php to add support for simple pagination of the item->get() query, and also support for sorting by 'hostid', which is more relevant than 'name' or 'itemid' :

@@ -28,7 +28,7 @@

        protected $tableName = 'items';
        protected $tableAlias = 'i';
-       protected $sortColumns = array('itemid', 'name', 'key_', 'delay', 'history', 'trends', 'type', 'status');
+       protected $sortColumns = array('itemid', 'name', 'key_', 'delay', 'history', 'trends', 'type', 'status', 'hostid');

        public function __construct() {
                parent::__construct();
@@ -55,6 +55,7 @@
         * @param bool   $options['count']
         * @param string $options['pattern']
         * @param int    $options['limit']
+         * @param int    $options['offset']
         * @param string $options['order']
         *
         * @return array|int item data as array or false if error
@@ -116,6 +117,7 @@
                        'sortfield'                                     => '',
                        'sortorder'                                     => '',
                        'limit'                                         => null,
+                        'offset'                                        => 0,
                        'limitSelects'                          => null
                );
                $options = zbx_array_merge($defOptions, $options);
@@ -370,10 +372,11 @@
                }

                $itemids = array();
+                $offset = $options['offset'];
                $sqlParts = $this->applyQueryOutputOptions($this->tableName(), $this->tableAlias(), $options, $sqlParts);
                $sqlParts = $this->applyQuerySortOptions($this->tableName(), $this->tableAlias(), $options, $sqlParts);
                $sqlParts = $this->applyQueryNodeOptions($this->tableName(), $this->tableAlias(), $options, $sqlParts);
-               $res = DBselect($this->createSelectQueryFromParts($sqlParts), $sqlParts['limit']);
+               $res = DBselect($this->createSelectQueryFromParts($sqlParts), $sqlParts['limit'], $offset);
                while ($item = DBfetch($res)) {
                        if (!is_null($options['countOutput'])) {
                                if (!is_null($options['groupCount'])) {

As DBselect() already implement offsets, it is pretty trivial.

Then I modified latest.php with a static pagination :

@@ -193,7 +193,10 @@
                'filter' => array(
                        'status' => array(ITEM_STATUS_ACTIVE)
                ),
-               'preservekeys' => true
+               'preservekeys' => true,
+                'limit' => 1000,
+                'offset' => 2000,
+                'sortfield' => 'hostid'
        ));
 }

Of course, the latest data page now loads in a few seconds, even if the selection fields are set to "all". it's not perfect due to the sorting options, but it's better than nothing.

******************** Profileur de script ********************
Temps total : 6.296404
Temps SQL total : 5.470514
Comptes SQL : 1079 (sélections : 1042 | exécutions : 37)
Pic d'utilisation mémoire : 39.75M
Limite mémoire : 2048M

I know this would introduce an API change, but wouldn't it be possible to add support for paging in get() methods (maybe first in just a few methods like item->get()) ?

Comment by Marc [ 2014 Jul 08 ]

tatapoum, just for information:
You might be interested in ZBXNEXT-370 as well, if not already aware of.
Btw, in my case (200k items, 1.5k hosts), since 2.2.4 I can even omit the patch in ZBXNEXT-370 and just enforce 'None' resp. 'Not selected' for the Combo-Boxes in Latest data resp. Overview.

Comment by Tatapoum [ 2014 Jul 09 ]

Thank you Marc, I wasn't aware of this patch. I slightly modified it to make it work with 2.2.4 and it helps. People would now have to choose All/All explicitly, as the Latest Data page takes minutes to display with the All/All selections (300 hosts, almost 100k items checked quite often).
However it doesn't really solve the issue, it's just a hack... The API really needs support for pagination, you can't download the full items/events/etc when you display a page. It works for small environments, but are small environments the target for a complex product like Zabbix ???
AFAIK it shouldn't be very difficult to implement, even through the result isn't perfect. I'd like to help, but I'm not sure if Zabbix developers are willing to include patches to the API.

Comment by Thomas Daugherty [ 2014 Jul 25 ]

I see this issue as having been marked resolved, stemming from the change to limit the latest data page to 24 hours. That may solve the issue for smaller setups, but for larger setups we still can't use the latest data page to display large groups of host's data. With the filter set to all/all it takes nearly 10 minutes to load, with the httpd processes' memory high water mark of nearly 2.5GB... The latest data page worked fine prior to 2.2.

Are there plans to revisit this issue?

Comment by Slash [ 2014 Jul 25 ]

I have the same issue as Thomas, for me the latest data page is still unusable. I even changed the limit from 24 hours to 1 hour but it still doesn't work for some group with many hosts.

Could you please reopen this ticket?

Comment by Rik Bruggink [ 2014 Jul 29 ]

Latest data page stil unusable on all settings in 2.2.5. Page load time in 2.2.1 was 27 sec load time in 2.2.5 28 secs vs 2.0.4 5 secs.
Could you please reopen this issue.

Comment by Rik Bruggink [ 2014 Jul 29 ]

loadtime on 2.2.5

Comment by Slash [ 2014 Jul 29 ]

Zabbix 2.2.4, trying to display latest data for a group with many hosts.

Comment by David Parker [ 2014 Aug 07 ]

Quick patch for 2.2.4 UI to disable All in "Latest Data" - this is just a hack. With this you can retain the ability to select all in e.g. Events without the liability of having users be able to select All/All in latest data.

— latest.php.stock 2014-08-07 10:02:53.708501638 -0500
+++ latest.php 2014-08-07 10:02:13.756489939 -0500
@@ -139,7 +139,8 @@
'with_monitored_items' => true
),
'hostid' => getRequest('hostid', null),

  • 'groupid' => getRequest('groupid', null)
    + 'groupid' => getRequest('groupid', null),
    + 'config' => array('deny_all' => 1)
    ));

$_REQUEST['groupid'] = $pageFilter->groupid;

and for overview:

— overview.php.stock 2014-08-07 10:03:11.996506959 -0500
+++ overview.php 2014-08-07 09:54:40.324349339 -0500
@@ -76,6 +76,7 @@
'applications' => array('templated' => false),
'hostid' => get_request('hostid', null),
'groupid' => get_request('groupid', null),
+ 'config' => array('deny_all' => 1),
'application' => get_request('application', null)
));

Comment by Marc [ 2014 Aug 23 ]

The previous patch snippet is actually not complete.
See zbxNext370-light.patch found in ZBXNEXT-370

Comment by iain wright [ 2014 Oct 01 ]

This is impacting us as well, significantly slower latest data on 2.2.3, will 2.4.0 help?

Comment by richlv [ 2016 Feb 23 ]

looks like this issue was handled in a very confusing way. there was something changed for 2.2.1 - but then, as per the comments, limiting to 24 hours was introduced in "2.2.4rc1 in r46266 and to 2.3.2 (trunk) in r46268"

i'd suggest adding those two versions to the "fix version" field.

Comment by dimir [ 2018 Nov 22 ]

Alternative solution proposed here.

Generated at Fri Apr 19 03:23:02 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.