[ZBX-14535] configuration.import consumes massive memory and takes too long Created: 2018 Jun 27 Updated: 2018 Dec 21 Resolved: 2018 Dec 21 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | API (A) |
Affects Version/s: | 3.4.10 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Critical |
Reporter: | Larry Dorman | Assignee: | Edgars Melveris |
Resolution: | Duplicate | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
centos 7, php memory_limit 8G, php and apache timeouts set to 10 minutes. |
Attachments: |
![]() ![]() |
||||||||
Issue Links: |
|
Description |
Using API configuration.export and configuration.import so that I can change control all of my template changes and automate pushing them to multiple zabbix instances to keep them all in-sync. On my alpha and beta instances this works great as there are very few hosts defined. I'm now implementing my production instance that has ~700 linux hosts defined and will probably have twice that in a few days. With a php memory limit of 8g the import fails because it runs out of memory. Bump the memory limit to 16g and rerun and the process fails at about the same place because it runs out of time. I could still bump the memory and timeouts a little higher, but the reality is that I'm about to double the number of hosts using this template and I will just be right back to this same problem. This cannot be the intended design/behavior of this API call. Doing the import through the front-end GUI works fine, but it also doesn't use the API and instead does direct DB access. This design flaw / bug pretty much makes it impossible to automate template imports. Please let me know if you need additional information. |
Comments |
Comment by Aigars Kadikis [ 2018 Jun 28 ] |
Hi, Larry! As a tiny workaround, you can use config backup via MySQL: #!/bin/bash time=$(date +%Y%m%d%H%M) echo creating backup of config.. sudo mysqldump -uroot -ppassword \ --flush-logs \ --single-transaction \ --create-options \ --ignore-table=zabbix.acknowledges \ --ignore-table=zabbix.alerts \ --ignore-table=zabbix.auditlog \ --ignore-table=zabbix.auditlog_details \ --ignore-table=zabbix.escalations \ --ignore-table=zabbix.events \ --ignore-table=zabbix.history \ --ignore-table=zabbix.history_log \ --ignore-table=zabbix.history_str \ --ignore-table=zabbix.history_str_sync \ --ignore-table=zabbix.history_sync \ --ignore-table=zabbix.history_text \ --ignore-table=zabbix.history_uint \ --ignore-table=zabbix.history_uint_sync \ --ignore-table=zabbix.profiles \ --ignore-table=zabbix.service_alarms \ --ignore-table=zabbix.sessions \ --ignore-table=zabbix.trends \ --ignore-table=zabbix.trends_uint \ --ignore-table=zabbix.user_history \ --ignore-table=zabbix.node_cksum zabbix | bzip2 -9 > /root/$time.configuration.only.bz2 Not sure if it suits exactly your need. One thing comes to mind regarding to API solution. Please set LogLevel debug in /etc/httpd/conf.d/zabbix.conf <VirtualHost *:80> DocumentRoot /usr/share/zabbix LogLevel debug ... Restart httpd Then execute your API calls. Please archive and attach /var/log/httpd/error_log Best regards, |
Comment by Larry Dorman [ 2018 Jul 02 ] |
I'm not sure what I was trying to do nor the actual problem was fully understood. I'll see if I can put together a simple process that will let you reproduce the problem and see it first-hand. I'll also get you the requested log... Please be patient; I found this problem in the middle of migrating one of my 2.4.x production Zabbix instances to a new 3.4.x instance so I'm quite busy and have to put first priority on the migration. Thanks. |
Comment by Larry Dorman [ 2018 Jul 26 ] |
Finally got a chance to put together some code to make this easy to reproduce and visualize... Created a clean system with only two hosts on it initially. (Default Zabbix server host and installed Zabbix Agent on server and setup as a second host so that I could monitor memory on server.) The attached code creates a specified number of hosts (800 hosts as supplied) that use the stock Linux OS template. You will want to edit doit.sh and update the first variables to be appropriate for your setup. I did this code as a simple bash script to that it's easy to reproduce. My production environment is using ruby to do the same calls, so it's clearly not a client side issue. With my php memory_limit set to 8G (yes, gigabytes) and the timeout set to 10 minutes that almost all memory is exhausted and the timeout is reached. I have over 1500 hosts with the Linux template on one of my instances and giving it more time and memory simply isn't an option. It's generally considered a code defect a memory_limit of even 500 mb is reached. zabbix_config_import_problem.zip Host 1 created... Host 800 created...
|
Comment by Larry Dorman [ 2018 Jul 26 ] |
Side note... setting LogLevel debug does not result in any activity in /var/log/httpd/error_log. |
Comment by Edgars Melveris [ 2018 Sep 27 ] |
Hello Larry! I was not able to reproduce the problem with your code. Took a clean appliance with version 3.4.12. Only modification I had to do was increase CacheSize to run your script. The result was: Host 799 created... Host 800 created... ============================================================================= Starting template import... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 51085 0 0 100 51085 0 41236 0:00:01 0:00:01 --:--:-- 41264 Template import completed... ============================================================================= Elapsed time: 2 seconds I also did not see any change in available memory: |
Comment by Aigars Kadikis [ 2018 Oct 08 ] |
Closing this as Cannot Reproduce. |
Comment by Larry Dorman [ 2018 Nov 28 ] |
Edgars - did you actually check that the hosts got created? There is exactly zero chance that the template updated for 800 hosts in 2 seconds. I'd be happy to work with you to help you reproduce this problem. Others on the Zabbix IRC channel have also reproduced this. |
Comment by Larry Dorman [ 2018 Nov 28 ] |
I will help you reproduce this problem... I checked back frequently after submitting and there was no activity. Now I check back again and find that you've closed it. This is a serious bug that deserves a real effort. The example code I submitted doesn't do any error checking, so my guess is it didn't actually create the hosts on your system and therefore wasn't a valid test. Send me an email - zabbix <at> bestfx.net and I'll work with you on this... |
Comment by Edgars Melveris [ 2018 Dec 03 ] |
Hello Larry, yes, the hosts where created. Creating the hosts actually took a bit longer, but that output was for template import. But, as it turns out, I must have already created (or imported) the template, that's why it worked so fast. So I deleted the template and created a new, empty one. The import still worked ok. I had previously set the php limit to 512MB and this was enough. Host 800 created... ============================================================================= Starting template import... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 51123 0 38 100 51085 0 591 0:01:26 0:01:26 --:--:-- 0 {"jsonrpc":"2.0","result":true,"id":1} Template import completed... ============================================================================= Elapsed time: 86 seconds real 1m49.050s user 0m1.964s sys 0m1.060s
And only a very small increase in memory usage. Another problem is, that my lab instance wasn't configured to handle that much hosts and items, so pollers started to become busy, but that is a different problem. Another note - we should keep the discussion here, because, if this will be confirmed as a bug, the devs need to see all the info. ps. is the template, that is included in the archive, the one, that actually gives you problems on the production? |
Comment by Larry Dorman [ 2018 Dec 06 ] |
I'm currently on an extended vacation, so I'll be slow to respond. When I can find a little time I'll update the code to halt on any errors and provide more diagnostic info. There were a couple minor changes one of the testers had to make to get it to properly create all 800 hosts and attempt the import. The template should already exist and already be linked to all 800 hosts when the configuration.import is executed. Definitely make sure you have your php memory limit and your timeout bumped up or it may fail long before you get to the point of consuming mass memory. I never did hear why, but one person who tested this for me had to remove the beginning and end quote on the Template_OS_Linux.xml file for it to import it. You should be able to test that the .xml works for you by doing a manual import to one host. |
Comment by Edgars Melveris [ 2018 Dec 07 ] |
OK, we will wait for your input, to decide, if this is a bug in Zabbix. ps. If your still using version 3.4 I would like to suggest upgrade to 4.0.x. Version 3.4 is no longer supported, so if this is a bug, it won't be fixed in that version. |
Comment by Jan Verhaert [ 2018 Dec 07 ] |
Looks somehow related to
|
Comment by dimir [ 2018 Dec 21 ] |
Will be fixed in ZBX-7700 |