[ZBX-12919] Preprocessing Manager - extreme memory usage Created: 2017 Oct 22 Updated: 2017 Nov 06 Resolved: 2017 Nov 06 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.4.2, 3.4.3 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Critical |
Reporter: | Andreas Biesenbach | Assignee: | Unassigned |
Resolution: | Workaround proposed | Votes: | 0 |
Labels: | memory, preprocessing, zabbix_server | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Operating system: Red Hat Enterprise Linux Server release 7.4 (Maipo) Latest software as well as firmware updates applied. |
Attachments: | Dataloss.png Status_of_zabbix.png Zabbix_environment.PNG atop_preprocessingcpu.PNG atop_preprocessingmem.PNG oom_report.png solved_internal_processes.PNG solved_preprocessing_queue.PNG zabbix_server.conf.png |
Description |
Hi all, since we updated our Zabbix Server to version 3.4 we are having a lot of trouble regarding the memory usage of preprocessing manager. The memory usage of this process sporadically raises up to 90% of total server memory and results in OOM process kills on OS level -> In most cases the server itself or the database are beeing killed with a loss of data in result. Strange fact: This mostly happens on saturdays (I already checked cron-/anacrontab but couldn't find anything). Steps to reproduce: Result:
Seems like I am not the only one with this issue: https://www.zabbix.com/forum/showthread.php?p=203975#post203975 Tomorrow I will raise the debug level for preprocessing process as mentioned in Thanks in advance! |
Comments |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 22 ] |
You may need to increase the number of preprocessing workers. Maybe you have a lot of items with "heavy" preprocessing options scheduled on Saturdays and default StartPreprocessors=3 can't cope with that. But maybe the root cause is slow DB, I can't say for sure because you have old graph of process busyness which does not feature preprocessing manager and workers. Could you import the latest version of Template App Zabbix Server from here and upload more graphs afterwards? |
Comment by Andreas Biesenbach [ 2017 Oct 23 ] |
First of all: Thanks for you fast reply!
It looks like you are right. The MySQL backup is running locally since we do not have a replication slave. As soon as the backup starts the preprocessing manager queue gets higher and higher. I didn't think that this would happen when running mysqldump with the "--single-transaction" option. /bin/mysqldump -uroot --single-transaction --routines --triggers --events --log-error=$backup_dir/${cur_date}_mysql_backup.log zabbix > /mysql/data_zabbix/backup/backup_zabbix_db.sql
And again it looks like you are right. Even after canceling the MySQL backup the preprocessing manager is not able to work off his queue. It is currently at ~104.000.000 items and still raising. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND I will now stop zabbix server process (even when losing some data) and change the mentioned parameter StartPreprocessors=3 to a higher value. I comment results as soon as I got them. ----------------------------------------------------------------------------------------------------------------------------------------------------------- Just as an info: I changed log level for preprocessing manager to debug. All I can see is that it is working as expected: 3217:20171023:073252.477 In preprocessor_enqueue() itemid: 4614996 Best regards |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 23 ] |
I hope that situation will recover eventually. Do not forget to update the template, there is a new item for preprocessing queue monitoring (without a trigger so far, unfortunately). |
Comment by Andris Zeila [ 2017 Oct 23 ] |
Depending on the queued values |
Comment by Andreas Biesenbach [ 2017 Oct 23 ] |
Hi all, We set StartPreprocessors to a higher value and imported new templates as mentioned: StartPreprocessors=20 I can't believe it. Zabbix server is now running stable. Even with MySQL backup running the queue permanently stays at 0 items. The only thing that now might cause errors are the history syncers, because their pre-select might take much longer during MySQL backup which results in higher history cache usage - but this can be fixed by increasing the history cache. Thanks a lot!! Even if I will have an eye on it the next few days I am optimistically that the OOM situations are a thing of the past now
|
Comment by Ingus Vilnis [ 2017 Nov 06 ] |
Hi Andreas, From the last screenshots looks like your issue is resolved just by some tuning. I will therefore close this ticket as Workaround proposed. Please reopen or create a new ticket if the problems with preprocessing still exist. |