[ZBX-16020] Uncontrolled memory allocation in Zabbix preprocessing Created: 2019 Mar 22  Updated: 2024 Apr 10  Resolved: 2019 Apr 30

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 4.0.7rc1, 4.2.1rc1, 4.4.0alpha1, 4.4 (plan)

Type: Problem report Priority: Major
Reporter: Vjaceslavs Bogdanovs Assignee: Vjaceslavs Bogdanovs
Resolution: Fixed Votes: 0
Labels: crash, error, memory, preprocessing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Team: Team C
Team: Team C
Sprint: Sprint 50 (Mar 2019), Sprint 51 (Apr 2019)
Story Points: 2.5

 Description   

Zabbix Admin can execute preprocessing testing request that will terminate Zabbix Server.

The problem is produced by the fact that buffer for input buffer has limitations and data put into DB has limitations (based on type), but preprocessing allows to request allocation of huge memory chunks that will cause Server to run out of memory:

6621:20190322:085652.148 server #42 started [preprocessing worker #2]
6568:20190322:085933.479 slow query: 8.262455 sec, "insert into history (itemid,clock,ns,value) values (23664,1553237964,474102852,1.655466);"
6570:20190322:085948.111 slow query: 4.414272 sec, "begin;"
6566:20190322:085956.863 slow query: 6.664379 sec, "select h.hostid,h.host,h.name,t.httptestid,t.name,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host,t.delay from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1553237990 and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)"
6565:20190322:085956.959 slow query: 5.726945 sec, "begin;"
6617:20190322:085956.971 slow query: 4.794837 sec, "select m.mediaid,m.mediatypeid,m.sendto from media m,users_groups u,config c,media_type mt where m.userid=u.userid and u.usrgrpid=c.alert_usrgrpid and m.mediatypeid=mt.mediatypeid and m.active=0 and mt.status=0"
6570:20190322:085957.017 slow query: 7.376471 sec, "insert into history (itemid,clock,ns,value) values (28536,1553237976,635758835,7.182663),(23256,1553237976,635790080,2.416357),(23257,1553237979,528970595,20.681284),(23258,1553237979,529017490,0.000000),(28538,1553237979,529066401,0.000000),(23259,1553237979,529344495,5.184401);"
6573:20190322:085957.056 slow query: 21.368017 sec, "insert into history (itemid,clock,ns,value) values (23252,1553237973,391629034,0.000000);"
6567:20190322:085957.101 slow query: 5.095014 sec, "select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and r.nextcheck<=1553237992 and mod(r.druleid,1)=0"
6560:20190322:085959.381 slow query: 7.169354 sec, "select refresh_unsupported,discovery_groupid,snmptrap_logging,severity_name_0,severity_name_1,severity_name_2,severity_name_3,severity_name_4,severity_name_5,hk_events_mode,hk_events_trigger,hk_events_internal,hk_events_discovery,hk_events_autoreg,hk_services_mode,hk_services,hk_audit_mode,hk_audit,hk_sessions_mode,hk_sessions,hk_history_mode,hk_history_global,hk_history,hk_trends_mode,hk_trends_global,hk_trends,default_inventory_mode,db_extension from config order by configid"
6560:20190322:090005.151 slow query: 3.140640 sec, "select hostmacroid,hostid,macro,value from hostmacro"
6609:20190322:090007.260 slow query: 5.150021 sec, "begin;"
6560:20190322:090010.666 slow query: 4.219534 sec, "select hostid,proxy_hostid,host,ipmi_authtype,ipmi_privilege,ipmi_username,ipmi_password,maintenance_status,maintenance_type,maintenance_from,errors_from,available,disable_until,snmp_errors_from,snmp_available,snmp_disable_until,ipmi_errors_from,ipmi_available,ipmi_disable_until,jmx_errors_from,jmx_available,jmx_disable_until,status,name,lastaccess,error,snmp_error,ipmi_error,jmx_error,tls_connect,tls_accept,proxy_address,auto_compress,maintenanceid from hosts where status in (0,1,5,6) and flags<>2"
6609:20190322:090014.796 slow query: 7.526507 sec, "update hosts set disable_until=1553238062 where hostid=10084"
6622:20190322:090038.740 [file:variant.c,line:105] zbx_strdup: out of memory. Requested 12289039818 bytes.
6556:20190322:090039.363 One child process died (PID:6622,exitcode/signal:1). Exiting ...
zabbix_server [6556]: Error waiting for process with PID 6622: [10] No child processes
6556:20190322:090039.395 syncing history data...
6556:20190322:090039.395 syncing history data done
6556:20190322:090039.395 syncing trend data...
6556:20190322:090039.404 syncing trend data done
6556:20190322:090039.404 Zabbix Server stopped. 

The simpliest use-case to reproduce this error:

  1. Create item with preprocessing.
  2. Add 10 same preprocessing steps (Regular expression, Pattern set to "(.*)" (without quotes) and Output set to "\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1").
  3. Execute test with some data.
  4. Observe that connection timed out and Server is dead in a few seconds.

Buffer limitation (or any other check) should be introduced.



 Comments   
Comment by Vjaceslavs Bogdanovs [ 2019 Apr 05 ]

Available in:

  • 4.0.7rc1 r92140
  • 4.2.1rc1 r92141
  • 4.4.0alpha1 (trunk) r92142
Generated at Sat Apr 20 04:39:15 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.