[ZBX-18971] improve error messages in case of unrecoverable errors from mysql Created: 2021 Feb 04 Updated: 2024 Apr 10 Resolved: 2021 Nov 30 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | None |
Fix Version/s: | 6.0 (plan) |
Type: | Documentation task | Priority: | Minor |
Reporter: | Oleksii Zagorskyi | Assignee: | Artjoms Rimdjonoks |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
||||
Team: | |||||
Sprint: | Sprint 76 (May 2021), Sprint 77 (Jun 2021), Sprint 78 (Jul 2021), Sprint 79 (Aug 2021), Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021) | ||||
Story Points: | 1 |
Description |
Take a look to this log: 27680:20210114:171831.958 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;] 27669:20210114:172217.750 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;] 27673:20210114:173031.830 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;] 27489:20210114:175119.574 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;] 27443:20210114:180321.494 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;] 27722:20210114:182528.661 [Z3005] query failed: [2013] Lost connection to MySQL server during query [commit;] 27722:20210114:182528.661 [Z3005] query failed: [2006] MySQL server has gone away [rollback;] 27407:20210114:182528.662 [Z3005] query failed: [1053] Server shutdown in progress [select h.hostid,h.host,h.name,t.httptestid,t.name,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host,t.delay from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1610648728 and mod(t.httptestid,50)=33 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)] 27386:20210114:182528.667 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select type,itemid from httptestitem where httptestid=962] 27395:20210114:182528.672 [Z3005] query failed: [1053] Server shutdown in progress [select type,itemid from httpstepitem where httpstepid=873] 27669:20210114:182528.672 [Z3005] query failed: [2013] Lost connection to MySQL server during query [update hosts set errors_from=0,disable_until=0 where hostid=10647] 27737:20210114:182528.674 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select distinct t.triggerid,t.description,t.expression,t.status,t.type,t.priority,t.comments,t.url,t.recovery_expression,t.recovery_mode,t.correlation_mode,t.correlation_tag,t.manual_close,t.opdata,t.discover from triggers t,functions f,items i,item_discovery id where t.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=54872] 27429:20210114:182528.674 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;] 27722:20210114:182528.697 [Z3001] connection to database 'zabbix' failed: [9002] Some errors occured. 27722:20210114:182528.697 Cannot connect to the database. Exiting... 27367:20210114:182528.703 One child process died (PID:27722,exitcode/signal:1). Exiting ... 27367:20210114:182528.865 [Z3001] connection to database 'zabbix' failed: [9002] Some errors occured. 27367:20210114:182528.865 Cannot connect to the database. Exiting... 8144:20210114:182533.144 Starting Zabbix Server. Zabbix 5.0.3 (revision 146855bff3). 8144:20210114:182533.144 ****** Enabled features ****** 8144:20210114:182533.144 SNMP monitoring: YES 8144:20210114:182533.144 IPMI monitoring: YES 8144:20210114:182533.144 Web monitoring: YES 8144:20210114:182533.144 VMware monitoring: YES 8144:20210114:182533.144 SMTP authentication: YES 8144:20210114:182533.144 ODBC: YES 8144:20210114:182533.144 SSH support: YES 8144:20210114:182533.144 IPv6 support: YES 8144:20210114:182533.144 TLS support: YES 8144:20210114:182533.144 ****************************** 8144:20210114:182533.144 using configuration file: /etc/zabbix/zabbix_server.conf 8144:20210114:182533.189 [Z3001] connection to database 'zabbix' failed: [9002] Some errors occured. 8144:20210114:182533.189 Cannot connect to the database. Exiting... Looks like it caused by this specific error: I looked to related code and it looks to a bit strange /****************************************************************************** * * * Function: DBconnect * * * * Purpose: connect to the database * * * * Parameters: flag - ZBX_DB_CONNECT_ONCE (try once and return the result), * * ZBX_DB_CONNECT_EXIT (exit on failure) or * * ZBX_DB_CONNECT_NORMAL (retry until connected) * * * * Return value: same as zbx_db_connect() * * * ******************************************************************************/ int DBconnect(int flag) { int err; zabbix_log(LOG_LEVEL_DEBUG, "In %s() flag:%d", __func__, flag); while (ZBX_DB_OK != (err = zbx_db_connect(CONFIG_DBHOST, CONFIG_DBUSER, CONFIG_DBPASSWORD, CONFIG_DBNAME, CONFIG_DBSCHEMA, CONFIG_DBSOCKET, CONFIG_DBPORT, CONFIG_DB_TLS_CONNECT, CONFIG_DB_TLS_CERT_FILE, CONFIG_DB_TLS_KEY_FILE, CONFIG_DB_TLS_CA_FILE, CONFIG_DB_TLS_CIPHER, CONFIG_DB_TLS_CIPHER_13))) { if (ZBX_DB_CONNECT_ONCE == flag) break; if (ZBX_DB_FAIL == err || ZBX_DB_CONNECT_EXIT == flag) { zabbix_log(LOG_LEVEL_CRIT, "Cannot connect to the database. Exiting..."); exit(EXIT_FAILURE); } sorry, I could be wrong as I'm not programmer, but it looks strange for me. It's not very clear why zabbix server self-terminated, while we know that usually it should try to reconnect after 10 seconds. |
Comments |
Comment by Artjoms Rimdjonoks [ 2021 May 17 ] |
Investigation The error message that gets observed in the Zabbix logs: "[9002] Some errors occured." is what the DB sends to the mysql connector library (and then to Zabbix) when DB refuses to setup a connection: ./src/libs/zbxdb/db.c: if (ZBX_DB_OK == ret && NULL == mysql_real_connect(conn, host, user, password, dbname, port, dbsocket, CLIENT_MULTI_STATEMENTS)) { zbx_db_errlog(ERR_Z3001, mysql_errno(conn), mysql_error(conn), dbname); ret = ZBX_DB_FAIL; } 9002 error code is not defined in the mysql source code or Zabbix, it is Azure specific. Zabbix or MySQL connector has no other data available that could provide a hint why the connection was not successful. |
Comment by dimir [ 2021 May 17 ] |
Can we detect 9002 and add a hint to the log message to get more details from Azure logs? arimdjonoks We can detect 9002 - but I would rather avoid writing the code around this because: It is easy to add this code, but testing and maintaining it would be quite expensive. If the issue repeats - I would rather prefer to mention this error in the "Known Issues". (this is really not Zabbix fault that Azure has such vague errors) andris: The current message "[9002] Some errors occured" may make user to feel helpless. Seems like "[9002]" can come with other text messages, not only with this "Some errors occured". I propose to detect this code, log what we got from Azure and append our hint like "See Zabbix documentation "Known issues"". There, in Zabbix online documentation, we can describe and maintain everything we know about the error 9002, without changing Zabbix source code. Another thing - is it the best action to terminate Zabbix server (instead of endless retrying) if error 9002 shows up? zalex_ua ? zalex_ua I do not know about the 9002 more than you, sorry. Google is our the only help here. I do not know is it required to terminate on the error or we could repeat. |
Comment by Alexander Vladishev [ 2021 Nov 30 ] |
Documentation updated: |