[ZBX-18905] mess of "Got timeout reading communication packets" messages in mysql errors log Created: 2021 Jan 22 Updated: 2024 Apr 10 Resolved: 2023 Nov 10 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | None |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Oleksii Zagorskyi | Assignee: | Konstantins Prutkovs (Inactive) |
Resolution: | Duplicate | Votes: | 7 |
Labels: | timeout | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() ![]() ![]() |
||||
Issue Links: |
|
||||
Team: | |||||
Sprint: | Technical backlog |
Description |
First of all this is not a bug report, but just a report, useful to share to everyone. There are 4 small installations, which were recently upgraded to zabbix v5.0 and as a requirement, MariaDB instances were also upgraded from 5.5 to 10.5.8 or 10.3.27 We started to spot errors in mysql's error.log file time to time: Aborted connection 48 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got timeout reading communication packets) (note - here host: should be IP of remote side host. In my case 'localhost' is because of local connection) Long troubleshooting showed something what makes sense to share here. There was a change in mysql's core for logging level, defined by "log_warnings" parameter (log_error_verbosity in MySQL 8.0, with nuances). The same thing happened for MariaDB, for versions 10.2 As for MySQL v8.0, it dropped support of "log_warnings", and replaced it by "log_error_verbosity" setting default to 2:
So, that's a mess with conditions, when the errors may appear in mysql error log. But it's strange thing why zabbix does not log any errors in own log, right? To play with the parameters you can use these commands: set global log_warnings=2; set global wait_timeout=30; show global variables like 'log_warnings'; show global variables like 'wait_timeout'; (for mysql v8.0 log_error_verbosity instead*=3* of log_warnings=2 and other values) MariaDB: 2021-01-21 18:40:39 16 [Warning] Aborted connection 16 to db: '5.2' user: 'root' host: 'localhost' (Got timeout reading communication packets) zabbix log is silent and zabbix daemon, when needed, recreates a new connection and then sends query: while MySQL sends FIN packet and with increased log_error_verbosity=3 logs these messages: 2021-01-21T14:14:30.966485Z 140 [Note] [MY-013417] [Server] The wait_timeout period was exceeded, the idle time since last command was too long. 2021-01-21T14:14:30.966521Z 140 [Note] [MY-010914] [Server] Aborted connection 140 to db: '5.2' user: 'root' host: 'localhost' (Got timeout reading communication packets). and in this case zabbix tries to send query to the same TCP connection, which was closed already, it receives RST and only then recreates a new connection and resend the query: 3746025:20210121:192540.063 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select clock,ns,value from history where itemid=32634 and clock>1611249940] Zabbix_server is compiled with MariaDB client library and used it during operations, so it may cause difference how client is threading such network communication. But as zabbix packages are compiled with MariaDB and zabbix users may used different DB servers, that's an interesting result to describe here. TCP dumps (manually filtered for 2 session of one zabbix process) attached as well, just in case. I did test for MySQL v8.0 only, I do not exclude that MySQL v5.x may have different behavior. |
Comments |
Comment by jrdnyquist [ 2021 Jan 27 ] |
We've run into this issue as well. We thought we had resolved it because the AWS storage (used by mariadb) was out of burst iops. We increased the burst iops @ aws by making the volume larger and the errors seemed to mostly go away. We still see those errors in the mariadb error log although not as often now. |
Comment by Konstantins Prutkovs (Inactive) [ 2022 Jun 07 ] |
Hey, zalex_ua after some testing I can confirm this issue. However, I do have a question: how were you able to produce the following error: 3746025:20210121:192540.063 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select clock,ns,value from history where itemid=32634 and clock>1611249940] I was not able to make Zabbix loose connection during a query by changing wait_timeout. [zalex_ua] I've just got the error, so reported it in the description. |
Comment by Alex Kalimulin [ 2022 Jun 27 ] |
kprutkovs , zalex_ua so this error message does not affect Zabbix and is just a result of what timings are configured in the DB? If so then one just needs to use a sensible wait_timeout. |
Comment by Konstantins Prutkovs (Inactive) [ 2022 Jun 27 ] |
Kalimulin it does seem that is the case. Since zabbix automatically reestablishes the connection to the DB anyways. |
Comment by Oleksii Zagorskyi [ 2022 Jun 27 ] |
Kalimulin, on very first sentence of issue description i said - "this is not a bug report, but just a report". Ideally would be that each zabbix process, which communicates with DB, reads what is the "wait_timeout" (DB engine specific thing) and use it for NOP/heartbeat packets/queries. This way keeping DB connection persistent and valid. |
Comment by Oleksii Zagorskyi [ 2022 Jun 27 ] |
Another real case just has happened for me: When I pressed Test, I'm getting an unexpected error - "Permission denied": I could not get where it comes from ... I've retried the Test and it was successful. Ok, I ignored that, thinking it possibly was an issue in locally running postfix or so. An hour later I needed to look to zabbix_server.log and spot these 2 SQL errors: 7839:20220627:214241.520 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid] 7840:20220627:214324.747 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid] well, I decided to re-Test the media types, and I've got 2 more fresh errors: 7842:20220627:222528.113 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid] 7838:20220627:224058.878 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid] all these PIDs are 4 different trappers (of 5 totally started). So, my idea I shared earlier (NOP/heartbeat packets/queries) makes sense, IMO. <kprutkovs> I was unable to reproduce your issue zalex_ua. Even with low settings of 10 seconds set to wait_timeout I cannot make zabbix fail Media Type test. Can you perhaps give additional information on steps how to reproduce this particular issue? <zalex_ua> I've moved your sub-comment here, where it should bu I think. <kprutkovs> Yes I am sure that timeout was set to 10 seconds. No, no errors were seen in the zabbix_server.log unfortunately. I have set both timeout settings manually in the config and made sure to restart the mysql server. Perhaps there is some setup specifics I am missing? <zalex_ua> Not sure what is wrong on your side. Share please details about the installation: uname -a dpkg -l | egrep -i "maria|mysql" rpm -qa | egrep -i "maria|mysql" ldd /usr/sbin/zabbix_server | grep mysql <kprutkovs>here is the output from my testing environment: ❯ uname -a Linux ___machine_name___ 4.18.0-372.19.1.el8_6.x86_64 #1 SMP Tue Aug 2 13:42:59 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux ❯ ldd ~/git/zabbix-18905/sbin/zabbix_server | egrep -i "maria|mysql" libmariadb.so.3 => /lib64/libmariadb.so.3 (0x00007ffae30d6000) ❯ dnf list installed | egrep -i "maria|mysql" mariadb.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-backup.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-common.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-connector-c.x86_64 3.1.11-2.el8_3 @appstream mariadb-connector-c-config.noarch 3.1.11-2.el8_3 @appstream mariadb-connector-c-devel.x86_64 3.1.11-2.el8_3 @appstream mariadb-devel.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-errmsg.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-gssapi-server.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-server.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream mariadb-server-utils.x86_64 3:10.3.35-1.module_el8.6.0+3265+230ed96b @appstream perl-DBD-MySQL.x86_64 4.046-3.module_el8.6.0+2827+49d66dc3 @appstream php-mysqlnd.x86_64 7.4.19-4.module_el8.6.0+3238+624bf8b8 @appstream I compile and run zabbix from source if you are wondering. The OS is Alma Linux. |
Comment by Konstantins Prutkovs (Inactive) [ 2022 Jun 28 ] |
zalex_ua concerning your latest comment, what do you mean exactly by "My local MySQL uses default wait_timeout=28800". Does it mean that you get this value when you execute the following commands? SHOW SESSION VARIABLES LIKE "wait_timeout"; -- 28800 SHOW GLOBAL VARIABLES LIKE "wait_timeout"; -- 28800 You can also explicitly set both wait_timeout and interactive_timeout in mysql configuration to 28800 to avoid getting this error unless zabbix has such large gaps between database manipulation. If you keep experiencing this issue then, please send my mysql configuration files if possible. <zalex_ua> The sentence "My local MySQL uses default wait_timeout=28800" means:
Right, the 28800 is returned when reading the global variable wait_timeout. And believe me - it's default for 98% of all zabbix installations I've seen for many years. There are ~1% of cases when people decreases it (which we don't recommend) and probably ~0.1% of cases when people increase it to 86400. I know I can increase the parameter. But in that particular case increasing it even to 259200+ (3 days+) would not guarantee that I'd not had the issue. <kprutkovs> Is there any firewall or network device that would timeout connection between Mysql database and Zabbix server, producing said error in the log file? <zalex_ua> I don't think we can answer here. Customer's issue has been closed already. |
Comment by Konstantins Prutkovs (Inactive) [ 2023 Nov 10 ] |
Tested, this issue is resolved as a part of implementation done in |
Comment by Konstantins Prutkovs (Inactive) [ 2023 Nov 10 ] |
This issue is resolved as a part of fix done in |