[ZBX-18905] mess of "Got timeout reading communication packets" messages in mysql errors log Created: 2021 Jan 22  Updated: 2024 Apr 10  Resolved: 2023 Nov 10

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Oleksii Zagorskyi Assignee: Konstantins Prutkovs (Inactive)
Resolution: Duplicate Votes: 7
Labels: timeout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mariadb-wait_timeout-reached.png     PNG File mysql-wait_timeout-reached.png     PNG File unexpected-error-zalex.png     File zabbix-mariadb-zab_is_silent-ses35_38-pid_3744104.pcapng     File zabbix-mysql-zab_with_errors-2streams-35-21.pcapng    
Issue Links:
Duplicate
Team: Team C
Sprint: Technical backlog

 Description   

First of all this is not a bug report, but just a report, useful to share to everyone.

There are 4 small installations, which were recently upgraded to zabbix v5.0 and as a requirement, MariaDB instances were also upgraded from 5.5 to 10.5.8 or 10.3.27
Some details:
All zabbix processes are almost flat near 0% busy level. I assume that a little busy level of processes, where many of them are sleeping most of the time, increases probability that some particular processes did not need to contact DB server for time, longer than mysql's wait_timeout.

We started to spot errors in mysql's error.log file time to time:

Aborted connection 48 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got timeout reading communication packets)

(note - here host: should be IP of remote side host. In my case 'localhost' is because of local connection)
In the same time we don't see any errors in zabbix_server.log.

Long troubleshooting showed something what makes sense to share here.

There was a change in mysql's core for logging level, defined by "log_warnings" parameter (log_error_verbosity in MySQL 8.0, with nuances).
Default was 1 before MySQL 5.7.2, 2 as of 5.7.2, which became 2 by default which caused the errors logging.
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_log_warnings

The same thing happened for MariaDB, for versions 10.2
https://mariadb.com/kb/en/upgrading-from-mariadb-101-to-mariadb-102/

As for MySQL v8.0, it dropped support of "log_warnings", and replaced it by "log_error_verbosity" setting default to 2:
https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_log_error_verbosity
while in version 5.7.2 it was already supported with another default value - 3 which with following logic:

In MySQL 5.7.2 and higher, use of log_warnings is still permitted but maps onto use of log_error_verbosity as follows:
Setting log_warnings=0 is equivalent to log_error_verbosity=1 (errors only).
Setting log_warnings=1 is equivalent to log_error_verbosity=2 (errors, warnings).
Setting log_warnings=2 (or higher) is equivalent to log_error_verbosity=3 (errors, warnings, notes)

So, that's a mess with conditions, when the errors may appear in mysql error log.

But it's strange thing why zabbix does not log any errors in own log, right?
And here an interesting difference observed between MariaDB v10.3.25 and MySQL v8.0:

To play with the parameters you can use these commands:

set global log_warnings=2;
set global wait_timeout=30;
show global variables like 'log_warnings';
show global variables like 'wait_timeout';

(for mysql v8.0 log_error_verbosity instead*=3* of log_warnings=2 and other values)

MariaDB:
After wait_timeout it sends single RST packet (use tcpdump to see it) and logs this in errors log:

2021-01-21 18:40:39 16 [Warning] Aborted connection 16 to db: '5.2' user: 'root' host: 'localhost' (Got timeout reading communication packets)

zabbix log is silent and zabbix daemon, when needed, recreates a new connection and then sends query:

while MySQL sends FIN packet and with increased log_error_verbosity=3 logs these messages:

2021-01-21T14:14:30.966485Z 140 [Note] [MY-013417] [Server] The wait_timeout period was exceeded, the idle time since last command was too long.
2021-01-21T14:14:30.966521Z 140 [Note] [MY-010914] [Server] Aborted connection 140 to db: '5.2' user: 'root' host: 'localhost' (Got timeout reading communication packets).

and in this case zabbix tries to send query to the same TCP connection, which was closed already, it receives RST and only then recreates a new connection and resend the query:

and in this case zabbix logs an error as well:

3746025:20210121:192540.063 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select clock,ns,value from history where itemid=32634 and clock>1611249940]

Zabbix_server is compiled with MariaDB client library and used it during operations, so it may cause difference how client is threading such network communication.

But as zabbix packages are compiled with MariaDB and zabbix users may used different DB servers, that's an interesting result to describe here.

TCP dumps (manually filtered for 2 session of one zabbix process) attached as well, just in case.

I did test for MySQL v8.0 only, I do not exclude that MySQL v5.x may have different behavior.



 Comments   
Comment by jrdnyquist [ 2021 Jan 27 ]

We've run into this issue as well. We thought we had resolved it because the AWS storage (used by mariadb) was out of burst iops. We increased the burst iops @ aws by making the volume larger and the errors seemed to mostly go away. We still see those errors in the mariadb error log although not as often now.

Comment by Konstantins Prutkovs (Inactive) [ 2022 Jun 07 ]

Hey, zalex_ua after some testing I can confirm this issue. However, I do have a question: how were you able to produce the following error:

3746025:20210121:192540.063 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select clock,ns,value from history where itemid=32634 and clock>1611249940] 

I was not able to make Zabbix loose connection during a query by changing wait_timeout.

[zalex_ua]  I've just got the error, so reported it in the description.
I think it's quite possible to get such error for a processes which are not too active and may sleep for longer time (longer than wait_timeout on db server).

Comment by Alex Kalimulin [ 2022 Jun 27 ]

kprutkovs , zalex_ua so this error message does not affect Zabbix and is just a result of what timings are configured in the DB?

If so then one just needs to use a sensible wait_timeout.

Comment by Konstantins Prutkovs (Inactive) [ 2022 Jun 27 ]

Kalimulin it does seem that is the case. Since zabbix automatically reestablishes the connection to the DB anyways.

Comment by Oleksii Zagorskyi [ 2022 Jun 27 ]

Kalimulin, on very first sentence of issue description i said - "this is not a bug report, but just a report".
Note - even with sensible wait_timeout (default is 28800 - 8 hours) is still may happen on tiny installations, where processes may live longish time (longer that the wait_timeout) without real communications with DB.

Ideally would be that each zabbix process, which communicates with DB, reads what is the "wait_timeout" (DB engine specific thing) and use it for NOP/heartbeat packets/queries. This way keeping DB connection persistent and valid.

Comment by Oleksii Zagorskyi [ 2022 Jun 27 ]

Another real case just has happened for me:
A testing zabbix server 6.0 (which has many days uptime) and I needed to Test a media type from frontend.

When I pressed Test, I'm getting an unexpected error - "Permission denied":

I could not get where it comes from ... I've retried the Test and it was successful. Ok, I ignored that, thinking it possibly was an issue in locally running postfix or so.
Then I  needed to test one more time, and I again got the error, retried - successful. Hmmm ...

An hour later I needed to look to zabbix_server.log and spot these 2 SQL errors:

  7839:20220627:214241.520 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid]
  7840:20220627:214324.747 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid]

well, I decided to re-Test the media types, and I've got 2 more fresh errors:

  7842:20220627:222528.113 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid]
  7838:20220627:224058.878 [Z3005] query failed: [4031] The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. [select u.userid,u.roleid,u.username,r.type from sessions s,users u,role r where s.userid=u.userid and s.sessionid='e231883c6543aaed4b0cd9aeb550ba6c' and s.status=0 and u.roleid=r.roleid]

all these PIDs are 4 different trappers (of 5 totally started).
I feel like 50% of my Test attempts were failed with the "Permission denied" message - it should indicate how this may be annoying.
My local MySQL uses default wait_timeout=28800.

So, my idea I shared earlier (NOP/heartbeat packets/queries) makes sense, IMO.

<kprutkovs> I was unable to reproduce your issue zalex_ua. Even with low settings of 10 seconds set to wait_timeout I cannot make zabbix fail Media Type test. Can you perhaps give additional information on steps how to reproduce this particular issue?

<zalex_ua> I've moved your sub-comment here, where it should bu I think.
Hmm, have you make sure that even from CLI your connection get expired in 10 seconds of inactivity?
With 10 seconds I think you also should see quite a lot of such errors in zabbix_server.log, from many different process types.

<kprutkovs> Yes I am sure that timeout was set to 10 seconds. No, no errors were seen in the zabbix_server.log unfortunately. I have set both timeout settings manually in the config and made sure to restart the mysql server. Perhaps there is some setup specifics I am missing?

<zalex_ua> Not sure what is wrong on your side. Share please details about the installation:

uname -a
dpkg -l | egrep -i "maria|mysql"
rpm -qa | egrep -i "maria|mysql"
ldd /usr/sbin/zabbix_server | grep mysql

<kprutkovs>here is the output from my testing environment:

❯ uname -a
Linux ___machine_name___ 4.18.0-372.19.1.el8_6.x86_64 #1 SMP Tue Aug 2 13:42:59 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

❯ ldd ~/git/zabbix-18905/sbin/zabbix_server | egrep -i "maria|mysql"
        libmariadb.so.3 => /lib64/libmariadb.so.3 (0x00007ffae30d6000)

❯ dnf list installed | egrep -i "maria|mysql"
mariadb.x86_64                              3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-backup.x86_64                       3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-common.x86_64                       3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-connector-c.x86_64                  3.1.11-2.el8_3                               @appstream
mariadb-connector-c-config.noarch           3.1.11-2.el8_3                               @appstream
mariadb-connector-c-devel.x86_64            3.1.11-2.el8_3                               @appstream
mariadb-devel.x86_64                        3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-errmsg.x86_64                       3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-gssapi-server.x86_64                3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-server.x86_64                       3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
mariadb-server-utils.x86_64                 3:10.3.35-1.module_el8.6.0+3265+230ed96b     @appstream
perl-DBD-MySQL.x86_64                       4.046-3.module_el8.6.0+2827+49d66dc3         @appstream
php-mysqlnd.x86_64                          7.4.19-4.module_el8.6.0+3238+624bf8b8        @appstream

I compile and run zabbix from source if you are wondering. The OS is Alma Linux.

Comment by Konstantins Prutkovs (Inactive) [ 2022 Jun 28 ]

zalex_ua concerning your latest comment, what do you mean exactly by "My local MySQL uses default wait_timeout=28800". Does it mean that you get this value when you execute the following commands?

SHOW SESSION VARIABLES LIKE "wait_timeout"; -- 28800
SHOW GLOBAL VARIABLES LIKE "wait_timeout"; -- 28800

You can also explicitly set both wait_timeout and interactive_timeout in mysql configuration to 28800 to avoid getting this error unless zabbix has such large gaps between database manipulation. If you keep experiencing this issue then, please send my mysql configuration files if possible.

<zalex_ua> The sentence "My local MySQL uses default wait_timeout=28800" means:

  • DB server is MySQL and it's running locally on my laptop, where zabbix 6.0 is running too, unixsocket is used for communication. It means that network could not be a reason of DB connection interruption is any way. So the zabbix_server was affected by wait_timeout only.

Right, the 28800 is returned when reading the global variable wait_timeout. And believe me - it's default for 98% of all zabbix installations I've seen for many years. There are ~1% of cases when people decreases it (which we don't recommend) and probably ~0.1% of cases when people increase it to 86400.

I know I can increase the parameter. But in that particular case increasing it even to 259200+ (3 days+) would not guarantee that I'd not had the issue.

<kprutkovs> Is there any firewall or network device that would timeout connection between Mysql database and Zabbix server, producing said error in the log file?

<zalex_ua> I don't think we can answer here. Customer's issue has been closed already.

Comment by Konstantins Prutkovs (Inactive) [ 2023 Nov 10 ]

Tested, this issue is resolved as a part of implementation done in ZBX-23145

Comment by Konstantins Prutkovs (Inactive) [ 2023 Nov 10 ]

This issue is resolved as a part of fix done in ZBX-23145

Generated at Fri Apr 25 07:33:27 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.