[ZBX-8545] Low performance for key net.tcp.service[] which reads /proc/net/tcp file Created: 2014 Jul 30 Updated: 2017 May 30 Resolved: 2014 Nov 26 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.0.12, 2.2.4, 2.2.5, 2.3.2 |
Fix Version/s: | 2.5.0 |
Type: | Incident report | Priority: | Critical |
Reporter: | jaseywang | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | performance, tcp | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
server: $ zabbix_agentd --version |
Description |
After putting zabbix 2.0.5 into our production env for more than one year, we found some critical issues, here is one: For us, we now use ss command to get the correct data ASAP to work around, ss doesn't need to read that file and it return the results very quickly. The impact really depends on your service running on your server, for those who don't have so many connections, no worry, but for those who have tons of connections like us, it's really critical issue cuz it usually sends false alert. At the moment, the latest stable version 2.2.5 haven't fix that, and hope you guys fix that ASAP. ref: |
Comments |
Comment by Juris Miščenko (Inactive) [ 2014 Aug 21 ] |
The use of the netlink interface has been implemented at svn://svn.zabbix.com/branches/dev/ZBX-8545. This change applies only to 1) systems using a Linux kernel starting from version 2.6.14 and 2) the net.tcp.listen item. The netlink subprotocol required for this type of diagnostic (NETLINK_INET_DIAG) was added only in the 2.6.14 kernel and unfortunately, there's no clear way of retrieving UDP protocol socket information. Even the previously mentioned ss(1) utility from the iproute2 package resorts to reading from /proc/net/udp when it comes to UDP sockets. The main issue in implementing anything netlink related is the severe lack of documentation regarding requests and responses, although documentation on the delivery mechanism and its operation is abundant. Also, an issue that we're currently facing is determining the effectiveness of this change as detailed execution time advantages might only become apparent on high connectivity systems. If anyone has a system where the shortcomings of the previous implementations of the net.tcp.listen item were obviously too slow, it would be nice to hear some feedback on performance changes after applying this patch. |
Comment by Andris Zeila [ 2014 Sep 08 ] |
Successfully tested, please review my code changes in r48800, r48854 |
Comment by Juris Miščenko (Inactive) [ 2014 Sep 15 ] |
Changes merged in 2.5.0 (trunk) at r48983. |
Comment by Aleksandrs Saveljevs [ 2014 Sep 15 ] |
(1) Compiler gives a warning regarding the new code: $ make ... net.c: In function ‘NET_TCP_LISTEN’: net.c:570:44: warning: unused variable ‘found’ [-Wunused-variable] int ret = SYSINFO_RET_FAIL, n, buffer_alloc = 64 * ZBX_KIBIBYTE, found = 0; ^ ... jurism RESOLVED. asaveljevs CLOSED |
Comment by Aleksandrs Saveljevs [ 2014 Sep 15 ] |
(2) The following change is suggested: $ svn di Index: src/libs/zbxsysinfo/linux/net.c =================================================================== --- src/libs/zbxsysinfo/linux/net.c (revision 48989) +++ src/libs/zbxsysinfo/linux/net.c (working copy) @@ -71,7 +71,7 @@ NLERR_UNKNOWNMSGTYPE }; -int nlerr; +static int nlerr; static int find_tcp_port_by_state_nl(unsigned short port, int state, int *found) { @@ -593,7 +593,7 @@ { char *error = NULL; - switch(nlerr) + switch (nlerr) { case NLERR_UNKNOWN: error = zbx_strdup(error, "unrecognized netlink error occurred"); jurism RESOLVED. asaveljevs CLOSED |
Comment by richlv [ 2014 Sep 15 ] |
(3) docs :
asaveljevs ChangeLog says that "Old method of information retrieval also improved". However, it nowhere says how exactly. jurism What's new has been updated, Upgrade notes contain a terse description with a backlink to the what's new page containing the details of the change. RESOLVED. asaveljevs Pages in question are:
The first one looks good, but the following change is proposed for the second:
to
Also, richlv's third suggestion was not addressed and it might be useful to add it to net.tcp.listen[] item at https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/zabbix_agent . It may be done it a way similar to sensor[] item. REOPENED. jurism Also, seeing as the code interfacing with the kernel isn't based on a standard or official documentation, we cannot guarantee the correctness and conformance of the code. This warrants internal documentation at best. wiper I think it would be better not to oversaturate the https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/zabbix_agent with information, but to have separate pages with detailed description. The same goes for sensor item. However that would take a lot of work. asaveljevs It might be that a couple of sentences like "On Linux 2.6.14 and above information is obtained using the kernel's NETLINK interface, if possible. If not, information is read from /proc/net.tcp." would not be that much of an oversaturation. jurism Added comment about NETLINK to the net.tcp.listen item in 3.0 documentation. asaveljevs Your changes above also touched upgrade notes. I have fixed a typo at https://www.zabbix.com/documentation/3.0/manual/installation/upgrade_notes_300?&#item_changes . Please review. RESOLVED. jurism Everything looks fine. CLOSED. |
Comment by Juris Miščenko (Inactive) [ 2014 Sep 15 ] |
Fixes in code have been commited to trunk at r49015. jurism Documentation has been updated. |
Comment by Aleksandrs Saveljevs [ 2014 Sep 16 ] |
Please revert trunk changes and create a proper development branch. jurism Changes have been reverted. RESOLVED. asaveljevs The reverting commit was r49052. CLOSED. |
Comment by Aleksandrs Saveljevs [ 2014 Sep 16 ] |
(4) As sasha mentioned in (2) in jurism RESOLVED. asaveljevs It seems variable "found" could have been left uninitialized in case we get NLMSG_DONE. I have fixed that and removed unnecessary initializations and string allocations in r49081. Please take a look. RESOLVED. jurism Changes look fine. CLOSED. |
Comment by Aleksandrs Saveljevs [ 2014 Sep 16 ] |
(5) I have a Linux kernel above 2.6.14, but by default trunk compiles without Netlink support on my machine. It should be documented what needs to be done to compile with Netlink. asaveljevs Seems to be an error on my part - did not rerun ./bootstrap.sh. WON'T FIX. |
Comment by Juris Miščenko (Inactive) [ 2014 Sep 16 ] |
The necessary changes have been commited to the development branch at its original location. |
Comment by Juris Miščenko (Inactive) [ 2014 Sep 22 ] |
Change merged into 2.5.0 (trunk) at r49187. |
Comment by richlv [ 2014 Oct 25 ] |
subissues still open : |