[ZBX-10622] Windows binaries say "Connection refused" even on connection timeout Created: 2016 Apr 05  Updated: 2017 May 30  Resolved: 2016 Aug 05

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.0.1
Fix Version/s: 3.0.4rc1, 3.2.0alpha1

Type: Incident report Priority: Minor
Reporter: Yuri Volkov Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: errorreporting, overflow, unicode, zabbix_sender
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Microsoft Windows Server 2003 R2 Enterprise SP2 Russian



 Description   

When zabbix_sender.dll library fails to connect to Zabbix server due to invalid server IP address, parameter char **result of function zabbix_sender_send_values() contains a truncated 254 bytes long error message "cannot connect to [[10.0.255.1]:10051]: [0x0000274C] ??????? ?????????? ?????????? ???? ???????????, ?.?. ?? ??????? ?????????? ?? ????????? ????? ?? ??????? ????". The message (at least visible part of it) is correct, but it apparently did not fit in 255 bytes long buffer because russian unicode characters take twice more bytes than english unicode characters.



 Comments   
Comment by Yuri Volkov [ 2016 Apr 05 ]

$ echo -n "cannot connect to [[10.0.255.1]:10051]: [0x0000274C] Попытка установить соединение была безуспешной, т.к. от другого компьютера за требуемое время не получен нужн" | wc --bytes
254
]$

Comment by Aleksandrs Saveljevs [ 2016 Apr 05 ]

According to src/libs/zbxcomms/comms.c in the latest 2.4, the size of "zbx_tcp_strerror_message" buffer is 255 bytes. In 3.0, the size of this buffer is 512 bytes.

Even though "Affects Version/s" is "3.0.1", I would guess that the DLL was compiled with 2.4 source code. Would 512 bytes in 3.0 be enough to accommodate your error message?

Comment by Yuri Volkov [ 2016 Apr 05 ]

Aleksandrs, you are right. It turned out that WINAPI function LoadLibrary("zabbix_sender.dll") took the wrong library (v.2.4.5) from the wrong directory. But newer library (v.3.0.0.58455) somewhy returns another error message (now in english): "cannot connect to [[10.1.255.1]:10051]: Connection refused." Strictly speaking, connection is not refused, it is timed out after three unanswered TCP SYN packets. And there is more correct error message "A connection timeout occurred" provided in file zabbix-3.0.1/src/libs/zbxcomms/comms.c:

 371     if (0 == FD_ISSET(s->socket, &fdw))
 372     {
 373         if (0 != FD_ISSET(s->socket, &fde))
 374             *error = zbx_strdup(*error, "Connection refused.");
 375         else
 376             *error = zbx_strdup(*error, "A connection timeout occurred.");
 377 
 378         return FAIL;
 379     }

>> Would 512 bytes in 3.0 be enough to accommodate your error message?

Russian language is not as concise as english, but I guess 512 bytes (256 russian unicode characters) would be fine.

Comment by Aleksandrs Saveljevs [ 2016 Apr 05 ]

Let's consider the buffer size issue as solved then and try to investigate the error message problem (which was initially implemented in ZBX-3437).

Comment by Aleksandrs Saveljevs [ 2016 Apr 05 ]

Even though this issue was closed by the reporter, we should double-check whether the error message for timeout is correct.

Comment by Yuri Volkov [ 2016 Apr 05 ]

Tried zabbix_sender.exe v3.0.0 on Windows 7 x64 SP1. Result is the same: "Connection refused" instead of "Timeout".

zabbix_sender.exe [3328]: DEBUG: send value error: cannot connect to [[10.1.255.1]:10051]: Connection refused.
Sending failed.

Comment by Aleksandrs Saveljevs [ 2016 Apr 11 ]

Indeed, I have just tried and it seems to always say "Connection refused", even when the actual error is timeout. Setting status to "Confirmed".

Comment by Aleksandrs Saveljevs [ 2016 Apr 11 ]

I have also updated the issue summary accordingly.

Comment by Yuri Volkov [ 2016 Apr 13 ]

Can anybody tell me which tools are used to compile official zabbix_sender binaries for Windows? I have some experience in C programming and I would like to take part in debugging.

Comment by Aleksandrs Saveljevs [ 2016 Apr 13 ]

Some pointers regarding Windows agent compilation are available in ZBXNEXT-3168.

Comment by Andrey Melnikov [ 2016 Apr 13 ]

Install VisualStudio 14, create in build/win32/project/ simple bat file

call "C:\Program Files\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86
nmake CPU=i386 clean all
call "C:\Program Files\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64
nmake CPU=AMD64 clean all

and compile.
If you want to build winXP compatible app - replace into Makefile_common.inc '/SUBSYSTEM:CONSOLE' to '/SUBSYSTEM:"console,5.01"'

Comment by Yuri Volkov [ 2016 Apr 13 ]

Compilation succeeded. Now I have to figure out how to to create a VisualStudio project from the nmake project to debug compiled binaries.

Comment by Aleksandrs Saveljevs [ 2016 Apr 13 ]

Yuri, thank you for investigating this issue!

Comment by Andrey Melnikov [ 2016 Apr 14 ]

Yuri Volkov - into Makefile_common.inc add /ZI to COMMON_FLAGS and /DEBUG to LFLAGS and rebuild.

Comment by Yuri Volkov [ 2016 Apr 16 ]

The problem resides in the following block of code:

zabbix-3.0.1/src/libs/zbxcomms/comms.c
if (ZBX_PROTO_ERROR == connect(s->socket, addr, addrlen) && WSAEWOULDBLOCK != zbx_socket_last_error())
{
    *error = zbx_strdup(*error, strerror_from_system(zbx_socket_last_error()));
    return FAIL;
}

if (-1 == (res = select(0, NULL, &fdw, &fde, ptv)))
{
    *error = zbx_strdup(*error, strerror_from_system(zbx_socket_last_error()));
    return FAIL;
}

if (0 == FD_ISSET(s->socket, &fdw))
{
    if (0 != FD_ISSET(s->socket, &fde))
        *error = zbx_strdup(*error, "Connection refused.");
    else
        *error = zbx_strdup(*error, "A connection timeout occurred.");

    return FAIL;
}

On Windows 0 != FD_ISSET(s->socket, &fde) evalutes to TRUE in both cases: when connection is refused and when connection is timed out.

Here is what MSDN says about function select:

If a socket is processing a connect call (nonblocking), failure of the connect attempt is indicated in exceptfds (application must then call getsockopt SO_ERROR to determine the error value to describe why the failure occurred).

And here is the solution which I have tested on Windows Server 2003 R2:

if (0 == FD_ISSET(s->socket, &fdw))
{
    if (0 != FD_ISSET(s->socket, &fde))
    {
        int socket_error = 0;
        int socket_error_len = sizeof(int);

        if (ZBX_PROTO_ERROR != getsockopt(s->socket, SOL_SOCKET,
            SO_ERROR, (char *)&socket_error, &socket_error_len))
        {
            switch (socket_error)
            {
            case WSAECONNREFUSED:
                *error = zbx_strdup(*error, "Connection refused.");
                break;
            case WSAETIMEDOUT:
                *error = zbx_strdup(*error, "A connection timeout occurred.");
                break;
            default:
                *error = zbx_strdup(*error, "Unknown error.");
                break;
            }
        }
        else
            *error = zbx_strdup(*error, "Unknown error.");
    }

    return FAIL;
}
Comment by Andrey Melnikov [ 2016 Apr 17 ]

In `default` case better provide hex number of WSA* error code instead of 'Unknown error.'.

Comment by Yuri Volkov [ 2016 Apr 17 ]

Instead of hexadecimal error code we could as well show error message from Windows:

if (0 == FD_ISSET(s->socket, &fdw))
{
    if (0 != FD_ISSET(s->socket, &fde))
    {
        int socket_error = 0;
        int socket_error_len = sizeof(int);

        if (ZBX_PROTO_ERROR != getsockopt(s->socket, SOL_SOCKET,
            SO_ERROR, (char *)&socket_error, &socket_error_len))
        {
            wchar_t error_msg[512] = { 0 };

            if (0 != FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM, NULL,
                socket_error, 0, error_msg, _countof(error_msg), NULL))
            {
                char error_msg_oem[512] = { 0 };

                if (0 != WideCharToMultiByte(CP_OEMCP, 0, error_msg, -1,
                    error_msg_oem, _countof(error_msg_oem), NULL, NULL))
                {
                    *error = zbx_strdup(*error, error_msg_oem);
                }
                else
                {
                    *error = zbx_strdup(*error, "Unknown error.");
                }
            }
            else
            {
                *error = zbx_strdup(*error, "Unknown error.");
            }
        }
        else
        {
            *error = zbx_strdup(*error, "Unknown error.");
        }
    }

    return FAIL;
}

Works correctly on russian Windows Server 2003 R2:

C:\zabbix-3.0.1\bin\win32>zabbix_sender.exe -vv -z 10.0.0.1 -s host -k key -o 1
zabbix_sender.exe [3392]: DEBUG: send value error: cannot connect to [[10.0.0.1]:10051]: Попытка установить соединение была безуспешной, т.к. от другого компьютера за требуемое время не получен нужный отклик, или было разорвано уже установленное соединение из-за неверного отклика уже подключенного компьютера.

Sending failed.

C:\zabbix-3.0.1\bin\win32>

But... The previous version of zabbix_sender (2.4.5) showed the same localized error message from Windows (except for trunction).

I would keep more clear and concise "connection refused/timed out" and show system error message for the rest error codes:

if (socket_error == WSAECONNREFUSED)
{
    *error = zbx_strdup(*error, "Connection refused.");
}
else if (socket_error == WSAETIMEDOUT)
{
    *error = zbx_strdup(*error, "A connection timeout occurred.");
}
else
{
    wchar_t error_msg[512] = { 0 };

    if (0 != FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM, NULL,
        socket_error, 0, error_msg, _countof(error_msg), NULL))
    {
        char error_msg_oem[512] = { 0 };

        if (0 != WideCharToMultiByte(CP_OEMCP, 0, error_msg, -1,
            error_msg_oem, _countof(error_msg_oem), NULL, NULL))
        {
            *error = zbx_strdup(*error, error_msg_oem);
        }
        else
        {
            *error = zbx_strdup(*error, "Unknown error.");
        }
    }
    else
    {
        *error = zbx_strdup(*error, "Unknown error.");
    }
}
Comment by Yuri Volkov [ 2016 Apr 24 ]

Is there anybody there? The fix consists of just a few lines of code. Maybe I should create some sort of pull request?

Comment by Aleksandrs Saveljevs [ 2016 Apr 25 ]

Yuri, you are welcome to attach the patch. Unfortunately, pull requests are not supported (see ZBXNEXT-3236).

Comment by Viktors Tjarve [ 2016 Jun 15 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10622

Comment by Andris Zeila [ 2016 Jun 16 ]

(1) We have strerror_from_system() function that should be used to get system error message rather than formatting it manually.

viktors.tjarve RESOLVED in r60665.

wiper CLOSED

Comment by Andris Zeila [ 2016 Jun 16 ]

(2) While at it we should also create error message if the getsockopt() call fails (however unlikely it would be). Something like:

				*error = zbx_dsprintf(*error, "Cannot obtain error code: %s.",
						strerror_from_system(WSAGetLastError()));

viktors.tjarve RESOLVED in r60669.

sasha CLOSED

Comment by Andris Zeila [ 2016 Jun 17 ]

Successfully tested

Comment by Viktors Tjarve [ 2016 Jun 17 ]

Released in:

  • 3.0.4rc1 r60672
  • 3.1.0 r60673
Generated at Fri Apr 19 21:36:51 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.