[ZBX-8470] On CentOS/RHEL 5 64 bit unload module gets SIGSEGV Created: 2014 Jul 14  Updated: 2017 May 30  Resolved: 2014 Aug 01

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Proxy (P), Server (S)
Affects Version/s: 2.2.3
Fix Version/s: 2.5.0

Type: Incident report Priority: Minor
Reporter: Bruno Alberto Crespo Garcia Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: loadablemodule, patch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS/RHEL 5 64bit


Attachments: File zabbix-module.patch     File zabbix-module.patch     File zabbix_agentd.log     File zabbix_agentd.log    
Issue Links:
Duplicate
is duplicated by ZBX-9374 zabbix agent crash when using modules... Closed

 Description   

The line

*(void **)(&func_uninit) = dlsym(*module, ZBX_MODULE_FUNC_UNINIT);

seems to hit a bug in the C compiler that ships with EL 5 and the process received a SIGSEGV. On the other hand this C construction is not very good practice.

I created an small patch that fixes the problem and must be OK in any UNIX dialect.



 Comments   
Comment by Juris Miščenko (Inactive) [ 2014 Jul 15 ]

Bruno, I couldn't reproduce the behaviour on our RHEL5 machine using the stock example 'dummy' module that ships with our distributions (src/modules/dummy/). If possible, could you provide the source of your module (at least the init and uninit portions of it). Thank you.

Comment by Bruno Alberto Crespo Garcia [ 2014 Jul 15 ]

Agent log on crash

Comment by Bruno Alberto Crespo Garcia [ 2014 Jul 15 ]

I reproduce the issue with the following steps:

  • Pick your zabbix-2.2.4 src rpm from your repository.
  • Build the rpm in a CentOS 5.10 clean installation (important info, rpm -q gcc returns: gcc-4.1.2-54.el5):
    rpm -i zabbix-2.2.4-1.el5.src.rpm
    cd /usr/src/redhat/SPECS
    rpmbuild -bb zabbix.spec
  • Install the packages:
    cd /usr/src/redhat/RPMS/x86_64
    rpm -i rpm -i zabbix-agent-2.2.4-1.x86_64.rpm zabbix-2.2.4-1.x86_64.rpm
  • Compile your dummy module:
    cd /usr/src/redhat/BUILD/zabbix-2.2.4/src/modules/dummy/
    Edit Makefile and add -I /usr/include/libxml2 to the CC flags
    make
    cp dummy.so /usr/local/lib
  • Edit zabbix_agentd.conf and change
    LoadModulePath=/usr/local/lib
    LoadModule=dummy.so
  • Start and stop the service:
    service zabbix-agent start
    ... wait 5 minutes
    service zabbix-agent stop

And the agent crashes. I uploaded the log file for you.

If you are interested I can provide the binary rpm files that my machine generates.

I'm trying to do the same thing with an original rhel 5 system.

Please let me know if you need some other thing.

Comment by Bruno Alberto Crespo Garcia [ 2014 Jul 15 ]

Just to be sure, I created a new virtual machine, installed RHEL 5.9 from DVD, no updates, repeat the process (including rebuilding the rpms in the rhel vm) and the agent crash with your dummy module. I attached the log file.

Comment by Bruno Alberto Crespo Garcia [ 2014 Jul 15 ]

BTW: As I said in the title, the issue appear in 64bit version, but not in 32bit version.

Comment by Aleksandrs Saveljevs [ 2014 Jul 30 ]

According to "man dlsym", the solution proposed in the patch may not be optimal:

cosine = (double (*)(double)) dlsym(handle, "cos");

/* According to the ISO C standard, casting between function
   pointers and 'void *', as done above, produces undefined results.
   POSIX.1-2003 and POSIX.1-2008 accepted this state of affairs and
   proposed the following workaround:

   *(void **) (&cosine) = dlsym(handle, "cos");

   This (clumsy) cast conforms with the ISO C standard and will
   avoid any compiler warnings.

   The 2013 Technical Corrigendum to POSIX.1-2008 (a.k.a.
   POSIX.1-2013) improved matters by requiring that conforming
   implementations support casting 'void *' to a function pointer.
   Nevertheless, some compilers (e.g., gcc with the '-pedantic'
   option) may complain about the cast used in this program. */

I remember doing approximately the following elsewhere and it seemed to work without problems:

typedef union
{
	void	*ptr;
	void	(*func)();
}
ZBX_FUNCTION_POINTER;

...

ZBX_FUNCTION_POINTER	fp;

fp.ptr = dlsym(handle, "cos");

Then, when a function pointer was needed, I used "fp.func". We might or might not wish to try the same approach here.

sasha

HP-UX man page:

void *handle;
int i, *iptr;
int (*fptr)(int);

/* open the needed object */
handle = dlopen("/usr/mydir/mylib.sl", RTLD_LAZY);

/* find address of function and data objects */
fptr = (int (*)(int))dlsym(handle, "some_function");

iptr = (int *)dlsym(handle, "int_object");

...

Solaris man page:

int       *iptr, (*fptr)(int);

/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so.1", RTLD_LAZY);

/* find the address of function and data objects */
fptr = (int (*)(int))dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");

...
Comment by Bruno Alberto Crespo Garcia [ 2014 Jul 30 ]

Sorry, I didn't know the POSIX workaround, anyway the compiler in RHEL/CentOS has a problem, another simple fix that I checked and seems to work is:

typedef int             (*int_func_ptr)();

void    unload_modules()
{
        const char      *__function_name = "unload_modules";

        volatile int_func_ptr   func_uninit;
        void                    **module;

        zabbix_log(LOG_LEVEL_DEBUG, "In %s()", __function_name);

        /* there is no registered modules */
        if (NULL == modules)
                return;

        for (module = modules; NULL != *module; module++)
        {
                *(void **)(&func_uninit) = dlsym(*module, ZBX_MODULE_FUNC_UNINIT);
                if (NULL == func_uninit)
                {
....

I attach a complete patch....

Comment by Bruno Alberto Crespo Garcia [ 2014 Jul 30 ]

Alexander,

I used to think the same that the HP-UX and Solaris manuals, but it seems that Aleksandrs Saveljevs is right, the manual page from the OpenGroup:

void    *handle;
int     *iptr, (*fptr)(int);


/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);


/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");


/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);


Comment by Aleksandrs Saveljevs [ 2014 Jul 31 ]

There are two descriptions of dlsym() on OpenGroup's website. The one referred to by Bruno above is the first one:

  1. http://pubs.opengroup.org/onlinepubs/009695399/functions/dlsym.html
  2. http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html

The first (older, issue 6) suggests the *(void **)(&fptr) way and the second (newer, issue 7) suggests the (int (*)(int))dlsym(handle, "my_function") way.

My workaround with union is described at http://en.wikipedia.org/wiki/Dynamic_loading#Converting_Extracted_Library_Contents . However, http://www.trilithium.com/johan/2004/12/problem-with-dlsym/ suggests that according to the C standard this is still undefined behavior.

The latter link also suggests two other workarounds: casting through an integral type (like (int (*)(int))(intptr_t)dlsym()) and memcpy() of void pointer to function pointer, which seems to work in practice, but still there are no guarantees.

So there are two standards here that contradict each other: C standard and POSIX. The C standard does not define a conversion of a function pointer to "void *", because function pointers and object pointer are not required to have the same size. The latest version of POSIX, on the other hand, says that function pointers have to be convertible to "void *".

Consequently, there is no way to achieve what we need according to the C standard, so we have to aim at POSIX. In other words, there is no theoretically correct solution, so we have to choose a practical solution - the one that works in practice.

It seems suggested by all the pages above to use the (int (*)(int))dlsym(handle, "my_function") style and this will be POSIX-compliant. I also took a look at other software (mostly Firefox, a bit in BIND and Net-SNMP) and all use that style, too. So it might be that we should change to this style as well.

Comment by Juris Miščenko (Inactive) [ 2014 Aug 01 ]

Fix implemented in svn://svn.zabbix.com/branches/dev/ZBX-8470

Comment by Andris Zeila [ 2014 Nov 03 ]

Successfully tested

Comment by Juris Miščenko (Inactive) [ 2014 Nov 03 ]

Fix merged in 2.5.0 (trunk) r50361.

Generated at Tue Apr 23 18:37:04 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.