[ZBX-5243] AIX procfs parsing is reading truncated information about cmdline using external process Created: 2012 Jun 26  Updated: 2020 Jan 02  Resolved: 2014 May 08

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 1.8.11, 1.8.13, 2.0.0
Fix Version/s: 2.3.0

Type: Incident report Priority: Minor
Reporter: Boris Manojlovic Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: aix, items, patch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

AIX 6.1.0.0


Attachments: File zabbix-2.2.3-removed-procfs-new.patch     File zabbix-2.2.3-removed-procfs.patch     File zabbix-2.x-TRUNK-r44251-removed-procfs-new.patch     File zabbix-2.x-TRUNK-r44251-removed-procfs.patch     File zabbix-removed-procfs-1.8.13.patch     File zabbix-removed-procfs-2.0.0.patch     File zabbix-removed-procfs-2.0.2.patch    
Issue Links:
Duplicate
is duplicated by ZBX-6670 Item proc.num can not be used on AIX ... Closed
is duplicated by ZBX-8581 proc.num cmdline option doesn't check... Closed
Sub-task
depends on ZBX-17134 AIX procfs parsing is reading truncat... Closed

 Description   

procfs on AIX system truncates information about process command line to 80 characters, this patch removes this limitation with use of
correct system API giving full command line arguments length (up to zabbix own adjustable limit MAX_STRING_LEN)
Patches remove all support for profcs as using API is arguably faster and contained (no opening of procfs files or running ps program to get information as it
is currently done)



 Comments   
Comment by Boris Manojlovic [ 2012 Sep 11 ]

By mistake patch is not complete, as i had issues with creating patch that would be possible to apply on AIX, i had to copy package and manually amend source i have forgot to add definition of proccommargs in PROC_MEM function and compile would fail. corrected patch for 2.0.2 attached

Comment by Boris Manojlovic [ 2012 Sep 11 ]

amended patch

Comment by Boris Manojlovic [ 2014 Apr 10 ]

updated patch to apply cleanly

Comment by Boris Manojlovic [ 2014 Apr 10 ]

updated patch for latest trunk revision
r44251

Comment by Boris Manojlovic [ 2014 Apr 10 ]

here is example run (2 processes are because it sees itself (cmdline of proc.num part) and real process)

root@ttabih09:/ # ps auxwwwwwww|grep -i com.ibm.lwi.LaunchLWI
root 6750424 0.0 6.0 76504 60796 - A Mar 31 4:56 /var/opt/tivoli/ep/_jvm/jre/bin/java -Xmx384m -Xminf0.01 -Xmaxf0.4 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Xbootclasspath/a:/var/opt/tivoli/ep/runtime/core/eclipse/plugins/com.ibm.rcp.base_6.2.3.20110824-0615/rcpbootcp.jar:/var/opt/tivoli/ep/lib/com.ibm.logging.icl_1.1.1.jar:/var/opt/tivoli/ep/lib/jaas2zos.jar:/var/opt/tivoli/ep/lib/jaasmodule.jar:/var/opt/tivoli/ep/lib/lwidiag.jar:/var/opt/tivoli/ep/lib/lwinative.jar:/var/opt/tivoli/ep/lib/lwinl.jar:/var/opt/tivoli/ep/lib/lwirolemap.jar:/var/opt/tivoli/ep/lib/lwisecurity.jar:/var/opt/tivoli/ep/lib/lwitools.jar:/var/opt/tivoli/ep/lib/passutils.jar:../../runtime/agent/lib/cas-bootcp.jar -Xverify:none -cp eclipse/launch.jar:eclipse/startup.jar:/var/opt/tivoli/ep/runtime/core/eclipse/plugins/com.ibm.rcp.base_6.2.3.20110824-0615/launcher.jar com.ibm.lwi.LaunchLWI
root@ttabih09:/ # /usr/local/sbin/zabbix_agent -t "proc.num[,,,LaunchLWI]"
proc.num[,,,LaunchLWI] [u|2]
root@ttabih09:/ #

Comment by Aleksandrs Saveljevs [ 2014 Apr 11 ]

As mentioned in the linked ZBX-6670, "psinfo.pr_fname" (process name) is also limited to 15 characters.

For more details on "struct psinfo", see http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.files/doc/aixfiles/proc.htm :

char     pr_fname[PRFNSZ];          /* last component of exec()ed pathname*/
char     pr_psargs[PRARGSZ];        /* initial characters of arg list */

Here, PRFNSZ is 16 and PRARGSZ is 80.

In "struct procsinfo", the definition for the program name field is as follows:

char     pi_comm[MAXCOMLEN+1];      /* (truncated) program name */

Here, MAXCOMLEN is 32, which is better.

Comment by Aleksandrs Saveljevs [ 2014 Apr 11 ]

While investigating this issue, I noticed one thing that might or might not be a problem.

In /usr/include/procinfo.h, there is a definition of several macros and "struct procsinfo":

#define SSLEEP          1
#define SRUN            3

...

struct procsinfo
{
        ...
        uint            pi_state;       /* process state -- from proc.h */
        ...
}

In /usr/include/sys/proc.h, there are the following macros:

#define SNONE           0               /* slot is available    */
#define SIDL            4               /* process is created   */
#define SZOMB           5               /* process is dying     */
#define SSTOP           6               /* process is stopped   */
#define SACTIVE         7               /* process is active    */
#define SSWAP           8               /* process is swapped   */

So "pi_state" seems to refer to macros in proc.h and it is unclear whether "pi_state" can ever be SRUN or SSLEEP, which we use in our code.

At least the item always returns zero:

$ ./zabbix_agentd -t proc.num[]    
proc.num[]                                    [u|60]
$ ./zabbix_agentd -t proc.num[,,run]
proc.num[,,run]                               [u|0]
$ ./zabbix_agentd -t proc.num[,,sleep]
proc.num[,,sleep]                             [u|0]

To be investigated.

asaveljevs Comment starting at line 32 at http://fossies.org/linux/misc/old/monitor-2.1.9.tar.gz:a/monitor-2.1.9/print_top.c might suggest that SSLEEP and SRUN are no longer used since AIX 4.

Comment by Boris Manojlovic [ 2014 Apr 13 ]

patch against 2.2.3
(trunk patch does not apply cleanly because of change of struct)

Comment by Boris Manojlovic [ 2014 Apr 14 ]

i have amended my patch for getprocs handling and now it uses 64bit procsinfo64 struct which inside has pi_cpu member which when is bigger than zero can show if process has been working / running or it is idle with "0" cpu ticks
run:

return (procsinfo->pi_state == SACTIVE && procsinfo->pi_cpu > 0) ? SUCCEED : FAIL

sleep

return (procsinfo->pi_state == SACTIVE && procsinfo->pi_cpu == 0) ? SUCCEED : FAIL;

and as you have stated pi_comm is limited to 32 chars but that i think should not be limiting factor in any case

asaveljevs Are you sure procsinfo->pi_cpu gets reset when a process is put to sleep? Isn't that field an always increasing counter?

steki Yes it is always reset (checked with debug printfs in patch)

Comment by Boris Manojlovic [ 2014 Apr 14 ]

corrected handling of num.proc[,,(run|sleep)]

Comment by Aleksandrs Saveljevs [ 2014 Apr 14 ]

Boris, your patch is to be implemented in 2.4 (trunk), so unless you have a reason to develop for 2.2 for your own purposes, you might wish to prepare patches for trunk.

In any case, development is already going on in development branch svn://svn.zabbix.com/branches/dev/ZBX-5243 . So instead of preparing patches, you might just wish to follow that branch and add your suggestions as comments here.

Comment by Boris Manojlovic [ 2014 Apr 14 ]

trunk version of patch with corrected handling of proc.num[,,(run|sleep)]

Comment by Aleksandrs Saveljevs [ 2014 Apr 14 ]

As (hopefully) an improvement to Boris' suggestion, I have replaced getprocs() with getprocs64() and "struct procsinfo" with "struct procentry64".

According to http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/getprocs.htm , it should be safe:

In AIX 5.1 and later, 64-bit applications are required to use getprocs64() and procentry64. Note that struct procentry64 contains the same information as struct procsinfo64, with the addition of support for the 64-bit time_t and dev_t, and the 256-bit sigset_t. The procentry64 structure also contains a new version of struct ucred (struct ucred_ext) and a new, expanded struct rusage (struct trusage64) as described in <sys/cred.h> and <sys/resource.h> respectively. Application developers are also encouraged to use getprocs64() in 32-bit applications to obtain 64-bit process information as this interface provides the new, larger types. The getprocs() interface will still be supported for 32-bit applications using struct procsinfo or struct procsinfo64 but will not be available to 64-bit applications.

Comment by Aleksandrs Saveljevs [ 2014 Apr 14 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5243 . Work is based on Boris' patch. Thank you!

Comment by Boris Manojlovic [ 2014 Apr 14 ]
diff -u  proc.c~ proc.c
--- proc.c~     2014-04-14 12:23:21.000000000 +0200
+++ proc.c      2014-04-14 14:52:53.000000000 +0200
@@ -34,9 +34,9 @@
        switch (zbx_proc_stat)
        {
                case ZBX_PROC_STAT_RUN:
-                       return SRUN == procentry->pi_state ? SUCCEED : FAIL;
+                       return ( SACTIVE == procentry->pi_state && procentry->pi_cpu > 0) ? SUCCEED : FAIL;
                case ZBX_PROC_STAT_SLEEP:
-                       return SSLEEP == procentry->pi_state ? SUCCEED : FAIL;
+                       return ( SACTIVE == procentry->pi_state && procentry->pi_cpu == 0) ? SUCCEED : FAIL;
                case ZBX_PROC_STAT_ZOMB:
                        return SZOMB == procentry->pi_state ? SUCCEED : FAIL;
        }

example run:

root@aixtest:/WORK/zabbix/ZBX-5243 # /opt/z/sbin/zabbix_agent -t 'proc.num[,,all]' && /opt/z/sbin/zabbix_agent -t 'proc.num[,,sleep]' && /opt/z/sbin/zabbix_agent -t 'proc.num[,,run]'
proc.num[,,all]                               [u|96]
proc.num[,,sleep]                             [u|92]
proc.num[,,run]                               [u|4]
root@aixtest:/WORK/zabbix/ZBX-5243 # 
Comment by dimir [ 2014 Apr 25 ]

Successfully tested. Please review my changes in r44770, r44784.

Comment by Aleksandrs Saveljevs [ 2014 Apr 25 ]

Changes in r44770 and r44784 look good, thank you!

Comment by Aleksandrs Saveljevs [ 2014 Apr 25 ]

Fixed in pre-2.3.0 (trunk) r44790.

Comment by Aleksandrs Saveljevs [ 2014 May 08 ]

(1) This task broke compilation on Solaris, because <sys/procfs.h> is no longer included. Because the check for HAVE_SYS_PROCFS_H was only present in AIX code, we thought it was only needed there. Apparently, it is needed on Solaris, too.

asaveljevs Previously, we did a check for <sys/procfs.h> in configure.ac and included <sys/procfs.h> as follows:

#ifdef HAVE_SYS_PROCFS_H
/* This is needed to access the correct procfs.h definitions */
#	define _STRUCTURED_PROC 1
#	include <sys/procfs.h>
#endif

In <sys/procfs.h> it says:

/*
 * This definition is temporary.  Structured proc is the preferred API,
 * and the older ioctl-based interface will be removed in a future version
 * of Solaris.  Until then, by default, including <sys/procfs.h> will
 * provide the older ioctl-based /proc definitions.  To get the structured
 * /proc definitions, either include <procfs.h> or define _STRUCTURED_PROC
 * to be 1 before including <sys/procfs.h>.
 */
#ifndef _STRUCTURED_PROC
#define _STRUCTURED_PROC        0
#endif

Based on the above and the fact that we only need the header in Solaris now, we now simply include <procfs.h> in Solaris' proc.c.

asaveljevs Tested on Solaris 8 and Solaris 10. RESOLVED.

Comment by dimir [ 2014 May 09 ]

Successfully tested.

Comment by Aleksandrs Saveljevs [ 2014 May 09 ]

Fixed in pre-2.3.0 (trunk) r45231.

Comment by Aleksandrs Saveljevs [ 2015 Apr 09 ]

Boris' approach to distinguishing running and sleeping processes is considered in ZBX-9385, which is a regression from this task.

Generated at Fri Apr 26 12:50:02 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.