ZABBIX BUGS AND ISSUES
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-6047

system.boottime and system.uptime broken on solaris 10/11 zones

    Details

    • Type: Incident report Incident report
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.4
    • Fix Version/s: 2.2.9rc1, 2.4.4rc1, 2.5.0
    • Component/s: Agent (G)
    • Labels:
    • Environment:
      solaris 11 non global zone on Sun Fire X4270

      Description

      system.boottime on a solaris 11 zone returns a 1970 date/time instead of the correct boot time. In the global zone it returns the correct time.

      1. var-adm-utmpx.patch
        3 kB
        Aleksandrs Saveljevs
      1. uptime.png
        3 kB

        Activity

        Hide
        richlv added a comment - - edited

        confirming with 2.2.1rc1 agent on SunOS 5.11 11.1 sun4v

        Show
        richlv added a comment - - edited confirming with 2.2.1rc1 agent on SunOS 5.11 11.1 sun4v
        Hide
        richlv added a comment - - edited

        probably closely related - system.uptime also shows quite... impossible numbers

        Show
        richlv added a comment - - edited probably closely related - system.uptime also shows quite... impossible numbers
        Hide
        Alexey Pustovalov added a comment -

        I have reproduced another issue, but like described:

        root@appserv:~/zabbix-2.2.2# uptime 
          1:01am  up 622 day(s), 2 hr(s),  1 user,  load average: 0.01, 0.08, 0.10
        root@appserv:~/zabbix-2.2.2# ./src/zabbix_get/zabbix_get -s localhost -k system.boottime
        26975
        
        Show
        Alexey Pustovalov added a comment - I have reproduced another issue, but like described: root@appserv:~/zabbix-2.2.2# uptime 1:01am up 622 day(s), 2 hr(s), 1 user, load average: 0.01, 0.08, 0.10 root@appserv:~/zabbix-2.2.2# ./src/zabbix_get/zabbix_get -s localhost -k system.boottime 26975
        Hide
        Alexey Pustovalov added a comment -

        is it possible to check latest Zabbix agent version from 2.2 branch? The issue has been solved there.

        Show
        Alexey Pustovalov added a comment - is it possible to check latest Zabbix agent version from 2.2 branch? The issue has been solved there.
        Hide
        Andrew Howell added a comment -

        I've checked again on 2.2.2 and it still has the same issue with system.boottime and system.uptime

        Show
        Andrew Howell added a comment - I've checked again on 2.2.2 and it still has the same issue with system.boottime and system.uptime
        Hide
        Aleksandrs Saveljevs added a comment -

        Currently, in our test environemnt, I have not observed "system.boottime" showing a small value or "system.uptime" showing a big value, but I have observed that "system.boottime" and "system.uptime" show the same numbers in the non-global zone as they do in the global zone.

        Here is a bit of a research regarding the possible ways to fix the problem. In the present implementation we use "boot_time" kstat counter:

        $ kstat -p -s '*boot*'                                                                                    
        unix:0:system_misc:boot_time    1417777382
        

        If that counter is obtained in the non-global zone, it still returns the value for the global zone.

        One idea that appeared in the process was to look at system calls that system utilities perform. For instance, here is the output of "psrinfo":

        $ /usr/sbin/psrinfo
        0       on-line   since 12/05/2014 13:03:02
        

        Doing its "truss" says that it looks at "cpu_info0" kstat counter:

        ...
        time()                                          = 1418141200
        ioctl(3, KSTAT_IOC_READ, "cpu_info0")           = 8068
        ...
        

        Its value is:

        $ kstat -p -n cpu_info0 | grep 141
        cpu_info:0:cpu_info0:state_begin        1417777382
        

        That would probably be another way to get system boot time, but that still returns the same boot time as the global zone.

        Code in http://fossies.org/linux/monit/src/process/sysdep_SOLARIS.c also gave a hint that kstat might not be the way to go for non-global zones.

        A more likely candidate for deeper inspection is "uptime" utility:

        $ uptime
          6:21pm  up 36 min(s),  1 user,  load average: 0,00, 0,00, 0,02
        

        Doing its "truss" says that it looks in "/var/adm/utmpx" using getutxent() and similar system calls, see http://docs.oracle.com/cd/E19109-01/tsolaris8/817-0882/6mglcr99g/index.html for their documentation. This idea is further confirmed in http://compgroups.net/comp.unix.solaris/source-of-boottime-for-uptime-other-than-ut/41559 . (That discussion also suggests to stat() /proc/0 and look at its atime/mtime/ctime to get system boot time.)

        An implementation based on "/var/adm/utmpx" is currently available in development branch svn://svn.zabbix.com/branches/dev/ZBX-6047 . The patch that implements the change is attached as "var-adm-utmpx.patch". It would be nice if you could test whether that patch also solves the reported problem.

        On our test system with the "/var/adm/utmpx" approach the agent reports good values:

        $ ./zabbix_agentd -t system.boottime
        system.boottime                               [u|1418139894]
        $ ./zabbix_agentd -t system.uptime                                                                        
        system.uptime                                 [u|2171]
        
        Show
        Aleksandrs Saveljevs added a comment - Currently, in our test environemnt, I have not observed "system.boottime" showing a small value or "system.uptime" showing a big value, but I have observed that "system.boottime" and "system.uptime" show the same numbers in the non-global zone as they do in the global zone. Here is a bit of a research regarding the possible ways to fix the problem. In the present implementation we use "boot_time" kstat counter: $ kstat -p -s '*boot*' unix:0:system_misc:boot_time 1417777382 If that counter is obtained in the non-global zone, it still returns the value for the global zone. One idea that appeared in the process was to look at system calls that system utilities perform. For instance, here is the output of "psrinfo": $ /usr/sbin/psrinfo 0 on-line since 12/05/2014 13:03:02 Doing its "truss" says that it looks at "cpu_info0" kstat counter: ... time() = 1418141200 ioctl(3, KSTAT_IOC_READ, "cpu_info0") = 8068 ... Its value is: $ kstat -p -n cpu_info0 | grep 141 cpu_info:0:cpu_info0:state_begin 1417777382 That would probably be another way to get system boot time, but that still returns the same boot time as the global zone. Code in http://fossies.org/linux/monit/src/process/sysdep_SOLARIS.c also gave a hint that kstat might not be the way to go for non-global zones. A more likely candidate for deeper inspection is "uptime" utility: $ uptime 6:21pm up 36 min(s), 1 user, load average: 0,00, 0,00, 0,02 Doing its "truss" says that it looks in "/var/adm/utmpx" using getutxent() and similar system calls, see http://docs.oracle.com/cd/E19109-01/tsolaris8/817-0882/6mglcr99g/index.html for their documentation. This idea is further confirmed in http://compgroups.net/comp.unix.solaris/source-of-boottime-for-uptime-other-than-ut/41559 . (That discussion also suggests to stat() /proc/0 and look at its atime/mtime/ctime to get system boot time.) An implementation based on "/var/adm/utmpx" is currently available in development branch svn://svn.zabbix.com/branches/dev/ZBX-6047 . The patch that implements the change is attached as "var-adm-utmpx.patch". It would be nice if you could test whether that patch also solves the reported problem. On our test system with the "/var/adm/utmpx" approach the agent reports good values: $ ./zabbix_agentd -t system.boottime system.boottime [u|1418139894] $ ./zabbix_agentd -t system.uptime system.uptime [u|2171]
        Hide
        Oleg Ivanivskyi added a comment -

        Look like it's corrected:

        IN ZONE
        -----------
        uptime
        10:24am up 21 day(s), 10:17, 1 user, load average: 5.22, 4.71, 4.02
        who -r
        . run-level 3 Nov 21 00:07 3 0 S
        Zabbix information :
        Host uptime (in sec)	2014-12-12 10:21:48	21 days, 10:14:35	-15903 days, 12:03:45
        Host boot time	2014-12-12 10:21:48	2014-11-21 00:07:13	+15903 days, 12:04:28
        IN GLOBAL
        ---------------
        uptime
        10:26am up 21 day(s), 11:58, 4 users, load average: 5.29, 4.83, 4.13
        who -r
        . run-level 3 Nov 20 22:35 3 0 S
        Zabbix information :
        Host uptime (in sec)	2014-12-12 10:18:57	21 days, 11:51:42
        Host boot time	2014-12-12 10:18:57	2014-11-20 22:27:15	-
        Show
        Oleg Ivanivskyi added a comment - Look like it's corrected: IN ZONE ----------- uptime 10:24am up 21 day(s), 10:17, 1 user, load average: 5.22, 4.71, 4.02 who -r . run-level 3 Nov 21 00:07 3 0 S Zabbix information : Host uptime (in sec) 2014-12-12 10:21:48 21 days, 10:14:35 -15903 days, 12:03:45 Host boot time 2014-12-12 10:21:48 2014-11-21 00:07:13 +15903 days, 12:04:28 IN GLOBAL --------------- uptime 10:26am up 21 day(s), 11:58, 4 users, load average: 5.29, 4.83, 4.13 who -r . run-level 3 Nov 20 22:35 3 0 S Zabbix information : Host uptime (in sec) 2014-12-12 10:18:57 21 days, 11:51:42 Host boot time 2014-12-12 10:18:57 2014-11-20 22:27:15 -
        Hide
        Aleksandrs Saveljevs added a comment -

        Implementation details remain to be discussed, but otherwise it is "Resolved".

        Show
        Aleksandrs Saveljevs added a comment - Implementation details remain to be discussed, but otherwise it is "Resolved".
        Hide
        Andris Zeila added a comment -

        Successfully tested

        Show
        Andris Zeila added a comment - Successfully tested
        Hide
        Aleksandrs Saveljevs added a comment - - edited

        (1) The agent does not compile on Solaris 8:

        boottime.c:23:18: zone.h: No such file or directory
        boottime.c: In function `SYSTEM_BOOTTIME':
        boottime.c:30: error: `GLOBAL_ZONEID' undeclared (first use in this function)
        boottime.c:30: error: (Each undeclared identifier is reported only once
        boottime.c:30: error: for each function it appears in.)
        

        Aleksandrs Saveljevs According to http://en.wikipedia.org/wiki/Solaris_Containers , zones are available since Solaris 10. Therefore, added a check for zone.h during configuration in r51408. RESOLVED.

        Andris Zeila CLOSED

        Show
        Aleksandrs Saveljevs added a comment - - edited (1) The agent does not compile on Solaris 8: boottime.c:23:18: zone.h: No such file or directory boottime.c: In function `SYSTEM_BOOTTIME': boottime.c:30: error: `GLOBAL_ZONEID' undeclared (first use in this function) boottime.c:30: error: (Each undeclared identifier is reported only once boottime.c:30: error: for each function it appears in.) Aleksandrs Saveljevs According to http://en.wikipedia.org/wiki/Solaris_Containers , zones are available since Solaris 10. Therefore, added a check for zone.h during configuration in r51408. RESOLVED. Andris Zeila CLOSED
        Hide
        Aleksandrs Saveljevs added a comment -

        A couple of topics were discussed with Andris Zeila:

        • Currently, the first entry of type BOOT_TIME is obtained from /var/adm/utmpx. There was a question whether there can be multiple such entries. It was decided that we can assume that /var/adm/utmpx only contains one such entry and the historical information is kept in /var/adm/wtmpx.
        • The suggestion "to stat() /proc/0 and look at its atime/mtime/ctime to get system boot time" mentioned above was deemed too hackish to be implemented as a fallback solution.
        Show
        Aleksandrs Saveljevs added a comment - A couple of topics were discussed with Andris Zeila : Currently, the first entry of type BOOT_TIME is obtained from /var/adm/utmpx. There was a question whether there can be multiple such entries. It was decided that we can assume that /var/adm/utmpx only contains one such entry and the historical information is kept in /var/adm/wtmpx. The suggestion "to stat() /proc/0 and look at its atime/mtime/ctime to get system boot time" mentioned above was deemed too hackish to be implemented as a fallback solution.
        Hide
        Aleksandrs Saveljevs added a comment - - edited

        (2) Code in src/libs/zbxsysinfo/solaris/boottime.c in 2.4 was very different from 2.2 due to ZBXNEXT-2203. Resolved conflicts in svn://svn.zabbix.com/branches/dev/ZBX-6047-2.4. Please take a look.

        Andris Zeila CLOSED

        Show
        Aleksandrs Saveljevs added a comment - - edited (2) Code in src/libs/zbxsysinfo/solaris/boottime.c in 2.4 was very different from 2.2 due to ZBXNEXT-2203 . Resolved conflicts in svn://svn.zabbix.com/branches/dev/ZBX-6047-2.4. Please take a look. Andris Zeila CLOSED
        Hide
        Andris Zeila added a comment -

        Successfully tested

        Show
        Andris Zeila added a comment - Successfully tested
        Hide
        Aleksandrs Saveljevs added a comment -

        Fixed in pre-2.2.9 r51419, pre-2.4.4 r51420, and pre-2.5.0 (trunk) r51421.

        Show
        Aleksandrs Saveljevs added a comment - Fixed in pre-2.2.9 r51419, pre-2.4.4 r51420, and pre-2.5.0 (trunk) r51421.

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Howell
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: