Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-19311

zabbix-agent2 smart monitoring fails with megaraid

XMLWordPrintable

    • Sprint 84 (Jan 2022), Sprint 85 (Feb 2022), Sprint 86 (Mar 2022)
    • 1

      Steps to reproduce:

      1. Deploy and configure zabbix-agent2 on RHEL8
      2. Import the latest template for smart monitoring from git
      3. Create sudo rule for zabbix user and smartctl
      4. Have disk discovery failing

      Result:

      When trying to run the discovery manually with the agent:

      zabbix_agent2 -v -t smart.disk.discovery

       

      (...)
      2021/04/29 12:47:40.125137 [Smart] stopped looking for RAID devices of megaraid type, err:%!(EXTRA *errors.errorString=failed to get disk data from smartctl: Smartctl open device: /dev/bus/0 [megaraid_disk_00] failed: INQUIRY failed)
      (...)
      

       

      Expected:

      /sbin/smartctl --scan
       /dev/sda -d scsi # /dev/sda, SCSI device
       /dev/sdb -d scsi # /dev/sdb, SCSI device
       /dev/sdc -d scsi # /dev/sdc, SCSI device
       /dev/sdd -d scsi # /dev/sdd, SCSI device
       /dev/sde -d scsi # /dev/sde, SCSI device
       /dev/sdf -d scsi # /dev/sdf, SCSI device
       /dev/sdg -d scsi # /dev/sdg, SCSI device
       /dev/sdh -d scsi # /dev/sdh, SCSI device
       /dev/sdi -d scsi # /dev/sdi, SCSI device
       /dev/sdj -d scsi # /dev/sdj, SCSI device
       /dev/sdk -d scsi # /dev/sdk, SCSI device
       /dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
       /dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
       /dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
       /dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
       /dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
       /dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
       /dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device
       /dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
       /dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device
       /dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device
       /dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device
       /dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device

      NOTE: smartctl uses and outputs that virtual bus device that does not really exist in the filesystem, but this way you are able to return the smart status:

       

      smartctl -a /dev/bus/0 -d megaraid,1
       smartctl 7.1 2020-04-05 r5049 [x86_64-linux-4.18.0-240.10.1.el8_3.x86_64] (local build)
       Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
      === START OF INFORMATION SECTION ===
       Model Family: Intel S4510/S4610/S4500/S4600 Series SSDs
       Device Model: INTEL SSDSC2KG038T8
       Serial Number: PHYG025201RH3P8EGN
       LU WWN Device Id: 5 5cd2e4 152613993
       Firmware Version: XCV10120
       User Capacity: 3,840,755,982,336 bytes [3.84 TB]
       Sector Sizes: 512 bytes logical, 4096 bytes physical
       Rotation Rate: Solid State Device
       Form Factor: 2.5 inches
       Device is: In smartctl database [for details use: -P show]
       ATA Version is: ACS-3 T13/2161-D revision 5
       SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
       Local Time is: Thu Apr 29 12:49:44 2021 UTC
       SMART support is: Available - device has SMART capability.
       SMART support is: Enabled
      === START OF READ SMART DATA SECTION ===
       SMART Status not supported: ATA return descriptor not supported by controller firmware
       SMART overall-health self-assessment test result: PASSED
       Warning: This result is based on an Attribute check.
      General SMART Values:
       Offline data collection status: (0x00) Offline data collection activity
       was never started.
       Auto Offline Data Collection: Disabled.
       Self-test execution status: ( 0) The previous self-test routine completed
       without error or no self-test has ever 
       been run.
       Total time to complete Offline 
       data collection: ( 0) seconds.
       Offline data collection
       capabilities: (0x79) SMART execute Offline immediate.
       No Auto Offline data collection support.
       Suspend Offline collection upon new
       command.
       Offline surface scan supported.
       Self-test supported.
       Conveyance Self-test supported.
       Selective Self-test supported.
       SMART capabilities: (0x0003) Saves SMART data before entering
       power-saving mode.
       Supports SMART auto save timer.
       Error logging capability: (0x01) Error logging supported.
       General Purpose Logging supported.
       Short self-test routine 
       recommended polling time: ( 1) minutes.
       Extended self-test routine
       recommended polling time: ( 2) minutes.
       Conveyance self-test routine
       recommended polling time: ( 2) minutes.
       SCT capabilities: (0x003d) SCT Status supported.
       SCT Error Recovery Control supported.
       SCT Feature Control supported.
       SCT Data Table supported.
      SMART Attributes Data Structure revision number: 1
       Vendor Specific SMART Attributes with Thresholds:
       ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
       5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 8
       9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2575
       12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14
       170 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0
       171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 2
       172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
       174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 14
       175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 2390 (14 65535)
       183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
       184 End-to-End_Error_Count 0x0033 100 100 090 Pre-fail Always - 0
       187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
       190 Drive_Temperature 0x0022 081 075 000 Old_age Always - 19 (Min/Max 16/27)
       192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 14
       194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 19
       197 Pending_Sector_Count 0x0012 100 100 000 Old_age Always - 0
       199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
       225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 3576929
       226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 522
       227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 25
       228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 154396
       232 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0
       233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
       234 Thermal_Throttle_Status 0x0032 100 100 000 Old_age Always - 0/0
       235 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 2390 (14 65535)
       241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 3576929
       242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 1226992
       243 NAND_Writes_32MiB 0x0032 100 100 000 Old_age Always - 7374461
      SMART Error Log Version: 1
       No Errors Logged
      SMART Self-test log structure revision number 1
       No self-tests have been logged. [To run self-tests, use: smartctl -t]
      SMART Selective self-test log data structure revision number 1
       SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
       1 0 0 Not_testing
       2 0 0 Not_testing
       3 0 0 Not_testing
       4 0 0 Not_testing
       5 0 0 Not_testing
       Selective self-test flags (0x0):
      

      After scanning selected spans, do NOT read-scan remainder of disk.
      If Selective self-test is pending on power-up, resume after 0 minute delay.

       

       

            esneiders Eriks Sneiders
            che666 Rudolf Kastl
            Team INT
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: