[ZBX-19311] zabbix-agent2 smart monitoring fails with megaraid Created: 2021 Apr 29  Updated: 2024 Apr 10  Resolved: 2022 Mar 25

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent2 plugin (G)
Affects Version/s: 5.2.6
Fix Version/s: 5.0.22rc1, 6.0.3rc1, 6.2.0alpha1, 6.2 (plan)

Type: Problem report Priority: Trivial
Reporter: Rudolf Kastl Assignee: Eriks Sneiders
Resolution: Fixed Votes: 0
Labels: Agent2, Megaraid, SMART
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

RHEL 8


Issue Links:
Duplicate
Team: Team INT
Sprint: Sprint 84 (Jan 2022), Sprint 85 (Feb 2022), Sprint 86 (Mar 2022)
Story Points: 1

 Description   

Steps to reproduce:

  1. Deploy and configure zabbix-agent2 on RHEL8
  2. Import the latest template for smart monitoring from git
  3. Create sudo rule for zabbix user and smartctl
  4. Have disk discovery failing

Result:

When trying to run the discovery manually with the agent:

zabbix_agent2 -v -t smart.disk.discovery

 

(...)
2021/04/29 12:47:40.125137 [Smart] stopped looking for RAID devices of megaraid type, err:%!(EXTRA *errors.errorString=failed to get disk data from smartctl: Smartctl open device: /dev/bus/0 [megaraid_disk_00] failed: INQUIRY failed)
(...)

 

Expected:

/sbin/smartctl --scan
 /dev/sda -d scsi # /dev/sda, SCSI device
 /dev/sdb -d scsi # /dev/sdb, SCSI device
 /dev/sdc -d scsi # /dev/sdc, SCSI device
 /dev/sdd -d scsi # /dev/sdd, SCSI device
 /dev/sde -d scsi # /dev/sde, SCSI device
 /dev/sdf -d scsi # /dev/sdf, SCSI device
 /dev/sdg -d scsi # /dev/sdg, SCSI device
 /dev/sdh -d scsi # /dev/sdh, SCSI device
 /dev/sdi -d scsi # /dev/sdi, SCSI device
 /dev/sdj -d scsi # /dev/sdj, SCSI device
 /dev/sdk -d scsi # /dev/sdk, SCSI device
 /dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
 /dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
 /dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
 /dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
 /dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
 /dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
 /dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device
 /dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
 /dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device
 /dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device
 /dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device
 /dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device

NOTE: smartctl uses and outputs that virtual bus device that does not really exist in the filesystem, but this way you are able to return the smart status:

 

smartctl -a /dev/bus/0 -d megaraid,1
 smartctl 7.1 2020-04-05 r5049 [x86_64-linux-4.18.0-240.10.1.el8_3.x86_64] (local build)
 Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
 Model Family: Intel S4510/S4610/S4500/S4600 Series SSDs
 Device Model: INTEL SSDSC2KG038T8
 Serial Number: PHYG025201RH3P8EGN
 LU WWN Device Id: 5 5cd2e4 152613993
 Firmware Version: XCV10120
 User Capacity: 3,840,755,982,336 bytes [3.84 TB]
 Sector Sizes: 512 bytes logical, 4096 bytes physical
 Rotation Rate: Solid State Device
 Form Factor: 2.5 inches
 Device is: In smartctl database [for details use: -P show]
 ATA Version is: ACS-3 T13/2161-D revision 5
 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
 Local Time is: Thu Apr 29 12:49:44 2021 UTC
 SMART support is: Available - device has SMART capability.
 SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
 SMART Status not supported: ATA return descriptor not supported by controller firmware
 SMART overall-health self-assessment test result: PASSED
 Warning: This result is based on an Attribute check.
General SMART Values:
 Offline data collection status: (0x00) Offline data collection activity
 was never started.
 Auto Offline Data Collection: Disabled.
 Self-test execution status: ( 0) The previous self-test routine completed
 without error or no self-test has ever 
 been run.
 Total time to complete Offline 
 data collection: ( 0) seconds.
 Offline data collection
 capabilities: (0x79) SMART execute Offline immediate.
 No Auto Offline data collection support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
 SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
 Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
 Short self-test routine 
 recommended polling time: ( 1) minutes.
 Extended self-test routine
 recommended polling time: ( 2) minutes.
 Conveyance self-test routine
 recommended polling time: ( 2) minutes.
 SCT capabilities: (0x003d) SCT Status supported.
 SCT Error Recovery Control supported.
 SCT Feature Control supported.
 SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
 Vendor Specific SMART Attributes with Thresholds:
 ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 8
 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2575
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14
 170 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0
 171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 2
 172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
 174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 14
 175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 2390 (14 65535)
 183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
 184 End-to-End_Error_Count 0x0033 100 100 090 Pre-fail Always - 0
 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
 190 Drive_Temperature 0x0022 081 075 000 Old_age Always - 19 (Min/Max 16/27)
 192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 14
 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 19
 197 Pending_Sector_Count 0x0012 100 100 000 Old_age Always - 0
 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
 225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 3576929
 226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 522
 227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 25
 228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 154396
 232 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0
 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
 234 Thermal_Throttle_Status 0x0032 100 100 000 Old_age Always - 0/0
 235 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 2390 (14 65535)
 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 3576929
 242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 1226992
 243 NAND_Writes_32MiB 0x0032 100 100 000 Old_age Always - 7374461
SMART Error Log Version: 1
 No Errors Logged
SMART Self-test log structure revision number 1
 No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
 Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 



 Comments   
Comment by Rudolf Kastl [ 2022 Jan 20 ]

Update: Still happens on 5.4.9

Comment by Eriks Sneiders [ 2022 Mar 17 ]

Fixed in:

Generated at Sat Aug 02 10:00:59 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.