[ZBX-19311] zabbix-agent2 smart monitoring fails with megaraid Created: 2021 Apr 29 Updated: 2024 Apr 10 Resolved: 2022 Mar 25 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent2 plugin (G) |
Affects Version/s: | 5.2.6 |
Fix Version/s: | 5.0.22rc1, 6.0.3rc1, 6.2.0alpha1, 6.2 (plan) |
Type: | Problem report | Priority: | Trivial |
Reporter: | Rudolf Kastl | Assignee: | Eriks Sneiders |
Resolution: | Fixed | Votes: | 0 |
Labels: | Agent2, Megaraid, SMART | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
RHEL 8 |
Issue Links: |
|
||||
Team: | |||||
Sprint: | Sprint 84 (Jan 2022), Sprint 85 (Feb 2022), Sprint 86 (Mar 2022) | ||||
Story Points: | 1 |
Description |
Steps to reproduce:
Result: When trying to run the discovery manually with the agent: zabbix_agent2 -v -t smart.disk.discovery
(...) 2021/04/29 12:47:40.125137 [Smart] stopped looking for RAID devices of megaraid type, err:%!(EXTRA *errors.errorString=failed to get disk data from smartctl: Smartctl open device: /dev/bus/0 [megaraid_disk_00] failed: INQUIRY failed) (...)
Expected: /sbin/smartctl --scan /dev/sda -d scsi # /dev/sda, SCSI device /dev/sdb -d scsi # /dev/sdb, SCSI device /dev/sdc -d scsi # /dev/sdc, SCSI device /dev/sdd -d scsi # /dev/sdd, SCSI device /dev/sde -d scsi # /dev/sde, SCSI device /dev/sdf -d scsi # /dev/sdf, SCSI device /dev/sdg -d scsi # /dev/sdg, SCSI device /dev/sdh -d scsi # /dev/sdh, SCSI device /dev/sdi -d scsi # /dev/sdi, SCSI device /dev/sdj -d scsi # /dev/sdj, SCSI device /dev/sdk -d scsi # /dev/sdk, SCSI device /dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device /dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device /dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device /dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device /dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device /dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device /dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device /dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device /dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device /dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device /dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device /dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device NOTE: smartctl uses and outputs that virtual bus device that does not really exist in the filesystem, but this way you are able to return the smart status:
smartctl -a /dev/bus/0 -d megaraid,1 smartctl 7.1 2020-04-05 r5049 [x86_64-linux-4.18.0-240.10.1.el8_3.x86_64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Intel S4510/S4610/S4500/S4600 Series SSDs Device Model: INTEL SSDSC2KG038T8 Serial Number: PHYG025201RH3P8EGN LU WWN Device Id: 5 5cd2e4 152613993 Firmware Version: XCV10120 User Capacity: 3,840,755,982,336 bytes [3.84 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Apr 29 12:49:44 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status not supported: ATA return descriptor not supported by controller firmware SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x79) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 2) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 8 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2575 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 170 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 2 172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0 174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 14 175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 2390 (14 65535) 183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error_Count 0x0033 100 100 090 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Drive_Temperature 0x0022 081 075 000 Old_age Always - 19 (Min/Max 16/27) 192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 14 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 19 197 Pending_Sector_Count 0x0012 100 100 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 3576929 226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 522 227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 25 228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 154396 232 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0 234 Thermal_Throttle_Status 0x0032 100 100 000 Old_age Always - 0/0 235 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 2390 (14 65535) 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 3576929 242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 1226992 243 NAND_Writes_32MiB 0x0032 100 100 000 Old_age Always - 7374461 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk.
|
Comments |
Comment by Rudolf Kastl [ 2022 Jan 20 ] |
Update: Still happens on 5.4.9 |
Comment by Eriks Sneiders [ 2022 Mar 17 ] |
Fixed in:
|