-
Problem report
-
Resolution: Unresolved
-
Major
-
None
-
6.0.40, 7.0.12, 7.2.6, 7.4.0beta1
-
None
-
x86_64-linux-6.8.12-9-pve, Zabbix agent2 using smartctl version 7.4
-
Support backlog
Steps to reproduce:
- Run the following command using the Zabbix Agent 2 SMART plugin:
zabbix_agent2 -v -t smart.disk.discovery
Result:
- Commands are rejected with:
-
- Timeout occurred while gathering data.
-
- Cannot fetch data.: failed to execute smartctl... exit status 128, etc.
- Even though smartctl returns valid and parsable SMART JSON output, the plugin rejects it due to non-zero exit codes.
Expected:
- Plugin should accept and parse smartctl output when useful data is returned, even if exit status includes bits like 3, 4, or 5.
Root cause:
smartctl returns bitmask-style exit codes (e.g., 128 = bit 7 set, "device did not return all SMART info"), even when output is still valid JSON. However, the plugin's diskGetSingle logic strictly fails on any non-zero exit code, skipping useful data.
Relevant source:
Plugin source includes:
device, err := p.ctl.Execute(args...) if err != nil { return nil, errs.Wrap(err, errFailedToExecute) }
No post-check or conditional logic exists to tolerate known informative exit codes (e.g., bit 3, bit 4, etc.)
Exit status decoding (smartctl docs):
Bit Meaning
0 Command line did not parse
1 Device open failed
2 SMART not supported or enabled
3 SMART has detected a fault
4 Device error log has errors
5 Device self-test log has errors
6 Error when reading smartctl data
7 Device did not return all SMART info
Proposed resolution:
Introduce configurable tolerance in Zabbix Agent 2 SMART plugin for selected smartctl exit codes that still return parsable and valuable data.
Make sure that smart.disk.discovery is not depended on this as it will not discover any disks at all.
Log example
2025/04/25 09:37:27.284208 [Smart] failed to get device "/dev/sdl" info by type "scsi": Failed to execute smartctl: "{\n \"json_format_version\": [\n 1,\n 0\n ],\n \"smartctl\": {\n \"version\": [\n 7,\n 4\n ],\n \"pre_release\": false,\n \"svn_revision\": \"5530\",\n \"platform_info\": \"x86_64-linux-6.8.12-9-pve\",\n \"build_info\": \"(local build)\",\n \"argv\": [\n \"smartctl\",\n \"-a\",\n \"-d\",\n \"scsi\",\n \"/dev/sdl\",\n \"-j\"\n ],\n \"exit_status\": 128\n },\n \"local_time\": {\n \"time_t\": 1745566646,\n \"asctime\": \"Fri Apr 25 09:37:26 2025 CEST\"\n },\n \"device\": {\n \"name\": \"/dev/sdl\",\n \"info_name\": \"/dev/sdl\",\n \"type\": \"scsi\",\n \"protocol\": \"SCSI\"\n },\n \"scsi_vendor\": \"HGST\",\n \"scsi_product\": \"HUH721010AL4200\",\n \"scsi_model_name\": \"HGST HUH721010AL4200\",\n \"scsi_revision\": \"A2JB\",\n \"scsi_version\": \"SPC-4\",\n \"user_capacity\": {\n \"blocks\": 2441609216,\n \"bytes\": 10000831348736\n },\n \"logical_block_size\": 4096,\n \"scsi_lb_provisioning\": {\n \"name\": \"fully provisioned\",\n \"value\": 0,\n \"management_enabled\": {\n \"name\": \"LBPME\",\n \"value\": 0\n },\n \"read_zeros\": {\n \"name\": \"LBPRZ\",\n \"value\": 0\n }\n },\n \"rotation_rate\": 7200,\n \"form_factor\": {\n \"scsi_value\": 2,\n \"name\": \"3.5 inches\"\n },\n \"logical_unit_id\": \"0x5000cca26c2fdadc\",\n \"serial_number\": \"1DGV9W3Z\",\n \"device_type\": {\n \"scsi_terminology\": \"Peripheral Device Type [PDT]\",\n \"scsi_value\": 0,\n \"name\": \"disk\"\n },\n \"scsi_transport_protocol\": {\n \"name\": \"SAS (SPL-4)\",\n \"value\": 6\n },\n \"smart_support\": {\n \"available\": true,\n \"enabled\": true\n },\n \"temperature_warning\": {\n \"enabled\": true\n },\n \"smart_status\": {\n \"passed\": true\n },\n \"temperature\": {\n \"current\": 33,\n \"drive_trip\": 85\n },\n \"power_on_time\": {\n \"hours\": 48954,\n \"minutes\": 51\n },\n \"scsi_start_stop_cycle_counter\": {\n \"year_of_manufacture\": \"2018\",\n \"week_of_manufacture\": \"14\",\n \"specified_cycle_count_over_device_lifetime\": 50000,\n \"accumulated_start_stop_cycles\": 8,\n \"specified_load_unload_count_over_device_lifetime\": 600000,\n \"accumulated_load_unload_cycles\": 44\n },\n \"scsi_grown_defect_list\": 0,\n \"seagate_farm_log\": {\n \"supported\": false\n },\n \"scsi_error_counter_log\": {\n \"read\": {\n \"errors_corrected_by_eccfast\": 0,\n \"errors_corrected_by_eccdelayed\": 0,\n \"errors_corrected_by_rereads_rewrites\": 0,\n \"total_errors_corrected\": 0,\n \"correction_algorithm_invocations\": 121,\n \"gigabytes_processed\": \"1.176\",\n \"total_uncorrected_errors\": 0\n },\n \"write\": {\n \"errors_corrected_by_eccfast\": 0,\n \"errors_corrected_by_eccdelayed\": 8,\n \"errors_corrected_by_rereads_rewrites\": 0,\n \"total_errors_corrected\": 8,\n \"correction_algorithm_invocations\": 3737,\n \"gigabytes_processed\": \"0.066\",\n \"total_uncorrected_errors\": 0\n },\n \"verify\": {\n \"errors_corrected_by_eccfast\": 0,\n \"errors_corrected_by_eccdelayed\": 0,\n \"errors_corrected_by_rereads_rewrites\": 0,\n \"total_errors_corrected\": 0,\n \"correction_algorithm_invocations\": 18092,\n \"gigabytes_processed\": \"0.000\",\n \"total_uncorrected_errors\": 0\n }\n },\n \"scsi_pending_defects\": {\n \"count\": 0\n },\n \"scsi_self_test_0\": {\n \"code\": {\n \"value\": 1,\n \"string\": \"Background short\"\n },\n \"result\": {\n \"value\": 7,\n \"string\": \"Failed in segment\"\n },\n \"failed_segment\": {\n \"value\": 3,\n \"aka\": \"self_test_number\"\n },\n \"power_on_time\": {\n \"hours\": 48938,\n \"aka\": \"accumulated_power_on_hours\"\n }\n },\n \"scsi_self_test_1\": {\n \"code\": {\n \"value\": 1,\n \"string\": \"Background short\"\n },\n \"result\": {\n \"value\": 7,\n \"string\": \"Failed in segment\"\n },\n \"failed_segment\": {\n \"value\": 3,\n \"aka\": \"self_test_number\"\n },\n \"power_on_time\": {\n \"hours\": 48938,\n \"aka\": \"accumulated_power_on_hours\"\n }\n },\n \"scsi_self_test_2\": {\n \"code\": {\n \"value\": 2,\n \"string\": \"Background long\"\n },\n \"result\": {\n \"value\": 7,\n \"string\": \"Failed in segment\"\n },\n \"failed_segment\": {\n \"value\": 7,\n \"aka\": \"self_test_number\"\n },\n \"power_on_time\": {\n \"hours\": 48704,\n \"aka\": \"accumulated_power_on_hours\"\n },\n \"lba_first_failure\": {\n \"value\": 820141189,\n \"aka\": \"address_of_first_failure\"\n },\n \"sense_key\": {\n \"value\": 3,\n \"string\": \"Medium Error\"\n },\n \"asc\": 93,\n \"ascq\": 1,\n \"vendor_specific\": 0\n },\n \"scsi_self_test_3\": {\n \"code\": {\n \"value\": 1,\n \"string\": \"Background short\"\n },\n \"result\": {\n \"value\": 0,\n \"string\": \"Completed\"\n },\n \"power_on_time\": {\n \"hours\": 48698,\n \"aka\": \"accumulated_power_on_hours\"\n }\n },\n \"scsi_self_test_4\": {\n \"code\": {\n \"value\": 1,\n \"string\": \"Background short\"\n },\n \"result\": {\n \"value\": 0,\n \"string\": \"Completed\"\n },\n \"power_on_time\": {\n \"hours\": 48622,\n \"aka\": \"accumulated_power_on_hours\"\n }\n },\n \"scsi_extended_self_test_seconds\": 66360\n}": exit status 128.