[ZBXNEXT-9028]  Consistent device naming on linux Created: 2024 Feb 20  Updated: 2025 Jun 13

Status: Spec. sign-off
Project: ZABBIX FEATURE REQUESTS
Component/s: Agent2 plugin (G)
Affects Version/s: 6.4.11
Fix Version/s: None

Type: Change Request Priority: Minor
Reporter: Roy Sigurd Karlsbakk Assignee: Eriks Sneiders
Resolution: Unresolved Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: 16h
Original Estimate: Not Specified
Environment:

Linux


Attachments: File findsamefiles.go    
Issue Links:
Sub-task
depends on ZBX-26446 Zabbix agent 2 SMART plugin does not ... Merging
Team: Team INT

 Description   

Hi all

Today, Zabbix finds devices like sda, sdb, sdc etc and that is generally fine on smaller systems. On larger, however, if you have a lot, like 20 or 80 drives, those won't be given the same device name on each bootup. Your sdd and sdc can swap place because one was detected before the other. From Linux' point of view, this isn't a bug at all, since device names shouldn't be trusted anyway. If you keep on trusting device names, zabbix will show all sorts of interesting errors that aren't there, for instance a drive in a RAID where the smart counters have started ticking, and then, after a reboot, another drive has the same count, but the first has zero. So, with the old smartctl check (before zabbix_agent2), I rewrote the discovery to allow for returning a consistent path to the device, namely /dev/disk/by-id/ata-something-blabla (or scsi- or usb-), which then is a symlink to the real device. I know this works and it removes a lot of false positives. The problem is - how can I make Zabbix' LLD for blockdevices do the same? Rewrite the whole thing in scripts? It'd be very nice if there was a way to have zabbix_agent2 do this on its own. My versjon of the LLD script for disks is here in case someone wants to have a peek https://github.com/rkarlsba/ymse/blob/master/zabbix/zbx-smartctl/discovery-scripts/nix/smartctl-disks-discovery.pl

roy 



 Comments   
Comment by Roy Sigurd Karlsbakk [ 2024 Mar 08 ]

Is there any progress on this?

Comment by Roy Sigurd Karlsbakk [ 2024 Mar 31 ]

This isn't really a minor - for any machine that has a lot of drives, the drives will rename after a reboot and if a drive has issues, it'll make a problem. Now, after a reboot, I get this "SMART [sdb sat]: Disk has been replaced (new serial number received)", which is normal, since sdb isn't a fixed drive and the drive most definetely hasn't got a new serial number, but just another name. Could you look into this, please? I've shown how I did it - it's not hard.

Comment by Roy Sigurd Karlsbakk [ 2024 Apr 12 ]

Are you working on this? It's not like it's hard work to fix it!

Comment by Roy Sigurd Karlsbakk [ 2024 Jun 03 ]

it's taken more than a year now and the fix should be rather easy. What's stopping you?

Comment by Eriks Sneiders [ 2024 Jun 10 ]

Good day! We are currently fixing other related SMART issues and the stability of the SMART plugin, we will try to get to this issue after those have been fixed.

Would you have any suggestions how to reproduce this issue constantly on a smaller device count? As finding DEV devices with 20-80 drives is quite troublesome.

Comment by Stefan [ 2024 Jun 10 ]

esneiders that sounds great!
suggestion: you need to know the root cause: it happens because harddisks  have different response times to say "yes I'm here" .. so the easiest thing would be to just plugin other harddisks one second later

 

RoyK if you think is that easy, zabbix is opensource - so  why don't you create a pull request?

Comment by Roy Sigurd Karlsbakk [ 2024 Jun 10 ]

esneiders The root cause is described here https://wiki.archlinux.org/title/Persistent_block_device_naming

shad0w I have described above how I did this in perl, just to show it that way. However, I'm not fluent in Go, so I don't really want to start hacking, resulting in something so ugly I'll get banned for it anyway.

Comment by Roy Sigurd Karlsbakk [ 2024 Jun 10 ]

Here's a simpler way, in bash

$ find -L /dev/disk/ -samefile /dev/sda -exec ls -l {} \;
lrwxrwxrwx 1 root root 9 Jun 10 13:09 /dev/disk/by-diskseq/1 -> ../../sda
lrwxrwxrwx 1 root root 9 Jun 10 13:09 /dev/disk/by-path/pci-0000:06:00.0-sas-phy3-lun-0 -> ../../sda
lrwxrwxrwx 1 root root 9 Jun 10 13:09 /dev/disk/by-id/wwn-0x5000c500e53d3ede -> ../../sda
lrwxrwxrwx 1 root root 9 Jun 10 13:09 /dev/disk/by-id/ata-ST16000NM001G-2KK103_ZL2PS47V -> ../../sda

Comment by alois [ 2024 Jun 10 ]

esneiders the problem shouldn't be hard to reproduce. Happens e.g. all the time on my desktop when I reboot it. After a restart the naming of the devices changed. Or I stumbled into that bug, because we use an Intel NUC as a VPN gateway. It has a eMMC and for reasons I have /var/log on an sd card. And there after nearly every reboot /dev/mmcblk0 and /dev/mmcblk1 are swapped.

And this doesn't only affect smart monitoring. I noticed it because of the eMMC and sd card I had to adjust $VFS.DEV.READ.AWAIT.WARN and $VFS.DEV.READ.WRITE.WARN which obviously fails when the drives swapped their names.

Comment by Roy Sigurd Karlsbakk [ 2024 Jun 11 ]

shad0w findsamefiles.go I beleive this should show a way to do it.

$ go run findsamefiles.go sda
/dev/sda -> [/dev/disk/by-id/ata-ST16000NM001G-2KK103_ZL2PS47V /dev/disk/by-id/wwn-0x5000c500e53d3ede]

Apologies for not posting a pull request. I don't speak go that well, so I guess other people in here can integrate this in a better way than I.

Creds to ikke @irc for the code - I just modified it slightly to return an array of hits instead of a single one

Comment by Roy Sigurd Karlsbakk [ 2024 Sep 12 ]

This bug was originally flagged as trivial by Zabbix, but AFAICS, it yet haven't been touched. Can someone at Zabbix please update me?

Comment by Stefan [ 2025 Mar 05 ]

esneiders any news about this?

maybe a solution could be the type which smartctl --scan is looking for:

smartctl --scan -d by-id

this is not documented: https://github.com/smartmontools/smartmontools/issues/295

and it would be great, if this get a higher prio, because it's very annoying to get spammed every time a storage-server gets rebootet

Comment by David Angelovich [ 2025 May 23 ]

It might be easier to just modify the discovery script and template to use disk serial number as the unique identifier (since that's more or less that the disk id is anyway). Should only require smart.disk.get[] to be updated to support retrieving data by disk serial, and the rest is in the template.

Generated at Sun Jun 15 14:33:51 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.