(TLDR; my ipmi queries need a delay; has anybody had a workaround?)
I have a new install of Z6 running on AlmaLinux/RH8, that I started to replace our older Z4 box. (That box had been migrated from SuSe to Centos 7, and it just wasn't working right. Been a happy zabbix user for 7+ years)
As part of this, I'm moving things slowly over from one box to another, and making sure all the data is coming in fine, the alerts are relevant, creating the new dashboards to replace screens, etc.
(Aside: I LOVE the new dashboards and visualization. They are beautiful!)
It's mostly been fine- I did have a weird alert because a temperature was above 10, when all the triggers are at 65, and I'm still investigating that- but the only major thing I have a problem with so far is retrieval of data via IPMI from some supermicro/intel boards (s2600wft and s2600wt2r). It does not come in but every 10-15 minutes if I'm lucky.
To be honest, we had acquired these new servers and on z4, I hadn't really worried about them because I had other boxes at each of their locations that I was getting good data from via Dell idracs for environmentals, and I could get the VMs data using the agent, so the hardware stats weren't critical. Now it is, since the old hardware is gone.
I tried figuring a way to get the data via SNMP, but apparently, the boards don't provide SNMP that way- they can only send traps. Boo!
I've got plenty of HD space, speed, etc. There are no queues waiting or delayed. I am getting good consistent data everywhere else with the few Dells I've moved over, and with some Lenovo, and with all the agents on the box. There are no errors showing up. IPMItool queries work just fine and I can see all the data. I have NOT done a full debug run yet; that's next. But I poked a little bit more, and saw this in the docs for the recent board OOBM bios updates:
"ipmitool is not working well when running in high load network. We recommend to add extra timeout by using “-N 5”. Default is 1 second for RMCP+, which is not enough. –N 5 will set 5 second as timeout. So the command will look like: ipmitool –I lanplus –H ip –U user –P password –C 17 –N 5 command
I dug in the code for ipmipoller.c, and it appears that there is a constant value of 1 for the timeout value, and it doesn't look like I can change that easily without changing and compiling or submitting a WHOLE bunch of code to allow it to be a variable, and y'all do NOT want my code. I am a sysadmin, and it's been 30+ years since I did any C:
const int ipc_timeout = 2;
const int ipmi_timeout = 1;
So I really need a way to change that, preferably on the IPMI tab of hosts.