• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

When server fan fails, how can BMC shut down the fan?

Gloria Chen

New Member
Joined
Aug 30, 2023
Messages
3 (0.00/day)
What is the risk to support server FAN replacement without shutdown system?
When server FAN fails, how can BMC shut down the fan?
 
What is the risk to support server FAN replacement without shutdown system?
When server FAN fails, how can BMC shut down the fan?
Depends on the implementation of IPMI and what the BIOS allows anyway, but on some you can easily switch them to off or "low rpm" modes.
 
What is the risk to support server FAN replacement without shutdown system?
I am not exactly sure what you mean when you ask, "What is the risk" and "replacement"?

Are you saying you have a fan that has failed and you want to replace it without shutting down the server?

If that is what you are asking, there are HUGE risks - depending on which fan and where it is located. You could drop a screw, or a screwdriver, or the fan. If the fan is attached to a heatsink, you could break the TIM (thermal interface material) bond. You could accidently knock loose a power or data cable. You might zap something with ESD.

IMO, as long as you are not overheating right now, the smart move is to schedule 1 hour of downtime - perhaps over this coming weekend (don't put it off too long or it will keep getting put off). Have everything ready before the downtime starts (new fan, tools, battleplan, etc.).

With proper planning, it will probably take you 5 - 10 minutes to swap out the fan. Then use the remaining time to do a proper cleaning out of any heat-trapping dust (and critters) that have settled inside. Thoroughly inspect the innards and makes sure the remaining fans spin freely and that all cables, RAM, and expansion cards are securely installed and connected. Check your cable management for tidiness. Good cable management minimizes impacting desired air flow and the risks of a wayward cable causing other issues.

If no additional problems are found, this should take 20 minutes or less.

Then put the server back in service and shine in the eyes of your bosses for completing the task in half the scheduled time!
 
I am not exactly sure what you mean when you ask, "What is the risk" and "replacement"?

Are you saying you have a fan that has failed and you want to replace it without shutting down the server?

If that is what you are asking, there are HUGE risks - depending on which fan and where it is located. You could drop a screw, or a screwdriver, or the fan. If the fan is attached to a heatsink, you could break the TIM (thermal interface material) bond. You could accidently knock loose a power or data cable. You might zap something with ESD.

IMO, as long as you are not overheating right now, the smart move is to schedule 1 hour of downtime - perhaps over this coming weekend (don't put it off too long or it will keep getting put off). Have everything ready before the downtime starts (new fan, tools, battleplan, etc.).

With proper planning, it will probably take you 5 - 10 minutes to swap out the fan. Then use the remaining time to do a proper cleaning out of any heat-trapping dust (and critters) that have settled inside. Thoroughly inspect the innards and makes sure the remaining fans spin freely and that all cables, RAM, and expansion cards are securely installed and connected. Check your cable management for tidiness. Good cable management minimizes impacting desired air flow and the risks of a wayward cable causing other issues.

If no additional problems are found, this should take 20 minutes or less.

Then put the server back in service and shine in the eyes of your bosses for completing the task in half the scheduled time!
Are you saying you have a fan that has failed and you want to replace it without shutting down the server?
→ i'm afraid so

Below approach is what we are implementing to meet the requirement, it's under the test in our lab right now :
  1. When a fan is removed: BMC has a related sensor, such as the Fan Present Sensor. As a result, the related event “IE: fan removed” can be observed in the BMC (either in GUI or IPMISEL).
  2. Behavior of other fans: When a fan fails (including when a fan is removed), the BMC will increase the speed of the other fans. However, not all other fans will reach 100%. The rate of increase is defined by the thermal table.
Thank you for the reply, and i'll keep you updated.....this is a very special project....the requirement from customer always shocked us and my boss
 
Just follow basic ESD safety procedures and it'll go without a hitch, no downtime necessary. Servers are usually designed to be easily repaired and with concern for high uptime scenarios.

If it's the CPU heatsink fan, you should be able to replace it as well, most servers are designed with easy to replace fans to maximize uptime - if yours isn't, then you should schedule a brief downtime to replace the part (heatsink assembly included) as Bill suggested.

Removing the heatsink with the system live will not immediately damage the processor but it will certainly cause a stability problem, so the downtime is highly advised in this case.

Good luck and be careful around live fan blades. Nasty cuts, believe me
 
Many Supermicro servers have documented hotswap fans.

It shouldn't be a big deal even on a normal fan, depending on whether it needs loose metal to mount (screws, clamps). Obviously you don't want to drop that onto the mainboard.
 
Many Supermicro servers have documented hotswap fans.

It shouldn't be a big deal even on a normal fan, depending on whether it needs loose metal to mount (screws, clamps). Obviously you don't want to drop that onto the mainboard.
Yup this right here is a requirement, hotswap technically should have better shielding to prevent a stray power arc from occuring, basically like scsi or esata.
 
Yup this right here is a requirement, hotswap technically should have better shielding to prevent a stray power arc from occuring, basically like scsi or esata.
The best part is there is like no risk of that anyway. In most super micros it’s literally a braket bolted to the back of the backplane. The fans are delta high CFM but they are just 4 pin PWM. The fans clip into there chassis then you slide them onto the mount rails and they “click” into place. The 4 pin fan connector snaps into a holder. You don’t need to get your hands any deeper in the chassis. Swapping takes seconds.
 
The best part is there is like no risk of that anyway. In most super micros it’s literally a braket bolted to the back of the backplane. The fans are delta high CFM but they are just 4 pin PWM. The fans clip into there chassis then you slide them onto the mount rails and they “click” into place. The 4 pin fan connector snaps into a holder. You don’t need to get your hands any deeper in the chassis. Swapping takes seconds.

Oh a rack mount with a backplane connector. I used to deal with avionics racks in the AF on the F-16 in my 20s, even 1 computer was like what you are saying on that plane. Just slide it in making sure its aligned correctly and secure using spring loaded knurled knobs or a lever lock (think p4 heatsink)
 
Oh a rack mount with a backplane connector. I used to deal with avionics racks in the AF on the F-16 in my 20s, even 1 computer was like what you are saying on that plane. Just slide it in making sure its aligned correctly and secure using spring loaded knurled knobs or a lever lock (think p4 heatsink)
Yup exactly only the 1U servers you need to fuck with screws nowadays. And even then newer ones just sit in rubber grommets under them so you connect 2 or 3 together and then like a slinky lay them across the vibration damping
 
Just follow basic ESD safety procedures and it'll go without a hitch, no downtime necessary. Servers are usually designed to be easily repaired and with concern for high uptime scenarios.

If it's the CPU heatsink fan, you should be able to replace it as well, most servers are designed with easy to replace fans to maximize uptime - if yours isn't, then you should schedule a brief downtime to replace the part (heatsink assembly included) as Bill suggested.

Removing the heatsink with the system live will not immediately damage the processor but it will certainly cause a stability problem, so the downtime is highly advised in this case.

Good luck and be careful around live fan blades. Nasty cuts, believe me
thanks, both of you and Bill's suggestion, we have done all we can do. (Good luck and be careful around live fan blades. Nasty cuts, believe me → yes, i have bad memery \0/ )

Many Supermicro servers have documented hotswap fans.

It shouldn't be a big deal even on a normal fan, depending on whether it needs loose metal to mount (screws, clamps). Obviously you don't want to drop that onto the mainboard.
Many Supermicro servers have documented hotswap fans. → we are not Supermicro but we do have it as well \0/
 
Back
Top