• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Tesla T4 Problems

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
Hello, We have a Threadripper TRX40 workstation using Tesla T4 overheating.
I had some troubles during the installation. I thought it was gone!
But the problem seems like still remain unsolved. GPU temperature is at minimum 57, on simple benchmarks ( not eve stress test ) over heating up to 91 Celsius and GPU close it self.
GPU memory always in use even in idle stage.
Teslat4-1.gif
Teslat4-0.gif

Configuration viideo:

Any idea what is going on?
Best
Serkan
 
Last edited:
Joined
Jun 29, 2009
Messages
1,146 (0.29/day)
Location
austria
System Name ibuytheusedstuff
Processor 5960x
Motherboard x99 sabertooth
Cooling water
Memory 32 dual ranked
Video Card(s) 1080ti
Display(s) 120hz
Power Supply antec 1200 oc
Mouse mx 518
Keyboard roccat arvo
i cannot find a 441.08 driver that officially supports tesla cards.
this is the latest with cuda 10.2 for tesla t series


maybe its worth a try

sorry i misread that its you are using the latest drivers!
 
Last edited:

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
i cannot find a 441.08 driver that officially supports tesla cards.
this is the latest with cuda 10.2 for tesla t series


maybe its worth a try
Thank you for quick response.
I have installed actually the same driver, But during the installation has error and suggested to install DHC version. Later suggested standard version. I have no clue which one is working. But it is working with fault. Memory in Full load but no work load on it! It does heat from memory reason I guess.
I have made new video about the problem.
 
Joined
Jun 29, 2009
Messages
1,146 (0.29/day)
Location
austria
System Name ibuytheusedstuff
Processor 5960x
Motherboard x99 sabertooth
Cooling water
Memory 32 dual ranked
Video Card(s) 1080ti
Display(s) 120hz
Power Supply antec 1200 oc
Mouse mx 518
Keyboard roccat arvo
it was my fault misreading info on gpu-z sorry
never saw this memory usage myself-the card seems to downclock okay

is this a new card?
newest bios on your motherboard?

maybe ya could post all your specs for easier helping? thx

did ya try to swap the cards to another slot?
are all pci-e slots occupied?
can ya switch the tesla card to pci-e 3.0 in bios?


for others who want to help: looks like everything was bought new:
asus lc360 aio
msi trx40 creator\changed to Gigabyte Aorus TRX40 Extreme with newest bios
g.skill neo F4-3600c16-19-19-39 \ 32gtznc \ x2
corsair hx1200i
corsair mp510 nvme x2
nvidia tesla T4m low profile

video gets interesting from 23.00min with msi mainboard

and new mainboard gigabyte start problems with tesla from 36.00 min and error D4=pci resource allocation error\out of resources.
 
Last edited:

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
No more "msi trx40 creator" Because it was no even display signal and "Nvidia Tesla T4" overheat offline condition.
I have replaced with Gigabyte Aorus TRX40 Extreme.
Yes all new and fresh installation.
I have just updated new bios just came from Gigabyte support. But it is more warmer. Tesla T4 actually not even heating above room temperature at my other intel PC!
These motherboards has 4 piece x16 PCIe Lane, But both are support 2 of x8 2 of x16. So I have not much choice. GPU`s need to be on full speed Lane. But I will try Tesla on x8 speed. It might support. However Nvidia Claimed Tesla T4 won`t loose from it`s own performance at x8 speed lane. But I doubted.
 

Attachments

Joined
Jun 29, 2009
Messages
1,146 (0.29/day)
Location
austria
System Name ibuytheusedstuff
Processor 5960x
Motherboard x99 sabertooth
Cooling water
Memory 32 dual ranked
Video Card(s) 1080ti
Display(s) 120hz
Power Supply antec 1200 oc
Mouse mx 518
Keyboard roccat arvo
and just for testing i would place a fan to the tesla. you are not the only one with overheating tesla card

maybe its just dead on arrival

Tesla T4 actually not even heating above room temperature at my other intel PC!
so you are saying the tesla card works normal in another pc whithout heat + memory problems?
 
Last edited:

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
and just for testing i would place a fan to the tesla. you are not the only one with overheating tesla card

maybe its just dead on arrival


so you are saying the tesla card works normal in another pc whithout heat + memory problems?
From Gigabyte support, Bios Updated and result more overheating:

I have removed the Tesla T4 from AMD MB and install on Intel i-9 Based Motherboard: seems like no overheating issue Except full memory use remain.
 

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
Double-check VRAM usage in a CLI with this command:

Code:
"%ProgramFiles%\NVIDIA Corporation\NVSMI\nvidia-smi.exe"
Nvidia.png


Tester setup is Intel based system. It is same time captured. According to "nvidia-smi.exe" Memory usage (86/15205) not much. But TechPowerUP app shows 15359MB (%100).
But our main Computer AMD Threadripper 3970x. Main problem there heat and Memory issue. I am really curious to see different Tesla T4 on Similar system if it is conflict of New generation AMD system and Nvidia Tesla GPU!
 
Joined
Jul 18, 2016
Messages
292 (0.21/day)
System Name Gaming PC / I7 XEON
Processor I7 4790K @stock / XEON W3680 @ stock
Motherboard Asus Z97 MAXIMUS VII FORMULA / GIGABYTE X58 UD7
Cooling X61 Kraken / X61 Kraken
Memory 32gb Vengeance 2133 Mhz / 24b Corsair XMS3 1600 Mhz
Video Card(s) Gainward GLH 1080 / MSI Gaming X Radeon RX480 8 GB
Storage Samsung EVO 850 500gb ,3 tb seagate, 2 samsung 1tb in raid 0 / Kingdian 240 gb, megaraid SAS 9341-8
Display(s) 2 BENQ 27" GL2706PQ / Dell UP2716D LCD Monitor 27 "
Case Corsair Graphite Series 780T / Corsair Obsidian 750 D
Audio Device(s) ON BOARD / ON BOARD
Power Supply Sapphire Pure 950w / Corsair RMI 750w
Mouse Steelseries Sesnsei / Steelseries Sensei raw
Keyboard Razer BlackWidow Chroma / Razer BlackWidow Chroma
Software Windows 1064bit PRO / Windows 1064bit PRO
i would not trust much gpuz
 
Joined
Jan 8, 2017
Messages
4,841 (4.09/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Scythe Katana 4 - 3x 120mm case fans
Memory 16GB - Corsair Vengeance LPX
Video Card(s) OEM Dell GTX 1080
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Zalman R1
Power Supply 500W
TESLAS DO NOT HAVE ACTIVE COOLING.

They are passively cooled and without modifying the cooling it's always going to overheat in a normal system, these things are designed for server casings with forced air intakes.

I can't believe no one pointed this out.

Take the shroud off and try and find a way to mount a fan on it, unless it's under warranty and you don't want to do that. Otherwise you have to come up with something else to try and force the air through the heatsink somehow. I've seen people try and make a "funnel" out of tape and put a fan at the end of it.
 
Last edited:
Joined
Sep 28, 2005
Messages
1,090 (0.21/day)
Location
Calgary Alberta, Canada
System Name PussySlayer
Processor Intel Core i7 4770
Motherboard Asrock Z87E-ITX
Cooling Some crappy Silverstone ITX Cooler
Memory 2x8GB Gskill RipjawsX 1600
Video Card(s) GTX 1070
Storage 1x500gb Crucial SSD
Display(s) BenQ 24" 1080P
Case Couger QBX
Audio Device(s) Onboard
Power Supply 650W EVGA BR - Coil Whine Issue
Software Windows 10 64bit Pro
The level 20 has the front glass panel, right? That thing is doesn't have good airflow and unfortunately as the gentleman above me stated, the T4 is a fanless gpu. Also noticeable in the video.

So the poor thing is cooking as the airflow isn't the greatest.
 
Joined
Jun 2, 2017
Messages
2,220 (2.13/day)
System Name Best AMD Computer
Processor AMD TR4 1920X
Motherboard MSI X399 SLI Plus
Cooling Alphacool Eisbaer 420 x2 Noctua XPX Pro TR4 block
Memory Gskill RIpjaws 4 3000MHZ 48GB
Video Card(s) Sapphire Vega 64 Nitro, Gigabyte Vega 64 Gaming OC
Storage 6 x NVME 480 GB, 2 x SSD 2TB, 5TB HDD, 2 TB HDD, 2x 2TB SSHD
Display(s) Acer 49BQ0k 4K monitor
Case Thermaltake Core X9
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Corsair HX1200!
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 10 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 24955 Time Spy: 13500
The level 20 has the front glass panel, right? That thing is doesn't have good airflow and unfortunately as the gentleman above me stated, the T4 is a fanless gpu. Also noticeable in the video.

So the poor thing is cooking as the airflow isn't the greatest.
If the OP has a Level 20 he may want to change that toi something like the CM 500 Mesh so that the components can get proper airflow. It would have better (if they were still available) to use the Core X series.
 
Joined
Jan 8, 2017
Messages
4,841 (4.09/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Scythe Katana 4 - 3x 120mm case fans
Memory 16GB - Corsair Vengeance LPX
Video Card(s) OEM Dell GTX 1080
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Zalman R1
Power Supply 500W
That card wont be able to be cooled properly no matter how much airflow you throw at it, the air never goes through the heatsink like it should due to low pressure.
 
Joined
Sep 28, 2005
Messages
1,090 (0.21/day)
Location
Calgary Alberta, Canada
System Name PussySlayer
Processor Intel Core i7 4770
Motherboard Asrock Z87E-ITX
Cooling Some crappy Silverstone ITX Cooler
Memory 2x8GB Gskill RipjawsX 1600
Video Card(s) GTX 1070
Storage 1x500gb Crucial SSD
Display(s) BenQ 24" 1080P
Case Couger QBX
Audio Device(s) Onboard
Power Supply 650W EVGA BR - Coil Whine Issue
Software Windows 10 64bit Pro
That card wont be able to be cooled properly no matter how much airflow you throw at it, the air never goes through the heatsink like it should due to low pressure.
Well, I guess the user could try to somehow attach a fan to blow directly through the fins from the back end blowing out towards the back plate. If that makes sense.

like this:



This here is a thread on the P4 which had the overheating issue:


If OP has a 3d printer, the link provides the gcode file needed for 3dprinter to print with. If you can find someone who has one, that could also work. The P4 and T4 look to be same size so it should work, no?
 
Joined
Jul 18, 2016
Messages
292 (0.21/day)
System Name Gaming PC / I7 XEON
Processor I7 4790K @stock / XEON W3680 @ stock
Motherboard Asus Z97 MAXIMUS VII FORMULA / GIGABYTE X58 UD7
Cooling X61 Kraken / X61 Kraken
Memory 32gb Vengeance 2133 Mhz / 24b Corsair XMS3 1600 Mhz
Video Card(s) Gainward GLH 1080 / MSI Gaming X Radeon RX480 8 GB
Storage Samsung EVO 850 500gb ,3 tb seagate, 2 samsung 1tb in raid 0 / Kingdian 240 gb, megaraid SAS 9341-8
Display(s) 2 BENQ 27" GL2706PQ / Dell UP2716D LCD Monitor 27 "
Case Corsair Graphite Series 780T / Corsair Obsidian 750 D
Audio Device(s) ON BOARD / ON BOARD
Power Supply Sapphire Pure 950w / Corsair RMI 750w
Mouse Steelseries Sesnsei / Steelseries Sensei raw
Keyboard Razer BlackWidow Chroma / Razer BlackWidow Chroma
Software Windows 1064bit PRO / Windows 1064bit PRO
nice solution
 

bug

Joined
May 22, 2015
Messages
7,293 (4.09/day)
Processor Intel i5-6600k (AMD Ryzen5 3600 in a box, waiting for a mobo)
Motherboard ASRock Z170 Extreme7+
Cooling Arctic Cooling Freezer i11
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V (@3200)
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 3TB Seagate
Display(s) HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
That card wont be able to be cooled properly no matter how much airflow you throw at it, the air never goes through the heatsink like it should due to low pressure.
Submerging it in water should solve any cooling issues :D
 
Joined
Aug 22, 2010
Messages
313 (0.09/day)
Location
Germany
System Name https://goo.gl/FDgehs
...According to "nvidia-smi.exe" Memory usage (86/15205) not much. But TechPowerUP app shows 15359MB (%100)...
I guess it's sth. like a buffer overflow in GPU-Z.
@W1zzard would have to take a look at that issue.

btw
Tesla driver has been updated today to version 442.50.
 

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
Thank you very much for support.
I partially agree about passive cooling bad design ( cooling problem ) and solutions.
But I have tested in one of Intel and two of Amd Motherboards. Both Motherboards Towers are the similar cooling capabilities, and there is no workload!
- Intel system do not overheating on idle condition and still working after several hours not more than 45 Celsius.
- Both AMD motherboards are overheated a lot. In several seconds it is coming up to 90 Celsius and GPU turnoff. First Msi MB which is not even show up display signal!
I have contacted to the GPU manufacturer they have seen the things I have shared and they agree to replace the faulty card. I will try to get different model rather if it is incompatibility issue!

I am coming to the conclusion with possibilities.
- It might be faulty card and need replacement. After replacement, it might be good idea to have cooling upgrade.
- Vram issue which is shown on "GPU-z" fully occupied, however "nvidia-smi.exe" show it is not used! What so ever causes full memory use in "GPU-z", if not Memory issue, then it might be something else!
- AMD TRX40 Threadripper CPU versus Nvidia GPU both high tech and competitor company! Their unmentioned conflicts and hidden or unknown incompatibility issue!

I guess it's sth. like a buffer overflow in GPU-Z.
@W1zzard would have to take a look at that issue.

btw
Tesla driver has been updated today to version 442.50.
Thank you. Does it (buffer overflow) means defect? But it was overheating on AMD MB even there was no Windows installed.
 
Joined
Sep 28, 2005
Messages
1,090 (0.21/day)
Location
Calgary Alberta, Canada
System Name PussySlayer
Processor Intel Core i7 4770
Motherboard Asrock Z87E-ITX
Cooling Some crappy Silverstone ITX Cooler
Memory 2x8GB Gskill RipjawsX 1600
Video Card(s) GTX 1070
Storage 1x500gb Crucial SSD
Display(s) BenQ 24" 1080P
Case Couger QBX
Audio Device(s) Onboard
Power Supply 650W EVGA BR - Coil Whine Issue
Software Windows 10 64bit Pro
Well, give that a shot! If it works afterwards, then good! If not, then it is something else. Out of curiosity, when you remove drivers, you are running DDU right? Or try the NVidia driver program that is on here.

This one is more extreme, but you try on a fully clean drive? Like a fresh install of windows?

Other than that, if you do end up using it, you may end up with heat issues anyway later on.
 
Joined
Feb 19, 2019
Messages
308 (0.75/day)
 

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
By the way I am not very sure, how long But, AMD TRX40 Threadripper MB was taking noticeable longer time than usual startup time to windows with Nvidia Tesla T4". when I removed It was faster. I will try it and share again.

Well, give that a shot! If it works afterwards, then good! If not, then it is something else. Out of curiosity, when you remove drivers, you are running DDU right? Or try the NVidia driver program that is on here.

This one is more extreme, but you try on a fully clean drive? Like a fresh install of windows?

Other than that, if you do end up using it, you may end up with heat issues anyway later on.
It is just 3-4 days old Windows. Before that, since 2 weeks I was trying to find out problem. Heating issue was before the windows Because Display signal was not coming until New Gaming GPU (GTX 1660 Super) comes!
 
Joined
Sep 28, 2005
Messages
1,090 (0.21/day)
Location
Calgary Alberta, Canada
System Name PussySlayer
Processor Intel Core i7 4770
Motherboard Asrock Z87E-ITX
Cooling Some crappy Silverstone ITX Cooler
Memory 2x8GB Gskill RipjawsX 1600
Video Card(s) GTX 1070
Storage 1x500gb Crucial SSD
Display(s) BenQ 24" 1080P
Case Couger QBX
Audio Device(s) Onboard
Power Supply 650W EVGA BR - Coil Whine Issue
Software Windows 10 64bit Pro
I am not sure how two separate GPU's operate at same time on this system and how the drivers were installed (sorry, I did not watch the whole video) so I dont know what you did there. There clearly is a conflict going on that if the GPU is used at full use at idle thus making it overheat.

I am trying to do research on this but cant seem to find other examples of same issue.

By the way I am not very sure, how long But, AMD TRX40 Threadripper MB was taking noticeable longer time than usual startup time to windows with Nvidia Tesla T4". when I removed It was faster. I will try it and share again.


It is just 3-4 days old Windows. Before that, since 2 weeks I was trying to find out problem. Heating issue was before the windows Because Display signal was not coming until New Gaming GPU (GTX 1660 Super) comes!
Well, give the GPU RMA a try. If the system works fine without the GPU installed and leaving the other GPU in, then who knows. If RMA works then great! If not, then there is a conflict going on. As you said, PNY is offering RMA. But I truely think it is a conflict going on with the TR4 motherboard and the two GPU's together. I could be entirely wrong but this is what I think.
 

sozkan

New Member
Joined
Mar 9, 2020
Messages
19 (0.66/day)
I am not sure how two separate GPU's operate at same time on this system and how the drivers were installed (sorry, I did not watch the whole video) so I dont know what you did there. There clearly is a conflict going on that if the GPU is used at full use at idle thus making it overheat.

I am trying to do research on this but cant seem to find other examples of same issue.



Well, give the GPU RMA a try. If the system works fine without the GPU installed and leaving the other GPU in, then who knows. If RMA works then great! If not, then there is a conflict going on. As you said, PNY is offering RMA. But I truely think it is a conflict going on with the TR4 motherboard and the two GPU's together. I could be entirely wrong but this is what I think.
I will try RMA. I am familar with high grade Gaming GPU`s before. I even run Amd and Nvidia Gpu`s same time on same MB. But the new things are first for me also.
At first, we intended to use Tesla T4 only. I thought, Tesla T4 will run thru thunderbolt port (we have been informed by re-seller). But finally understood. It is not! So we get cheaper other GPU for display purpose and We use tesla as a processor in our CFD simulation.
How ever Tesla T4 Heatsup on AMD MB even it was alone on first shut!
 
Top