• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Nvidia Tesla T4 cards incorrectly report 100% memory utilization

bsee-ino

New Member
Joined
Dec 15, 2020
Messages
3 (0.00/day)
GPU-Z reports 100% memory utilization for Tesla T4 cards. Monitoring the same card with Nvidia SMI reports the correct usage. Confirmed in GPU-Z v2.36.0 (latest). This is not a new issue.
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
26,956 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Interesting, any chance I could use Remote Desktop or Teamviewer to check out the problem and try a few debug builds?
 
Joined
Aug 22, 2010
Messages
749 (0.15/day)
Location
Germany
System Name Acer Nitro 5 (AN515-45-R715)
Processor AMD Ryzen 9 5900HX
Motherboard AMD Promontory / Bixby FCH
Cooling Acer Nitro Sense
Memory 32 GB
Video Card(s) AMD Radeon Graphics (Cezanne) / NVIDIA RTX 3080 Laptop GPU
Storage WDC PC SN530 SDBPNPZ
Display(s) BOE CQ NE156QHM-NY3
Software Windows 11 beta channel
Same issue as mentioned here:
 

bsee-ino

New Member
Joined
Dec 15, 2020
Messages
3 (0.00/day)
I can't give an remote session, sorry. This is being used for a business. However, the thread StefanM linked is exactly the same. This server is a Gigabyte, just like the motherboard in that thread. I'm not sure how GPUz queries the memory usage, so not sure if the motherboard is relevant at all. The issue occurred in multiple servers using both AMD & Intel cpus.
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
26,956 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Oh, this is VRAM usage, sorry I assumed you meant "Memory Controller Load".

Looks like an overflow indeed, let me test on other cards with around 16 GB memory or more

Edit: tested on RTX 3090 (24 GB) and RX 6800 XT (16 GB) and works for me.

Which Windows version do you use? If Windows 10, which build?
 
Joined
Aug 22, 2010
Messages
749 (0.15/day)
Location
Germany
System Name Acer Nitro 5 (AN515-45-R715)
Processor AMD Ryzen 9 5900HX
Motherboard AMD Promontory / Bixby FCH
Cooling Acer Nitro Sense
Memory 32 GB
Video Card(s) AMD Radeon Graphics (Cezanne) / NVIDIA RTX 3080 Laptop GPU
Storage WDC PC SN530 SDBPNPZ
Display(s) BOE CQ NE156QHM-NY3
Software Windows 11 beta channel
You can also double-check with task manager->performance->GPU

 

bsee-ino

New Member
Joined
Dec 15, 2020
Messages
3 (0.00/day)
I'm running windows Server 2019, build 1809. This was also an issue in whatever version was before 1809.
gpuz memory issue.PNG
 

theguero

New Member
Joined
Jan 13, 2021
Messages
1 (0.00/day)
I can confirm this issue. Installed a T4 onto my Supermicro X10DRLI-I motherboard today and the memory usage constantly shows 15360 MB usage.
 
Joined
Apr 7, 2021
Messages
3 (0.00/day)
Location
Greater Seattle Area
System Name Deep Learning Rig
Processor Xeon Platinum 8124M
Motherboard Asrock EPC621D8A
Cooling Air
Memory 128 GB
Video Card(s) RTX 3090, Tesla M40 12GB (manually controlled rear mounted blower)
Storage 8TB NVME, 16TB HDD, 500GB SSD Boot
Case CM HAF X
I second this issue, however I am running a tesla M40 12 GB with an RX480 for display output. SMI reports correct memory usage, however gpu-z 2.38 reports 11519 MB of VRAM usage from startup. The M40 is not recognized by task manager, CPUID HWMonitor, or CPU-Z and afterburner displays 11520 MB usage from startup.
Interesting, any chance I could use Remote Desktop or Teamviewer to check out the problem and try a few debug builds?
I am am fine with having a Teamviewer session to try debug builds.
 
Joined
Apr 7, 2021
Messages
3 (0.00/day)
Location
Greater Seattle Area
System Name Deep Learning Rig
Processor Xeon Platinum 8124M
Motherboard Asrock EPC621D8A
Cooling Air
Memory 128 GB
Video Card(s) RTX 3090, Tesla M40 12GB (manually controlled rear mounted blower)
Storage 8TB NVME, 16TB HDD, 500GB SSD Boot
Case CM HAF X
After some experimentation, I believe I have pinpointed the cause of the issue. By default, the Nvidia drivers use TCC mode instead of WDDM mode. If I change the mode using nvidia-smi.exe -i 0 -dm 0 to WDDM, GPU-Z displays the correct memory usage as expected.

With WDDM:


With TCC:

 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
26,956 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Thanks to @fffffgggg54 I now understand the issue.

The NVIDIA driver function that I'm using to get the available VRAM size does not work in TCC mode. Obviously nvidia-smi works (which uses NVML), so now I'll try to figure out how NVML gets the VRAM use and use that mechanism for GPU-Z

For next GPU-Z release I'll disable the VRAM usage sensor on all cards in TCC mode, until a solution is found
 
Top