• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Bug effecting all Nvidia GPUs - Nvidia won't respond - we need your help!

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
So, I've found a bug, I've reported it to nvidia, and it landed with a helpdesk noob who didn't understand it, and is now stuck in his queue as he's gotten butthurt and refuses to look at it. Yes, seriously.

We're going to need the community, to force nvidia to pay attention to this intentionally 'lost' case. Sadly, the community seems not to actually care..... How about techpowerup? Little help?

Here's a link to an illustration of the bug in effect and steps to (visibly) reproduce it yourself:
*EDIT: There is one very important detail missing from this link: The card performing the tests attached, is a 3090. This is relevant when combined with the suggested resolution to observe the issue, because the intention is to produce an extreme framerate >250FPS, so if you do this test with a different card a lower resolution will be required.

You can create your own case with nvidia, or if you like you can tell them to look at mine. Same name as here, they can find it.

Thanks in advance for your help.
 
Last edited:

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
13,909 (2.42/day)
Location
Louisiana -Laissez les bons temps rouler!
System Name Bayou Phantom
Processor Core i7-8700k 4.4Ghz @ 1.18v
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F Black CPU cooler
Memory 2x 16GB Mushkin Redline DDR-4 3200
Video Card(s) EVGA RTX 2080 Ti Xc
Storage 1x 500 MX500 SSD; 2x 6TB WD Black; 1x 4TB WD Black; 1x400GB VelRptr; 1x 4TB WD Blue storage (eSATA)
Display(s) HP 27q 27" IPS @ 2560 x 1440
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
Sorry man, I’m not having any trouble and I definitely don’t see legions of people here complaining about mysterious Nvidia problems you don’t truly identify.

If you are too lazy to identify the problem in writing then I’m too lazy to decipher your image.
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
You get a better response if you post this in the Reddit/Nvidia forums.
Went to the nvidia forums after two weeks of no response from tech support, no response there, went to reddit a week later, posts deleted, a week later I'm here.

Sorry man, I’m not having any trouble and I definitely don’t see legions of people here complaining about mysterious Nvidia problems you don’t truly identify.

If you are too lazy to identify the problem in writing then I’m too lazy to decipher your image.
Follow the link by clicking the image. The one I labelled "Here's a link to an illustration of the bug in effect and steps to (visibly) reproduce it yourself: "
 
Last edited:
Joined
Nov 8, 2020
Messages
474 (0.38/day)
System Name Dusty
Processor 5900x
Motherboard MSI B550 Tomahawk
Cooling Noctua NH-D15
Memory Corsair Vengence LPX 32GB
Video Card(s) MSI RTX 3070 Gaming X
Storage yes
Case Fractal Design Define R6
Power Supply EVGA SuperNOVA 750w
VR HMD Oculus CV1
Went to the nvidia forums after two weeks of no response from tech support, no response there, went to reddit a week later, posts deleted, a week later I'm here.


Follow the link by clicking the image. The one I labelled "Here's a link to an illustration of the bug in effect and steps to (visibly) reproduce it yourself: "

I did, and I found no problems at all. On two cards, the 3070 and the 1050ti in my laptop.
Though as it mentions, it only occurs in some monitoring software so the question is then, is the issue rather related to their implementation being less than optimal than a horrifying bug?
Either way, no problems here.

The picture itself is pretty much worthless either way without better resolution on the scale, there are always variances in framerate and frametimes. But I found no variances that occur regularly, as they would in that case.
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
28,472 (4.24/day)
Location
Indiana, USA
Processor Intel Core i7 10850K@5.2GHz
Motherboard AsRock Z470 Taichi
Cooling Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory 32GB DDR4-3600
Video Card(s) RTX 2070 Super
Storage 500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s) Acer Nitro VG280K 4K 28"
Case Fractal Design Define S
Audio Device(s) Onboard is good enough for me
Power Supply eVGA SuperNOVA 1000w G3
Software Windows 10 Pro x64
The picture itself is pretty much worthless either way without better resolution on the scale, there are always variances in framerate and frametimes. But I found no variances that occur regularly, as they would in that case.
And polling the GPU will cause data to be sent over the PCI-E bus, which can cause a very minor frametime spike. Sometimes this can't be avoided and it's usually so small it won't be noticeable(I know I've never noticed it).
 

johnspack

Here For Good!
Joined
Oct 6, 2007
Messages
5,980 (0.99/day)
Location
Nelson B.C. Canada
System Name System2 Blacknet , System1 Blacknet2
Processor System2 Threadripper 1920x, System1 2699 v3
Motherboard System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s) System2 GTX 980Ti System1 GTX 970
Storage System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s) 2x 24" 1080 displays
Case System2 Some Nzxt case with soundproofing...
Audio Device(s) Asus Xonar U7 MKII
Power Supply System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse Logitech G900 Chaos Spectrum
Keyboard Ducky
Software Manjaro, Windows 10, Kubuntu 23.10
Benchmark Scores It's linux baby!
I'll call this close to flamebate, but maybe he really believes it. I wouldn't put much credence in this.
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
25,866 (3.79/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard EVGA z690 Dark KINGPIN (modded BIOS)
Cooling EK-AIO Elite 360 D-RGB
Memory 64GB Gskill Trident Z5 DDR5 6000 @6400
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 G95SC
Case Lian Li o11 Evo Dynamic White
Audio Device(s) Moondrop S8's on Schiit Hel 2e
Power Supply Bequiet! Power Pro 12 1500w
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software Windows 11
Benchmark Scores I dont have time for that.
And polling the GPU will cause data to be sent over the PCI-E bus

further polling most things increases load in some way. try spamming the shit out of like a thermistor on an I2C bus.

measure 0 or the temperature of the sun.
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
Sorry man, I’m not having any trouble and I definitely don’t see legions of people here complaining about mysterious Nvidia problems you don’t truly identify.

If you are too lazy to identify the problem in writing then I’m too lazy to decipher your image.
Regarding this: It's not something you'll notice unless you're seriously digging deep to tune your performance, or if you are running really strange loads in really unusual ways (see: the process to reproduce the bug. Who plays games at 900p on a 3090? These kind of strange conditions are what's required to make this visible to the naked eye). Default settings such as pre-rendering queue depths will ensure that this bug is hidden from view, but it is still impacting your performance. It's just that instead of stuttering in a visual way you see on-screen or in a graph, it stutters elsewhere in your system, like, keyboard inputs or network traffic or something fun.

This is why even though it's effecting literally every card that's been tested, I'm the only one (as far as I know) who's noticed it. It's not obvious, to put it lightly. At least, not under normal conditions. I personally noticed it because I was messing with some frame synchronisation that required millisecond accurate extremely low frametimes with a single frame pre-render. I've given instructions that will reproduce it reliably in a way that's easy to see on a frametime plot.


I did, and I found no problems at all. On two cards, the 3070 and the 1050ti in my laptop.
Though as it mentions, it only occurs in some monitoring software so the question is then, is the issue rather related to their implementation being less than optimal than a horrifying bug?
Either way, no problems here.

The picture itself is pretty much worthless either way without better resolution on the scale, there are always variances in framerate and frametimes. But I found no variances that occur regularly, as they would in that case.
You're the first of 20 PC's not to see any issue, but the rest of your post makes me wonder if your test platform is valid. You say "there are alwys variances in frametimes" but take a look at my graph on the right. As explained in the text there, I used a frametime limiter to accentuate this, maybe you will want to also, but it isn't needed to observe this fault (you'll need a sharper eye though) and of course this demonstration assumes you can maintain stable frametimes in the first place, obviously we can't test a frametime-related issue otherwise.

The graph I've shown is more than enough to illustrate the issue even with that resolution - because the issue is so blatantly apparent. I can grab you higher res images if you like though.

The question regarding the monitoring apps is valid. I can see in traces that a specific Nvidia API call is the one taking an exceedingly long time, and because this is not unique to a specific app, I'm going upstream to the first common point. If there's a faulty API implementation then nvidia will want to issue an advisory to developers as such.
And polling the GPU will cause data to be sent over the PCI-E bus, which can cause a very minor frametime spike. Sometimes this can't be avoided and it's usually so small it won't be noticeable(I know I've never noticed it).
Which this isn't, as traces will show. The Nvidia techs will get all that, just as soon as they actually look at this.

I'll call this close to flamebate, but maybe he really believes it. I wouldn't put much credence in this.
Test it as described and you will believe it too. You really think I've spent the past month having people call me a liar because they wouldn't even look, for my benefit? The only person getting flamed over this, is me.

Edit: Your signature applies here.
further polling most things increases load in some way. try spamming the shit out of like a thermistor on an I2C bus.
I can slow polling to every 10 seonds and it will still spike. I can copy every frame down the PCI buss and back up again every time and not generate enough load to even reach 1/10th of this spike. This isn't excessive buss traffic or normal behaviour when polling.
 
Last edited:
Joined
Nov 8, 2020
Messages
474 (0.38/day)
System Name Dusty
Processor 5900x
Motherboard MSI B550 Tomahawk
Cooling Noctua NH-D15
Memory Corsair Vengence LPX 32GB
Video Card(s) MSI RTX 3070 Gaming X
Storage yes
Case Fractal Design Define R6
Power Supply EVGA SuperNOVA 750w
VR HMD Oculus CV1
frametime.PNG


Hwinfo running and monitoring everything, horrible frametimes for sure.

Spike was a printscreen which I later realized was only for heaven so I had to snip it.
Smaller variances are of no concern considering all the stuff I got running in the background but would you look at that. No regular issues at all.

Point is, the problem might exist but i get the feeling your overstating its severity.
 
Joined
Nov 11, 2016
Messages
3,065 (1.13/day)
System Name The de-ploughminator Mk-II
Processor i7 13700KF
Motherboard MSI Z790 Carbon
Cooling ID-Cooling SE-226-XT + Phanteks T30
Memory 2x16GB G.Skill DDR5 7200Cas34
Video Card(s) Asus RTX4090 TUF
Storage Kingston KC3000 2TB NVME
Display(s) LG OLED CX48"
Case Corsair 5000D Air
Power Supply Corsair HX850
Mouse Razor Viper Ultimate
Keyboard Corsair K75
Software win11
Easy solution to easy problem, just set max FPS, solves 99% of all frametime issues.
When the GPU pipeline is getting 100% hammered, any polling will cause slight stutter, even when moving your mouse. Nvidia already knew about this, that's why they created Reflex API, which basically limit the GPU pipeline at 98% load, leave the last 2% for mouse input latency reduction or hardware polling.
Other solution is using "Prefer Maximum performance" in the Power Management Mode in NVCP, which keep high GPU clocks so that GPU pipeline is free.
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
View attachment 204532

Hwinfo running and monitoring everything, horrible frametimes for sure.

Spike was a printscreen which I later realized was only for heaven so I had to snip it.
Smaller variances are of no concern considering all the stuff I got running in the background but would you look at that. No regular issues at all.

Point is, the problem might exist but i get the feeling your overstating its severity.
If you want to reproduce the fault in a manner you can easily view in a frametime graph, please follow my instructions. If you follow some other process, as you have, I can't guarantee that it will work.

Easy solution to easy problem, just set max FPS, solves 99% of all frametime issues.
When the GPU pipeline is getting 100% hammered, any polling will cause slight stutter, even when moving your mouse. Nvidia already knew about this, that's why they created Reflex API, which basically limit the GPU pipeline at 98% load, leave the last 2% for mouse input latency reduction or hardware polling.
Other solution is using "Prefer Maximum performance" in the Power Management Mode in NVCP, which keep high GPU clocks so that GPU pipeline is free.
The frametimes are just a symptom of the issue. The aim here is not to achieve stable frametimes, it is to fix the driver. I don't have any desire to sweep this under the rug.

The GPU pipeline is not getting 100% hammered. In my process you will find it at 17%. Polling the GPU does not cause stutter in other scenarios. This isn't a utilisation issue.
 
Joined
Nov 8, 2020
Messages
474 (0.38/day)
System Name Dusty
Processor 5900x
Motherboard MSI B550 Tomahawk
Cooling Noctua NH-D15
Memory Corsair Vengence LPX 32GB
Video Card(s) MSI RTX 3070 Gaming X
Storage yes
Case Fractal Design Define R6
Power Supply EVGA SuperNOVA 750w
VR HMD Oculus CV1
If you want to reproduce the fault in a manner you can easily view in a frametime graph, please follow my instructions. If you follow some other process, as you have, I can't guarantee that it will work.
I did follow your process but at this point im starting to think it won't matter what anyone does because you will refuse to believe any of it.
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
Guys, please. I've been trying to fix YOUR GPU for the past month. Do me a favour: If you don't want to perform my test, don't. If you don't want to contact nvidia, don't. But please, pretty please, I am SO tired of explaining the same things over and over. I've been through all of this on several forums now and it's the same every time... If you're not going to do my test as instructed, and if you're not a developer who would understand it anyway, and if you're not willing to call nvidia regrdless.... please, just step away. That's all I ask of you. Thanks.

Edit: Don't get me wrong, I'm down to spend all day explaining it to people who want to understand, I just have zero inclination toward arguments.
I did follow your process but at this point im starting to think it won't matter what anyone does because you will refuse to believe any of it.
No, you didn't. You prove it in your screenshot.
 
Last edited:

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,032 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Gpuz not affected?
 
Joined
Nov 8, 2020
Messages
474 (0.38/day)
System Name Dusty
Processor 5900x
Motherboard MSI B550 Tomahawk
Cooling Noctua NH-D15
Memory Corsair Vengence LPX 32GB
Video Card(s) MSI RTX 3070 Gaming X
Storage yes
Case Fractal Design Define R6
Power Supply EVGA SuperNOVA 750w
VR HMD Oculus CV1
No, you didn't. You prove it in your screenshot.
Yes, I did.
First you complain I didnt limit it even though your post says that would not be needed but makes it easier. Even then I had no issues.
So then I did it again, this time with a limiter as you suggested and thats what I got, and thats still apparently not correct.
A nice smooth flat frametime graph, even when running hwinfo which as you suggested would show frametime spikes

As you say in the post yourself
Here, Heaven is configured to run lowest settings and 1600x900 resolution, to ensure a very high frame rate/low frame times, which will ensure that the fault is easily visible in the graph. The fault continues at any resolution or framerate. In this case, I have applied a frametime limiter, in order to get a flat line which will accentuate the frametime spike from the glitch. You do not need to use a frametime limiter, but it will make the bug more apparent in the graph.

Which is exactly what I did with in this case Hwinfo64 running on all sensors which according to you, should cause these regular spikes in frametimes.

Otherwise its you who have not explained it properly because I see no issues at all running Hwinfo64 while I run Heaven.

Else, please confirm that hwinfo64 + Heaven + Limiter should not produce spikes? Because your post says otherwise claiming that no matter what framerate or what resolution the spikes will occur.
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
No, you didn't. You prove it in your screenshot.
F*** my apologies there is a minor omission in that doc (I've pasted the wrong draft like an idiot) that has a major impact. I mentioned the resolution but not that these test were done on a 3090 (I mention it earlier in the thread but not on that page). If your card is weaker than that and since the process doesn't specify it like I thought, it's possible you followed it otherwise. Mea culpa. I blew it.

I did however mention you're going to need extremely high frame rates and 120 ain't that. Try to double or triple it by reducing the resolution.
Otherwise its you who have not explained it properly

It's exactly that, I am sorry. I've had SO many people fail to follow the process (usually followed by abusing me which is great fun) I tarred you with their brush. Jerk move. My bad.

Else, please confirm that hwinfo64 + Heaven + Limiter should not produce spikes? Because your post says otherwise claiming that no matter what framerate or what resolution the spikes will occur.
The delays will occur but you may not see them. Since I've wasted a ton of your time I owe you at least a proper explanation. I'll try and keep it plain-english-y.

What happens here is that the monitoring call which should be done in microseconds, takes several milliseconds. This is CPU time, not GPU time. FWIW, an earlier experiment showed me that this is extremely memory-speed critical. Taking my memory down from the usual 3800 flat 16s to 2133 with stock timings, made this issue extremely drastic and noticeable. Delays in the memory pipeline appear as delays in the cpu pipeline at a higher level of monitoring (because it's the CPU that's waiting on the data from RAM). So, given that at this point, the frame is being rendered by the CPU in order to take it's place at the end of a queue behind two other frames which have to be processed and displayed before the one that was delayed, that delay is eaten up by the buffer and you don't see it - BUT IT STILL ATE YOUR CPU IN THERE. IT JUST HID THE EVIDENCE. < Caps because this is super important otherwise it would be a non-issue, right?

So, we have the need to force a CPU-limited scenario, in order to see the CPU's behaviour. And we want to avoid a pre-rendering queue hiding the mess, right? So how? Frames, all of the frames. By reaching a massive framerate we have ensured the GPU is lightly loaded (or it wouldnt be able to get those frames - we're not trying to inducee a GPU-limited scenario here, so not too high!) and will attempt to render the frames at full tilt, thus loading up the CPU by emptying the prerender queue quickly, the loaded CPU exacerbating the issue, and the short queue exposing it.

SOOOO you need like 250+ FPS to see it. 300+ is recommended. The more, the better - so long as the system can realistically handle that load. This is why I went to the trouble of specifying 1600x900 in heaven, because at that res, with the 3090, you'll just be able to see it. So, aside from the fact I'd typed that stuff like a dozen times and didn't realise that this time I forgot to mention what card it was like an idiot..... Now you understand why nobody sees it. It's buried and hidden by mechanisms that are supposed to do exactly that, to give us smooth frametimes and high framerates. And this is why I'm being a stickler about following the process (which I screwed up and I apologise again) because if one does not (which sadly appears to be almost everyone almost all of the time) then you very easily end up in a scenario where you are not within the parameters where the bug is visible to you, for example by running 120FPS where your frametimes are too high to see it and too high to get the CPU mad, and you could probably easily run too low res and choke your system entirely and not see anything.

BTW, nvidias got the right info including the card type and much, much more than I've shared here, so that's not why they can't see it.
 
Joined
Nov 8, 2020
Messages
474 (0.38/day)
System Name Dusty
Processor 5900x
Motherboard MSI B550 Tomahawk
Cooling Noctua NH-D15
Memory Corsair Vengence LPX 32GB
Video Card(s) MSI RTX 3070 Gaming X
Storage yes
Case Fractal Design Define R6
Power Supply EVGA SuperNOVA 750w
VR HMD Oculus CV1
That makes a lot more sense and you should have started with that information!

my 8700k is however unable to push that framerate in Heaven so I can't look in to it at those framerates. What CPU did you use when you tested this? Or other hardware in general for the system?
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
Adding to the above because this has come up before: Not EVERY load makes this happen. I don't know why, I'd like to ask nvidia. I can tell you a real-world load that does: Battlefield 1. That's the game that made me notice this. But trying some other 300FPS load won't lead to "your bug doesn't exist" because I chose heaven because I know it's repeatable there. Some other load may not be. Even using BF1 as an example, it causes the problem, but the frametimes are too unstable (other than the menus but there's a reason I didn't suggest that) so it's really not useful. Myself and a handful of friends have tested it across a bunch of systems and it always works. That's why I'm specifying heaven, because it's repeatable. I did try other loads (mostly benchmarks because this is for reproduction at the lab and they might not have <insert game here>) and heaven was the best one.
That makes a lot more sense and you should have started with that information!

my 8700k is however unable to push that framerate in Heaven so I can't look in to it at those framerates. What CPU did you use when you tested this? Or other hardware in general for the system?
I should have! I thought I did! I'm honestly so sorry man. I typed it so so many times having my posts deleted and trying different places, I just thought it was in there like usual, and clearly, it isn't. I blew it.

Yeh again it's a hard bug to reproduce for many reasons and yeh one is because the hardware requirements are rough. I did manage to get a 1070+5820k to do it but that thing is tuned to the nines (it's my old gaming rig). A mate did it on a 2070 (sorry I don't know what CPU it was I wanna say 9900k)... So other GPUs can do it. CPU is probably the tricky one, because it's a matter of getting it to be loaded but not too loaded (as described above). Your 8700 probably can hit some lower-than-250 framerate that will successfully expose the spikes but there's a fair amount of work in finding that (not-so-)sweet spot. This one is a 5900x with chart-topping benchmarks pushing 3800 16-16-16-34 ram and the elusive 3090 and even still it took me months to pin it down.

It really is hard to spot. Honestly that's part of the reason it's a concerning bug, because it's the kind of thing that gets missed, and stays in the drivers forever making a tiny but entirely unnecessary dent in performance. You need such high end hardware to be able see it, that it's super easy to hide it under the performance of the thing; or you don't have that hardware and you don't ever see it.... This bug is trying hard to last forever.

Awaiting approval before being displayed publicly.
Uhh what?

1624084553571.png


Since it's been suggested a higher resolution might be useful, here it is: 100ms sample rate, vertical scale set to 16.7ms aka 60FPS. This means every horizontal grey line is 1.6ms. Heaven is set to free roam mode so the frametimes should be more stable than usual heaven, however I'm doing it with my browser, discord, etc open so there are a few spikes. The first section is just standing in heaven. Then I alt-tab out and start hwinfo64. Look at those spikes. That's dipping from 450+ FPS to 120. Then I exit hwinfo. Nice and flat again (except for those two spikes, that's discord doing something and beeping at me. I don't think this requires I re-do it, after all it's pretty obvious it's not the same as that middle section.)

Gpuz not affected?
Awaiting approval before being displayed publicly.
Normally I'd assume it's because I'm new and it was automated and I would have to wait for an admin to see it, but since you've been by already and this has been public already, I'm wondering if the thread was hidden manually?
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,032 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
What happens here is that the monitoring call which should be done in microseconds, takes several milliseconds. This is CPU time, not GPU time.
It's probably waiting for some kind of lock, not uncommon, especially if the I2C bus is involved. Only NVIDIA can fix it, maybe they already have the fix ready and are just waiting for verification, or the right driver release window. Or they are too busy with higher priority issues

I'm wondering if the thread was hidden manually?
You made some changes to your post which triggered the spam detection for new users, so the thread went to an "approval queue"
 
Joined
Feb 20, 2020
Messages
9,340 (6.14/day)
Location
Louisiana
System Name Ghetto Rigs z490|x99|Acer 17 Nitro 7840hs/ 5600c40-2x16/ 4060/ 1tb acer stock m.2/ 4tb sn850x
Processor 10900k w/Optimus Foundation | 5930k w/Black Noctua D15
Motherboard z490 Maximus XII Apex | x99 Sabertooth
Cooling oCool D5 res-combo/280 GTX/ Optimus Foundation/ gpu water block | Blk D15
Memory Trident-Z Royal 4000c16 2x16gb | Trident-Z 3200c14 4x8gb
Video Card(s) Titan Xp-water | evga 980ti gaming-w/ air
Storage 970evo+500gb & sn850x 4tb | 860 pro 256gb | Acer m.2 1tb/ sn850x 4tb| Many2.5" sata's ssd 3.5hdd's
Display(s) 1-AOC G2460PG 24"G-Sync 144Hz/ 2nd 1-ASUS VG248QE 24"/ 3rd LG 43" series
Case D450 | Cherry Entertainment center on Test bench
Audio Device(s) Built in Realtek x2 with 2-Insignia 2.0 sound bars & 1-LG sound bar
Power Supply EVGA 1000P2 with APC AX1500 | 850P2 with CyberPower-GX1325U
Mouse Redragon 901 Perdition x3
Keyboard G710+x3
Software Win-7 pro x3 and win-10 & 11pro x3
Benchmark Scores Are in the benchmark section
Hi,
Is this threads title actuate all gpu's effected or is this just 30 series effected ?
op didn't list all gpu's tested only mentioned 3090.
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
From the description and details - is this a GPU problem or an API/driver problem? Seems to be a latter, maybe. This is admitted to be a CPU-limited scenario, CPU is causing the bump and RAM speed greatly affects things. Also, regular checks that do cause CPU load for monitoring purposes are by themselves unavoidable, 100ms frequency is rather intensive as well.

Configuration of monitoring software surely plays into this as well if this is CPU load. Did you disable any other meters from monitoring and it still happens? Are you graphing on screen?
You mention that this does not happen with all monitoring software, you name Libre Hardware Monitor as one that does not. Is that really the case?
By the way, is the same thing replicable in some other OS, Linux for example?
 

xcasxcursex

New Member
Joined
Jun 19, 2021
Messages
23 (0.02/day)
Hi,
Is this threads title actuate all gpu's effected or is this just 30 series effected ?
op didn't list all gpu's tested only mentioned 3090.
It does it on all cards but because it's difficult to observe the faster cards are a lot easier to see it. As above, CPU actually ends up being important, too, since the GPU could just have the resolution lowered to reach high framerates, but the CPU may not be able to keep up.

From the description and details - is this a GPU problem or an API/driver problem? Seems to be a latter, maybe. This is admitted to be a CPU-limited scenario, CPU is causing the bump and RAM speed greatly affects things. Also, regular checks that do cause CPU load for monitoring purposes are by themselves unavoidable, 100ms frequency is rather intensive as well.

Configuration of monitoring software surely plays into this as well if this is CPU load. Did you disable any other meters from monitoring and it still happens? Are you graphing on screen?
You mention that this does not happen with all monitoring software, you name Libre Hardware Monitor as one that does not. Is that really the case?
By the way, is the same thing replicable in some other OS, Linux for example?
Well analysed, yes, this is an NVAPI issue as best I can tell. It's tough because the apps which are effected are closed source so there's a limit to what I can see. There's inevitably a point of this where my only answers are "I don't know, and I'd like to ask nvidia". You're right, the 100ms is extreme, I only do that to record the images for the purpose of proving this is a thing. Normally it's at default 1000. Note that this is MSI afterburner's poll rate, but the app which is causing the spikes is hwinfo, and the poll rate on that is 1 second (as visible by the giant spike every 1 second in the graphs). I actually tested it at 2, 5 and 10 seconds to see if the spikes disappeared. They didn't change a bit, other than being every 2, 5, or 10 seconds instead of every 1. I still have plenty of CPU, GPU and memory bandwidth available.... And even with other apps polling at far higher rates, there are no issues. It really doesn't suggest any kind of excessive load is to blame here. It does seem like there's some kind of scheduling/handling issue, as wizard said, it's probably waiting for a lock.... Honestly if I dig into the traces far enough I might even be able to get that specific, but that kind of work is way into the "that's nvidia's job" territory ;)

It's probably waiting for some kind of lock, not uncommon, especially if the I2C bus is involved. Only NVIDIA can fix it, maybe they already have the fix ready and are just waiting for verification, or the right driver release window. Or they are too busy with higher priority issues


You made some changes to your post which triggered the spam detection for new users, so the thread went to an "approval queue"
Thanks man I thought I got shadowbanned right from the drop, appreciate your explaining what not to do next time haha :)

Sadly, the response from nvidia after some weeks of explaining all that has been said above and much, much more, was the following:


So I tested an in-house PC that has a Win10 X64 +RTX 3080. Ran multiple games at 1080P @1440P without any issues.

I was getting good FPS as well.

I found no reason to test any 3rd party benchmark tests since all games were running perfectly. We use benchmark tools if the PC or GPU has performance issues during the normal usage or while gaming. So it indicates that your's is a singular case and there is a possibility that it's a hardware issue.

If you've read the above, you already understand why his test methodology was entirely inadequate and his conclusions entirely illogical. But the consequence of his inability to cope with this, is that we all are trapped in helpdesk limbo. He would reply but never actually do anything related to the issue just treating it like a normal stuttering complaint. Now they just don't even respond for weeks.
 
Last edited:
Joined
Feb 6, 2021
Messages
2,630 (2.25/day)
Location
Germany
System Name Sunk Cost Fallacy
Processor AMD Ryzen 7 7800X3D
Motherboard ASRock B650E Steel Legend Wifi
Cooling Arctic Liquid Freezer II 360 Rev. 7
Memory 2x16GB G.Skill Trident Z5 NEO 6000 CL30
Video Card(s) Sapphire Nitro+ RX 7900 XTX Vapor-X
Storage WD Black SN850X 1TB + 2x 2TB, 2x 4TB Crucial MX500, 4TB Samsung 870 Evo.
Display(s) Alienware AW2723DF, LG 27GR93U, LG 27GN950-B
Case Lian Li O11 Air Mini
Audio Device(s) Bose Companion Series 2 III, Sennheiser GSP600 and HD599 SE - Creative Soundblaster X4
Power Supply bequiet! Dark Power Pro 12 1500w Titanium
Mouse Logitech GPRO X Superlight & G502 X
Keyboard Corsair K65 RGB Mini, Razer Black Widow V3 TKL
VR HMD Oculus Rift S
i own the whole Ampere Lineup except of the new TI Cards and i have Zero Problems.
 
Top