• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

RTX 5070 discussion

Joined
Aug 27, 2023
Messages
346 (0.50/day)
Just got a Palit GamerPro RTX 5070 12GB so sharing some findings.

It's the non OC model so 250W TGP max both VBIOS.

Can run Steel Nomad at around 3GHz @0.975V and 2125MHz mem for a score of 56 FPS, X99 mainboard and CPU are 10yrs+ now so a bit long in the tooth. PCIe 3 plus DDR4 2133MT/s RAM. See there are 60+ scores, don't know if that's with higher TGP (300W) or not.

Memory clock set to around 1653MHz results in artifacts / lockup requiring power button intervention, ugg. A few seldom times TDR to the rescue or even a bug check once with the nvidia driver. Thought it might be a driver issue but nobody else reporting this! Any thoughts @StViolenceDay ?

Voltage doesn't report dropping below 0.8V, well a drop to 0.795V sometimes, even in PState 8?

Seems very leaky, wouldn't like to try a HW volt mod with this one.

VRAM temps get a little high when loaded before fan kicks in, IIRC up to 96C if that temperature is correct.

Memory limits seem a bit restrictive, (-2000MT/s to +6000MT/s) wonder what the reason for that is.

So anyone else running one of these, what are your thoughts?
 
Memory clock set to around 1653MHz results in artifacts / lockup requiring power button intervention, ugg. A few seldom times TDR to the rescue or even a bug check once with the nvidia driver. Thought it might be a driver issue but nobody else reporting this! Any thoughts @StViolenceDay ?
1653Mhz - do you mean underclocking VRAM? Or this value is offset? (i dont have a 5070, but I thought its default memory clock are 1750Mhz)

Actually since gddr7 has a lot of error detection mechanics including reporting the write errors to the GPU (so it can retry the write operation) - this is a first report about actually getting artifacts (instead of hangs/slowdowns) caused by GDDR7 memory tuning.

Please run the memtest_vulkan tool for 5 minutes (standard test):
  • on that 1653Mhz setting
  • on default frequencies
And it if would report any errors - attached its log file here, would be interesting to see the error details
 
Don't go on other people's scores in 3DMark. There will always be lots of high scores from people overclocking. Find a benchmark that shows a game you own, with the settings used, and check your card is within a few percent of that.

Your CPU will be a bottleneck at low-resolution/high-refresh, and your board's lack of ReBAR and PCIe 3.0 slot might be an issue if you are running AAA games at 4K with all the eye-candy turned on., because the 5070's 12GB isn't always enough, and it starts shuffling data over PCIe instead. At 1440p or below it shouldn't matter, just about every game still fits in 12GB at that resolution.
 
1653Mhz - do you mean underclocking VRAM? Or this value is offset? (i dont have a 5070, but I thought its default memory clock are 1750Mhz)
Thank you for the reply. Yes, default 1750MHz and offset would be -97MHz at default PState 0.

Actually since gddr7 has a lot of error detection mechanics including reporting the write errors to the GPU (so it can retry the write operation) - this is a first report about actually getting artifacts (instead of hangs/slowdowns) caused by GDDR7 memory tuning.
Art.jpg



Please run the memtest_vulkan tool for 5 minutes (standard test):
  • on that 1653Mhz setting
  • on default frequencies
And it if would report any errors - attached its log file here, would be interesting to see the error details
I can do default but not 1653MHz as it errors as soon as set.


Windows Event viewer after the fact. Example, not always the same.
err2.png



Tried several driver versions plus latest for Windows, Linux driver 575 does the same. Can use nvidia-settings to replicate with a value of -1552MT/s (-97MHz) although it's shown in MHz for some reason.
ns.png


I test BW but have to exclude that small ~8MHz band to avoid artifacts / system crash.
5070_2100.png


Test result at default 1750MHz
Code:
[alex@fedora memtest_vulkan-v0.5.0_DesktopLinux_X86_64]$ ./memtest_vulkan
https://github.com/GpuZelenograd/memtest_vulkan v0.5.0 by GpuZelenograd
To finish testing use Ctrl+C

1: Bus=0x03:00 DevId=0x2F04   12GB NVIDIA GeForce RTX 5070
2: Bus=0x00:00 DevId=0x0000   32GB llvmpipe (LLVM 20.1.6, 256 bits)
(first device will be autoselected in 0 seconds)   Override index to test:
    ...first device autoselected
Standard 5-minute test of 1: Bus=0x03:00 DevId=0x2F04   12GB NVIDIA GeForce RTX 5070
      1 iteration. Passed  0.0312 seconds  written:    7.2GB 570.7GB/sec        checked:   10.9GB 587.6GB/sec
     34 iteration. Passed  1.0303 seconds  written:  239.2GB 572.2GB/sec        checked:  358.9GB 586.2GB/sec
    195 iteration. Passed  5.0259 seconds  written: 1167.2GB 572.5GB/sec        checked: 1750.9GB 586.2GB/sec
   1156 iteration. Passed 30.0070 seconds  written: 6967.2GB 572.4GB/sec        checked:10450.9GB 586.0GB/sec
   2118 iteration. Passed 30.0244 seconds  written: 6974.5GB 572.7GB/sec        checked:10461.8GB 586.3GB/sec
   3080 iteration. Passed 30.0188 seconds  written: 6974.5GB 572.8GB/sec        checked:10461.8GB 586.3GB/sec
   4042 iteration. Passed 30.0271 seconds  written: 6974.5GB 572.6GB/sec        checked:10461.8GB 586.2GB/sec
   5004 iteration. Passed 30.0217 seconds  written: 6974.5GB 572.7GB/sec        checked:10461.8GB 586.3GB/sec
   5966 iteration. Passed 30.0229 seconds  written: 6974.5GB 572.7GB/sec        checked:10461.8GB 586.3GB/sec
   6928 iteration. Passed 30.0208 seconds  written: 6974.5GB 572.7GB/sec        checked:10461.8GB 586.3GB/sec
   7890 iteration. Passed 30.0275 seconds  written: 6974.5GB 572.5GB/sec        checked:10461.8GB 586.2GB/sec
   8851 iteration. Passed 30.0011 seconds  written: 6967.2GB 572.5GB/sec        checked:10450.9GB 586.1GB/sec
Standard 5-minute test PASSed! Just press Ctrl+C unless you plan long test run.
Extended endless test started; testing more than 2 hours is usually unneeded
use Ctrl+C to stop it when you decide it's enough
^C
memtest_vulkan: no any errors, testing PASSed.
  press any key to continue...
[alex@fedora memtest_vulkan-v0.5.0_DesktopLinux_X86_64]$
 
Last edited:
BTW whats the VRAM vendor displayed in GPU-Z and exact VBIOS version?

Maybe VBIOS has timings/freq ranges table different from "most common". From the exoerience of 1660S I can say that thete can be as much as 10 timings updates during GPU production period, all with minor adjustments
 
Samsung config 0
VBIOS 98.05.36.00.4D and 98.05.36.00.4C

Tried running just console in Fedora (no GUI) and with some quick resets, ie setting fault clock then waiting 0 to some ms and setting back to default (same program) I was able to get control back. Started your program in another console during fault frequency doesn't get to selection and running it before hand and then setting fault frequency sees it stopping with
Runtime error: ERROR_DEVICE_LOST while getting () in context wait_for_fences

More often than not in console mode the display background changes color rapidly rather than block type artifacts which is quite disturbing ie probably epileptic inducing.

Code:
Jul 03 19:36:27 fedora kernel: NVRM: _kgspLogXid119: ********************************* GSP Timeout **********************************
Jul 03 19:36:27 fedora kernel: NVRM: _kgspLogXid119: Note: Please also check logs above.
Jul 03 19:36:27 fedora kernel: NVRM: GPU at PCI:0000:03:00: GPU-be9b4836-5a63-a9ef-5b03-721c54957016
Jul 03 19:36:27 fedora kernel: NVRM: GPU Board Serial Number: 0
Jul 03 19:36:27 fedora kernel: NVRM: Xid (PCI:0000:03:00): 119, pid=3182, name=memtest_vulkan, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080012b 0x230).
Jul 03 19:36:27 fedora kernel: NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x000000002080012b 0x0000000000000230.
Jul 03 19:36:27 fedora kernel: NVRM: GPU0 RPC history (CPU -> GSP):
Jul 03 19:36:27 fedora kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
Jul 03 19:36:27 fedora kernel: NVRM:      0    76   GSP_RM_CONTROL        0x000000002080012b 0x0000000000000230 0x000639059ff19625 0x0000000000000000          y
Jul 03 19:36:27 fedora kernel: NVRM:     -1    103  GSP_RM_ALLOC          0x0000000000009072 0x000000000000000c 0x000639059ff193a8 0x000639059ff19516    366us
Jul 03 19:36:27 fedora kernel: NVRM:     -2    76   GSP_RM_CONTROL        0x0000000020800a5d 0x0000000000000008 0x000639059ff192d6 0x000639059ff193a2    204us
Jul 03 19:36:27 fedora kernel: NVRM:     -3    103  GSP_RM_ALLOC          0x0000000000009072 0x000000000000000c 0x000639059ff19176 0x000639059ff192c8    338us
Jul 03 19:36:27 fedora kernel: NVRM:     -4    76   GSP_RM_CONTROL        0x0000000020800a5d 0x0000000000000008 0x000639059ff19090 0x000639059ff19170    224us
Jul 03 19:36:27 fedora kernel: NVRM:     -5    103  GSP_RM_ALLOC          0x0000000000009072 0x000000000000000c 0x000639059ff18f28 0x000639059ff19081    345us
Jul 03 19:36:27 fedora kernel: NVRM:     -6    76   GSP_RM_CONTROL        0x0000000020800a5d 0x0000000000000008 0x000639059ff18e42 0x000639059ff18f24    226us
Jul 03 19:36:27 fedora kernel: NVRM:     -7    103  GSP_RM_ALLOC          0x0000000000009072 0x000000000000000c 0x000639059ff18c5f 0x000639059ff18e34    469us
Jul 03 19:36:27 fedora kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
Jul 03 19:36:27 fedora kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
Jul 03 19:36:27 fedora kernel: NVRM:      0    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f5df98a 0x000639055f5df9a8     30us
Jul 03 19:36:27 fedora kernel: NVRM:     -1    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f4eb18c 0x000639055f4eb199     13us
Jul 03 19:36:27 fedora kernel: NVRM:     -2    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f3f699e 0x000639055f3f69a9     11us
Jul 03 19:36:27 fedora kernel: NVRM:     -3    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f3021ab 0x000639055f3021be     19us
Jul 03 19:36:27 fedora kernel: NVRM:     -4    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f20d9cb 0x000639055f20d9d8     13us
Jul 03 19:36:27 fedora kernel: NVRM:     -5    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f1191e4 0x000639055f1191f1     13us
Jul 03 19:36:27 fedora kernel: NVRM:     -6    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055f0249e6 0x000639055f0249f3     13us
Jul 03 19:36:27 fedora kernel: NVRM:     -7    4099 POST_EVENT            0x00000000000000a2 0x0000000000000000 0x000639055ef301e9 0x000639055ef301f7     14us
Jul 03 19:36:27 fedora kernel: CPU: 31 UID: 1000 PID: 3182 Comm: memtest_vulkan Tainted: G S         OE       6.15.3-200.fc42.x86_64 #1 PREEMPT(lazy)
Jul 03 19:36:27 fedora kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Jul 03 19:36:27 fedora kernel: Hardware name: Xioaxi                 X99-Special           /X99 Taichi, BIOS P1.80 04/06/2018
Jul 03 19:36:27 fedora kernel: Call Trace:
Jul 03 19:36:27 fedora kernel:  <TASK>
Jul 03 19:36:27 fedora kernel:  dump_stack_lvl+0x5d/0x80
Jul 03 19:36:27 fedora kernel:  _kgspRpcRecvPoll+0x593/0x760 [nvidia]
Jul 03 19:36:27 fedora kernel:  _issueRpcAndWait+0xd2/0x900 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? osGetCurrentThread+0x26/0x60 [nvidia]
Jul 03 19:36:27 fedora kernel:  rpcRmApiControl_GSP+0x76f/0x940 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? _tlsThreadEntryGet+0x82/0x90 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? osGetCurrentThread+0x26/0x60 [nvidia]
Jul 03 19:36:27 fedora kernel:  rmresControl_Prologue_IMPL+0xd4/0x1e0 [nvidia]
Jul 03 19:36:27 fedora kernel:  resControl_IMPL+0xd6/0x1b0 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? _tlsEntryAcquire+0x29/0xd0 [nvidia]
Jul 03 19:36:27 fedora kernel:  serverControl+0x47e/0x590 [nvidia]
Jul 03 19:36:27 fedora kernel:  _rmapiRmControl+0x544/0x820 [nvidia]
Jul 03 19:36:27 fedora kernel:  rmapiControlWithSecInfo+0x79/0x140 [nvidia]
Jul 03 19:36:27 fedora kernel:  rmapiControl+0x24/0x40 [nvidia]
Jul 03 19:36:27 fedora kernel:  kgrobjPromoteContext_IMPL+0x2e8/0x350 [nvidia]
Jul 03 19:36:27 fedora kernel:  kgrobjConstruct_IMPL+0x27a/0x480 [nvidia]
Jul 03 19:36:27 fedora kernel:  __nvoc_objCreate_KernelGraphicsObject+0x132/0x240 [nvidia]
Jul 03 19:36:27 fedora kernel:  __nvoc_objCreateDynamic+0x4a/0x70 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? _portMemAllocNonPagedUntracked+0x2c/0x40 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? os_alloc_mem+0x104/0x120 [nvidia]
Jul 03 19:36:27 fedora kernel:  resservResourceFactory+0xc5/0x240 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? _tlsEntryAcquire+0x93/0xd0 [nvidia]
Jul 03 19:36:27 fedora kernel:  _clientAllocResourceHelper+0x2aa/0x660 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? _tlsThreadEntryGet+0x82/0x90 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? tlsEntryGet+0x31/0x70 [nvidia]
Jul 03 19:36:27 fedora kernel:  serverAllocResourceUnderLock+0x33b/0xa10 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? portSyncSpinlockAcquire+0x18/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? portThreadGetCurrentThreadId+0x1d/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? os_acquire_rwlock_write+0x2b/0x40 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? portThreadGetCurrentThreadId+0x1d/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? rmclientValidateLocks_IMPL+0x21/0x90 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? _serverLockClientWithLockInfo.constprop.0+0x106/0x260 [nvidia]
Jul 03 19:36:27 fedora kernel:  serverAllocResource+0x2b4/0x5c0 [nvidia]
Jul 03 19:36:27 fedora kernel:  rmapiAllocWithSecInfo+0x1f0/0x420 [nvidia]
Jul 03 19:36:27 fedora kernel:  rmapiAllocWithSecInfoTls+0x65/0x90 [nvidia]
Jul 03 19:36:27 fedora kernel:  Nv04AllocWithAccessSecInfo+0x6f/0x80 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? security_capable+0x50/0x150
Jul 03 19:36:27 fedora kernel:  RmIoctl+0xac3/0xda0 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? os_acquire_spinlock+0x12/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? portSyncSpinlockAcquire+0x18/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  rm_ioctl+0x66/0x4f0 [nvidia]
Jul 03 19:36:27 fedora kernel:  nvidia_ioctl.isra.0+0x450/0x810 [nvidia]
Jul 03 19:36:27 fedora kernel:  nvidia_unlocked_ioctl+0x1d/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  __x64_sys_ioctl+0x97/0xc0
Jul 03 19:36:27 fedora kernel:  do_syscall_64+0x7b/0x160
Jul 03 19:36:27 fedora kernel:  ? do_syscall_64+0x87/0x160
Jul 03 19:36:27 fedora kernel:  ? nvidia_unlocked_ioctl+0x1d/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? __x64_sys_ioctl+0x97/0xc0
Jul 03 19:36:27 fedora kernel:  ? syscall_exit_to_user_mode+0x10/0x210
Jul 03 19:36:27 fedora kernel:  ? do_syscall_64+0x87/0x160
Jul 03 19:36:27 fedora kernel:  ? nvidia_unlocked_ioctl+0x1d/0x30 [nvidia]
Jul 03 19:36:27 fedora kernel:  ? __x64_sys_ioctl+0x97/0xc0
Jul 03 19:36:27 fedora kernel:  ? syscall_exit_to_user_mode+0x10/0x210
Jul 03 19:36:27 fedora kernel:  ? do_syscall_64+0x87/0x160
Jul 03 19:36:27 fedora kernel:  ? filp_flush+0x5b/0x80
Jul 03 19:36:27 fedora kernel:  ? syscall_exit_to_user_mode+0x10/0x210
Jul 03 19:36:27 fedora kernel:  ? do_syscall_64+0x87/0x160
Jul 03 19:36:27 fedora kernel:  ? syscall_exit_to_user_mode+0x10/0x210
Jul 03 19:36:27 fedora kernel:  ? do_syscall_64+0x87/0x160
Jul 03 19:36:27 fedora kernel:  ? exc_page_fault+0x7e/0x1a0
Jul 03 19:36:27 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 03 19:36:27 fedora kernel: RIP: 0033:0x7fc2ec30eaad
Jul 03 19:36:27 fedora kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
Jul 03 19:36:27 fedora kernel: RSP: 002b:00007ffed0b757f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 03 19:36:27 fedora kernel: RAX: ffffffffffffffda RBX: 0000000000000030 RCX: 00007fc2ec30eaad
Jul 03 19:36:27 fedora kernel: RDX: 00007ffed0b75950 RSI: 00000000c030462b RDI: 000000000000000b
Jul 03 19:36:27 fedora kernel: RBP: 00007ffed0b75840 R08: 00007ffed0b75950 R09: 00007ffed0b75978
Jul 03 19:36:27 fedora kernel: R10: 00007fc2d5b17b54 R11: 0000000000000246 R12: 000000000000000b
Jul 03 19:36:27 fedora kernel: R13: 00000000c030462b R14: 000000000000002b R15: 00007ffed0b75850
Jul 03 19:36:27 fedora kernel:  </TASK>
Jul 03 19:36:27 fedora kernel: NVRM: _kgspLogXid119: ********************************************************************************
Jul 03 19:36:27 fedora kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 76!
Jul 03 19:36:27 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Call timed out [NV_ERR_TIMEOUT] (0x00000065) returned from kgrobjPromoteContext(pGpu, pKernelGraphicsObject, pKernelGraphics) @ kernel_graphics_object.c:223
Jul 03 19:36:33 fedora kernel: NVRM: Xid (PCI:0000:03:00): 119, pid=3182, name=memtest_vulkan, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 103 (GSP_RM_ALLOC) (0xcab5 0x8).
Jul 03 19:36:33 fedora kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 103!
Jul 03 19:36:33 fedora kernel: NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0005e; hParent=0xbeef0100; hObject=0xbeefa0b5; hClass=0x0000cab5; paramsSize=0x00000008; paramsStatus=0x00000000; status=0x00000065
Jul 03 19:36:39 fedora kernel: NVRM: Xid (PCI:0000:03:00): 119, pid=3182, name=memtest_vulkan, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 103 (GSP_RM_ALLOC) (0xcab5 0x8).
Jul 03 19:36:39 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: Back to back GSP RPC timeout detected! GPU marked for reset @ kernel_gsp.c:2314
Jul 03 19:36:39 fedora kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 103!
Jul 03 19:36:39 fedora kernel: NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0005e; hParent=0xbeef0101; hObject=0xbeef8500; hClass=0x0000cab5; paramsSize=0x00000008; paramsStatus=0x00000000; status=0x00000065
Jul 03 19:36:45 fedora kernel: NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:03:00 (printing 1 of every 30).  The GPU likely needs to be reset.
Jul 03 19:36:51 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Call timed out [NV_ERR_TIMEOUT] (0x00000065) returned from pRmApi->Control(pRmApi, pGpu->hInternalClient, pGpu->hInternalSubdevice, NV2080_CTRL_CMD_INTERNAL_LOG_OOB_XID, &params, sizeof(params)) @ gpu.c:6468
Jul 03 19:36:51 fedora kernel: NVRM: Xid (PCI:0000:03:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
Jul 03 19:36:57 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Call timed out [NV_ERR_TIMEOUT] (0x00000065) returned from kgrobjPromoteContext(pGpu, pKernelGraphicsObject, pKernelGraphics) @ kernel_graphics_object.c:223
Jul 03 19:37:09 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
Jul 03 19:37:09 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
Jul 03 19:37:09 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:1375
Jul 03 19:37:21 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ mem.c:179
Jul 03 19:37:27 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ vaspace_api.c:538
Jul 03 19:37:45 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ mem.c:179
Jul 03 19:37:51 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ vaspace_api.c:538

Can still type in the console but reset doesn't happen and shutdown command doesn't complete if given or I didn't wait long enough for timeouts..

Otherwise the card seems to work okay, even at max memory clock. Seems Palit prefer to push for distributors handling problems which would probably mean giving the distributor the card and waiting 3 weeks for the same card or replacement.

PS your program seems familiar, cant help thinking we may have met on Overclock.net in the past.
 
No Email acknowledgement from Palit for a while now, seems they would rather eat an RMA.

A little testing before that

Heaven result
HB1.jpg


Looking at tomb raider but DLSS seems broken.
 
Some feedback back from Palit email support. Basically if operating properly at default settings it's not considered for warranty so looks like I'm stuck with it for life as I wouldn't feel right selling a card that I know has a fault on it. Still maybe I'll get lucky and it gets fixed with a new VBIOS or driver version.
 
Just got a Palit GamerPro RTX 5070 12GB so sharing some findings.

It's the non OC model so 250W TGP max both VBIOS.

Can run Steel Nomad at around 3GHz @0.975V and 2125MHz mem for a score of 56 FPS, X99 mainboard and CPU are 10yrs+ now so a bit long in the tooth. PCIe 3 plus DDR4 2133MT/s RAM. See there are 60+ scores, don't know if that's with higher TGP (300W) or not.

Memory clock set to around 1653MHz results in artifacts / lockup requiring power button intervention, ugg. A few seldom times TDR to the rescue or even a bug check once with the nvidia driver. Thought it might be a driver issue but nobody else reporting this! Any thoughts @StViolenceDay ?

Voltage doesn't report dropping below 0.8V, well a drop to 0.795V sometimes, even in PState 8?

Seems very leaky, wouldn't like to try a HW volt mod with this one.

VRAM temps get a little high when loaded before fan kicks in, IIRC up to 96C if that temperature is correct.

Memory limits seem a bit restrictive, (-2000MT/s to +6000MT/s) wonder what the reason for that is.

So anyone else running one of these, what are your thoughts?
The voltage seems normal. I had a 5070 Ti Asus Prime, granted a faulty one, and that one could not go under 0.8V no matter what I did with it. Any combination of NVCP, Windows and BIOS settings could not lower it. In fact, it frequently went above, even at idle (P8). Just seems to be how these cards roll.

You might have gotten unlucky with that VRAM, though. If I understand correctly it still runs fine at base settings? I'd say keep it and roll with it if it does.
 
I'd say keep it and roll with it if it does.
Yeah, just a 6MHz band at 1650.125MHz to 1656.125MHz that it happens. The rest of the range from 1600MHz to 2125MHz seems fine which makes it peculiar. Tried OC VBIOS since the manufacturer is not going to do anything, as suggested by @StViolenceDay. Same result at 1650.125MHz.

It's a shame the top memory clock is limited to +6000MT/s, it would be interesting to see how far it might go. Samsung was claiming 40GT/s achievable for 28Gbps chips which is +1200MT/s and coincedentally aligns with the maximums memory table IINM. Perhaps the chips get too hot on consumer boards at these higher clocks?

Yep, seen quite a few 5070 /Ti's all showing 0.8V minimum, some 5060's show in the 0.6V range though. Seems strange to me as does Blackwell but maybe I'm just overthinking it, as if Blackwell wasn't given 100% effort by Nvidia. Maybe AI was distracting?

Made another Steel Nomad run with the 300W VBIOS, managed to score in the 60's but only just.

6015.jpg


Probably stick to my lower 0.975V 3.0GHz setting though and revert to the original 250W VBIOS or updated if Palit produces one.
 
Thought I'd checkout the hotspot temperature today since it's no longer supplied to API's? and was curious if there might be something ominous going on. :laugh:

Fans set to 70%, a bit noisy for me TBH. Fixed voltage 1.025V running Steel Nomad Windowed. Power gets to about 270W - 280W. T4 is the hottest temperature reported, GPU temp is what is normally reported.


HS2.png


Interesting, almost 20C difference on load, well one point. Mostly averages between 14 and 15. Would perhaps be even more interesting to see on higher power GPU's?

Memory temp stopped around 68C but one thing to be aware of on this card is the default fan setting doesn't have them kicking in until GPU temp is around 60C so memory can get quite hot in some situations. Personally saw 96C before fans came on so a little adjustment in order I think for my own preference. Not sure if that temperature is the hottest chip or not, would have been really nice to be able to request readings for each chip. And what's with Nvidia-smi and memory temperature, didn't the smi department get the memo or something?
Code:
[alex@fedora ~]$ nvidia-smi -q -d temperature

==============NVSMI LOG==============

Timestamp                                 : Tue Jul 15 20:28:23 2025
Driver Version                            : 575.64.03
CUDA Version                              : 12.9

Attached GPUs                             : 1
GPU 00000000:03:00.0
    Temperature
        GPU Current Temp                  : 37 C
        GPU T.Limit Temp                  : 48 C
        GPU Shutdown T.Limit Temp         : -5 C
        GPU Slowdown T.Limit Temp         : -2 C
        GPU Max Operating T.Limit Temp    : 0 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating T.Limit Temp : N/A


Clocks are shown with requested and effective clocks. As can be seen even though the requested clock drops the effective clock can drop even further usually unbeknown to the user.
 
Last edited:
Regarding getting T4 - is it acessible via APIs as some "unspecified temp" or are you using direct reading of PCIe BAR registers?

(I mean BAR0 representing GPU control registers mapped&accessible by CPUs physical-address-space, I suppose 64MB-sized since RTX5x00 series)
 
Can you set up the curve the way it outperforms vanilla 3070 Ti but draws <120 W of power?
 
@StViolenceDay BAR0, it would be nicer for Nvidia to provide API access then can run in userland instead of privileged!

@Macro Device A while back I tried running at 0.8V (minimum settable on this card) to see how far I could go but just have some scribbled notes. 2GHz Steel Nomad 40.3FPS at 116W. 2.1GHz crashed. Would need to run again to confirm. Be aware I'm using HW that is 10+ years old. I don't think it makes a great deal of difference on Steel Nomad but for reference my dead RTX 3070 scored about 35FPS.
 
Take it as a yes. Thanks!
 
Some hotspot results for the 5070

Screenshot From 2025-07-22 12-11-25.png


GTemp is from Nvapi call NvAPI_GPU_GetThermalSettings while Th? is from NvAPI_GPU_ThermChannelGetStatus. T? temperatures are from BAR0. Some of these readings appear to be duplicates, here's a temperature grouping list from low to high.

T2, T6, T12
T5, T11
T19, T23
T1
T9, T15, T17, T21
T18, T22
T3, T7, T13
T4, T8, T10, T14, T16, T20

GTemp seems to have usually been take from T1 which itself may be a calculated temperature but in the case of the 5070 doesn't seem to be directly linked to any BAR0 temperature, unless I missed some. Instead it appears to be an average as is Th1. For these results GTemp / Th1 is around 6C lower than T1 at the higher readings. Note that makes the hotspot delta higher in that case than if T1 were used. Might cause some people to excited in that case even though it's not so bad?

The staircase type plot of Th2 is the memory temperature which has a granularity of 2C.


An example with a GTX 1660 Super

Screenshot From 2025-07-22 12-58-13.png

Temperature grouping

T2, T9, T10
T11, Th7
Th5
T1, T6, Th1, GTemp
T5, Th3
T3
T12, Th8
Th6
T8
T4, T7, Th2, Th4

So less BAR0 T? readings and more Th? readings on the 1660.

Some changes to VBIOS structure with 50 series and BAR0 too. Before GPU temps started at offset 0x20400 from at least Pascal which was T1 and from my limited testing GTemp. Hotspot T4 was at 0x2046c. Now for 5070 T1 is at offset 0xAD0400 and T4 at 0xAD046c, a least for my card.
 
Great thanks for offsets, would be very useful!

Did you ever investigate how VRAM temp is got from BAR0 registers? The nvidia priprietary bootable "MODS" diagnostic toolset can show per-VRAM-IC temp for GPUs repirting VRAM temps in GPU-Z - ( thats cards with GDDR6X memory and Quadro cards with gddr6 memory like Ampere A2000 12GB).

So, for example for 3090 "Mods" reports 24 distinct temps (1 temp for each of 24 VRAM ICs), while GPU-Z only report single max(looks like that, not sure that its max). Separate VRAM IC temps are very useful to understand the reason of being hot: if all ICs are hot - its week cooling system, but if only 1-2 VRAM ICs are 10+ degree hotter than others - thats weak thermal pads on those ICs
 
Back
Top