• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Half my Titan Z doesn't support CUDA?

DrNewcenstein

New Member
Joined
Oct 31, 2017
Messages
4 (0.00/day)
So after a fair amount of system tinkering, it's come to my attention that half of my Titan Z somehow stopped having CUDA capabilities. Latest GPU-Z (as well as the previous version) has the CUDA and PhysX boxes unchecked on #2, yet OpenCL is checked. Both halves are visible in Device Manager, GPU-Z, GPU Shark, Daz Studio, Nvidia Control Panel, MSI Afterburner, EVGA Precision X, Nvidia-smi.exe, nvflash, and pretty much anything else that's supposed to be able to see two GPUs named Titan Z.

Obviously this a double-decker card with an internal SLI connection. Can this be disabled, and would it resolve the issue?

I've flashed the BIOS (which had previously fixed an issue where the second half was completely invisible to everything) and both halves display the correct BIOSes: 80.80.5A.00.01 and 80.80.5A.00.02.
The 80.80.5A.00.02 side is what's jacked up.

I've flashed in the ROMs found here - NVIDIA.GTXTITANZ.6144.140414_2, MSI.GTXTITANZ.6144.140414_3, NVIDIA.GTXTITANZ.6144.140414_4 (which are all "Side 2" versions).
When I put the card under stress, MSI AB monitor shows 0 activity level on the 2nd half of the Titan Z while the front half is being choke-slammed.

I've used every driver revision from 387.92 to the newest and shiniest 417, using DDU to swap them out cleanly.

I need to update my system specs listed below, but it's the non-display unit in my HP Z600 where the Quadro K4000 is primary.
I have the latest BIOS for the PC installed.
Win 7 64 Pro
 
Joined
May 8, 2016
Messages
1,741 (0.60/day)
System Name BOX
Processor Core i7 6950X @ 4,26GHz (1,28V)
Motherboard X99 SOC Champion (BIOS F23c + bifurcation mod)
Cooling Thermalright Venomous-X + 2x Delta 38mm PWM (Push-Pull)
Memory Patriot Viper Steel 4000MHz CL16 4x8GB (@3240MHz CL12.12.12.24 CR2T @ 1,48V)
Video Card(s) Titan V (~1650MHz @ 0.77V, HBM2 1GHz, Forced P2 state [OFF])
Storage WD SN850X 2TB + Samsung EVO 2TB (SATA) + Seagate Exos X20 20TB (4Kn mode)
Display(s) LG 27GP950-B
Case Fractal Design Meshify 2 XL
Audio Device(s) Motu M4 (audio interface) + ATH-A900Z + Behringer C-1
Power Supply Seasonic X-760 (760W)
Mouse Logitech RX-250
Keyboard HP KB-9970
Software Windows 10 Pro x64
Did a promt pop-up for enabling SLI ?
 
Joined
Jan 8, 2017
Messages
8,943 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
This is a known bug with SLI configurations, as long as both GPUs are selectable as CUDA devices in the Nvidia Control Panel it's fine.
 
Last edited:
Joined
Feb 18, 2005
Messages
5,238 (0.75/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Logitech G613
Software Windows 10 Professional x64

DrNewcenstein

New Member
Joined
Oct 31, 2017
Messages
4 (0.00/day)
Did a promt pop-up for enabling SLI ?

Nope. A Titan Z is 2 GPUs on one frame, so the SLI connection is some kind of internal thing, not the external cable as you'd use with 2 physically separate GPUs in 2 separate slots. I got a "this system is multi-GPU capable" message, and NVCP lets me choose "Maximize 3D" where all GPUs work together for rendering, or Disable Multi-GPU for running multiple monitors and each GPU runs independently.

This is a known bug with SLI configurations, as long as both GPUs are selectable as CUDA devices in the Nvidia Control Panel it's fine.

Yes, I can select All, the K4000, or either half of the Titan Z. I usually leave the K4000 out of it.

Sounds like it could be a driver bug similar to the one reported here? https://www.techpowerup.com/forums/...ond-card-missing-cuda-tooltip-in-gpu-z.244757

I would suggest you contact NVIDIA directly regarding this issue and please share their response here so that others with the same issue can find the answer.
That thread reports the issue was addressed with the 4xx drivers. However, no one in that thread explicitly mentions a Titan Z, so I don't know how different the SLI operation is between that and physically separate GPUs.

However, I'm now running into a more pressing matter: for the longest time, I ran the 387.92 drivers because they were stable. The Titan Z had issues with 389, IIRC, so I went back and stayed there until I got a 1080ti (about a year after getting a Titan X (Pascal), which I also ran on 387.92 since all I played was Skyrim and Daz Studio Iray worked just fine).
With all the reconfiguring I've been doing, as noted earlier, when I put the Titan Z and K4000 in the same machine, then did a DDU-cleaned install of 387.92, I got nothing from the Titan Z, even after several reboots and checking the simple stuff. No Device Manager appearance, not even "Other Devices" with the generic icon and triangle.
DDU'd again and put in 392-something and got the same thing.
DDU'd again and put in 411 and it worked, but came right back to the present issue where Side 2 doesn't contribute, and THEN started getting "This device cannot be ejected", as if it were a USB device, followed by "Display Driver stopped working and recovered", along with "Daz Studio encountered and needs" errors. I use Daz Studio on this machine to the almost exclusion of all else, so for this to happen is a major issue.

Reboots usually hang after these faults, forcing a hard reset and a Safe Mode load and then a reboot, and it's 50/50 as to whether the Titan Z will reappear once Windows loads.

Another dance with DDU and put in the latest 417 drivers changes nothing.

(to clarify: my primary machine uses the Pascal and 1080ti, so putting one of those in place of the Titan Z is not where I want to be - I did it for about 2 days last week, and while I had 2 machines that were roughly equal, the loss of rendering speed in the one machine was excruciating. I went from one fast and one slow to two "ok I guess, but not as good").

To make sure it wasn't the MOBO going out, I replaced the Titan Z with one of my 780tis, and everything's fine. Swap that out for the other 780ti, everything's fine. Swap that out for a 980, everything's fine. Swap that out for a 1080ti, everything's fine. Put the Titan Z back in, and it's a dice-roll.

Magic 8-Ball says the Titan Z is having a terminal issue, but I'd prefer as many second opinions as I can get. And a Magic Cure. if there is one.

I keep telling myself I shoulda sold it 6 months ago during the GPU shortage when miners were paying $1500 for them used.

I've got a couple of other things to try before I take a hammer to it.
================================================================

EDIT:
Ok, so I went back and got the latest 417 drivers loaded *without installing GeForce Experience* this time. Card shows in Device Manager, still the same issue with GPU-Z (except PhysX capability check mark comes and goes - check it once it's checked. Refresh and it's unchecked. Refresh and it's checked. Or not).

I should note this exact GPU worked previously in this exact machine some few months ago, except for what I believe was an overheating issue that caused BSODs. But then, it was right next to the PC's PSU. I've got things shifted around a bit so there should be more clearance and less heat. Still, it is possible the heat issues have bjorked something.

At any rate, I got it loaded and stable. Disabled Multi-GPU in NVCPL, then check GPU-Z and it shows SLI Disabled, so that's how that gets done on a Titan Z. I had no idea.

Fire up Daz Studio with a basic single-figure scene and select one Titan Z for Photoreal device (it shows 2, obviously). Renders just fine. Run the device monitor function of Nvidia-smi as that's going on and it's clearly Side A of the card. Side B shows no activity.
Render completes in 30 seconds, device monitor shows everything at rest.
Deselect Side A and select Side B, render completes instantly without actually doing anything. No figure in the image, just a transparent background. Clear sign the card is being skipped completely.

I doubt Nvidia will want to see the debug dump from a product they stopped making.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

EDIT again because replying to your own post inserts it as an Edit:
I may have found the issue: In Device Manager>Properties sheet, on the Details pane, if I scroll down to Bus Reported Device Description, it says 3D Video Controller for Side 2 (the bad one).
Side 1 (the good one, that has all the outputs on it) shows as Video Controller (VGA Compatible).

Both are listed under Display Adapters, as they should be, so at least that part's right.

Shouldn't both of these show as Video Controller (VGA Compatible)? Anybody got a Titan Z that works can check this?

MORE EDITS:

DDU doesn't show a vBIOS version for the 2nd half of the Titan Z. Shows it for the 1st half. Shows it for the K4000. Hmmm. Fishy.
 
Last edited:
Top