• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GP100 Silicon Moves to Testing Phase

throw in the nv greed factor and it is $650.

welcome to 2015 nvidia, amd has been waiting a for you.

You call it greed, but in the free market, companies only charge what people are willing to pay. As long as people are buying boatloads of a product at a given price, it won't drop.

If I manufactured a widget, and people were willing to buy all I made at $200, despite some people saying it's too expensive, I'm going to sell them for $200, because I have my own expenses and bills to pay for and my family to provide for.
 
Last edited:
My point was that there is nothing FP32 can't do that FP16 can but there is a lot FP32 can't do that FP64 can. We should be moving towards FP64 and not away from it unless the architecture is designed in a such away to allow four FP16, two FP32, or one FP64 operation to be completed simultaneously. I know that is not true of FP64 so I doubt it is true of FP16 either. It makes more sense to make a FP64 monster and feed it with FP16 and FP32 workload than to focus on FP16 and nerf FP64.
 
So, the price tag associated with this particular product indicates nothing about the final price tag. When you ship something like this you do the replacement value of the goods+20%. As this is a one-off process, and the goods are theoretically having a setup and confirmation run just for this part, the value will be huge. You've got setup time, operator time, development time, etc... Combine this with a limited engineering sample of the HBM2 memory, and you've got an astronomically priced one-off card. This is why people don't commission custom GPUs.

As far as performance, I'd settle for a 50% performance improvement with a much smaller cooling budget. Nvidia has done well with their recent offerings, but not having GDDR5 and actually running these cards cooler would allow for either less power draw (can you say gaming rig on more often?), or a smaller size allowing for amazing performance.



Cynic inside me also has one question. Is the cited 60-90% improvement raw compute, or improvements based around DX12 mathematics. Given this is basically FUD, we'll only find out next year. Set anticipation to reasonable...


Edit:
Tripping speed-traps already? Sounds like this is gonna be one fast processor!! ba bum tss!!:D

Seriously though, goodbye 28nm, you shall not be missed, about time we moved to a smaller process :rockout:

I can see the sentiment, but heartily disagree. 28nm has had some amazing times, due in no small part to not leaving in a reasonable fashion. We've seen the 6xx, 7xx, and 9xx series (arguably the 9xx is the best DX11 will be getting) from Nvidia. We've seen the 7xxx, 2xx, and 3xx series from AMD (arguably the cards that made bitcoin famous/infamous). 28nm is something really worth celebrating.

That said, it's time for 16/14nm. Good lord, do we need a change.



Surprisingly though, think back on zombie processes. There are still parts built on the 45nm process, SB-e was 32nm with a 65nm PCH, and even the flagship PCH of Intel (Z170) is built on the 22nm lithographic process (CPUs really are the exception to the rule that low cost processes drive silicon adoption, but somebody has to be the trailblazer).
 
Last edited:
I take it this is just a the GP100 TSCM silicon chip... I suppose these can have some functionality and operational parameters checked; however it's not until they have such chips and HMB2 added to a interposer that any real design validation can be made. It seems like it would be at least 6-8 months of everything going correct and moving forward without any glitch's before Nvidia has the professional (Tesla) versions fully vetted, finalized ready to intro to HPC clients. Then real production would commence, and it probably a good 10-12 weeks to have complete packaged interposers and mount to PCB's. So figure end 2016 for Tesla cards start moving to fulfill the HPC orders. This news is interesting, but it would be more substantive for gamers if this was about the GP104, though much of the work is congruent and should quickly parlay into the first gaming card.
 
welcome to 2015 nvidia, amd has been waiting a for you.

Actually nVIDIA is way ahead of AMD. HBM1 doesn't seem to provide any tangible performance benefits at this time, and the 4GB limitation will hit everyone like a brick towards the second half of next year. So it made ZERO sense for nVIDIA to make a HBM1 product.

HBM2 provides a MINIMUM of 16GB of VRAM at twice the speed/bandwidth. If AMD survives until HBM2 you'll see marketing from them that will make the Fury X look like a school project. But since AMD is very-very-very quiet about HBM2, I don't think they'll get there too soon.
 
My point was that there is nothing FP32 can't do that FP16 can but there is a lot FP32 can't do that FP64 can. We should be moving towards FP64 and not away from it unless the architecture is designed in a such away to allow four FP16, two FP32, or one FP64 operation to be completed simultaneously. I know that is not true of FP64 so I doubt it is true of FP16 either. It makes more sense to make a FP64 monster and feed it with FP16 and FP32 workload than to focus on FP16 and nerf FP64.

It's a speed vs accuracy argument, and for most consumer cases, FP32 is plenty. In some niches, FP16 is plenty, and for a minor increase in die are, you can pretty much double performance by using FP16 instead of FP32 (neural nets, low-fidelity image processing and redering effects). And quite obviously, demand from enough very well-connected devs is there too, else nV wouldn't bother at all. Expect AMD to come up with something very similar.

I take it this is just a the GP100 TSCM silicon chip... I suppose these can have some functionality and operational parameters checked; however it's not until they have such chips and HMB2 added to a interposer that any real design validation can be made. It seems like it would be at least 6-8 months of everything going correct and moving forward without any glitch's before Nvidia has the professional (Tesla) versions fully vetted, finalized ready to intro to HPC clients. Then real production would commence, and it probably a good 10-12 weeks to have complete packaged interposers and mount to PCB's. So figure end 2016 for Tesla cards start moving to fulfill the HPC orders. This news is interesting, but it would be more substantive for gamers if this was about the GP104, though much of the work is congruent and should quickly parlay into the first gaming card.

Interposer assembly would be likely be done at the fab, since the interposer is really just another silicon die you drop other dies on, and where else beside fabs would you find the tech to do such an assembly?

Most likely SK Hynix is shipping full HBM2 wafers straight to TSMC for final assembly on the interposers.
 
Actually nVIDIA is way ahead of AMD. HBM1 doesn't seem to provide any tangible performance benefits at this time, and the 4GB limitation will hit everyone like a brick towards the second half of next year. So it made ZERO sense for nVIDIA to make a HBM1 product.

HBM2 provides a MINIMUM of 16GB of VRAM at twice the speed/bandwidth. If AMD survives until HBM2 you'll see marketing from them that will make the Fury X look like a school project. But since AMD is very-very-very quiet about HBM2, I don't think they'll get there too soon.

HBM1 is four time faster than GDRR5, so you won't see anything new with HBM2, just higher price tag.

"School project" just remember that AMD released first GDRR5 too .
i think some of nvidia team is going to this school.
 
Releasing "first" something is like posting "First!" on a YouTube video. The idea is to actually gain something by using that technology. When you have the competition's 256bit/384bit GDDR5 cards matching or outperforming your HBM1 cards, that you invested heavily in and that also introduce availability problems, how is it a smart move?

AMD also released the first 1GHz processor by a few hours. They also released the first desktop 64 bit processor. They also released the first desktop processor with an integrated memory controller. None of those things gave them any tangible advantages. Some did, but they had no idea about how to exploit them. And look at them now. It's not about being "first", it's about perfecting something until it makes sense to implement it in an actual product. Which is something Apple does for years.

HBM1 doesn't make sense now. It did one or two years ago, about that time when nVIDIA presented their "test" HBM board. They showed it to us, as in "we're working on it", and when it makes sense, they'll release it.
 
Releasing "first" something is like posting "First!" on a YouTube video. The idea is to actually gain something by using that technology. When you have the competition's 256bit/384bit GDDR5 cards matching or outperforming your HBM1 cards, that you invested heavily in and that also introduce availability problems, how is it a smart move?

AMD also released the first 1GHz processor by a few hours. They also released the first desktop 64 bit processor. They also released the first desktop processor with an integrated memory controller. None of those things gave them any tangible advantages. Some did, but they had no idea about how to exploit them. And look at them now. It's not about being "first", it's about perfecting something until it makes sense to implement it in an actual product. Which is something Apple does for years.

HBM1 doesn't make sense now. It did one or two years ago, about that time when nVIDIA presented their "test" HBM board. They showed it to us, as in "we're working on it", and when it makes sense, they'll release it.
Yeah right i still remember 7970 and 680 , when most nvidia cards had less memory than amd and almost every nvidia user told me, it useless to have 3-4gb memory on a videocard. where 680 is now ? while amd is still selling same old chip under new name .. and it's still pretty good.
 
Last edited:
Interposer assembly would be likely be done at the fab, since the interposer is really just another silicon die you drop other dies on, and where else beside fabs would you find the tech to do such an assembly?

Most likely SK Hynix is shipping full HBM2 wafers straight to TSMC for final assembly on the interposers.

Well, I don't know... have we heard anything that TSMC is making the interposers? I suppose that could be a possibility especially if HBM2 has some different technique. TSMC may still have a older fab process that they could transition to making a interposer, but IDK if they'd even get into the packaging on the interposer. If they intended to offer that on a large scale I'd think they would've like to have started cutting their teeth doing it for AMD.

AMD has interposer coming from UMC made on their 65nm process, while then Amkor handling the packaging and assembly of chip and HBM onto the interposer.
 
Last edited:
I take it this is just a the GP100 TSCM silicon chip... I suppose these can have some functionality and operational parameters checked; however it's not until they have such chips and HMB2 added to a interposer that any real design validation can be made.
That isn't how it works. The raw silicon is wired into a test rig that simulates the memory subsystems (along with other parameters such as PCI-E interface for bus simulation/validation for platform and multi-GPU) The silicon is then put through a series of logic runtime verification and validation protocol (RTL - register transfer level test/validation) -to make sure the logic blocks work as intended - power management (inc clock gating and RTAPI), and power verification to ensure that the individual logic blocks exhibit was within design parameters. Obviously there is a ton of other validation also, but the test/verification team do not need the whole card (or interposer in this case) to validate the silicon - interface with any DDR3 RAM would suffice
It seems like it would be at least 6-8 months of everything going correct and moving forward without any glitch's before Nvidia has the professional (Tesla) versions fully vetted, finalized ready to intro to HPC clients. Then real production would commence, and it probably a good 10-12 weeks to have complete packaged interposers and mount to PCB's. So figure end 2016 for Tesla cards start moving to fulfill the HPC orders.
The initial timetable was to be early Q3. Deviation on that obviously depends upon a number of factors - whether the silicon needs revision, yields of 16nmFF+ ( I am surprised that the TSMC naysayers haven't commented on the companies ability to produce a monolothic GPU on the process), and HBM2 production timetable. Samsung have said volume production should start in early 2016. All in all, the best actual indicator would be an announced HPC contract, of which I don't think there are any at the moment (the announced HPC contracts are for Volta in 2018).
This news is interesting, but it would be more substantive for gamers if this was about the GP104, though much of the work is congruent and should quickly parlay into the first gaming card.
True enough. GP100 won't see the light of day - at least initially, as a GeForce card. Nvidia seems to have a built in ready market for Tesla SKUs judging the groundswell in deep learning programming taking place. Makes no sense to sell a card to a gamer for $1K when you can easily sell the same product for $4-5K to a company investing in coding for artificial neural networking.
Well, I don't know... have we heard anything that TSMC is making the interposers? I suppose that could be a possibility especially if HBM2 has some different technique. TSMC may still have a older fab process that they could transition to making a interposer, but IDK if they'd even get into the packaging on the interposer. If they intended to offer that on a large scale I'd think they would've like to have started cutting their teeth doing it for AMD.
AFAIK, most foundries looking to get into interposer manufacture were waiting on test/validation tooling to catch up with what is largely an old foundry process (which probably added to the Fury delays). This chart gives an idea of who is involved in the die stacking business, and in what capacity (it isn't up to the minute, but is pretty current).
xJ0OSuE.jpg


AMD has interposer coming from UMC made on their 65nm process, while then Amkor handling the packaging and assembly of chip and HBM onto the interposer.
Correct, and I wouldn't be surprised to see Amkor (or ASE) responsible for the interposer integration for Pascal either.
 
Last edited:
That isn't how it works. The raw silicon is wired into a test rig that simulates the memory subsystems (along with other parameters such as PCI-E interface for bus simulation/validation for platform and multi-GPU) The silicon is then put through a series of logic runtime verification and validation protocol (RTL - register transfer level test/validation) -to make sure the logic blocks work as intended - power management (inc clock gating and RTAPI), and power verification to ensure that the individual logic blocks exhibit was within design parameters. Obviously there is a ton of other validation also, but the test/verification team do not need the whole card (or interposer in this case) to validate the silicon - interface with any DDR3 RAM would suffice

Oh yeah, forgot they could do that..

You wouldn't get much use out of testing using lower clockspeeds though, since that gives no info on actual power and thermal properties, and HBM/HBM2 are already running at (comparatively) low speeds, so lowering further wouldn't be too useful either in reducing crosstalk and general noise to make it work over longer distances. Also, I don't think you could sub in DDR3, since HBM (afaik) uses completely different electrical protocols for data transfer... As for testing PCI-E & RTL, it's not too hard to do such a thing on hardware if you're willing to sacrifice runtime speed, but then you don't gain anything since you miss timing issues (at least, that was my experience with one of my FPGA projects...). Personally, I think this is the first revision of final chips going out, possibly on a very custom interposer/PCB combo with a pile more memory debugging tools and measurement points...

Of course, I could be completely wrong.. It's not like I've worked on tapeout before, just inferring a lot...
 
HMB2 added to a interposer that any real design validation can be made.

Obviously there is a ton of other validation also, but the test/verification team do not need the whole card (or interposer in this case) to validate the silicon - interface with any DDR3 RAM would suffice.
Correct, probably not the proper choice of words; design validation can be done at the chip level and they'll have a good idea of the performance they should see. I think there will be just a little more work testing (at least another layer for the complete package) that hadn't been there in past... before the rubber can meet the road.
 
Personally, I think this is the first revision of final chips going out, possibly on a very custom interposer/PCB combo with a pile more memory debugging tools and measurement points...
At this stage in development that is probably a given. I think the GP100 at this stage looks more like something out of Frankenstein's lab than any polished and recognizable consumer product. The last test/verification silicon I saw looked like a throwback to wire-wrapped DIY/homebuilts from the early 70's.
Also, I don't think you could sub in DDR3, since HBM (afaik) uses completely different electrical protocols for data transfer...
Probably poor choice of example on my part. I was trying to convey that to verify/validate the memory controllers it should be possible to verify via single channel DDR ( rather than the whole HBM wide interface). Is it not possible to have multiple single channel DDR emulate HBM for test/verification purposes?
 
Last edited:
At this stage in development that is probably a given. I think the GP100 at this stage looks more like something out of Frankenstein's lab than any polished and recognizable consumer product. The last test/verification silicon I saw looked like a throwback to wire-wrapped DIY/homebuilts from the early 70's.

I'm not so sure.. have you seen the really low-level devkits Intel sells? They clip onto the topside pads of the CPUs and give you essentially complete control over the CPU's pipeline, including (to the best of my knowledge at least..) full ability to stop the processor and step through instructions, and even go back.. Among the various bits they grant, they also pull a bunch of info about strange stuff like hit rates and such... They may well be using near-final packages with the pins broken out by the PCB instead...

Either way, can't be too far off now... 6-9months I'd say...
 
If people want to make statements about the pricing in a negative manner it seems churlish given the recent history going back to what.... 8800 Ultra days? Flagship = most expensive (unless of course your flagship card under performs your main flagship card but you still sell it for $650 anyway).
Neither brand produces 'affordable' flagships these days. Unfortunately.
throw in the nv greed factor and it is $650.
welcome to 2015 nvidia, amd has been waiting a for you.
Wow amazing how quick everyone forgot that Fury X was gonna be a 850$ card but thanks to Nvidia putting 980ti at 650$, Fury X end up being sold at 650$. Lets not forget about gtx970 being sold at 330$ which FORCED 290x at time to near same price from its what 500$ price tag. So lately its been nvidia forcing prices down not AMD.

60 - 90% performance improvment. Well this certainly looks promising, but I guess the price will also be TITAN like.

Anyway I'll be waiting for a full GP104 based product.
60-90% being its from nvidia does can be held credible since nvidia does have track record of telling truth performance wise of their card (cue the people to bring up the gtx970 issue).
 
60-90% being its from nvidia does can be held credible since nvidia does have track record of telling truth performance wise of their card (cue the people to bring up the gtx970 non-issue).

FTFY :P
 
Releasing "first" something is like posting "First!" on a YouTube video. The idea is to actually gain something by using that technology. When you have the competition's 256bit/384bit GDDR5 cards matching or outperforming your HBM1 cards, that you invested heavily in and that also introduce availability problems, how is it a smart move?

AMD also released the first 1GHz processor by a few hours. They also released the first desktop 64 bit processor. They also released the first desktop processor with an integrated memory controller. None of those things gave them any tangible advantages. Some did, but they had no idea about how to exploit them. And look at them now. It's not about being "first", it's about perfecting something until it makes sense to implement it in an actual product. Which is something Apple does for years.

HBM1 doesn't make sense now. It did one or two years ago, about that time when nVIDIA presented their "test" HBM board. They showed it to us, as in "we're working on it", and when it makes sense, they'll release it.

Actually releasing the first 64 bit processor is about the only thing AMD has done that hand significant tangible benefits. AMD has to license x86, but Intel has to license x64 from AMD since AMD was the author that everyone went with instead of Intel's implementation.
 
It's a speed vs accuracy argument, and for most consumer cases, FP32 is plenty. In some niches, FP16 is plenty, and for a minor increase in die are, you can pretty much double performance by using FP16 instead of FP32 (neural nets, low-fidelity image processing and redering effects). And quite obviously, demand from enough very well-connected devs is there too, else nV wouldn't bother at all. Expect AMD to come up with something very similar.

Its more directional to where Nvidia wants to take Pascal with CUDA

New Features in CUDA 7.5 - 16-bit Floating Point (FP16) Data

as far as AMD

GCN12ISA.png

Fiji should have an improvement over Tonga. One would expect Artic Islands to improve from Fiji.

Game wise I just think it will be used for artsy post processing effects..
 
Last edited:
It's a speed vs accuracy argument, and for most consumer cases, FP32 is plenty. In some niches, FP16 is plenty, and for a minor increase in die are, you can pretty much double performance by using FP16 instead of FP32 (neural nets, low-fidelity image processing and redering effects). And quite obviously, demand from enough very well-connected devs is there too, else nV wouldn't bother at all. Expect AMD to come up with something very similar.
GCN 1.2 already supports FP16 but they nerfed FP64 to get it.

Edit: @Xzibit beat me to it.
 
Why would you choose to highlight a report from over two months ago over reports from two days ago stating volume production in Q1 ?
Problem with report as AMD getting "priority" as claimed in that other guys story, Is AMD even close to being ready with artic islands to make use of that so called priority. Could see Pascal in q2 but is AMD gonna have one before q3? if not then screwing nvidia over when they are willing to pay for the chips now and use them would be kinda stupid.
 
Back
Top