Wednesday, September 30th 2009

NVIDIA GT300 ''Fermi'' Detailed

NVIDIA's upcoming flagship graphics processor is going by a lot of codenames. While some call it the GF100, others GT300 (based on the present nomenclature), what is certain that the NVIDIA has given the architecture an internal name of "Fermi", after the Italian physicist Enrico Fermi, the inventor of the nuclear reactor. It doesn't come as a surprise, that the codename of the board itself is going to be called "reactor", according to some sources.

Based on information gathered so far about GT300/Fermi, here's what's packed into it:
  • Transistor count of over 3 billion
  • Built on the 40 nm TSMC process
  • 512 shader processors (which NVIDIA may refer to as "CUDA cores")
  • 32 cores per core cluster
  • 384-bit GDDR5 memory interface
  • 1 MB L1 cache memory, 768 KB L2 unified cache memory
  • Up to 6 GB of total memory, 1.5 GB can be expected for the consumer graphics variant
  • Half Speed IEEE 754 Double Precision floating point
  • Native support for execution of C (CUDA), C++, Fortran, support for DirectCompute 11, DirectX 11, OpenGL 3.1, and OpenCL


Update: Here's an image added from the ongoing public webcast of the GPU Technology Conference, of a graphics card based on the Fermi architecture.

Source: Bright Side of News
Add your own comment

205 Comments on NVIDIA GT300 ''Fermi'' Detailed

#1
Benetanegia
amschip said:
I still wonder will that 3 billion transistors really make a difference taking into account all the added cpu kind functionality? gt200 was much bigger than rv770 and yet that difference didn't really scaled well.
Just my two cents...:)
It's not the transistor count what you have to take into account, it's 512 SPs which is 2.15x more than in GT200. That paired with all the improvements in threading and load balancing means that Fermi has probably more than twice the power than GT200 has. After reading the whitepapers, I don't think that anything of that added "cpu kind" functionality will cripple performance, on the contrary: latencies have been dramatically decreased, interconnect bandwidth increased, there are added schedulers and threads...

Regarding the last sentence that's not accurate really. If you put a GTX285 at the same clocks as the HD4870 reference clocks it would more than scale beyond, HD4890 clocks... It's just two different ways of doing things, Nvidia has had the OC advantage in almost every chip in the recent years, mainly because they aim at lower clocks to begin with. And that being said we have no clue which clocks will GT300 launch at, it could be anything between 600-800 Mhz. Lower and higher is unlikely. If it's close to 600Mhz, then GT200 would be 2x as fast as GT200, if it launched near to 800Mhz it would be much faster than that. Point is we don't know exactly how it will perform, but looking at the specs it becomes more and more evident it will not be slow.

EDIT: Before this becomes a discussion, I'm not fighting with you at all. I'm just stating the posibilities, answering your questions trying to offer the different angles.

DaedalusHelios said:
Yeah I don't really trust fudzilla enough to believe it 100%. I still think its a bit up in the air what release strategy they will have. Its not that I don't believe you Benetanegia, its that I don't believe fudzilla is truthful all the time.
I don't believe 100% either, and I'm not saying that's going to be true. But what I do think is that writen words that are claimed to come from a CEO >>>>>>>>> speculation and thoughts of a member with no info to back his claims. So since all this is speculation, and all of us are talking from speculation, I put both things in a balance and I have no doubts as to which posible, especulated, reality is the one with more probabilities. Specially since most of the other info there regarding GTC is true. Even if Fudzilla is not the most believable source, truth is that with GT300, they've been correct in the last two days and also overally. For instance I think they were the first ones mentioning the real codename Fermi.

What is clear IMO is that he had already made his mind around an idea, he didn't know who Jensen Huang is nor what GTC is, so he thought he was making his claim stronger in his second reply, while he wasn't, and he is unable to change his position after that on his next posts.

Point is, even if that info is not 100% accurate, the posibility that it could not happen that way is not enough to assure his claims. Uncertainty is never a proof of anything, and seriously I'm starting to believe I've traveled to an alien world or something, because I'm seing uncertainty used as proof everywhere: like in BM: Arkham, TWIMTBP as a whole, in the spaniard TV... It's the world becoming crazy or what?
Posted on Reply
#3
amschip
Benetanegia said:
It's not the transistor count what you have to take into account, it's 512 SPs which is 2.15x more than in GT200. That paired with all the improvements in threading and load balancing means that Fermi has probably more than twice the power than GT200 has. After reading the whitepapers, I don't think that anything of that added "cpu kind" functionality will cripple performance, on the contrary: latencies have been dramatically decreased, interconnect bandwidth increased, there are added schedulers and threads...

Regarding the last sentence that's not accurate really. If you put a GTX285 at the same clocks as the HD4870 reference clocks it would more than scale beyond, HD4890 clocks... It's just two different ways of doing things, Nvidia has had the OC advantage in almost every chip in the recent years, mainly because they aim at lower clocks to begin with. And that being said we have no clue which clocks will GT300 launch at, it could be anything between 600-800 Mhz. Lower and higher is unlikely. If it's close to 600Mhz, then GT200 would be 2x as fast as GT200, if it launched near to 800Mhz it would be much faster than that. Point is we don't know exactly how it will perform, but looking at the specs it becomes more and more evident it will not be slow.
I'm not saying it's true either, but everywhere I look its: "OMG it has 3 Billion transistors, it must be fast" :) while rv770 proved otherwise already. As for my second sentence, it's refering to the first one really. By looking at real game performance 1.4 billion against 956 million wasn't really translating into 50% more performance now was it :).
Posted on Reply
#4
HTC
I really don't care which one is faster as long as the faster one isn't way faster.

Why, you ask? Because if so, the winning one can behave much like Intel VS AMD (price wise) and, IMHO, that's a BIG no no.

Apparently (on paper), nVidia will win this round (if and when "Fermi" is launched): the question is, by how much.

As long as they are both close to each other, then we can all benefit from their price wars.
Posted on Reply
#6
kid41212003
HTC said:
I really don't care which one is faster as long as the faster one isn't way faster.

Why, you ask? Because if so, the winning one can behave much like Intel VS AMD (price wise) and, IMHO, that's a BIG no no.

Apparently (on paper), nVidia will win this round (if and when "Fermi" is launched): the question is, by how much.

As long as they are both close to each other, then we can all benefit from their price wars.
It doesn't matter how much it faster, as LONG as AMD has something afforable with good performance. This is the right way to say it.

It doesn't matter if NVIDIA put out a $700 or $1000 or $1b gpus, because those gpus are not meant to be mainstream. Those gpus are not meant to be good price/performance.

Even if the margin is small (1-5% faster), they still can sell it for x2 more the price.
The performance gap is not a problem, I repeat, as long as AMD has something good on their side, then we're (consumers) all good.
Posted on Reply
#7
HalfAHertz
Benetanegia said:
It's not the transistor count what you have to take into account, it's 512 SPs which is 2.15x more than in GT200. That paired with all the improvements in threading and load balancing means that Fermi has probably more than twice the power than GT200 has. After reading the whitepapers, I don't think that anything of that added "cpu kind" functionality will cripple performance, on the contrary: latencies have been dramatically decreased, interconnect bandwidth increased, there are added schedulers and threads...

Regarding the last sentence that's not accurate really. If you put a GTX285 at the same clocks as the HD4870 reference clocks it would more than scale beyond, HD4890 clocks... It's just two different ways of doing things, Nvidia has had the OC advantage in almost every chip in the recent years, mainly because they aim at lower clocks to begin with. And that being said we have no clue which clocks will GT300 launch at, it could be anything between 600-800 Mhz. Lower and higher is unlikely. If it's close to 600Mhz, then GT200 would be 2x as fast as GT200, if it launched near to 800Mhz it would be much faster than that. Point is we don't know exactly how it will perform, but looking at the specs it becomes more and more evident it will not be slow.

EDIT: Before this becomes a discussion, I'm not fighting with you at all. I'm just stating the posibilities, answering your questions trying to offer the different angles.



I don't believe 100% either, and I'm not saying that's going to be true. But what I do think is that writen words that are claimed to come from a CEO >>>>>>>>> speculation and thoughts of a member with no info to back his claims. So since all this is speculation, and all of us are talking from speculation, I put both things in a balance and I have no doubts as to which posible, especulated, reality is the one with more probabilities. Specially since most of the other info there regarding GTC is true. Even if Fudzilla is not the most believable source, truth is that with GT300, they've been correct in the last two days and also overally. For instance I think they were the first ones mentioning the real codename Fermi.

What is clear IMO is that he had already made his mind around an idea, he didn't know who Jensen Huang is nor what GTC is, so he thought he was making his claim stronger in his second reply, while he wasn't, and he is unable to change his position after that on his next posts.

Point is, even if that info is not 100% accurate, the posibility that it could not happen that way is not enough to assure his claims. Uncertainty is never a proof of anything, and seriously I'm starting to believe I've traveled to an alien world or something, because I'm seing uncertainty used as proof everywhere: like in BM: Arkham, TWIMTBP as a whole, in the spaniard TV... It's the world becoming crazy or what?
These are all valid points, but you have to remember one thing - usally complex and big chips don't like high frequencies. We should really wait and see the final specs. I'm pretty sure it will be faster than 5870, still the question is how much faster exactly.
Posted on Reply
#8
KainXS
The way I am looking at it Nvidia did the smart thing, instead of relying on the old MADD architecture they finally upgraded to something better, as for frequencies nobody knows but one thing is for sure ATI doubled the specs of their 5870 over the old gen and got what, a 40% performance boost across the board, Nvidia did the same exact thing with their GT200's and got the same result nearly, because they kept the same old architecture so I think Nvidia made the right move this time, I am expecting more from the GT300's than I did for the HD5870.

Mimd based architectures are going to be the way of the future, no the old Scalar architecture.


I was really really suprised when I saw how small the card was though, I was just amazed, because Nvidia usually makes the High end cards very long but this time they are moving in the right direction.
Posted on Reply
#9
OnBoard
Animalpak said:


Just 8 pin power :confused: And 2 pins more? No way will it run with just one power plug, or it's a miracle card.

But looks really nice, about time people get a bit of bling too for how much they have to pay :)
Posted on Reply
#10
newtekie1
Semi-Retired Folder
You have to remember that 8-pin isn't just the addition of 2 extra pins(really those are just ground pins anyway, so they could have been left off). The real change with the 8-pin introduction was the doubling of the power provided according to the specifications. The addition of the two pins doesn't really do anything, I think it was just done to make it easy to tell the difference in supplied/required power.
Posted on Reply
#11
OnBoard
newtekie1 said:
You have to remember that 8-pin isn't just the addition of 2 extra pins(really those are just ground pins anyway, so they could have been left off). The real change with the 8-pin introduction was the doubling of the power provided according to the specifications. The addition of the two pins doesn't really do anything, I think it was just done to make it easy to tell the difference in supplied/required power.
Yep (well more ground should allow more amps from 12v), but in that picture there is 8 pins + 2 pins.

GTX 280 needs 8pins+6pins and this is more that double the transistors, that's why I'm not buying it, even if it is smaller manufacturing process.
Posted on Reply
#12
Benetanegia
amschip said:
I'm not saying it's true either, but everywhere I look its: "OMG it has 3 Billion transistors, it must be fast" :) while rv770 proved otherwise already. As for my second sentence, it's refering to the first one really. By looking at real game performance 1.4 billion against 956 million wasn't really translating into 50% more performance now was it :).
It depends. This is the average of all games performance at 2560x1600 from Wizzard's HD5870 review.



Compared to HD5870 the GTX285 is doing 78% and HD4870 is doing 50%, if we normalize 50% to being 100% and take it as the base, then:

78/50 * 100 = 156%

That is at 2560x1600 the GTX285 is 56% faster than HD4870 in the average of all the games that Wizzard reviews.

But wait!! Ati had another 956 million transistor card, using the same chip the HD4850, we apply the same math and that gives us that GTX285 is 95% faster or almost twice as fast. 40% more transistors and 2x the performance not too shaby isn't it? GTX285's clock is 648 Mhz, HD4870 is 750 Mhz and HD4850 is 625 Mhz.

Comparing the card at 2560x1600 does make sense, because a lot of that extra 40% transistors went to the extra 16 ROPs that help at that resolution.

What I mean with all this is, it depends.
Posted on Reply
#13
jaredpace
you know, benetanegia,

i bet a gtx380 scores "125%" on that chart in W1zzards review in December :)
Posted on Reply
#14
btarunr
Editor & Senior Moderator
I'm predicting that figure to be 115~120% on that chart.
Posted on Reply
#15
Benetanegia
Since the lower numbers have been taken I'll say 135-140%. We have a poll going on here. :D
Posted on Reply
#16
wolf
Performance Enthusiast
Benetanegia said:
It depends. This is the average of all games performance at 2560x1600 from Wizzard's HD5870 review.

http://img.techpowerup.org/091001/perfrel_2560.gif

Compared to HD5870 the GTX285 is doing 78% and HD4870 is doing 50%, if we normalize 50% to being 100% and take it as the base, then:

78/50 * 100 = 156%

That is at 2560x1600 the GTX285 is 56% faster than HD4870 in the average of all the games that Wizzard reviews.

But wait!! Ati had another 956 million transistor card, using the same chip the HD4850, we apply the same math and that gives us that GTX285 is 95% faster or almost twice as fast. 40% more transistors and 2x the performance not too shaby isn't it? GTX285's clock is 648 Mhz, HD4870 is 750 Mhz and HD4850 is 625 Mhz.

Comparing the card at 2560x1600 does make sense, because a lot of that extra 40% transistors went to the extra 16 ROPs that help at that resolution.

What I mean with all this is, it depends.
I'm really starting to enjoy your posts Benetanegia, I'm glad you found TPU, or that TPU found you :)

EDIT: as for the poll ill go for 130% flat :)
Posted on Reply
#17
jaredpace
Probably right btarunr, 20% faster, and 60% later than a 5870 sounds about right.

:-)
Posted on Reply
#18
yogurt_21
Benetanegia said:
It depends. This is the average of all games performance at 2560x1600 from Wizzard's HD5870 review.

http://img.techpowerup.org/091001/perfrel_2560.gif

Compared to HD5870 the GTX285 is doing 78% and HD4870 is doing 50%, if we normalize 50% to being 100% and take it as the base, then:

78/50 * 100 = 156%

That is at 2560x1600 the GTX285 is 56% faster than HD4870 in the average of all the games that Wizzard reviews.

But wait!! Ati had another 956 million transistor card, using the same chip the HD4850, we apply the same math and that gives us that GTX285 is 95% faster or almost twice as fast. 40% more transistors and 2x the performance not too shaby isn't it? GTX285's clock is 648 Mhz, HD4870 is 750 Mhz and HD4850 is 625 Mhz.

Comparing the card at 2560x1600 does make sense, because a lot of that extra 40% transistors went to the extra 16 ROPs that help at that resolution.

What I mean with all this is, it depends.
285 was a revision, you need to redo your numbers using the 280 to start your theory. which btw is flawed as it's assuming since the 5870 is twice the speed of the 4870 that the gt300 will be twice the speed of the gt200. it could be more than twice the speed, it could be less. we have zero numbers to go on atm, just paper specs.
Posted on Reply
#19
btarunr
Editor & Senior Moderator
OnBoard said:
Just 8 pin power :confused: And 2 pins more? No way will it run with just one power plug, or it's a miracle card.
One 6-pin connector on the 'top' (placeholder for 8-pin), one 8-pin one at the 'rear'.

Posted on Reply
#20
lemonadesoda
Half Speed IEEE 754 Double Precision floating point
This is extraordinary performance... assuming is means what it is suggesting. This thing will walk the floor on CUDA, math and Physx. A new world order on computational accelerators has just opened.

I predict RIP for http://www.clearspeed.com/

PS. Who prefers the "matte" look of the ATI, or the "glossy" look of the nV?
Posted on Reply
#23
15th Warlock
lemonadesoda said:

PS. Who prefers the "matte" look of the ATI, or the "glossy" look of the nV?
Doesn't really matter that much to me, all those pretty stickers and glossy finishes will be facing down all the time anyway... :p

I wonder why no one has come with a killer backplate design, I mean, I know it wouldn't have any practical function, but wouldn't it be nice to have something you can actually see when you stare at your case's window instead of a PCB or an all black backplate?... :confused:
Posted on Reply
#24
Benetanegia
yogurt_21 said:
285 was a revision, you need to redo your numbers using the 280 to start your theory. which btw is flawed as it's assuming since the 5870 is twice the speed of the 4870 that the gt300 will be twice the speed of the gt200. it could be more than twice the speed, it could be less. we have zero numbers to go on atm, just paper specs.
Why do I have to redo the numbers? GTX285 and GTX280 use the exact same architecture, only difference is clocks, it's absolutely irrelevant which one I use, when my point is precisely that clocks matter. GT200 didn't offer less performance per billion transistor than RV770, even when RV770 runs 100 Mhz faster GTX285 performs significantly better than the % increase in transistors. At similar clocks the 1.4 billion card performs almost twice as fast as the 1 billion card. So that refutes the claim of "same performance, more transistors".

Anyway we are not comparing GT200/RV770, we are talking about Fermi. Fermi has 3 billion transistors and just like GT200 it will use every one of them in being significantly faster than RV870. I'm not assuming it will be twice as fast as GT200, although I know it will be somewhere around. Twice as fast would have been if we said that in the chart it would be 156%, I said 135-140% quite different. Looking at the papers, we don't have any reason to think it will be twice as fast, what we have is some reasons to think it will be 3x faster, but we are saying it will be less than twice.
Posted on Reply
#25
imperialreign
Benetanegia said:
Fixed. He did said about the top to bottom release. What I linked is the second (out 6) consecutive post made in FUD at the hours that the keynote has taken place, the first one says:
I don't care what Fud said about the keynote address - again, I do not find them to be a reliable source. Just because Fud claims a CEO says something, does not make it true.

2nd day into the conference, and I have yet to see any mention of such a release strategy on the on-running GTC blog site, nor was it mentioned in Jensen's keynote address summary . . . as well, no other tech site reporting on the conference have mentioned the GT300 release strategy.

Sorry to play devil's advocate here, but I'd think news such as that (along with the presentation of the GT300) would be big enough that other sites would've coughed it up too . . . not just Fud; and I'm 100% sure Fud is not the only "major" tech site with representatives on hand. If you've payed any attention to the tech industry over the last 10-20 years (which, I defi get the feeling you have), you'd know that upcoming hardware release strategies (especially for the GPU markets) are like crack to the tech communities - right alongside the spec and pricing sheets.

Such a release strategy just does not make sense for nVidia, especially considering that ATI have already made it to market with their new series, allowing ATI to gain the upper hand in the pricing game . . . not to mention that everyone knows hemlock is waiting in the wings, and ATI won't drop that bombshell until nVidia have stepped into the ring . . . this is the same market strategy both companies have used since the days of the X1000/7800 series. That being said, until I see it on shelves - I'll believe it when I see it.
I don't believe 100% either, and I'm not saying that's going to be true. But what I do think is that writen words that are claimed to come from a CEO >>>>>>>>> speculation and thoughts of a member with no info to back his claims. So since all this is speculation, and all of us are talking from speculation, I put both things in a balance and I have no doubts as to which posible, especulated, reality is the one with more probabilities. Specially since most of the other info there regarding GTC is true. Even if Fudzilla is not the most believable source, truth is that with GT300, they've been correct in the last two days and also overally. For instance I think they were the first ones mentioning the real codename Fermi.
So, seeing as how eeverything is all just speculation . . . then it's safe to assume your initial interpretations of Fud's article are merely speculation as well? I mean, you yourself claim you don't believe Fud 100% either, yet you've twisted your understanding of the article to serve your needs . . . and you have the audacity to claim that I'm warping the message?

Again, this is the tech market we're talking about - all "rumors," whether regurgitated from news sites, or spewed from the manufacturer's mouth - are all to be taken with a grain of salt. There's a lot of sandbagging in the industry, and a lot of smoke & mirrors, too. Things can and do change overnight.
What is clear IMO is that he had already made his mind around an idea, he didn't know who Jensen Huang is nor what GTC is, so he thought he was making his claim stronger in his second reply, while he wasn't, and he is unable to change his position after that on his next posts.
I love your pretentious level of assumption. Simply because I disagree with an unbacked statement you made . . . brilliant.

You still fail to realize that again you yourself have twisted the claim that Fud made . . . here, let me help break down simple english context for you, seeing as how you obiviously don't get it . . . now, if english is not your primary language, that's cool - I apologize . . . if not . . . then, that's just sad:
Nvidia will implement a top-to-bottom release strategy from high end to entry level. (the claim that is being disputed) While he didn't talk about it during the keynote presentation meaning, Fud states that Jensen DID NOT mention the GT300 release strategy during his keynote address), this release strategy also includes a high end dual-GPU configuration that should ship around the same time ("around the same time" - meaning "not at the same time, but possibly within a month or two") as the high end single-GPU model (which contradicts the first sentence in this sentence, in that the dual-GPU will not ship at the same time as the single-GPU . . . which means it won't be a "top-to-bottom" release, as the dual-GPU would be released first).
Does that make it more understandable?
Point is, even if that info is not 100% accurate, the posibility that it could not happen that way is not enough to assure his claims. Uncertainty is never a proof of anything, and seriously I'm starting to believe I've traveled to an alien world or something, because I'm seing uncertainty used as proof everywhere: like in BM: Arkham, TWIMTBP as a whole, in the spaniard TV... It's the world becoming crazy or what?
Sounds like sandbagging to me . . . as you've come across quite unsure yourself. It's always amazed me how "holier-than-thou" people are willing to come across, but seem to be under the impression that their shit doesn't stink.


I've said my piece, I'm done with this discussion, debate, arguement, disagreement or whatever you want to call it.

I'll let time prove who's right and who's wrong. :toast:
Posted on Reply
Add your own comment