Friday, March 19th 2010

NVIDIA Claims Upper Hand in Tessellation Performance

A set of company slides leaked to the press reveals that NVIDIA is claiming the upper hand in tessellation performance. With this achievement, NVIDIA is looking to encourage leaps in geometric detail, probably in future games that make use of tessellation. NVIDIA's confidence comes from the way its GF100 GPU is designed (further explained here). Each GF100 GPU physically has 16 Polymorph Engines, one per streaming multiprocessor (SM) which helps in distributed, parallel geometry processing. Each Polymorph Engine has its own tessellation unit. With 15 SMs enabled on the GeForce GTX 480 and 14 on the GeForce GTX 470, there are that many independent tessellation units.

NVIDIA demonstrated its claims in the presentation using the Unigine Heaven, where the GeForce GTX 480 was pitted against a Radeon HD 5870. In many scenes where tessellation is lower, the GPUs performed neck-and-neck, with the GTX 480 performing better more often. But in scenes with heavy tessellation (particularly the "dragon" scene, where a highly detailed model of a dragon needs to be rendered with densely tessellated meshes, the GTX 480 clocks nearly a 100% performance increment over the HD 5870. NVIDIA has been confident about the tessellation performance back since January, when it detailed the GF100 architecture. The GeForce GTX 400 series graphics cards will be unveiled on the 26th of March.


Images Courtesy: Techno-Labs
Add your own comment

145 Comments on NVIDIA Claims Upper Hand in Tessellation Performance

#1
SNiiPE_DoGG
the54thvoid said:
Something that will speed up loading times on BC2? DX11 and ATI+ BC2 = Slow load. I always miss the vehicles at map starts...
Thats really an HDD/RAM issue mate, dx11 has to load more data of course and it emphasizes any data transfer speed bottlenecks you may have.
Posted on Reply
#2
the54thvoid
I've got 6GB od DDR3 triple Channel on an i7920. Samsung spinpoint HDD. My pc is actually quite fast.

This link covers the problem well.

http://forums.electronicarts.co.uk/battlefield-bad-company-2-pc/928007-bc2-pc-still-slow-loading-map-good-system.html

It's to do with the drivers having to reload the map shaders each time.... NV have it sussed.

I've since changed the settings.ini file to force DX9 and it loads far quicker now. Bah ATI drivers.
Posted on Reply
#4
araditus
"We can't comment on the authenticity of the results, but the figures reinforce the belief that NVIDIA's GeForce GTX 480 will become the fastest single-GPU graphics card on offer."

scroll down on that hexus chart, As most have said, nothing is real till the 26th, and its only 1 game (not to mention one of the most GPU bias-ed games ever), with an unknown system and Ive heard of this cool program called photoshop
Posted on Reply
#5
DeathByTray
The slow loading times in BC2 DX11 is a known issue. ATI is working on it.
Posted on Reply
#6
Marineborn
get a quad core or more itll help your load times atleast it does mine, i get into games in under 30 seconds, i only have a 7200rpm hardrive
Posted on Reply
#7
CyberCT
Well I guess the only way to know if Nvidia is worth it or not this round is within a week. I was thinking about upgrading to Windows 7 next month but we will see. Maybe I'll wait until the DX11 cards compete and drop in prices a bit more.

I was actually spending a lot more time looking into the whole eyefinity thing. The videos on youtube that show no bezels actually do look pretty cool. I guess with their recording software or whatever they were able to zoom i a certain way to eliminate the bezels of their actual trimonitor setup. Over the last few years I remembered news of some developers coming out with large widescreen monitors that don't have bezels in between what would be regular monitors' resolutions in the same setup. Specifically the one from Alienware and I forget the other one. But after some research they both cost almost $8,000 ... rediculous. The bezels are still thick enough not to be seamless at a good pricepoint with current monitors on the market. Heck, even the Samsung 3 or 6 monitor display that's supposedly made with eyefinity in mind costs a heck of a lot of money.

With much less than that money, I could buy a 1080p DLP projector and project a 130" screen which would wipe the floor with eyefinity. I'm not saying ATI's technology is bad, but in the future where bezels either get ultra thin or merged monitors enter the market with no bezels, then maybe it would make more sense. I have yet to find a 1080p 3D projector for a great price, but I'm sure it would come out within the next few years. I still feel 3D is awesome for games and movies, and I have yet to hear the opposite from anyone that comes to my place to see it once in a while. Maybe you guys that are naysayers don't have it set up correctly or your display is too small?
Posted on Reply
#8
TheMailMan78
Big Member
Benetanegia said:
Dirt2 on DX11, which means Tesselation, where we know Nvidia is much much faster. Dirt2 has little tesselation, maybe not even worth mentioning performance wise, but everything adds up.:
And you know Dirt 2 was developed on ATI hardware much like Farcry 2 was developed on Nvidia hardware so HOW do you figure this chart makes any sense? Especially since Nividia doesn't even have an DX11 card out yet?!? Thats a MAJOR flaw in your logic. Are these charts real? Could be. However if they showed the new Nvidia cards in a negative light I wonder if you would be so protective.
Posted on Reply
#9
Benetanegia
the54thvoid said:
Ben?

Found something over at Hexus.

http://www.hexus.net/content/item.php?item=23032

Those damn graphs we're talking about earlier are Nvidia's own in house benchies.

Nuff said.
First of all, sorry for the confusion with the language thing, you have to admit there's many language Nazis out there, but I can see that I overreacted. I was trying to make a satiric comment about something that happens a lot, but after reading it I see I failed at my intention.

Geforce GTX 480 TDP at 250W - http://www.fudzilla.com/content/view/18170/1/

That's why you can't believe any rumors and much less make any assumptions based on them. Which one is truth now? 300w? 250? You see?

TheMailMan78 said:
And you know Dirt 2 was developed on ATI hardware much like Farcry 2 was developed on Nvidia hardware so HOW do you figure this chart makes any sense? Especially since Nividia doesn't even have an DX11 card out yet?!? Thats a MAJOR flaw in your logic. Are these charts real? Could be. However if they showed the new Nvidia cards in a negative light I wonder if you would be so protective.
One word TESSELATION.

Regarding the issue with the language, I admit I overreacted, but it has nothing to do with arrogance, not mine at least. If at all it's the (probably unconscious) arrogance of the average english speaking people who always find the time to correct others people's grammar who made me write about that. That was not the case (he was not correcting), and I apologyse to the54thvoid, but it's absolutely NOT YOUR issue. I'm starting to think you are in love with me anyway, because you step up on my discussions on every oportunity you find and well, I'm not used to so much attention (it's usually me who has to go after women). Sorry, and I will do use my arrogance this time, but you are not my type.
Posted on Reply
#10
PCpraiser100
Will the world of computing ever know that a) there are still driver issues with ATI b) The fact that Nvidia's upcoming series will kill your wallet.

I'm being patient with the upcoming Catalyst, are you?
Posted on Reply
#11
EchoMan
PCpraiser100 said:
Will the world of computing ever know that a) there are still driver issues with ATI

I'm being patient with the upcoming Catalyst, are you?
Not to sound nasty but I've heard the same notion ever since ATI could be found on the map (9000 series)... Pretty sure it's known.
Posted on Reply
#12
jellyrole
Every release their drivers improve. Give them props for having shitty drivers, which are improving, yet having such amazing hardware results that makes up for the poorly performing drivers. It shows that they're doing something right and that they're improving on something else.
Posted on Reply
#13
Zubasa
PCpraiser100 said:
Will the world of computing ever know that a) there are still driver issues with ATI b) The fact that Nvidia's upcoming series will kill your wallet.

I'm being patient with the upcoming Catalyst, are you?
You forgot c) nVidia's Drivers simply destroys your card :p
Posted on Reply
#14
TheMailMan78
Big Member
Benetanegia said:
One word TESSELATION.
And you base this off of synthetic benches?

I ask because you know ATI and Nvidia handle tessellation differently and therefore games that are written based off their respected hardware will perform differently. To say Nvidia does tessellation "better" is ignorant of the process.

I found this post on another forum. It may explain things better for you.
What people are calling the tesselator on Direct3D 11 compatible hardware, are in fact 3 different things: 1. The Hull Shader, 2. The Tesselator, 3. Domain Shader

To achive the effect of tesselated geometry you have to use the 3 stages in the pipeline. The hardware tesselator in the ATI card is only the number 2 item, that fits between 2 new programable software shader stages. There is not enough info on the Nvidia card to be sure if it really has a hardware tessellator or if that stage is also executed on the programable cores using software as was implied until recently by Charlie.

To see in those benchmark graphics the drop in perfomance when mesh tesselation gets upped is not incompatible with the idea that the tesselator is working according to spec, on the ATI card, and I'll try to explain why.

The main objective of the tesselator usage, is to avoid having to do heavy vertex interpolation on animated meshes on the joints with lots of bones that are needed to do realistic animation. The same applies with doing vertex interpolation when doing mesh morphing with lots of weights in facial animation. Those are heavy calculations that would get heavily multiplied when using finer meshes with lots more of finer details. It's someting that is not linear, and doubling the vertex count on U and V directions on a square patch, would square the number of vertices that would need to be processed at later stages. This would become a bottle neck for several reasons: a) when meshes are far away, they would not need all that detail. b) most of the detail needed would be on the silhouette of the close-up objects. In the majority of triangles/quads facing the viewer, all those vertices generated by the tesselator or of using finer detail meshes would be "lost" as redundant and uncecessary, taxing the geometry processor at the following stage. So something had to be devised, and the tessellator is the needed solution to better scale into the future.

Having said that, this does not necessarily mean that using the "tesselator" (the 3 stages of it) to tessellate a coarser geometry in real-time to produce the finer detail where it's needed, and visible, will necessarily be "free" from a perfomance point of view, even if the stage 2 tesselator (the stage that ATI chip has implemented in hardware and the only fixed funcion of the 3 stages) is doing it's job for "free". The explanation lies in the fact that for it to be active, there are 2 extra programable stages that will be using the programable cores of the ship, 1) to select where detail is or is not needed (the Hull shader) and 2) to do, for instance, displacement mapping to add aditional detail where it's really needed and visible (The Domain Shader).

When using tessellation there will be two additional programmable pipeline stages doing calculations AND the fixed function tesselator in between the two. So, even if the middle one is doing work "for free" not having a perfomance penalty on the system, the other two (Hull and Domain shaders) that form part of the tesselation system, do have impact on perfomance, because they are competing for the global unified compute resources of programable processing cores. Not counting the additional bookeeping or managing FIFO queues of the new generated vertices. It needs to be balanced, but nevertheless, the cost of using the tesselator will be a lot less than sending a much detailed finer mesh, and having to animate all those irrelevant vertices in most cases. This gives selective detail only when/where needed, and allows developpers to deploy the same meshes as assets to achieve various degrees of detail depending on the computing resources of each card, from low to hi-end. Even if perfomance of using a finer detailed mesh would mean a drop to 1/4 of the FPS using the tessellator, by not using it would mean to drop to less than 1/16 the FPS.

From what I could understand of these presentation:
http://www.hardwarecanucks.com/forum...roscope-5.html
It seems that Nvidia's solution to tesselation means they are using mostly a software aproach to the 3 stages of tessellation. In the "PolyMorph engine" it's mentioned a "Tessellator", but from what I read it seems that it's hardware to improve the vertex fetching abilities. They seem to have gone from a vertex fetch per clock, on the G200b, to 8 vertices fetches per clock, using that parallel vertex fetching mechanism. It surely will improve with Tesselation by not allowing vertex fetches to be an imediate bottleneck on the system. And they will use that speed-up in 8x vertex fetching abilities to do the intermediary non-programable tessellation on sofware, which will use some of the cores to do the work that is done in hardware on the ATI's implementation.

To sumarize:
ATI : 1 software + 1 hardware + 1 software tessellation stages
Nvidia : 1 software + 1 software + 1 software tessellation stages

Nvidia compensates the lack of dedicated hardware tesselator by increasing the vertex fetch stage x8 in parallell in regard to previous generation, allowing 8 new vertices to be processed per clock.

ATI seems to do it sequentially 1 per clock (migh be wrong on this one), but does not have to allocate extra programable cores to do the fixed function part, freeing those cores to process pixels or other parts of the programable pipeline stages.

Only when Fermi is released will be possible comparisions with real usage scenarios. Only time will tell what's the best approach to that problem, and it will all depend on price/perfomance/wattage, as has been said here.

But with either solution, it would not make sense to expect constant perfomance levels (FPS) independent of the tesselation level, since there will always be the programable part of the tessellation there to steal computing resources from the other processing stages (at least if one wants to do "intelligent" selective tessellation and not by brute force).

In that sense, ATI Radeon Tessellation is NOT broken in any way. Part of it is free, the other part is not. I guess that those synthetic benchmarks like Uniengine that seem to indicate good perfomance on the Nivdia card, might be based on the fact that they are crancking up uniformly the tessellation load, and not selectively, ie, with Domain and Hull Shader off, like the ATI Radeons used to work on previous iterations. ATI opted for selective tessellation, so there might not be needed an huge vertex processing increase. Nvidia is recomending heavier tessellation, because of the paralell vertex processing (8x) it implemented, but it might come at the cost of less perfomance when doing heavy pixel computations or other work because of the decrease in remaining computational cores."
http://www.semiaccurate.com/forums/showpost.php?p=22193&postcount=33
Posted on Reply
#15
HalfAHertz
When I first heard 300w, I remembered our first microwave was about 300w in power. One thing led to another and I made the following with this morning's coffee...
Note: the following material is not to be used as flame bait. It's just a joke peeps!

Not satisfied with your purchase? You expect more than just Dx11 and Physx?

Then call now and order your very own Easy-bake solution at 1-800-Half-A-Hertz

Our technology manages to harness all that unused extra power of your new GPU and focus it in two perfectly sized hot plates

You can heat up your coffee, cook a balanced breakfast or just warp up in those cold winter days.

Brought to you by your friendly Half-A-Hertz & Co. and powered by NVIDIA T.M.
Posted on Reply
#16
afw
:roll: :roll: :roll: ... thats really ingenious :roll: :roll: :roll: ...
Posted on Reply
#17
phanbuey
TheMailMan78 said:
And you base this off of synthetic benches?

I ask because you know ATI and Nvidia handle tessellation differently and therefore games that are written based off their respected hardware will perform differently. To say Nvidia does tessellation "better" is ignorant of the process.

I found this post on another forum. It may explain things better for you.



http://www.semiaccurate.com/forums/showpost.php?p=22193&postcount=33
Keep in mind you're talking to a someone who used to make elaborate (and quite good) graphs comparing old and new nvidia shader designs and why the gtx 480 was going to beat the 5970... The Green Force is strong with this one.

No offense B, you're a smart guy, but it seems ATi definitely pee'd in your coffee at some point in time.

Got 3 days left :rockout:
Posted on Reply
#18
Benetanegia
TheMailMan78 said:
And you base this off of synthetic benches?

I ask because you know ATI and Nvidia handle tessellation differently and therefore games that are written based off their respected hardware will perform differently. To say Nvidia does tessellation "better" is ignorant of the process.

I found this post on another forum. It may explain things better for you.



http://www.semiaccurate.com/forums/showpost.php?p=22193&postcount=33
Everything based on the assumption that Fermi has no dedicated tesselator... :laugh: It has 15. Nuff said.

phanbuey said:
Keep in mind you're talking to a someone who used to make elaborate (and quite good) graphs comparing old and new nvidia shader designs and why the gtx 480 was going to beat the 5970... The Green Force is strong with this one.

No offense B, you're a smart guy, but it seems ATi definitely pee'd in your coffee at some point in time.

Got 3 days left :rockout:
Yep, in three days we will find out, but I never said it would outright beat the 5970, I said that based on the performance scaling on past cards it could beat it. I especified a performance range that went from 90% of HD5970's performance to 150% and I said I believed it to be closer to the lower end of that range (ROP bottleneck). Furthermore, those claims were based on the specs of the time, that is, 750 Mhz core and 1700 Mhz shaders. The memory has been severely crippled too, so although I didn't consider that to be a bottleneck in those old charts, now I do thing it will affect performance although just a bit. Without very heavy calculations (only 480 cores and core/shader clock adjustements) the new specs move that range to around 80-110% the performance of the 5970. My charts were based on now old HD5xxx performance, untouched by 6 months of driver improvement. Things have changed quite a bit, but that means that if Fermi ends up 15-30% faster as some of those charts suggest, I was dead spot on. 3 days...
Posted on Reply
#19
Wshlist
yay

I love it when they fight, now let's see prices drop, and availability improve for cards like the HD5850.
Posted on Reply
#20
Altered
Thats some funny shit HalfAHertz. Made me laugh for the first time today.
Posted on Reply
#21
TheMailMan78
Big Member
Benetanegia said:
Everything based on the assumption that Fermi has no dedicated tesselator... :laugh: It has 15. Nuff said.
Its not "Nuff said" in the least.
Posted on Reply
#22
evillman
Very funny discussion. Keep going. ROFL
Posted on Reply
#23
nt300
ATIs tesselators are quad pumped ;) how about Nvidia? I dont think so :D
Posted on Reply
#24
nt300
Heres a slap in the face for Nvidia :laugh:
So according to this article Nvidia Fermi fails to impress. So much for ATI dropping prices :cry:
Nvidia bends the definition of honesty in GTX480 benches:eek:
Same old same old, but this time much hotter!
http://www.semiaccurate.com/2010/03/23/nvidia-bends-definition-honesty-gtx480-benches/

NVIDIA HAS SOME interesting numbers in its GTX480 presentations, but just like the Heaven benchmark numbers, they don't seem to reflect reality. This latest 'accident' in an official presentation centers around Dirt2.

When SemiAccurate put up the first set of GTX480 benchmark numbers a month or so ago, people laughed at our claims of a small margin over a 5870 while losing badly to a 5970. The fanbois were almost apoplectic over our claims of 70C idle temps and far hotter in gaming. People with fragile egos, especially those tacked to dreams of enthusiast hardware just hate having their bubbles burst.

Then came the Nvidia benchmark snippets. First was Heaven, or the parts of it, strangely only the parts where Nvidia did well. From there, bits and pieces trickled out, all cherry picked. One that was featured in official Nvidia presentations was Dirt2, the 'ATI centric' benchmark. For some odd reason, Nvidia's GTX480 won handily, it won by so much that something seemed very odd.

As it turns out, the numbers were indeed too good to be true. If you run the Dirt2 demo, the GTX470 and GTX480 drop back to DX9 mode, as you can see it in the picture above. Since they are doing far less work, frame rates go up. Running the DX11 code path drop frames by around 25 to 40 percent for the same work. In the game itself, DX11 works just fine on the GTX480, so it is likely that the demo lacked the correct profile for the then unreleased and 6 months out GTX4x0s.

For some odd reason, that point wasn't mentioned in the Nvidia slides SemiAccurate saw. Sources deep inside Santa Clara have told SemiAccurate that this wasn't due to the TWIMTBP budget cuts, it is probably just the old 'Nvidia honesty' coming forward once again. For some reason, the real numbers that compare DX11 to DX11 versions didn't make their press presentations even though the game had been out for months by then. Funny that.

For those unwilling to take such things as hard facts into account in order to protect their egos, we submit the above screenshot of the Dirt2 demo, running on a GTX480. The card is running at the official 1848 MHz (3696MHz effective) memory clock and the shaders are at the stock 1401MHz 'hot clock'.

Please note, in this case 'hot' is more literal than figurative, the GPU here is running at 87C, far hotter than any card that expects to have a realistic life span should be at. Unconfirmed reports from China have the card hitting 98C on furmark. Don't expect these 'puppies' to have a long life, even in dog years.

In the end, the card is too hot, too slow, and unmanufacturable. We told you so. Pop goes the ego.S|A
Posted on Reply
#25
Benetanegia
nt300 said:
ATIs tesselators are quad pumped ;) how about Nvidia? I dont think so :D
Quad pumped yes, but 1 op/cycle anyway, because it needs 4 clocks to finish the work. It's the same for Nvidia (1 op/cycle) ones AFAIK and even if it wasn't the same it still has 16. If not quad pumped it would still have 4 times the tesselation capabilities.

Tesselators themselves are not the problem anyway. Triangle setup is far more important. You could have a tesselator hexa or hecta pumped (or 100 tesselators), I don't care, but it wouldn't make a difference if you can only do one tri/cycle as is the case with Evergreen. Nvidia didn't put 16 tesselators because they are needed anyway, they are there to increase availability and reduce latency by giving each SM one tesselator to operate with. Tesselation performance will still be limited primarily by the tri setup and secondly by hull/domain shaders anyway (both of which Fermi seems to deal better with thanks to its L1/L2 caches), but HD5870/HD5770 demostrate that Evergreen is heavily limited by tri setup/tesselator even with the low tesselation levels found in Stalker, Dirt2, etc and not by shader performance: when tesselation is enabled the HD5770 loses much less than it's biggest brethren.
Posted on Reply
Add your own comment