• We've upgraded our forums. Please post any issues/requests in this thread.

NVIDIA Could Ready HD 4670 Competitor

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
34,328 (9.23/day)
Likes
17,426
Location
Hyderabad, India
System Name Long shelf-life potato
Processor Intel Core i7-4770K
Motherboard ASUS Z97-A
Cooling Xigmatek Aegir CPU Cooler
Memory 16GB Kingston HyperX Beast DDR3-1866
Video Card(s) 2x GeForce GTX 970 SLI
Storage ADATA SU800 512GB
Display(s) Samsung U28D590D 28-inch 4K
Case Cooler Master CM690 Window
Audio Device(s) Creative Sound Blaster Recon3D PCIe
Power Supply Corsair HX850W
Mouse Razer Abyssus 2014
Keyboard Microsoft Sidewinder X4
Software Windows 10 Pro Creators Update
#1
GPU Café published information on future competition lineups., which shows the entry of a "GeForce 9550 GT" stacked up against the Radeon HD 4670. Sources in the media have pointed to the the possibility that the the RV730 based HD 4670 from ATI outperforms NVIDIA cards in its current lineup, relative to the segments where GeForce 9500 GT sits. The HD 4650 could exchange a few blows with the GeForce 9500 GT with equal or better levels of performance while the HD 4670 surpasses it.

Show full news post
 
Last edited:
Joined
Apr 12, 2006
Messages
614 (0.14/day)
Likes
76
Location
New Sewickley, PA
System Name six pack
Processor Phenom II X6 1090T BE
Motherboard ASRock 890FX Deluxe3
Cooling Cool-it ECO ALC Water cooler
Memory 4GB G-Skill Flares CL7
Video Card(s) 1GB HIS HD4890 930core / 1025mem 4100mhz effective
Storage 320Gb Seagate Barracuda (w/perpendiular recording) and 750GB Western Digital Caviar Black
Display(s) Acer B273HU 2048 x 1152 native resolution
Case Xoxide X-Sentric Professional Series w/250mm Fan
Audio Device(s) SB Audigy 2 Value
Power Supply 750 Watt Apevia Warlock
Software Windows Vista Home Premium 64bit (w/dreamscene patch)/Windows 7 Pro 64bit
Benchmark Scores 3DMark06:10,516 3Dmark Vantage: P8537
#2
$20 less for a slightly less powerful GPU. Honestly I think that if you will spend >100 on a GPU to begin with, the extra $20 for the 9600GT is nothing. I don't think there is much of a market for so many different versions of today's cards. There should only be a mainstream, performance, and enthusiast class product for each new generation of card. Any more products and its just going to confuse the consumer.
 
Joined
Oct 2, 2007
Messages
156 (0.04/day)
Likes
9
System Name WeaponX2
Processor AMD64 7750 KUMA
Motherboard M2N-SLI DELUXE
Cooling Artic Cooling64
Memory 2X 1GB 4-4-4-10 Corasir
Video Card(s) Asus 4870 @ 870/1150
Storage 2x WD 640GB
Display(s) 19" Samsung Syncmaster
Case Black Widow
Audio Device(s) onboard
Power Supply 550w Rosewill
Software XP PRO SP2
#3
$20 less for a slightly less powerful GPU. Honestly I think that if you will spend >100 on a GPU to begin with, the extra $20 for the 9600GT is nothing. I don't think there is much of a market for so many different versions of today's cards. There should only be a mainstream, performance, and enthusiast class product for each new generation of card. Any more products and its just going to confuse the consumer.
true but if it stays the was it is it is $40 from the 9500GT--->9600GT
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
24,275 (5.51/day)
Likes
10,363
Location
Indiana, USA
Processor Intel Core i7 4790K@4.6GHz
Motherboard AsRock Z97 Extreme6
Cooling Corsair H100i
Memory 32GB Corsair DDR3-1866 9-10-9-27
Video Card(s) ASUS GTX960 STRIX @ 1500/1900
Storage 480GB Crucial MX200 + 2TB Seagate Solid State Hybrid Drive with 128GB OCZ Synapse SSD Cache
Display(s) QNIX QX2710 1440p@120Hz
Case Corsair 650D Black
Audio Device(s) Onboard is good enough for me
Power Supply Corsair HX850
Software Windows 10 Pro x64
#4
I'm guessing the 9550GT is going to be the 55nm part with clocks slightly increased. The 9500GT should be able to handle the 4650 and the Pre-Overclocked 9500GT's should be able to handle the 4670.

I think nVidia is just going overboard with adding the 9550GT. They should have just left the 9500GT for $20 cheaper and let their partners Pre-Overclock the cards to make up the difference in performance and price.
 

candle_86

New Member
Joined
Dec 28, 2006
Messages
3,914 (0.98/day)
Likes
227
#5
the 9600GSO is an even 100 these days, your stupid to get one of these with the 9600GSO out there
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
24,275 (5.51/day)
Likes
10,363
Location
Indiana, USA
Processor Intel Core i7 4790K@4.6GHz
Motherboard AsRock Z97 Extreme6
Cooling Corsair H100i
Memory 32GB Corsair DDR3-1866 9-10-9-27
Video Card(s) ASUS GTX960 STRIX @ 1500/1900
Storage 480GB Crucial MX200 + 2TB Seagate Solid State Hybrid Drive with 128GB OCZ Synapse SSD Cache
Display(s) QNIX QX2710 1440p@120Hz
Case Corsair 650D Black
Audio Device(s) Onboard is good enough for me
Power Supply Corsair HX850
Software Windows 10 Pro x64
#6
the 9600GSO is an even 100 these days, your stupid to get one of these with the 9600GSO out there
You have to kind of ignore the prices in the picture, the 9500GT doesn't retail for $109, so the 9550GT won't be $129. I wonder if these prices are even USD? A 9500GT is more around $60 for the DDR2 version and $80($75 with Mail-In) for the DDR3 version.

I am in agreement with you though, the 9600GSO can be had for $90 even with free shipping right now from newegg. So, IMO, these lower class cards aren't worth saving the $10-20. The 9600GSO is even cheaper if you consider rebates, they can be had for $80.
 

candle_86

New Member
Joined
Dec 28, 2006
Messages
3,914 (0.98/day)
Likes
227
#7
agreed, I'd love to know when we started getting 2 low end series cards. The 9500GT should be set lower MSRP, and the 9400GT abdononed
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
24,275 (5.51/day)
Likes
10,363
Location
Indiana, USA
Processor Intel Core i7 4790K@4.6GHz
Motherboard AsRock Z97 Extreme6
Cooling Corsair H100i
Memory 32GB Corsair DDR3-1866 9-10-9-27
Video Card(s) ASUS GTX960 STRIX @ 1500/1900
Storage 480GB Crucial MX200 + 2TB Seagate Solid State Hybrid Drive with 128GB OCZ Synapse SSD Cache
Display(s) QNIX QX2710 1440p@120Hz
Case Corsair 650D Black
Audio Device(s) Onboard is good enough for me
Power Supply Corsair HX850
Software Windows 10 Pro x64
#8
There has always been 4 basic levels. The extreme low end, low end, mid-range, high end.

With the 8 Series:
8400, 8500, 8600, 8800
With the 7 series:
7100/7200, 7300, 7600, 7800/7900
With the 6 series:
6200-TC, 6200, 6600, 6800
With the 5 series:
5200, 5500, 5600/5700, 5800/5900

Though, in todays market, I don't see a place for the extreme low end anymore.
 

candle_86

New Member
Joined
Dec 28, 2006
Messages
3,914 (0.98/day)
Likes
227
#9
There has always been 4 basic levels. The extreme low end, low end, mid-range, high end.

With the 8 Series:
8400, 8500, 8600, 8800
With the 7 series:
7100/7200, 7300, 7600, 7800/7900
With the 6 series:
6200-TC, 6200, 6600, 6800
With the 5 series:
5200, 5500, 5600/5700, 5800/5900

Though, in todays market, I don't see a place for the extreme low end anymore.

yes and now actully.

For Starters the 5500 came out after the 5200 to replace the 5200 Ultra that was more expensive to produce.

The 6200TC is the same low end generation as the normal 6200, the normal PCIe 6200 was just so they had something there, the 6200TC replaced it

The 7100/7200 line granted where lower end, though the 7100GS was faster than the 7200GS, and the 7200GS was just to get rid of NV44 cores. Though it started here.

Personally I want it simple again.

Geforce MX for low end

Geforce TI for high end.

GeforceMX 220, GeforceMX 240, GeforceMX 260

GeforceTI 220, GeforceTI 240, GeforceTI 260,

that would simpliy life enough for me
 
Joined
Mar 1, 2008
Messages
242 (0.07/day)
Likes
46
Location
Antwerp, Belgium
Processor Intel Xeon X5650 @ 3.6Ghz - 1.2v
Motherboard Gigabyte G1.Assassin
Cooling Thermalright True Spirit 120
Memory 12GB DDR3 @ PC1600
Video Card(s) nVidia GeForce GTX 780 3GB
Storage 256GB Samsung 840 Pro + 3TB + 3TB + 2TB
Display(s) HP ZR22w
Case Antec P280
Audio Device(s) Asus Xonar DS
Power Supply Antec HCG-620M
Software Windows 7 x64
#10
I'm guessing the 9550GT is going to be the 55nm part with clocks slightly increased. The 9500GT should be able to handle the 4650 and the Pre-Overclocked 9500GT's should be able to handle the 4670.

I think nVidia is just going overboard with adding the 9550GT. They should have just left the 9500GT for $20 cheaper and let their partners Pre-Overclock the cards to make up the difference in performance and price.
I really don't think so.
The 9500 is basically a higher clocked 8600.
The HD4600 is basically a HD3870 with a 128bit bus (+faster AA unit).
Considering that these cards are mainly used by users who own 19" monitors (~1280x1024), the low memory bandwidth won't be a major criteria.
For refrence, a HD3850 is around 2x faster than a 8600GTS @ 1280x1024.
The HD4670 will have 480GFlops (peak) and 9500GT has around 132GFlops (peak - depending on the model). You can't close that gap with an overclock.
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
24,275 (5.51/day)
Likes
10,363
Location
Indiana, USA
Processor Intel Core i7 4790K@4.6GHz
Motherboard AsRock Z97 Extreme6
Cooling Corsair H100i
Memory 32GB Corsair DDR3-1866 9-10-9-27
Video Card(s) ASUS GTX960 STRIX @ 1500/1900
Storage 480GB Crucial MX200 + 2TB Seagate Solid State Hybrid Drive with 128GB OCZ Synapse SSD Cache
Display(s) QNIX QX2710 1440p@120Hz
Case Corsair 650D Black
Audio Device(s) Onboard is good enough for me
Power Supply Corsair HX850
Software Windows 10 Pro x64
#11
I really don't think so.
The 9500 is basically a higher clocked 8600.
The HD4600 is basically a HD3870 with a 128bit bus (+faster AA unit).
Considering that these cards are mainly used by users who own 19" monitors (~1280x1024), the low memory bandwidth won't be a major criteria.
For refrence, a HD3850 is around 2x faster than a 8600GTS @ 1280x1024.
The HD4670 will have 480GFlops (peak) and 9500GT has around 132GFlops (peak - depending on the model). You can't close that gap with an overclock.
You are wrong, the 128-bit bus makes a huge performance hit.

The HD4670 is just an overclocked HD4650. All the information we have seen says the 9500GT matches the HD4650, so an overclocked 9500GT should be able to match an HD4670.

And the FLOPS rating of either card doesn't matter one bit, and has no real affect on graphical performance. If it did, we wouldn't see the 9600GT rated at 208 GFLOPS outperforming the HD3870 rated at 496 GFLOPS.
 
Joined
Feb 18, 2006
Messages
5,100 (1.18/day)
Likes
1,255
Location
AZ
System Name Thought I'd be done with this by now
Processor i7 4790K 4.4GHZ turbo currently at 4.6GHZ at 1.16v
Motherboard MSI Z97-G55 SLI
Cooling Scythe Mugen 2 rev B (SCMG-2100), stock on gpu's.
Memory 8GB G.SKILL Ripjaws Z Series DDR3 2400MHZ 10-12-12-31
Video Card(s) EVGA GTX 760 Superclocked replaced HIS R9 290 that was artifacting
Storage 1TB MX300 M.2 OS + Games, 4x ST31000524NS in Raid 10 Storage and Backup, external 2tb backup,
Display(s) BenQ GW2255 surprisingly good screen for the price.
Case Raidmax Scorpio 668
Audio Device(s) onboard HD
Power Supply EVGA 750 GQ
Software Windows 10
Benchmark Scores no one cares anymore lols
#12
You are wrong, the 128-bit bus makes a huge performance hit.

The HD4670 is just an overclocked HD4650. All the information we have seen says the 9500GT matches the HD4650, so an overclocked 9500GT should be able to match an HD4670.

And the FLOPS rating of either card doesn't matter one bit, and has no real affect on graphical performance. If it did, we wouldn't see the 9600GT rated at 208 GFLOPS outperforming the HD3870 rated at 496 GFLOPS.
yeah and I thought the 4670 has 12rop's which would further lower performance. or am I wrong on that?
 

WarEagleAU

Bird of Prey
Joined
Jul 9, 2006
Messages
10,809 (2.59/day)
Likes
529
Location
Gurley, AL
System Name Boddha Getta Boddha Getta Bah!
Processor AMD FX 6100 @ 4.432Ghz @1.382
Motherboard ASUS M5A99X EVO AMD 990X AMD SB950
Cooling Custom Water. EK 240MM Kit, Supreme HSF - Runs 35C
Memory 2 x 4GB Corsair Vengeance White LP @ 1.35V
Video Card(s) XFX Radeon HD 6870 980/1100
Storage WD Caviar Black 1.0TB, WD Caviar Green 1.0TB, WD 160GB
Display(s) Asus VH222/S 22: (21.5" Viewable) 1920x1080p HDMI LCD Monitor
Case NZXT White Switch 810
Audio Device(s) Onboard Realtek 5.1
Power Supply NZXT Hale 90 Gold Cert 750W Modular PSU
Software Windows 8.1 Profession 64 Bit
#13
I dont think you are wrong. I also dont think the 9500GT is even par with the HD 4650. But good luck to me trying to convince you of that.
 

candle_86

New Member
Joined
Dec 28, 2006
Messages
3,914 (0.98/day)
Likes
227
#14
320x16x8 is the RV730 core config, thats 8 rops, and 16 TMU's with 64x5 ALU's which only 1 set will really be used in games of course.

So 64x750 = 48,000

now the 9500GT 32x16x8 core config, now the shader config is 32 and only 32 but all those will be used, unlike the extra ALU's on the 4650.

so 32x1400 = 44,800

numbers are fairly close on shader op's per second for most games actully.

so you tell me can the 9500GT keep up?

what ATI needs is a 9600GT killer the RV670 is supposed to stop production soon leaving nothing to compete
 
Joined
Mar 1, 2008
Messages
242 (0.07/day)
Likes
46
Location
Antwerp, Belgium
Processor Intel Xeon X5650 @ 3.6Ghz - 1.2v
Motherboard Gigabyte G1.Assassin
Cooling Thermalright True Spirit 120
Memory 12GB DDR3 @ PC1600
Video Card(s) nVidia GeForce GTX 780 3GB
Storage 256GB Samsung 840 Pro + 3TB + 3TB + 2TB
Display(s) HP ZR22w
Case Antec P280
Audio Device(s) Asus Xonar DS
Power Supply Antec HCG-620M
Software Windows 7 x64
#15
320x16x8 is the RV730 core config, thats 8 rops, and 16 TMU's with 64x5 ALU's which only 1 set will really be used in games of course.

So 64x750 = 48,000

now the 9500GT 32x16x8 core config, now the shader config is 32 and only 32 but all those will be used, unlike the extra ALU's on the 4650.

so 32x1400 = 44,800

numbers are fairly close on shader op's per second for most games actully.

so you tell me can the 9500GT keep up?

what ATI needs is a 9600GT killer the RV670 is supposed to stop production soon leaving nothing to compete
What you write here is pure nonsence. You need to study more into this matter before making such a statement.
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.

@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9

You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
24,275 (5.51/day)
Likes
10,363
Location
Indiana, USA
Processor Intel Core i7 4790K@4.6GHz
Motherboard AsRock Z97 Extreme6
Cooling Corsair H100i
Memory 32GB Corsair DDR3-1866 9-10-9-27
Video Card(s) ASUS GTX960 STRIX @ 1500/1900
Storage 480GB Crucial MX200 + 2TB Seagate Solid State Hybrid Drive with 128GB OCZ Synapse SSD Cache
Display(s) QNIX QX2710 1440p@120Hz
Case Corsair 650D Black
Audio Device(s) Onboard is good enough for me
Power Supply Corsair HX850
Software Windows 10 Pro x64
#16
What you write here is pure nonsence. You need to study more into this matter before making such a statement.
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.

@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9

You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
Using your numbers, and assuming GFLOPS matter, why doesn't the HD3870 outperform the 9800GTX? Though I don't know where you are even getting your numbers.

http://www.techpowerup.com/reviews/Galaxy/GeForce_9500_GT_Overclocked/9.html

The HD3650 doesn't outperform the 8600GTS in Crysis despite the nearly 100 GFLOP advantage the HD3650 has. Face it, GFLOPS can't be used to determin gaming performance.

I dont think you are wrong. I also dont think the 9500GT is even par with the HD 4650. But good luck to me trying to convince you of that.
We will have to wait until the HD4650 is released and see. However, judging by the performance of the HD3650, which is about 60% of the 9500GT, and the fact that the HD4650 appears to be the HD3650 with everything on the core double, I think the two will be very close in the end.
 
Last edited:
Joined
Mar 1, 2008
Messages
242 (0.07/day)
Likes
46
Location
Antwerp, Belgium
Processor Intel Xeon X5650 @ 3.6Ghz - 1.2v
Motherboard Gigabyte G1.Assassin
Cooling Thermalright True Spirit 120
Memory 12GB DDR3 @ PC1600
Video Card(s) nVidia GeForce GTX 780 3GB
Storage 256GB Samsung 840 Pro + 3TB + 3TB + 2TB
Display(s) HP ZR22w
Case Antec P280
Audio Device(s) Asus Xonar DS
Power Supply Antec HCG-620M
Software Windows 7 x64
#17
http://www.computerbase.de/artikel/...st_ati_radeon_hd_4870_x2/20/#abschnitt_crysis

Well i also should have mentioned that since ATI and nVidia use completely different architectures, it's hard to compare their GFlops. But within one brand it's easy to see that GFlops do matter and that's why i was pointing to the 9600GT and 9800GTX comparison.

Your techpowerup review of Crysis has one flaw:
We tested the DX9 version with graphics set to "High", which is the highest non-DX10 setting in the game.

ComputerBase uses DX10 and 'Very High'. This setting is much more shader demanding!
BTW in this same review the 9500GT scores 7.0fps and that's only 9% more than a HD3650.

If you go here:
http://www.computerbase.de/artikel/...on_hd_4870_x2/23/#abschnitt_performancerating => these are the results of all games combined.
Here you see that the 9500GT scores only 19% more on average than a HD3650.

Enough talking and let's just wait a month.
 
Joined
Oct 5, 2007
Messages
1,714 (0.46/day)
Likes
182
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
#18
MrMilli your point has a big flaw: who cares how this cards will perform in future games? As you have said in the future the HD card could perform better because the demand for shader power will be higher. Again who cares? It's not even able to use the higher settings of the most demanding games today, let alone in the future ones...

If the card has some shader power left now (assuming that is true, which I don't think) then the card is bottlenecked by the other parts. That will not change in the future and that only means that while the 9500GT will go down to 5fps from the 10fps that renders today, the HD card will mantain a framerate close to that 10. WOOhhooo! Big deal. Same happens with the X1000 family, now they are like 50%++ faster than GF7 counterparts but always on higher settings and thus unplayable frames.

I have said this like hundreds of times: ever since the X1000 series Ati seems more concerned about how the cards could perform in the future than making the better card they can for the present.
 

candle_86

New Member
Joined
Dec 28, 2006
Messages
3,914 (0.98/day)
Likes
227
#19
What you write here is pure nonsence. You need to study more into this matter before making such a statement.
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.

@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9

You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
then explain to me one thing please, why does the R700 and RV770 preform more like a 64sp card than a 320sp card. The reason is only one of those ALU's is a complex shader, 64 of those are simple the rest arn't even related to shader work actully. And very few games use simple shaders because its harder to program for 2 types of shaders than just one.
 
Joined
Mar 1, 2008
Messages
242 (0.07/day)
Likes
46
Location
Antwerp, Belgium
Processor Intel Xeon X5650 @ 3.6Ghz - 1.2v
Motherboard Gigabyte G1.Assassin
Cooling Thermalright True Spirit 120
Memory 12GB DDR3 @ PC1600
Video Card(s) nVidia GeForce GTX 780 3GB
Storage 256GB Samsung 840 Pro + 3TB + 3TB + 2TB
Display(s) HP ZR22w
Case Antec P280
Audio Device(s) Asus Xonar DS
Power Supply Antec HCG-620M
Software Windows 7 x64
#20
then explain to me one thing please, why does the R700 and RV770 preform more like a 64sp card than a 320sp card. The reason is only one of those ALU's is a complex shader, 64 of those are simple the rest arn't even related to shader work actully. And very few games use simple shaders because its harder to program for 2 types of shaders than just one.
The R600 uses 64 superscalar unified shader clusters, each consisting of 5 stream processing units for a total of 320 stream processing units. Each of the first 4 stream processing units is able to retire a finished single precision floating point MAD (or ADD or MUL) instruction per clock, dot product (dp, and special cased by combining ALUs), and integer ADD. The fifth unit is more complex and can additionally handle special transcendental functions such as sine and cosine. Each of the 64 shader clusters can execute 6 instructions per clock cycle (peak), consisting of 5 shading instructions plus 1 branch.
The claimed theoretical processing power for the 8 Series cards given in FLOPS may not be correct at all times. For example the GeForce 8800 GTX has 518.43 GigaFLOPs theoretical performance given the fact that there are 128 stream processors at 1.35 GHz with each SP being able to run 1 Multiply-Add and 1 Multiply instruction per clock [(MADD (2 FLOPs) + MUL (1 FLOP))×1350MHz×128 SPs = 518.4 GigaFLOPs]. This figure may not be correct because the Multiply operation is not always available giving a possibly more accurate performance figure of (2×1350×128) = 345.6 GigaFLOPs.
So ... just to recap for you:
ATI: 5 units can do MADD (or ADD or MUL)
The 5th (and complex) unit is a special unit. It can also do transcedentals like SIN, COS, LOG, EXP. That's it.
1 MADD (=Multiply-Add) = 2 Flops
1 ADD or MUL = 1 Flops
And these are all usable. The developer doesn't need to program this. The compiler takes care of this. A real life scenario with some bad code could be something like 2 MADD + 1 MUL. If we average this over the 64 units then that would give 240GFlops.

nVidia: basically each scalar unit can do 2 Flops per clock. That would result in a real life performance of around 90GFlops.

So on shader performance ATI will win hands down.

Considering how close the HD4870 performs to the GTX 280 and how much more texel fillrate and bandwidth the GTX has, then it seems to me that shader performance is darn important these days.
 
Last edited:

Frick

Fishfaced Nincompoop
Joined
Feb 27, 2006
Messages
14,878 (3.45/day)
Likes
5,411
System Name A dancer in your disco of fire
Processor i3 4130 3.4Ghz
Motherboard MSI B85M-E45
Cooling Cooler Master Hyper 212 Evo
Memory 4 x 4GB Crucial Ballistix Sport 1400Mhz
Video Card(s) Asus GTX 760 DCU2OC 2GB
Storage Crucial BX100 120GB | WD Blue 1TB x 2
Display(s) BenQ GL2450HT
Case AeroCool DS Cube White
Power Supply Cooler Master G550M
Mouse Intellimouse Explorer 3.0
Keyboard Dell SK-3205
Software Windows 10 Pro
#21
Though, in todays market, I don't see a place for the extreme low end anymore.
Until they make good integrated graphics (780G/HD3200 is kinde nice though...) there will always be a need for extreme low-end imo. Like the HD3450. It's not really a bad card for a HTPC and it's cheaper than a gaming mouse.
 
Joined
Oct 5, 2007
Messages
1,714 (0.46/day)
Likes
182
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
#22
So ... just to recap for you:
ATI: 5 units can do MADD (or ADD or MUL)
The 5th (and complex) unit is a special unit. It can also do transcedentals like SIN, COS, LOG, EXP. That's it.
1 MADD (=Multiply-Add) = 2 Flops
1 ADD or MUL = 1 Flops
And these are all usable. The developer doesn't need to program this. The compiler takes care of this. A real life scenario with some bad code could be something like 2 MADD + 1 MUL. If we average this over the 64 units then that would give 240GFlops.

nVidia: basically each scalar unit can do 2 Flops per clock. That would result in a real life performance of around 90GFlops.

So on shader performance ATI will win hands down.

Considering how close the HD4870 performs to the GTX 280 and how much more texel fillrate and bandwidth the GTX has, then it seems to me that shader performance is darn important these days.
Some inacuracies and missinformation there:

- Theoretically both Ati and Nvidia shaders can do MADD+MUL. What you quoted above was about the G80, it has been long fixed in later releases. Assuming Ati can do both at a ime, while Nvidia can't, is stupid consideing Ati doesn't outperform Nvidia by so much even on shader specific benchmarks...
-You so conveniently forgot Nvidia shaders run at doeble the speed when calculating the "real life" performance...
- R600 and R700 are SIMD for each cluster and VLIW for each shader. This means that the instruction for all 5 units in the shader have to be written at the same time in the compilation (Very Long Instruction Word) and that all 80 shaders (R600=80x4, R700=80x10) in each cluster must calculate the same instruction. By constrast Nvidia's are scalar and also organiced on SIMD arrays, but only 16 or 24 long. (G80/9x and GT200 respectively)

This has two effects:

1. VLIW means that even if shaders (5 ALUs) are superscalar for the programmer or the drivers in this case, each shader IS a vector unit.
2. SIMD over such large arrays means that if a state change occurs, you have to calculate it in a different cluster, potentially losing a complete cluster or even the entire chip in tha clock.

That's why Ati is comparable to Nvidia when it comes to "real life" shader power.
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
34,328 (9.23/day)
Likes
17,426
Location
Hyderabad, India
System Name Long shelf-life potato
Processor Intel Core i7-4770K
Motherboard ASUS Z97-A
Cooling Xigmatek Aegir CPU Cooler
Memory 16GB Kingston HyperX Beast DDR3-1866
Video Card(s) 2x GeForce GTX 970 SLI
Storage ADATA SU800 512GB
Display(s) Samsung U28D590D 28-inch 4K
Case Cooler Master CM690 Window
Audio Device(s) Creative Sound Blaster Recon3D PCIe
Power Supply Corsair HX850W
Mouse Razer Abyssus 2014
Keyboard Microsoft Sidewinder X4
Software Windows 10 Pro Creators Update
#24
Sorry for the inaccuracy. Fixed.
 
Joined
Mar 1, 2008
Messages
242 (0.07/day)
Likes
46
Location
Antwerp, Belgium
Processor Intel Xeon X5650 @ 3.6Ghz - 1.2v
Motherboard Gigabyte G1.Assassin
Cooling Thermalright True Spirit 120
Memory 12GB DDR3 @ PC1600
Video Card(s) nVidia GeForce GTX 780 3GB
Storage 256GB Samsung 840 Pro + 3TB + 3TB + 2TB
Display(s) HP ZR22w
Case Antec P280
Audio Device(s) Asus Xonar DS
Power Supply Antec HCG-620M
Software Windows 7 x64
#25
@Darkmatter

- Only GT200 can dual-issue MADD and MUL ops all the time. G8x/G9x generation chips can't do it all the time. There are a select few scenarios where you can dual-issue MAD and MUL ops.

- I didn't: 1375Mhz * 2 Flops * 32 shaders = 88 GFlops

- You are wrong about it being SIMD. ATI's shader involves a MIMD 5-way vectoriel unit, MIMD signifying (contrary to SIMD) that several different instructions can be processed in parallel. The compiler is going to try to assemble simple operations in order to fill the MIMD 5D unit. But these 5 instructions cannot be dependant on each other. So even one shader can process different instructions at a time, let alone one cluster!
I simulated that only 3 instructions/shader can be done on average in my real life calculation because of less than optimal code and inefficiencies.
So basically your conclusion is wrong!

Using my real life caculation (it's just a simulation):
9800GTX 432GFlops
HD3870 248
HD4670 240
9600GT 208
9500GT 88
HD3650 87

If you check out my Crysis scores i posted previously, things start to make sence.
Now i know the HD4670 won't beat the 9600GT in Crysis because of many factors but what ATI has done is basically slapped the HD3870 shader engine into it. Add the RV700 generation architectural improvements.
nVidia on the contrary has made a die shrink of G84 and clocked it higher.

(pls read my previous posts before you reply)