2245 Users online, 6.80 mbps
Quick Search
Already a member?
Username:
Password:
Register Here!
New Forum Posts
03:50 by [I.R.A]_FBi
Using OLD cards (7)
03:46 by Live OR Die
COD5 Profile lost (5)
03:46 by AKlass
What should I ask for? (2)
03:45 by WarEagleAU
These any good? (8)
03:44 by Hardware_Mark1
Mudo's CM 690 mod (7)
03:43 by _jM
New ASUS P5Q PRO OCing help.. (46)
03:41 by WarEagleAU
My CM HAF 932 Came Today!!!! (0)
03:41 by cdawall
Who has a B2 Phenom that overclocks? (16)



Last Articles


Popular Articles
Friday, August 29 2008
GPU Café published information on future competition lineups., which shows the entry of a "GeForce 9550 GT" stacked up against the Radeon HD 4670. Sources in the media have pointed to the the possibility that the the RV730 based HD 4670 from ATI outperforms NVIDIA cards in its current lineup, relative to the segments where GeForce 9500 GT sits. The HD 4650 could exchange a few blows with the GeForce 9500 GT with equal or better levels of performance while the HD 4670 surpasses it.

The entry of a GeForce 9550 GT shows the 9500 GT cannot compete with the HD 4650, a newer price demographic of ~ $129 is shown in that chart that not only indicates prices, but also shows the HD 4650's lead over 9500 GT is so significant that ATI could be comfortable with asking you $20 more than what 9500 GT asks, relative to the range. GPU Café reports that the 9550 GT would be a toned-down (and shrunk) G94, as in the 55 nm G94b, featuring 64 shader processors and a 192-bit memory bus (and presumably, memory configurations such as 384 MB or 768 MB of GDDR3 memory).



Source: GPU Café
posted by btarunr - 12:00 AM |  Related News

User comments
by Apocolypse007 (August 29th - 5:52 PM) - Reply
$20 less for a slightly less powerful GPU. Honestly I think that if you will spend >100 on a GPU to begin with, the extra $20 for the 9600GT is nothing. I don't think there is much of a market for so many different versions of today's cards. There should only be a mainstream, performance, and enthusiast class product for each new generation of card. Any more products and its just going to confuse the consumer.
by Black Light (August 29th - 5:57 PM) - Reply
by: Apocolypse007;951691
$20 less for a slightly less powerful GPU. Honestly I think that if you will spend >100 on a GPU to begin with, the extra $20 for the 9600GT is nothing. I don't think there is much of a market for so many different versions of today's cards. There should only be a mainstream, performance, and enthusiast class product for each new generation of card. Any more products and its just going to confuse the consumer.
true but if it stays the was it is it is $40 from the 9500GT--->9600GT
by newtekie1 (August 29th - 6:02 PM) - Reply
I'm guessing the 9550GT is going to be the 55nm part with clocks slightly increased. The 9500GT should be able to handle the 4650 and the Pre-Overclocked 9500GT's should be able to handle the 4670.

I think nVidia is just going overboard with adding the 9550GT. They should have just left the 9500GT for $20 cheaper and let their partners Pre-Overclock the cards to make up the difference in performance and price.
by candle_86 (August 29th - 6:04 PM) - Reply
the 9600GSO is an even 100 these days, your stupid to get one of these with the 9600GSO out there
by newtekie1 (August 29th - 6:10 PM) - Reply
by: candle_86;951722
the 9600GSO is an even 100 these days, your stupid to get one of these with the 9600GSO out there
You have to kind of ignore the prices in the picture, the 9500GT doesn't retail for $109, so the 9550GT won't be $129. I wonder if these prices are even USD? A 9500GT is more around $60 for the DDR2 version and $80($75 with Mail-In) for the DDR3 version.

I am in agreement with you though, the 9600GSO can be had for $90 even with free shipping right now from newegg. So, IMO, these lower class cards aren't worth saving the $10-20. The 9600GSO is even cheaper if you consider rebates, they can be had for $80.
by candle_86 (August 29th - 6:13 PM) - Reply
agreed, I'd love to know when we started getting 2 low end series cards. The 9500GT should be set lower MSRP, and the 9400GT abdononed
by newtekie1 (August 29th - 6:23 PM) - Reply
There has always been 4 basic levels. The extreme low end, low end, mid-range, high end.

With the 8 Series:
8400, 8500, 8600, 8800
With the 7 series:
7100/7200, 7300, 7600, 7800/7900
With the 6 series:
6200-TC, 6200, 6600, 6800
With the 5 series:
5200, 5500, 5600/5700, 5800/5900

Though, in todays market, I don't see a place for the extreme low end anymore.
by candle_86 (August 29th - 6:43 PM) - Reply
by: newtekie1;951763
There has always been 4 basic levels. The extreme low end, low end, mid-range, high end.

With the 8 Series:
8400, 8500, 8600, 8800
With the 7 series:
7100/7200, 7300, 7600, 7800/7900
With the 6 series:
6200-TC, 6200, 6600, 6800
With the 5 series:
5200, 5500, 5600/5700, 5800/5900

Though, in todays market, I don't see a place for the extreme low end anymore.

yes and now actully.

For Starters the 5500 came out after the 5200 to replace the 5200 Ultra that was more expensive to produce.

The 6200TC is the same low end generation as the normal 6200, the normal PCIe 6200 was just so they had something there, the 6200TC replaced it

The 7100/7200 line granted where lower end, though the 7100GS was faster than the 7200GS, and the 7200GS was just to get rid of NV44 cores. Though it started here.

Personally I want it simple again.

Geforce MX for low end

Geforce TI for high end.

GeforceMX 220, GeforceMX 240, GeforceMX 260

GeforceTI 220, GeforceTI 240, GeforceTI 260,

that would simpliy life enough for me
by MrMilli (August 29th - 8:11 PM) - Reply
by: newtekie1;951716
I'm guessing the 9550GT is going to be the 55nm part with clocks slightly increased. The 9500GT should be able to handle the 4650 and the Pre-Overclocked 9500GT's should be able to handle the 4670.

I think nVidia is just going overboard with adding the 9550GT. They should have just left the 9500GT for $20 cheaper and let their partners Pre-Overclock the cards to make up the difference in performance and price.
I really don't think so.
The 9500 is basically a higher clocked 8600.
The HD4600 is basically a HD3870 with a 128bit bus (+faster AA unit).
Considering that these cards are mainly used by users who own 19" monitors (~1280x1024), the low memory bandwidth won't be a major criteria.
For refrence, a HD3850 is around 2x faster than a 8600GTS @ 1280x1024.
The HD4670 will have 480GFlops (peak) and 9500GT has around 132GFlops (peak - depending on the model). You can't close that gap with an overclock.
by newtekie1 (August 29th - 8:53 PM) - Reply
by: MrMilli;951943
I really don't think so.
The 9500 is basically a higher clocked 8600.
The HD4600 is basically a HD3870 with a 128bit bus (+faster AA unit).
Considering that these cards are mainly used by users who own 19" monitors (~1280x1024), the low memory bandwidth won't be a major criteria.
For refrence, a HD3850 is around 2x faster than a 8600GTS @ 1280x1024.
The HD4670 will have 480GFlops (peak) and 9500GT has around 132GFlops (peak - depending on the model). You can't close that gap with an overclock.
You are wrong, the 128-bit bus makes a huge performance hit.

The HD4670 is just an overclocked HD4650. All the information we have seen says the 9500GT matches the HD4650, so an overclocked 9500GT should be able to match an HD4670.

And the FLOPS rating of either card doesn't matter one bit, and has no real affect on graphical performance. If it did, we wouldn't see the 9600GT rated at 208 GFLOPS outperforming the HD3870 rated at 496 GFLOPS.
by yogurt_21 (August 29th - 9:33 PM) - Reply
by: newtekie1;952019
You are wrong, the 128-bit bus makes a huge performance hit.

The HD4670 is just an overclocked HD4650. All the information we have seen says the 9500GT matches the HD4650, so an overclocked 9500GT should be able to match an HD4670.

And the FLOPS rating of either card doesn't matter one bit, and has no real affect on graphical performance. If it did, we wouldn't see the 9600GT rated at 208 GFLOPS outperforming the HD3870 rated at 496 GFLOPS.
yeah and I thought the 4670 has 12rop's which would further lower performance. or am I wrong on that?
by WarEagleAU (August 29th - 9:44 PM) - Reply
I dont think you are wrong. I also dont think the 9500GT is even par with the HD 4650. But good luck to me trying to convince you of that.
by candle_86 (August 29th - 10:45 PM) - Reply
320x16x8 is the RV730 core config, thats 8 rops, and 16 TMU's with 64x5 ALU's which only 1 set will really be used in games of course.

So 64x750 = 48,000

now the 9500GT 32x16x8 core config, now the shader config is 32 and only 32 but all those will be used, unlike the extra ALU's on the 4650.

so 32x1400 = 44,800

numbers are fairly close on shader op's per second for most games actully.

so you tell me can the 9500GT keep up?

what ATI needs is a 9600GT killer the RV670 is supposed to stop production soon leaving nothing to compete
by MrMilli (August 30th - 12:35 AM) - Reply
by: candle_86;952207
320x16x8 is the RV730 core config, thats 8 rops, and 16 TMU's with 64x5 ALU's which only 1 set will really be used in games of course.

So 64x750 = 48,000

now the 9500GT 32x16x8 core config, now the shader config is 32 and only 32 but all those will be used, unlike the extra ALU's on the 4650.

so 32x1400 = 44,800

numbers are fairly close on shader op's per second for most games actully.

so you tell me can the 9500GT keep up?

what ATI needs is a 9600GT killer the RV670 is supposed to stop production soon leaving nothing to compete
What you write here is pure nonsence. You need to study more into this matter before making such a statement.
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.

@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9

You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
by newtekie1 (August 30th - 1:42 AM) - Reply
by: MrMilli;952330
What you write here is pure nonsence. You need to study more into this matter before making such a statement.
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.

@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9

You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
Using your numbers, and assuming GFLOPS matter, why doesn't the HD3870 outperform the 9800GTX? Though I don't know where you are even getting your numbers.

http://www.techpowerup.com/reviews/Galaxy/GeForce_9500_GT_Overclocked/9.html

The HD3650 doesn't outperform the 8600GTS in Crysis despite the nearly 100 GFLOP advantage the HD3650 has. Face it, GFLOPS can't be used to determin gaming performance.

by: WarEagleAU;952107
I dont think you are wrong. I also dont think the 9500GT is even par with the HD 4650. But good luck to me trying to convince you of that.
We will have to wait until the HD4650 is released and see. However, judging by the performance of the HD3650, which is about 60% of the 9500GT, and the fact that the HD4650 appears to be the HD3650 with everything on the core double, I think the two will be very close in the end.
by MrMilli (August 30th - 3:16 AM) - Reply
http://www.computerbase.de/artikel/hardware/grafikkarten/2008/test_ati_radeon_hd_4870_x2/20/#abschnitt_crysis

Well i also should have mentioned that since ATI and nVidia use completely different architectures, it's hard to compare their GFlops. But within one brand it's easy to see that GFlops do matter and that's why i was pointing to the 9600GT and 9800GTX comparison.

Your techpowerup review of Crysis has one flaw:
We tested the DX9 version with graphics set to "High", which is the highest non-DX10 setting in the game.

ComputerBase uses DX10 and 'Very High'. This setting is much more shader demanding!
BTW in this same review the 9500GT scores 7.0fps and that's only 9% more than a HD3650.

If you go here:
http://www.computerbase.de/artikel/hardware/grafikkarten/2008/test_ati_radeon_hd_4870_x2/23/#abschnitt_performancerating => these are the results of all games combined.
Here you see that the 9500GT scores only 19% more on average than a HD3650.

Enough talking and let's just wait a month.
by DarkMatter (August 30th - 4:16 AM) - Reply
MrMilli your point has a big flaw: who cares how this cards will perform in future games? As you have said in the future the HD card could perform better because the demand for shader power will be higher. Again who cares? It's not even able to use the higher settings of the most demanding games today, let alone in the future ones...

If the card has some shader power left now (assuming that is true, which I don't think) then the card is bottlenecked by the other parts. That will not change in the future and that only means that while the 9500GT will go down to 5fps from the 10fps that renders today, the HD card will mantain a framerate close to that 10. WOOhhooo! Big deal. Same happens with the X1000 family, now they are like 50%++ faster than GF7 counterparts but always on higher settings and thus unplayable frames.

I have said this like hundreds of times: ever since the X1000 series Ati seems more concerned about how the cards could perform in the future than making the better card they can for the present.
by candle_86 (August 30th - 5:41 AM) - Reply
by: MrMilli;952330
What you write here is pure nonsence. You need to study more into this matter before making such a statement.
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.

@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9

You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
then explain to me one thing please, why does the R700 and RV770 preform more like a 64sp card than a 320sp card. The reason is only one of those ALU's is a complex shader, 64 of those are simple the rest arn't even related to shader work actully. And very few games use simple shaders because its harder to program for 2 types of shaders than just one.
by MrMilli (August 30th - 2:09 PM) - Reply
by: candle_86;952649
then explain to me one thing please, why does the R700 and RV770 preform more like a 64sp card than a 320sp card. The reason is only one of those ALU's is a complex shader, 64 of those are simple the rest arn't even related to shader work actully. And very few games use simple shaders because its harder to program for 2 types of shaders than just one.

The R600 uses 64 superscalar unified shader clusters, each consisting of 5 stream processing units for a total of 320 stream processing units. Each of the first 4 stream processing units is able to retire a finished single precision floating point MAD (or ADD or MUL) instruction per clock, dot product (dp, and special cased by combining ALUs), and integer ADD. The fifth unit is more complex and can additionally handle special transcendental functions such as sine and cosine. Each of the 64 shader clusters can execute 6 instructions per clock cycle (peak), consisting of 5 shading instructions plus 1 branch.

The claimed theoretical processing power for the 8 Series cards given in FLOPS may not be correct at all times. For example the GeForce 8800 GTX has 518.43 GigaFLOPs theoretical performance given the fact that there are 128 stream processors at 1.35 GHz with each SP being able to run 1 Multiply-Add and 1 Multiply instruction per clock [(MADD (2 FLOPs) + MUL (1 FLOP))×1350MHz×128 SPs = 518.4 GigaFLOPs]. This figure may not be correct because the Multiply operation is not always available giving a possibly more accurate performance figure of (2×1350×128) = 345.6 GigaFLOPs.
So ... just to recap for you:
ATI: 5 units can do MADD (or ADD or MUL)
The 5th (and complex) unit is a special unit. It can also do transcedentals like SIN, COS, LOG, EXP. That's it.
1 MADD (=Multiply-Add) = 2 Flops
1 ADD or MUL = 1 Flops
And these are all usable. The developer doesn't need to program this. The compiler takes care of this. A real life scenario with some bad code could be something like 2 MADD + 1 MUL. If we average this over the 64 units then that would give 240GFlops.

nVidia: basically each scalar unit can do 2 Flops per clock. That would result in a real life performance of around 90GFlops.

So on shader performance ATI will win hands down.

Considering how close the HD4870 performs to the GTX 280 and how much more texel fillrate and bandwidth the GTX has, then it seems to me that shader performance is darn important these days.
by Frick (August 30th - 2:23 PM) - Reply
by: newtekie1;951763

Though, in todays market, I don't see a place for the extreme low end anymore.
Until they make good integrated graphics (780G/HD3200 is kinde nice though...) there will always be a need for extreme low-end imo. Like the HD3450. It's not really a bad card for a HTPC and it's cheaper than a gaming mouse.
by DarkMatter (August 30th - 6:33 PM) - Reply
by: MrMilli;952976
So ... just to recap for you:
ATI: 5 units can do MADD (or ADD or MUL)
The 5th (and complex) unit is a special unit. It can also do transcedentals like SIN, COS, LOG, EXP. That's it.
1 MADD (=Multiply-Add) = 2 Flops
1 ADD or MUL = 1 Flops
And these are all usable. The developer doesn't need to program this. The compiler takes care of this. A real life scenario with some bad code could be something like 2 MADD + 1 MUL. If we average this over the 64 units then that would give 240GFlops.

nVidia: basically each scalar unit can do 2 Flops per clock. That would result in a real life performance of around 90GFlops.

So on shader performance ATI will win hands down.

Considering how close the HD4870 performs to the GTX 280 and how much more texel fillrate and bandwidth the GTX has, then it seems to me that shader performance is darn important these days.
Some inacuracies and missinformation there:

- Theoretically both Ati and Nvidia shaders can do MADD+MUL. What you quoted above was about the G80, it has been long fixed in later releases. Assuming Ati can do both at a ime, while Nvidia can't, is stupid consideing Ati doesn't outperform Nvidia by so much even on shader specific benchmarks...
-You so conveniently forgot Nvidia shaders run at doeble the speed when calculating the "real life" performance...
- R600 and R700 are SIMD for each cluster and VLIW for each shader. This means that the instruction for all 5 units in the shader have to be written at the same time in the compilation (Very Long Instruction Word) and that all 80 shaders (R600=80x4, R700=80x10) in each cluster must calculate the same instruction. By constrast Nvidia's are scalar and also organiced on SIMD arrays, but only 16 or 24 long. (G80/9x and GT200 respectively)

This has two effects:

1. VLIW means that even if shaders (5 ALUs) are superscalar for the programmer or the drivers in this case, each shader IS a vector unit.
2. SIMD over such large arrays means that if a state change occurs, you have to calculate it in a different cluster, potentially losing a complete cluster or even the entire chip in tha clock.

That's why Ati is comparable to Nvidia when it comes to "real life" shader power.
by GPUCafe (August 30th - 7:44 PM) - Reply
FYI, original source is us: http://gpucafe.com/2008/08/nvidia-preparing-to-counter-attack-in-the-sub-150-segment/

And the image is not from a slide.
by btarunr (August 30th - 7:52 PM) - Reply
Sorry for the inaccuracy. Fixed.
by MrMilli (August 30th - 8:09 PM) - Reply
@Darkmatter

- Only GT200 can dual-issue MADD and MUL ops all the time. G8x/G9x generation chips can't do it all the time. There are a select few scenarios where you can dual-issue MAD and MUL ops.

- I didn't: 1375Mhz * 2 Flops * 32 shaders = 88 GFlops

- You are wrong about it being SIMD. ATI's shader involves a MIMD 5-way vectoriel unit, MIMD signifying (contrary to SIMD) that several different instructions can be processed in parallel. The compiler is going to try to assemble simple operations in order to fill the MIMD 5D unit. But these 5 instructions cannot be dependant on each other. So even one shader can process different instructions at a time, let alone one cluster!
I simulated that only 3 instructions/shader can be done on average in my real life calculation because of less than optimal code and inefficiencies.
So basically your conclusion is wrong!

Using my real life caculation (it's just a simulation):
9800GTX 432GFlops
HD3870 248
HD4670 240
9600GT 208
9500GT 88
HD3650 87

If you check out my Crysis scores i posted previously, things start to make sence.
Now i know the HD4670 won't beat the 9600GT in Crysis because of many factors but what ATI has done is basically slapped the HD3870 shader engine into it. Add the RV700 generation architectural improvements.
nVidia on the contrary has made a die shrink of G84 and clocked it higher.

(pls read my previous posts before you reply)
by DarkMatter (August 31st - 2:41 AM) - Reply
by: MrMilli;953332
@Darkmatter

- Only GT200 can dual-issue MADD and MUL ops all the time. G8x/G9x generation chips can't do it all the time. There are a select few scenarios where you can dual-issue MAD and MUL ops.

- I didn't: 1375Mhz * 2 Flops * 32 shaders = 88 GFlops

- You are wrong about it being SIMD. ATI's shader involves a MIMD 5-way vectoriel unit, MIMD signifying (contrary to SIMD) that several different instructions can be processed in parallel. The compiler is going to try to assemble simple operations in order to fill the MIMD 5D unit. But these 5 instructions cannot be dependant on each other. So even one shader can process different instructions at a time, let alone one cluster!
I simulated that only 3 instructions/shader can be done on average in my real life calculation because of less than optimal code and inefficiencies.
So basically your conclusion is wrong!

Using my real life caculation (it's just a simulation):
9800GTX 432GFlops
HD3870 248
HD4670 240
9600GT 208
9500GT 88
HD3650 87

If you check out my Crysis scores i posted previously, things start to make sence.
Now i know the HD4670 won't beat the 9600GT in Crysis because of many factors but what ATI has done is basically slapped the HD3870 shader engine into it. Add the RV700 generation architectural improvements.
nVidia on the contrary has made a die shrink of G84 and clocked it higher.

(pls read my previous posts before you reply)


Sorry, but you are wrong. Well, in some way you could say it's MIMD, because R600/700 is composed of SIMD arrays of 5 wide superscalar shader processors controled through VLIWs. BUT the MULTIPLE instruction part is INSIDE each shader, meaning that each ALU within the shader can process different instructions, BUT the every SP in the SIMD array has to share the same instruction. My claim still remains true.

http://www.techreport.com/articles.x/12458/2
http://www.techreport.com/articles.x/14990/4


These stream processor blocks are arranged in arrays of 16 on the chip, for a SIMD (single instruction multiple data) arrangement, and are controlled via VLIW (very long instruction word) commands. At a basic level, that means as many as six instructions, five math and one for the branch unit, are grouped into a single instruction word. This one instruction word then controls all 16 execution blocks, which operate in parallel on similar data, be it pixels, vertices, or what have you.


And then still remains the question whether the drivers can take the usually linear code of games (linear in the sense that AFAIK they calculate different data types at a different time, instead of everything being calculated concurrently) and effectively blend different types of instructions in one VLIW instructions in real time. "Real time" being the key. R600/700 was developed with GPGPU in mind and there it can be effectively used. The inclusion of VLIW then makes sense. But IMO that is fundamentally impossible for the most part in real time calculations. Probably if shaders are doing vertex calculations the other 2 ALUs remain unused, even worse if the operation requires less ALUs.

On the MADD+MUL you are probably right, but Nvidia DID claim they had fixed it on the 9 series.

88 GFlops: I thought you were talking about the 9600GT, for some reason. Probably because candle mentioned it. But TBH arguing about the shader power to compare the graphics cards performance is pointless. The card could be capable of 10 TFlops, but if it mantained only the same 8 render back-ends, it would still perform similarly to any other card with 8 ROPs and similar clocks.

Ah oh, about Crysis. Nonsense. HD3870 is not faster than 9600 GT, let alone a massively crippled one. (if you insist in comparing the HD3870 with the HD4670)
by MrMilli (August 31st - 4:16 AM) - Reply
@ Darkmatter

I went to school myself again and found out that you are right about the fact that each cluster is SIMD. That will cause some inefficiency.
http://pc.watch.impress.co.jp/docs/2008/0626/kaigai_3.pdf

This is my source on Crysis: http://www.computerbase.de/artikel/hardware/grafikkarten/2008/test_ati_radeon_hd_4870_x2/20/#abschnitt_crysis
They use DX10 - very high - 1280x1024.

We'll talk about this again when benchmarks appear which i guess will be soon.
But here is a nice little preview for you:
http://bp3.blogger.com/_4qvKWy79Suw/R5pzm6JY-BI/AAAAAAAAAPg/YUofEVeF82U/s1600-h/hd3690.gif => one chart
http://www.pcpop.com/doc/0/265/265454_5.shtml => full article (chinese)

I don't know if you remember the Radeon HD3690 intented for the chinese market only?
This is what it is: http://www.itocp.com/attachments/month_0801/20080117_5aca84ad09a931a1be6fzI5hDbRNoulx.jpg
Basically a HD3850 with a 128bit bus.
I know it's 16 vs 8 ROPS but both will have 16 tex units .... time will tell.
by newtekie1 (August 31st - 4:56 AM) - Reply
http://www.vr-zone.com/articles/Radeon_HD_4670_&_4650_3DMark_Vantage_Performance_Revealed/6007.html

First glimps at HD4650 and HD4670 performance. The HD4650 scores P21xx in Vantage, the 9500GT is pretty close with P19xx. The switch to 55nm and raising clocks should bring the 9500GT into striking distance of the HD4670.
by DarkMatter (August 31st - 6:53 AM) - Reply
by: MrMilli;953902
@ Darkmatter

I went to school myself again and found out that you are right about the fact that each cluster is SIMD. That will cause some inefficiency.
http://pc.watch.impress.co.jp/docs/2008/0626/kaigai_3.pdf

This is my source on Crysis: http://www.computerbase.de/artikel/hardware/grafikkarten/2008/test_ati_radeon_hd_4870_x2/20/#abschnitt_crysis
They use DX10 - very high - 1280x1024.

We'll talk about this again when benchmarks appear which i guess will be soon.
But here is a nice little preview for you:
http://bp3.blogger.com/_4qvKWy79Suw/R5pzm6JY-BI/AAAAAAAAAPg/YUofEVeF82U/s1600-h/hd3690.gif => one chart
http://www.pcpop.com/doc/0/265/265454_5.shtml => full article (chinese)

I don't know if you remember the Radeon HD3690 intented for the chinese market only?
This is what it is: http://www.itocp.com/attachments/month_0801/20080117_5aca84ad09a931a1be6fzI5hDbRNoulx.jpg
Basically a HD3850 with a 128bit bus.
I know it's 16 vs 8 ROPS but both will have 16 tex units .... time will tell.
Yeah time will tell. I never pretended to say that this card will be faster than the HD anyway. I do think that on reasonable settings for this kind of cards both will be pretty close. You can't take some benchmarks and say one card is better than other one because at some settings it has 7fps and the other card only 4fps. None of the two are playable, you have to look at what they do at playable settings, because they were designed for those ones.

The HD3870 is only faster when AF/AA is disabled and/or when both cards are very under playable frames. You can't seriously prove your point based on that criteria, because of course if you disable AA and AF, taking out the burden from ROPs and TMUs, obviously all the burden will be on shaders. But on more common settings the card that is more balanced usually wins. HD3000 series were unbalanced and the HD46xx will be even more. HD4xxx's ROP and TMU are more efficient so it will do better than HD3xxx no matter what but IMO not to the point to leave the competition far behind.
by DarkMatter (August 31st - 6:56 AM) - Reply
by: newtekie1;953962
http://www.vr-zone.com/articles/Radeon_HD_4670_&_4650_3DMark_Vantage_Performance_Revealed/6007.html

First glimps at HD4650 and HD4670 performance. The HD4650 scores P21xx in Vantage, the 9500GT is pretty close with P19xx. The switch to 55nm and raising clocks should bring the 9500GT into striking distance of the HD4670.
Duh! :eek: 32 TMUs?? :eek:

That could change my view completely...

Have to think about it though.
by MrMilli (August 31st - 1:42 PM) - Reply
@Darkmatter
Don't treat me like a nob. The Crysis example was brought up to explain to newtekie1 that shader power does matter.

Great find newtekie1. But the HD4650 GDDR2 is already beating the 9500GT GDDR3. The 9550GT needs to be core 1Ghz, shader 2Ghz & memory 2Ghz to get close to the HD4670. I think those frequencies are out of reach. I think the 9550GT is meant to compete with the HD4650 GDDR3.
by candle_86 (August 31st - 2:22 PM) - Reply
by: MrMilli;952976
So ... just to recap for you:
ATI: 5 units can do MADD (or ADD or MUL)
The 5th (and complex) unit is a special unit. It can also do transcedentals like SIN, COS, LOG, EXP. That's it.
1 MADD (=Multiply-Add) = 2 Flops
1 ADD or MUL = 1 Flops
And these are all usable. The developer doesn't need to program this. The compiler takes care of this. A real life scenario with some bad code could be something like 2 MADD + 1 MUL. If we average this over the 64 units then that would give 240GFlops.

nVidia: basically each scalar unit can do 2 Flops per clock. That would result in a real life performance of around 90GFlops.

So on shader performance ATI will win hands down.

Considering how close the HD4870 performs to the GTX 280 and how much more texel fillrate and bandwidth the GTX has, then it seems to me that shader performance is darn important these days.
800SP vs 240SP and it still can't catch it, i think ATI has a problem there
by GPUCafe (August 31st - 4:50 PM) - Reply
by: candle_86;954360
800SP vs 240SP and it still can't catch it, i think ATI has a problem there

Big problem for sure. They are the one's giving $100-150 price drops right? ;)
by DarkMatter (August 31st - 7:30 PM) - Reply
by: MrMilli;954326
@Darkmatter
Don't treat me like a nob. The Crysis example was brought up to explain to newtekie1 that shader power does matter.

Great find newtekie1. But the HD4650 GDDR2 is already beating the 9500GT GDDR3. The 9550GT needs to be core 1Ghz, shader 2Ghz & memory 2Ghz to get close to the HD4670. I think those frequencies are out of reach. I think the 9550GT is meant to compete with the HD4650 GDDR3.
I'm willing to hear where did I treat you like a noob? Your Crysis point still doesn't hold. Shader power does matter, no one, not even newtekie said it doesn't we just questioned HD card's real shader power ON GAMES.

And that also counts for your second paragraph, the card beating the other one on 3DMark means nothing. It never did. 3DMark is only useful to test OCs and such things. It's not useful to test different cards or system's real performance. Ati cards, specially ever since the R600 have a tremendous advantage on benchmarks, because it's a lot easier to obtain a much higher efficiency (as discussed above) on a fixed benchmark than on real gameplay. The lack of texture power is also mitigated on a benchmark, as everyithng behind the camera will never have to be rendered unexpectedly. It doesn't even matter if the benchmark is something like 3DMark or Crysis GPU benchmark. HardOPC already demostrated that.
by Kursah (August 31st - 7:52 PM) - Reply
Who cares about these specifics folks? Sure it's somewhat nice to know, but damn! This has been an interesting read and rehash of technologies. ATI has dissapointed me with their advertising shaders, personally I would've counted each cluster as a shader core, instead of bragging about 320, 640, 800 or however many zillion "shaders" they fit on their GPU. Also their strategy is improving with every generation, not just in how many shaders, but in overall performance. Both sides are doing good, to me there is no clear winner as I could care less...what I DO care about is what is going to get me what I want for the budget I have to work with...sometimes that includes temps, stability, drivers, OC-ability, etc. See my sys specs to see the winner I chose! Couldn't be happier! :D

As-far-as these low-low-end cards, I may pick a couple up to put in a my sisters' rig and parents' rigs. They do little-to-nothing stressful beyond 2D...just depends if replacing what they already have is worth it or not. As newtekie stated earlier...I really see no point in a strong market for these cards...we don't need multiple models in the low-end segment imo, nor do I care about it's 3D or benchmark performance...if I were to get one of these, it would be for an internet/htpc rig that probably would never game.

:toast:
by MrMilli (September 1st - 12:25 AM) - Reply


Well this seems to me a very accurate representation of real life game performance. Everything is where it should be. Actually HD4870 should be above GTX260 what would mean that it doesn't give an advantage to ATI.
And HardOPC ... please ...
Most websites already concluded that the 3DMark Vantage GPU score is very representative.
by MrMilli (September 1st - 2:53 PM) - Reply
http://gpucafe.com/2008/08/nvidia-preparing-to-counter-attack-in-the-sub-150-segment/

GPU Café has found out that the 9550GT is going to be based on G94b. 64 shaders & 192bit bus. It kinda confirms that the G96 couldn't catch up with the HD4670. If this is true then the 9550GT will be very competitive. It seems that nVidia is prepared to cut profit margins in order to stay competitive since this G94b based product will be much more expensive to produce.
by DarkMatter (September 1st - 3:03 PM) - Reply
by: MrMilli;955656
http://gpucafe.com/2008/08/nvidia-preparing-to-counter-attack-in-the-sub-150-segment/

GPU Café has found out that the 9550GT is going to be based on G94b. 64 shaders & 192bit bus. It kinda confirms that the G96 couldn't catch up with the HD4670. If this is true then the 9550GT will be very competitive. It seems that nVidia is prepared to cut profit margins in order to stay competitive since this G94b based product will be much more expensive to produce.
Good to know. As of the margins, I don't think they will be much smaller than what Ati has with the HD4670. And that chip, if true, is bound to be significantly faster.
by MrMilli (September 1st - 4:37 PM) - Reply
PCB will be much more expensive (more layers because of the 192bit bus) and bigger.
G94b will be around 200mm² and RV730 is around 150mm².
Power consumption will be an issue too since a 9600GT uses around 100W and need an additional PCI-E power plug. If they want this gone then they need to take it below 75W. 55nm will bring them tops 10W lower consumption on the same clock so that's not enough.
(HD4670 has a 59W power envelope)

When we're talking about end user prices of around $100 then these stuff matter a lot.
by DarkMatter (September 1st - 5:16 PM) - Reply
by: MrMilli;955744
PCB will be much more expensive (more layers because of the 192bit bus) and bigger.
G94b will be around 200mm² and RV730 is around 150mm².
Power consumption will be an issue too since a 9600GT uses around 100W and need an additional PCI-E power plug. If they want this gone then they need to take it below 75W. 55nm will bring them tops 10W lower consumption on the same clock so that's not enough.
(HD4670 has a 59W power envelope)

When we're talking about end user prices of around $100 then these stuff matter a lot.
OMG I know all that. But it won't be that much, and it should perform faster enough to be able to sell it for a bit more. Also G94b will be for the GT and the ones that don't qualify will become the 9550. Those chips are going waste right now, so it will actually increase their current margins IMO.
by candle_86 (September 1st - 7:22 PM) - Reply
by: MrMilli;955744
PCB will be much more expensive (more layers because of the 192bit bus) and bigger.
G94b will be around 200mm² and RV730 is around 150mm².
Power consumption will be an issue too since a 9600GT uses around 100W and need an additional PCI-E power plug. If they want this gone then they need to take it below 75W. 55nm will bring them tops 10W lower consumption on the same clock so that's not enough.
(HD4670 has a 59W power envelope)

When we're talking about end user prices of around $100 then these stuff matter a lot.
most users have free molex also think about it. Many bought a 5200 Ultra of FX5600 and both needed external power, it comes down to whats cheaper
by MrMilli (September 2nd - 5:57 PM) - Reply
@Darkmatter

How can a 9550GT use broken G94b's if it keeps all the 64 shaders? Broken memory bus?
And i'm sticking with the production cost issue. I double checked everything again and the 9550GT should be around 35-45% more expensive to produce. nVidia can do two things: put 384MB on the card (instead of 768MB) or really use broken G94's (48 shaders?).
Overview of material:
HD4670: 6-layer pcb, ~380 chips per wafer, 128bit chip packaging
9550GT: 8-layer pcb, ~290 chips per wafer, 256bit chip packaging

Did you ever see wafer prices? pcb and chip packaging cost aren't anything to scoff at either.
Even if the 9550GT will be only a bit more expensive but also a bit faster, ATI is bound to make a huge profit on the HD4650 & HD4670. Not only will the RV730 be a hit in it's class but the RV710 is going to destroy the 9400GT.
No matter how many disadvantages the SIMD based VLIW shader engine has, it really takes much less die space than the scalar based approach nVidia uses.

BTW a review:
http://publish.it168.com/2008/0901/20080901043806.shtml
http://en.expreview.com/2008/09/02/rv730-reviewed-prforms-close-to-3850/
by DarkMatter (September 2nd - 7:21 PM) - Reply
by: MrMilli;957496
@Darkmatter

How can a 9550GT use broken G94b's if it's keeps all the 64 shaders? Broken memory bus?
And i'm sticking with the production cost issue. I double checked everything again and the 9550GT should be around 35-45% more expensive to produce. nVidia can do two things: put 384MB on the card (instead of 768MB) or really use broken G94's (48 shaders?).
Overview of material:
HD4670: 6-layer pcb, ~380 chips per wafer, 128bit chip packaging
9550GT: 8-layer pcb, ~290 chips per wafer, 256bit chip packaging

Did you ever see wafer prices? pcb and chip packaging cost aren't anything to scoff at either.
Even if the 9550GT will be only a bit more expensive but also a bit faster, ATI is bound to make a huge profit on the HD4650 & HD4670. Not only will the RV730 be a hit in it's class but the RV710 is going to destroy the 9400GT.
No matter how many disadvantages the SIMD based VLIW shader engine has, it really takes much less die space than the scalar based approach nVidia uses.

BTW a review:
http://publish.it168.com/2008/0901/20080901043806.shtml
http://en.expreview.com/2008/09/02/rv730-reviewed-prforms-close-to-3850/


So many things... Well

1- Nvidia uses a cluster aproach, so they can disable both SP/TMU clusters AND ROP/MC clusters.

2- Any sources on that it will use 8 layers? If 8800 GT could be made in 6 layer PCB, as Nvidia wanted partners to adopt, this one can be on 6 layers a lot easier. I don't actually know if it will have 8, so I'm just assuming. 192 bit is NOT 256 bit last time I checked anyway.

3- Which are your sources for die size?

:roll::roll: 290*8-layers / 6-layers = ~380 :roll::roll: I really hope you have sources for die size and that calculation was not made as things seem to tell... PCB Layers have nothing to do with chips per wafer. NO COMMENT!!

4- Of course they could put 384 MB on them and could still perform a lot better. Isn't the HD3850 faster with only 256 after all?

5- SIMD + VLIW does not necessarily take less space for the same performance. G80/92 vs. R600/670 proved that. R7xx is better, but don't compare it to previous 55nm chips as Nvidia has still to show a real 55nm chip. Also only looking at die photos you can clearly see that Ati puts all their units very close to each other, while Nvidia puts some "blank" space between them so the chip does not get so hot. HINT: Nvidia @65nm is cooler than Ati @55nm.

Now I'm not saying which card will be faster, but IMO no one will be a lot better than the other as you seem to believe and want to tell everybody. It simply won't. Yeah on your link we can see the HD4670 very close to HD3850. The thing is that, judging by the specs, the 9550GT could be close to 9600GT/HD3870 (shaders FTW isn't it, or you suddenly changed your mind?) specially at lower resolutions, where this both cards are supossed to be aimed for.
by MrMilli (September 3rd - 12:57 AM) - Reply
1- i know. do you really think that they have enough perfect chips (all 64 shaders) with just one memory controller/rop cluster broken? i don't think so because the G94b has been in production for like 2 months now, they will use good chips too.
lets not forget that the 9550GT will have 12 rops because of this.

2- true, there are some variants that use a 6-layer pcb but forget about high frequencies then. even with a 192bit bus.

3- what the hell are you talking about? what do pcb layers have to do with chips per wafer? can't you read the comma's or are you just making fun of me now? i'm talking about three different things: pcb, chips, packaging!
you want the calculation? here you go: wafer = ~70000mm² so that's: (70000/150)*0.82
the 0.82 stands for the yields (i had to guess that one but i took the same for both).
All reports are saying that the RV730 will be ~150mm².
G94 = 240mm² -- normally 65nm to 55nm = < ~18% -- 240-18% = 196.8 mm²

4- no the HD3850 256MB is slower.

5- fyi, even the RV770 is smaller than the G92b and as far as i can remember, it's much faster. lol
RV670 -> 14,36 mm x 13.37 mm = 192 mm²
RV770 -> 15.65 mm x 15.65 mm = 245 mm²
G92b ---> 16.4 mm x 16.4 mm = 268 mm² >> 55nm
G92 ----> 18 mm x 18 mm = 324 mm²
G200 --> 24 mm x 24 mm = 576 mm²

You show me one post where i said that the 9550GT will be slower after we found out that it will be G94b based! Actually I found out myself that it will be G94b based and corrected myself.
I said the 9550GT will be very competitive but it will cost nVidia money.
I do believe that they will perform comparable. I'm just saying that the 9550GT will cost ~35% more to produce compared to the HD4670 and it will have less memory at the same price point.
I don't know why i even bother replying. This is the last thing i put here. You can reply whatever you want, i won't reply anymore.
by DarkMatter (September 3rd - 2:01 AM) - Reply
by: MrMilli;958108
1- i know. do you really think that they have enough perfect chips (all 64 shaders) with just one memory controller/rop cluster broken? i don't think so because the G94b has been in production for like 2 months now, they will use good chips too.
lets not forget that the 9550GT will have 12 rops because of this.

2- true, there are some variants that use a 6-layer pcb but forget about high frequencies then. even with a 192bit bus.

3- what the hell are you talking about? what do pcb layers have to do with chips per wafer? can't you read the comma's or are you just making fun of me now? i'm talking about three different things: pcb, chips, packaging!
you want the calculation? here you go: wafer = ~70000mm² so that's: (70000/150)*0.82
the 0.82 stands for the yields (i had to guess that one but i took the same for both).
All reports are saying that the RV730 will be ~150mm².
G94 = 240mm² -- normally 65nm to 55nm = < ~18% -- 240-18% = 196.8 mm²

4- no the HD3850 256MB is slower.

5- fyi, even the RV770 is smaller than the G92b and as far as i can remember, it's much faster. lol
RV670 -> 14,36 mm x 13.37 mm = 192 mm²
RV770 -> 15.65 mm x 15.65 mm = 245 mm²
G92b ---> 16.4 mm x 16.4 mm = 268 mm² >> 55nm
G92 ----> 18 mm x 18 mm = 324 mm²
G200 --> 24 mm x 24 mm = 576 mm²

You show me one post where i said that the 9550GT will be slower after we found out that it will be G94b based! Actually I found out myself that it will be G94b based and corrected myself.
I said the 9550GT will be very competitive but it will cost nVidia money.
I do believe that they will perform comparable. I'm just saying that the 9550GT will cost ~35% more to produce compared to the HD4670 and it will have less memory at the same price point.
I don't know why i even bother replying. This is the last thing i put here. You can reply whatever you want, i won't reply anymore.
You have short memory or something as all the discussion between us has been based on you praising the HD card to no end, while saying Nvidia will have a tough time to compete, when you don't actually know shit. It was me who was saying BOTH would be OK. You are trying to say Ati will pwn all the time. Because you can't use the performance argument you are just being creative, something that I can admire TBH, but it's nothing more than fairy tales coming out from your head. Enjoyable to a point, but anyone can get tired easily after some posts.

LOL. You gotta love fanboism.

Besides that:

-HD3850 256 is almost as fast as the 512MB variant. Within a 5% difference.

-Perform comparable? LOL. We already know how HD4670 performs, the 9550GT will be VERY close to both 9600GT and 8800GS, because it's specs are exactly that a mix of the two. Depending on the game it will be close to one or the other, to the slower one of the two probably, either way it will be way faster than HD4670 unless they clock it absurdly low, because where the GT will be slower (same games as the 8800GS) is where the HD will be slower too, maybe even slower because 12 vs. 8 ROPs.

-G92b is not a true 55nm chip. Neither are these ones probably. Anyway, apart from RV770 which I DID exclude from my claim, all other 55nm Ati chips are close to Nvidia's 65nm chips when it comes to performance/die size, DESPITE the process difference!!!!

-I love how you categorically affirm the GT will be a 35% more expensive to produce, that it will require to have less memory for doing so, that it won't clock high enough if it has 6 layers, that it will be xxx mm2, etc, when you actually don't have a clue about the chip, as any other mortal on the Earth. It's funny really.

-Also you seem to forget that production costs of the card, on that segment is less than all the money that intermediaries take for them + the packaging, so actually 35% difference on production cost can ealily end up being a 10% in retail. The GT can easily be more than 10% faster than the HD card.

All in all we can't affirm anything. I have not affirmed anything, YOU HAVE, putting all your assumptions as facts. And that is my friend when DarkMatter always comes in.

Now I would love you to respond to the post, since this is a conversation (even a discussion is a conversation) between civil people and it's not polite to end conversations the way you did. I didn't insult you, so I have the right to get a response. Say whatever you want in the postm though I would like you to reply to the content. Even better, PM me, but do it.

EDIT: I first thought to let this one pass, but I have decided to attack you from all fronts, since you like to fight on all of them too. lol.

G92b is actually significantly smaller than RV770. Not enough to justify the performance difference, but it's a new chip against an old one. As I said G92b is NOT a true 55nm chip.

http://www.pcper.com/article.php?aid=580
http://www.pcper.com/article.php?aid=581&type=expert

G92 - 324 mm2
G92b - 231 mm2
RV770 - 260 (256 is probably more accurate)

- 231/256*100 = 90% - So G92b it's a 10% smaller than RV770. A quick look at Wizard's reviews reveal that surprisingly HD4870 is around 10-15% faster than 9800GTX+! :eek: Surprise! (Actually it was a surprise for me. I'm talking about higher resolution and settings FYI)

- 231/324*100 = 71,3% - Almost a 29% reduction. It seems not only Ati ca do that kind of things, after all...

Let's extrapolate that 29% to the G94b please:

- 240*0,713 = 171

Higher than the Radeons estimated 150, but much better than your picture isn't it? And that's for the full G94b, the new 9600GT, you can't actually compare them directly. You would have to compare the new 9600GT to the Radeon to do any fair perf./size comparison*. Nvidia does things differently than Ati. Where Ati tries to do a single chip and get as higher yields as possible on that chip, Nvidia does the chips bigger (faster) so that they don't have to care about deffective cores. They just can use them as the second card, because even crippled are going to be able to compete (8800GT, G80 GTS, GTX260, 8800GS... the list is long). The consecuence of this is that Nvidia has to throw away much less chips, and I could even go as far as to say that it might contrarest the expenses of less die-per-wafer numbers and yields.

*Let's not leave loose ends and let's continue that comparison:

- According to Wizzards reviews HD3850 is 20% slower than the 9600GT.
- I'm going to make an estimate and say that according to your links, the HD4670 is 10% slower than HD3850 (sometimes less, sometimes more), let's be gentle and traduce it as a 5% accumulative for a total of 25% slower than the 9600GT.

- 150/171*100 = 87,7% ...

OK. Let's play with your numbers...
150/196,8*100 = 76,2% Even your (probably very wrong) estimates fall short.

I'm willing to hear a response for this.
by MrMilli (September 10th - 9:48 PM) - Reply
Darkmatter i've waited till the numbers came in:
http://www.anandtech.com/video/showdoc.aspx?i=3405&p=7

To be honest, i stopped reading your above post halfways because it's full of mistakes.

So a HD4670 is as fast or faster than a 9600GSO. A 9600GSO is a G92 @ 192bit.
Now explain to me how a G94 @ 192bit can come close to this?
(and pls don't make up stuff)
by candle_86 (September 10th - 10:15 PM) - Reply
the same reason the 9600GT is faster than the GSO MrMill. The 9600GSO aka 8800GS has to be oced to beat the 9600GT, everyone and there grandmother knows that.
by MrMilli (September 10th - 11:34 PM) - Reply
by: candle_86;970302
the same reason the 9600GT is faster than the GSO MrMill. The 9600GSO aka 8800GS has to be oced to beat the 9600GT, everyone and there grandmother knows that.
Well you should also know that:
9600GT (G94) is slower than 9800GTX (G92).
9600GSO (G92 192bit) is almost as fast but slower than 9600GT.
So a G94 @ 192bit will be even slower than a 9600GSO.

... even my grandmother knows that ... pffff ... did you even read this thread?
by DarkMatter (September 11th - 1:17 AM) - Reply
by: MrMilli;970263
Darkmatter i've waited till the numbers came in:
http://www.anandtech.com/video/showdoc.aspx?i=3405&p=7

To be honest, i stopped reading your above post halfways because it's full of mistakes.

So a HD4670 is as fast or faster than a 9600GSO. A 9600GSO is a G92 @ 192bit.
Now explain to me how a G94 @ 192bit can come close to this?
(and pls don't make up stuff)
by: MrMilli;970409
Well you should also know that:
9600GT (G94) is slower than 9800GTX (G92).
9600GSO (G92 192bit) is almost as fast but slower than 9600GT.
So a G94 @ 192bit will be even slower than a 9600GSO.

... even my grandmother knows that ... pffff ... did you even read this thread?
Everytime you post is only to show your ignorance.

First of all, there are no mistakes there and I didn't make up anything. It's constrasted info. Search a bit. :laugh:. The fact that you stopped reading only shows you are not able or willing to read something you know it's against your beliefs and completely true. You don't want to learn the bold truth and your brain just screams: ALARM ALARM! STOP READING! EXTERNAL INFLUENCE DETECTED!

Second, the chip doesn't matter one thing, actual specs of the chip does. The GS has more shaders but are crippled by the low ROP count and 192 bit bus AND the fact that it runs at 550Mhz. The GT at 650Mhz is running 18% faster and a quick look at any Wizzard's review will show you that (surprise, surprise...) the GT is around 18% faster on average. On lower resolutions the difference is smaller (ROP advantage gone, SPs FTW) and on higher ones it's bigger, because ROP number counts there.

The 9550GT if required could be easily be clocked at 750Mhz.

- Because it's 55nm it could be clocked above 700Mhz.
- Because it has less stuff than the 9600GT it could be clocked higher.
- Because Nvidia chips are nowhere their limit, if really needed, they could clock it higher.

You have to realise how the market is been until now. Nvidia has been owning all segments so they didn't have to stress the cards too much to compete (when I say that, I mean not reaching a point where failure rate could eventually become a problem, RV770 anyone?). They let that work to partners instead, knowing they will do it (that's the way of Nvidia to make them happy). Proof of that is how every single Nvidia chip based on G92 and newer chips can easily be overclocked a 20% without making the card sweat (with stock cooling and volts) and up to 30% are possible also at stock, Ati chips simply can't do that (20% OC applied to 775Mhz is 930, 750Mhz-->900Mhz). That's also the reason you can find a lot of Nvidia factory OCed cards and only few Ati ones, and those few ones are usually OCed just a bit.

The bottom line is that in order to compete Nvidia chips have a lot of headroom yet. HD4670 once again does a modest 10% OC on Wizzards review and just shows Ati systematically clocks the cards higher above in the curve. Now Nvidia will have to clock the new cards higher and that's all. The GS, BTW, is the Nvidia card that holds the record of stock overclocking AFAIK, primarily because it has less stuff inside, so as I said, just one factor more in 9550GT's favor against the GS and 9600GT and ultimately against the HD4670.

It's going to be a tought fight but IMO it's in Nvidia's hands. The 9550GT can be a lot faster than the 9600GSO, very close to the GT except at 1920x1200 4xAA and above, but no one will or should buy a 85$ card if he wants to play at those settings anyway and the HD4670 is neither a good performer there. We have yet to see if Nvidia WANTS it to be faster.
by MrMilli (September 11th - 3:30 PM) - Reply
-65nm to 55nm brings a theoretical shrink of 19%. That's max 19%.
You are saying: G92 - 324 mm2 G92b - 231 mm2
Did nVidia make a shrink of 40%? Did it ever occur to you that pcper.com is wrong.

-G92b is not a true 55nm chip?? WTH! What is it then? 60nm?
Seriously, where did you read that? The chip shrank 18%, that means an almost perfect transition from 65nm to 55nm. Don't let anybody fool you, it's 55nm.

-So you are basically saying that:
Take a 9600GT, cut off ~1/4 of the chip, now clock it really high so it's close to 9600GT performance at $80. Wow this makes a lot of business sence. *sarcasm*
nVidia will never clock it higher than 650Mhz. You can be pretty sure of that.

HD4670: http://www.newegg.com/Product/Product.aspx?Item=N82E16814500061
9500GT: http://www.newegg.com/Product/Product.aspx?Item=N82E16814500061

Those are the cheapest prices, $80. First thing nVidia needs to do before it even can release a 9550GT is to drop the 9500GT price to ~$65.
And like i have said before, the HD4670 is a true lowend product. It's cheap to make and ATI can make it even cheaper.
Just look at that simple design: http://www.computerbase.de/bild/article/866/17
Very small PCB and very simple power circuitry, comparable to the much slower 9500GT.

So you have called me:
- LOL. You gotta love fanboism.
- Everytime you post is only to show your ignorance.
That's really nice of you! I have been on topic all the time, never called you names but you still need to say these stuff like a kid. Maybe you are a kid, i don't know.
The only reason why we have this discussion is because you are ignorant.
You look at matters with your limited knowledge of business and electronics, and always conclude that i'm wrong. Well i waited for the HD4670 to be released. Now i'll wait for the 9550GT to be released.
by DarkMatter (September 11th - 4:49 PM) - Reply
by: MrMilli;971281
-65nm to 55nm brings a theoretical shrink of 19%. That's max 19%.
You are saying: G92 - 324 mm2 G92b - 231 mm2
Did nVidia make a shrink of 40%? Did it ever occur to you that pcper.com is wrong.

-G92b is not a true 55nm chip?? WTH! What is it then? 60nm?
Seriously, where did you read that? The chip shrank 18%, that means an almost perfect transition from 65nm to 55nm. Don't let anybody fool you, it's 55nm.

-So you are basically saying that:
Take a 9600GT, cut off ~1/4 of the chip, now clock it really high so it's close to 9600GT performance at $80. Wow this makes a lot of business sence. *sarcasm*
nVidia will never clock it higher than 650Mhz. You can be pretty sure of that.

HD4670: http://www.newegg.com/Product/Product.aspx?Item=N82E16814500061
9500GT: http://www.newegg.com/Product/Product.aspx?Item=N82E16814500061

Those are the cheapest prices, $80. First thing nVidia needs to do before it even can release a 9550GT is to drop the 9500GT price to ~$65.
And like i have said before, the HD4670 is a true lowend product. It's cheap to make and ATI can make it even cheaper.
Just look at that simple design: http://www.computerbase.de/bild/article/866/17
Very small PCB and very simple power circuitry, comparable to the much slower 9500GT.

So you have called me:
- LOL. You gotta love fanboism.
- Everytime you post is only to show your ignorance.
That's really nice of you! I have been on topic all the time, never called you names but you still need to say these stuff like a kid. Maybe you are a kid, i don't know.
The only reason why we have this discussion is because you are ignorant.
You look at matters with your limited knowledge of business and electronics, and always conclude that i'm wrong. Well i waited for the HD4670 to be released. Now i'll wait for the 9550GT to be released.


LOL. I say you are ignorant because you effectively are, mate. And you show it everytime you write. You have a hard time understanding things so I will go part by part again:

- 65 to 55 nm is effectively a ~40% reduction in size. See, it happens that chips are square and that fab. process is the minimum distance attainable between two transistors in an array of transistors. Because transistors are put on a 2 dimensional array: 65^2 / 55^2 = 1,39. Now if you know how to read that number, it means 65 nm is 40% bigger than 55nm. And think I have to explain this kind of things... :shadedshu

- In order to be a true 55nm chip you have to redesign it. For instance, within the chip (any chip), they have to add many redundant transistors, which their only job is refresh/amplify the signal between the ones that do the math (also many others are thre just to serve as resistors, but that later). Those have to be put at a distance according to the resistance of the medium in which they are present, just like radio repeaters. This means the optimum distance is constant for the same silicon alley. The transition of G92 to G92b was only optical, meaning the exact same structure was used, but where in 65nm (i.e) 5 repeaters where needed, in 55nm only 4 would be required. You have one only ocupying space.

Same principle can be applied to the "resistors". Basically on chips there are no resistors, engineers take benefit of the parasit resistance (a bunch of adjacent transistors act like a resistor to other transistors) to set the transistors to the desired output. They try to make the whole chip so that all working (the ones that have a function, that take part on ALUs etc.) act like resistors to each other, but that is impossible to 100%, so redundant transistors are required. Because at 55nm you need less voltage, resistors values can be smaller, so you need less of them. Again because G92b is just the same chip made smaller it has more than what a true 55nm chip would require.

- They are not cuttin 1/4 of the chip, by no means. See, again you show ignorance.
And it doesn't make sense what? Competing doesn't make sense? Probably they are releasing a new 9600GT (9650GT?) based on G94b with higher clocks, so it wouln't compete with it anyway, we really don't know.

- More things, the price difference between 128bit and 192bit, or 256bit for that matter, is not that big anymore. There are already lots and lots of 256bit cards around and below $80. The HD3850, for instance is around there and it's comparatively A LOT more expensive to produce than the 9550GT will be. 9600GT is already well below $100, so yeah Nvidia wouldn't have any problem to sustitute 9600GT with that cheaper card if it performed similar. It wouldn't be the first time Nvidia does this, Ati has also made that hundreds of times. It's common bussiness.

Now THINK before posting again with ignorant responses, I'm getting tired of explaining everything.

EDIT: OH, and BTW nice try with those newegg links. :roll:
http://www.newegg.com/Product/Product.aspx?Item=N82E16814500062

9600GT at $80 after MIR:

http://www.newegg.com/Product/Product.aspx?Item=N82E16814125099
by btarunr (September 11th - 4:51 PM) - Reply
Calm down people, just make your point and leave it at that.
by DarkMatter (September 11th - 5:13 PM) - Reply
by: btarunr;971395
Calm down people, just make your point and leave it at that.
I'm just wondering if ignorant and ignorance have a very different and despective meaning or magnitude in english. I just want to say "lack of knowledge" when I say them and didn't find other word that engloved the whole idea. If they are despective and really offensive, then sorry. Sorry MrMilli for calling you ignorant, but you have to admit you lack the basic knowledge on many things. I'll try to find a better way to express it.
by MrMilli (September 11th - 5:40 PM) - Reply
Nice stuff you write there but the fact is that 55nm is an optical shrink.
http://www.xbitlabs.com/news/other/display/20070328080802.html

... has announced 55nm process technology, an optical shrink for its 65nm fabrication process, ... TSMC&#8217;s 55nm process technology is a 90% linear-shrink process from 65nm including I/O and analog circuits.

So you are right, G92 to G92b is an optical shrink. That's what 55nm is all about!

PS: your first link is a DDR2 9500GT!
That 9600GT will cost you $109. And if you get the rebate, it will become $79. That can take months. But rebates are temporary. They don't change retail prices on permanent basis.
by DarkMatter (September 11th - 9:39 PM) - Reply
by: MrMilli;971448
Nice stuff you write there but the fact is that 55nm is an optical shrink.
http://www.xbitlabs.com/news/other/display/20070328080802.html

... has announced 55nm process technology, an optical shrink for its 65nm fabrication process, ... TSMC's 55nm process technology is a 90% linear-shrink process from 65nm including I/O and analog circuits.

So you are right, G92 to G92b is an optical shrink. That's what 55nm is all about!

PS: your first link is a DDR2 9500GT!
That 9600GT will cost you $109. And if you get the rebate, it will become $79. That can take months. But rebates are temporary. They don't change retail prices on permanent basis.
Sorry man, but again you fail to demostrate any bit of intelligence. 55nm, the process, IS an optical shrink of 65nm, meaning there's no other changes like the compounds, chemicals used, etc. BUT is it that hard to understand that when you fab on a smaller process, you can get rid of many things that you need on the bigger one just to stabilize the chip, give it the proper voltage throughthe whole chip, etc? :banghead:

No changes were made to G92b, they just took the chip and made photocopy that was 40% smaller. That's exactly why I said that G92B is just an optical shrink of G92. G92 on the other hand was much more than a shrink of the G80.

Anyway, forget about everything. It's like talking to a wall or a 3 years old kid who doesn't care about the lesson. :shadedshu
by MrMilli (September 11th - 10:28 PM) - Reply
Yes Darkmatter, you know best. Actually you know even better than TSMC. How can i be that stupid. Jesus ...

Like i said in my previous post and it comes straight from TSMC's whitepaper:
TSMC&#8217;s 55nm process technology is a 90% linear-shrink process from 65nm including I/O and analog circuits.

http://www.beyond3d.com/content/news/529
I'll quote: TSMC's 55nm process is a 10% linear shrink of 65nm in each dimension, or 19% overall.
80nm was also a 19% shrink, but it did not affect analogue and I/O. This means the scaling wasn't as good as 55nm's in practice.


TSMC's white papers (or any other document for that matter) don't state anything about a reduction in repeaters, so please show me proof. And when we're talking about a reduction of only 19% then power distribution doesn't change that much, specially when both ATI and nVidia increase the frequency of their chips. In their situation power stays on the same level.

Don't mix up theory with facts. And will the admin be so kind and just close this thread. It's been enough.
by DarkMatter (September 12th - 1:12 AM) - Reply
I'm going to cumpliment your wish by sending PM. We should have done that long time ago, anyway.
by zithe (September 12th - 1:24 AM) - Reply
by: candle_86;952649
then explain to me one thing please, why does the R700 and RV770 preform more like a 64sp card than a 320sp card. The reason is only one of those ALU's is a complex shader, 64 of those are simple the rest arn't even related to shader work actully. And very few games use simple shaders because its harder to program for 2 types of shaders than just one.
R600/R670 you mean? Divide 320 by 5 and what do you get? 3870 has only 64 shaders. They each have 5 shader units. That explains what you're saying.

4870 has 800 'SPU' right? Divide 800 by 5. That's 160. It seems that if ATI decided to slap 300 shaders in their next card and uberclock them, it'd give a MASSIVE boost over the previous generation. Rambling..

This has all probably been said lol..
by DarkMatter (September 12th - 3:48 AM) - Reply


As you can see except the G80 vs. G92 all of them are pretty close to the theroretical relation. The reason for G80 to fail to the "rule" is that G80 had the video decoding on a sister chip, so it's difficlt to know if the arrangement of the chip is similar. For instance the video decoding "chip" could be just sitting by the side of the GPU in G92 instead of being completely integrated, occupying much more space that would create a gap that could lead to the error. Still the relation is close enough to the theory to "prove" the numbers given at Beyond3D as usual difference were wrong. IMO what they say there applies better to CPUs, because the logic is much smaller in comparison to the rest of the chip.

G94 and RV770 were added to test if number of transistor to chip size relation was consistent enough within the same fab process, so that relation can then be added to the comparison between processes. IMO it's good enough to make a fair guesstimate, and make all other comparisons legitimate.

I'm going to try to make that chart bigger, but other chips are harder to find. Feel free to link sites where transistor count and size of other chips are available.

Cheers.
Post your comment