• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

B580 faster than the A770, but has less compute hardware?

Status
Not open for further replies.

Tia

New Member
Joined
Dec 28, 2024
Messages
6 (0.05/day)

The B580 has less: Shading Units, TMUs, ROPs, Execution Units, Tensor Cores, RT Cores.
Compared to the A770. But is still faster?

My current working theory is that they focused too much on compute with the A770 and A750 and too little on memory bandwidth speed.
I am starting to wonder if a A770 can outperform a B580 solely on a compute benchmark.

Anyone has any benchmarks that compares compute of the A770 to the B580?
 
I dont think thats that simple. It seems maybe, idk under reported? The B series is not an iteration on Alchemist. Battlemage is a completely different architecture.

Here is our B580 review which goes over this in text format.


Here is a video GN did with Tom explaining some of it, if video is more your speed.

 
I dont think thats that simple. It seems maybe, idk under reported? The B series is not an iteration on Alchemist. Battlemage is a completely different architecture.

Here is our B580 review which goes over this in text format.


Here is a video GN did with Tom explaining some of it, if video is more your speed.

Well that answers my question and then some.
Thanks.
 
Generally, comparing raw specs works only between GPUs of the same architecture/generation anyway. This is a rabbit hole many enthusiasts end up falling in for some reason whenever leaks or rumors for unreleased GPUs come out, but it’s pointless. The fact that a card has less X and Y than a card of previous architecture means very little and no real performance can be ascertained that way. Like, for example, a 1070 had less of… everything (except VRAM) than a 980Ti and was still equal or faster. Or the gen before that 970 to a 780Ti. You get the picture.

Just wanted to expand and add a bit to what Solaris said above, just as a PSA.
 
The B580 has less: Shading Units, TMUs, ROPs, Execution Units, Tensor Cores, RT Cores.
Compared to the A770. But is still faster?

My current working theory is that they focused too much on compute with the A770 and A750 and too little on memory bandwidth speed.
I am starting to wonder if a A770 can outperform a B580 solely on a compute benchmark.

Anyone has any benchmarks that compares compute of the A770 to the B580?
B580 has less physical memory bandwidth. Even though it uses 20GT memory modules it's only 192-bit while A770 is 16GT and 256-bit.

The Alchemist architecture has lots of architectural imbalances that prevent it from being fully utilized: https://chipsandcheese.com/p/microbenchmarking-intels-arc-a770

-It needs high workload otherwise it sits idle.
-The 512GB/s memory bandwidth is wasted on Alchemist because it has hard time utilizing it. Also C&C tests show that it can't even reach 512GB/s in tests except in exceptional circumstances, and work more like a 250-300GB/s device.
-Battlemage has other advances such as Fast Clear, a technology which has been in existence in AMD/Nvidia GPUs more than a decade ago but Intel is only using it now. Fast Clear increases utilization of all parts of the memory subsystem from the private caches, large shared caches, and the VRAM itself.
-Battlemage no longer emulates critical instructions that had to be on Alchemist, so it's faster there too.
-Battlemage also clocks quite a bit higher, reducing the gap further.

The compute "advantage" you are talking about is only theoretical. It basically doesn't lose a single test over Alchemist, meaning in the real world it's absolutely worthless.

Comparing based on shaders, fillrate, and memory bandwidth is like looking number of cylinders in a car and saying one is higher performing than the other. You could have a V8 with less horsepower and torque than V6. Also the V6 car might have a more efficient transmission system, is more aerodynamic, and weighs less. Further, the driver behind the wheel affects performance too. And if you are in the middle of New York, then you'll never be able to go full speed. Complex systems require complex analysis to fully understand.
 
Generally, comparing raw specs works only between GPUs of the same architecture/generation anyway. This is a rabbit hole many enthusiasts end up falling in for some reason whenever leaks or rumors for unreleased GPUs come out, but it’s pointless. The fact that a card has less X and Y than a card of previous architecture means very little and no real performance can be ascertained that way. Like, for example, a 1070 had less of… everything (except VRAM) than a 980Ti and was still equal or faster.

The 1070 has higher Fillrate and GFLOPS than the 980Ti so it's little surprise it's faster. But how could that be if the 1070 has "less of... everything"?

Because the 1070 has considerably more of the most important thing: clock speed, 1822 vs. 1140 MHz. 60% higher clock speeds is a sledgehammer. You gotta look at all the specs.
 
@Lew Zealand
I… am aware? Yes? That wasn’t the only example I had. I would hope that the 1070 had seen a jump in clocks considering it was on a new node. Frequency is absolutely important (though it also can be architecturally dependent), but OPs question was specifically about compute resources here:
The B580 has less: Shading Units, TMUs, ROPs, Execution Units, Tensor Cores, RT Cores.
Though yeah, you are right that at least for gaming workloads one can brute force quite a bit with freqs. Pure compute accelerators tend to be more efficiency and constant use oriented and tend to run lower clocks. But those are a different kettle of fish altogether. Good shout though, yeah, I should have mentioned frequency as a factor too.
 
Though yeah, you are right that at least for gaming workloads one can brute force quite a bit with freqs. Pure compute accelerators tend to be more efficiency and constant use oriented and tend to run lower clocks. But those are a different kettle of fish altogether. Good shout though, yeah, I should have mentioned frequency as a factor too.
The B580 has less theoretical resources even taking account the frequency.

And frequency itself is not an easy thing to do either. Often the losses come due to frequency differences. It's quite an execution marvel for modern 400mm2+ GPUs to be at 2.5-3GHz frequencies. Geforce 10x generation that you guys are talking about, Nvidia rearchitected the circuitry to get higher frequency, so a lot has went into that as well.
Pure compute accelerators tend to be more efficiency and constant use oriented and tend to run lower clocks. But those are a different kettle of fish altogether.
The enthusiast products have quite a high requirement. In addition to the market accepting only fraction of a price of those GPUs, there are hundreds of thousands of games out there and the GPU has to work with thousand different configurations across dozens of different CPU generations.

In fact, I think a company that figures out the enthusiast segment basically guarantees success in every other market, because the requirements are so stringent.
 
Status
Not open for further replies.
Back
Top