• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD Debuts New 12- and 16-Core Opteron 6300 Series Processors

Even it its way of topic:
A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.
Isn't that what amd's HSA and HUMA are meant to solve?
Edit: Aquinius you speedy guy beat me to it with more eloquence.
 
Might read into APUs again. There are benefits to be had by having HUMA on an APU, which solves the memory copying problem.

True, but the performance of a 7970 on an APU is not going to happen any time soon, I guess...
 
True, but the performance of a 7970 on an APU is not going to happen any time soon, I guess...
I just used a 7970 since I have a 7970 and knew its DP off hand. The architectural potential is there.

I got curious and looked for what DP an apu can get. This is the only thing I can find on current a10 7850k but its from WCCFtech so who knows its validity. 5800k on left, 7850k on right. Overclocked.
A10-7850K-GPGPU-635x420.jpg
 
ive come to note all businesses will go with the cheapest parts available, plus most companies or people let alone dont know who AMD is.

But I say this is really good news for them, now they just need to make the 8core desktop parts more efficient
 
ive come to note all businesses will go with the cheapest parts available, plus most companies or people let alone dont know who AMD is.

But I say this is really good news for them, now they just need to make the 8core desktop parts more efficient

Yeah! They need something to compete with Intel's 8 core Atom SoC. 20-watt TDP for an 8-core SoC isn't too shabby. There is a slower variant that offers lower clocks and less power usage but still retains 8 cores as well. I kind of want one.

True, but the performance of a 7970 on an APU is not going to happen any time soon, I guess...

As long as PCI-E is your bus and you have memory that is completely segregated from the CPU, you're going to have that issue. Remember how gimped Intel CPUs were when they used an MCH and how the CPU needed to communicate with the MCH to get anything out of memory. As soon as the memory controller was moved next to the CPU cores, memory access speeds started flying and latency dropped like a rock. The issue is that no software can take advantage of having stream processors and CPU cores both working on the same data. Sharing data between different CPU cores is problematic enough, forget sharing it with an array of SIMD cores.
 
Last edited:
I dunno. In multithreading AMD was beating Intel. Xeons are another story
Well, in this instance since the article concerns Opteron, Xeon would actually be the story worth considering as counterpoint.
57994.png

but when it comes to price for the performance. That I'd be interested to see.
Undoubtably, but then AMD are obviously going to make a concession for processor upgrade pricing in order to make ageing C32/G34 platforms at least somewhat palatable.
[Chart source]
 
Intel uses unused resources in the CPU to get extra multi-threaded performance. AMD added extra hardware for multi-threaded performance as opposed to using just the extra resources available. The performance of a module vs the performance of a single core with HT has costs and benefits of their own. With an Intel CPU, that second thread doesn't nearly have as much processing power that the first thread does, where with AMD, the amount of performance that second "thread" or "core" if you will has much more tangible gains than the HT thread does.
In the testing I did, disabling HTT really crippled my 920 and as far as software is concerned, performance from the logical cores is expected to match performance from the physical cores. There's really no discernible difference between them.

It may bog down faster than AMD's SMT implementation but comparing Intel Xeon 6-core processors to AMD Opteron 6-core processors really doesn't show that to be the case either. Put bluntly, there's really no evidence to support AMD's SMT is any better than Intel's SMT.
 
Last edited:
In the testing I did, disabling HTT really crippled my 920 and as far as software is concerned, performance from the logical cores is expected to match performance from the physical cores. There's really no discernible difference between them.

HT got better and better since day one. The 920 is a monster of CPU (lovely, yes), but I really can't believe a HT core matches performance to its physical core asociated to it.
I mean, if you make use of core 0 up to 100%, then core 1 should drop performance -quite- dramatically while core 2 wouldn't be affected at all.
 
HT got better and better since day one. The 920 is a monster of CPU (lovely, yes), but I really can't believe a HT core matches performance to its physical core asociated to it.
I mean, if you make use of core 0 up to 100%, then core 1 should drop performance -quite- dramatically while core 2 wouldn't be affected at all.
it doesn't when I turn HT off on my 3960X I lose at worst 40% of my multi threaded performance but it boost single threaded performance a little(1% maybe) as it takes load off of the data management part of the CPU.
 
Last edited:
In the testing I did, disabling HTT really crippled my 920 and as far as software is concerned, performance from the logical cores is expected to match performance from the physical cores. There's really no discernible difference between them.

It may bog down faster than AMD's SMT implementation but comparing Intel Xeon 6-core processors to AMD Opteron 6-core processors really doesn't show that to be the case either. Put bluntly, there's really no evidence to support AMD's SMT is any better than Intel's SMT.

A while back I did some testing with my i7 and started disabling cores, leaving HT on and turning it off, to see how the performance difference between 4c/4t would be on my i7 and 2c/4t would be. In all honesty, the numbers don't agree with you. I can try and find it again, but generally speaking, hyper threading didn't yield much more than 30% improvement over a real core.

I'm curious, how did you test the performance of your CPU between disabling/enabling hyper threading?

Edit: Here, I found it.


attachment.png
 
Last edited by a moderator:
HT got better and better since day one. The 920 is a monster of CPU (lovely, yes), but I really can't believe a HT core matches performance to its physical core asociated to it.
I mean, if you make use of core 0 up to 100%, then core 1 should drop performance -quite- dramatically while core 2 wouldn't be affected at all.
Because that's not how it works. It functions a lot like virtualization where the physical core is never exposed to the operating system. Instead, there are four physical cores handling eight virtual cores--each physical core is responsible for two virtual cores. The physical cores are designed to work on each virtual core up to 50% of the time. This is why a quad core with SMT behaves very much like a slower eight physical core processor without SMT when handling heavy multithreaded loads.
it doesn't when I turn HT off on my 3960X I lose at worst 40% of my multi threaded performance but it boost single threaded performance a little(1% maybe) as it takes load off of the data management part of the CPU.
Yeah, the tests were multithreaded (four or eight threads). I didn't test anything single threaded.

A while back I did some testing with my i7 and started disabling cores, leaving HT on and turning it off, to see how the performance difference between 4c/4t would be on my i7 and 2c/4t would be. In all honesty, the numbers don't agree with you. I can try and find it again, but generally speaking, hyper threading didn't yield much more than 30% improvement over a real core.

I'm curious, how did you test the performance of your CPU between disabling/enabling hyper threading?

Edit: Here, I found it.
View attachment 49412
It looks like they got it fixed which is good (30% is about where it should be). My 920 is three generations older than yours.

My test method was this custom application. Basically it does var++ for 1 second across how many threads you tell it to of what type of variable and it tells you what it reached, repeats it 10 times and gives you the total and the results for every thread. It shows basic compute power of any given processor.

Is it even possible to disable SMT on AMD processors? Repeating the test you did with an AMD processor would tell us definitively what sort of difference SMT makes on them.
 
Last edited:
I would like everyone to remember what the equivalent Xeon is at that price point. I'm willing to bet that the Opteron is more cost effective, considering a 10 Core Xeon starts at 1600 USD, I think everything needs to be put into perspective. I would rather take two 16c Opterons than a single 10c Xeon, but that's just me.

The biggest problem with that is that the Opteron solutions tend to consume a lot more power. A Xeon (the benchmarks I see are mostly for E5-2660's) will consume about 95W, and a similar Opteron (in terms of performance--the Opteron 6380) consumes 151W under full load. That's a staggering difference. I would say over about a year the difference is made up in terms of cost, but the additional performance of the Xeon is not. People need to stop looking at exclusively initial investment costs and start considering things like heat generation, power consumption, and performance over time given a set cost.
 
And 7-zip is likely memory bound, making it an ideal case for HT.
 
Because that's not how it works. It functions a lot like virtualization where the physical core is never exposed to the operating system. Instead, there are four physical cores handling eight virtual cores--each physical core is responsible for two virtual cores. The physical cores are designed to work on each virtual core up to 50% of the time. This is why a quad core with SMT behaves very much like a slower eight physical core processor without SMT when handling heavy multithreaded loads.
I disagree... I think you are wrong or I am misunderstanding you. HT will enable virtual cores, we agree with that. Windows will see those as cores, windows won't care if they are real or virtual, if its there it will assign any task or process to that free core. Performance on the other side will be impacted as soon as you use the virtual cores... do you mean that core 1 is not the "virtual core" from core 0?

Is it even possible to disable SMT on AMD processors? Repeating the test you did with an AMD processor would tell us definitively what sort of difference SMT makes on them.
You can't disable half "module", but well, you can still use affinity option in windows. On my FX8320 after installing the windows patch, processes began being assigned differently in the CPU
 
The biggest problem with that is that the Opteron solutions tend to consume a lot more power. A Xeon (the benchmarks I see are mostly for E5-2660's) will consume about 95W, and a similar Opteron (in terms of performance--the Opteron 6380) consumes 151W under full load. That's a staggering difference. I would say over about a year the difference is made up in terms of cost, but the additional performance of the Xeon is not. People need to stop looking at exclusively initial investment costs and start considering things like heat generation, power consumption, and performance over time given a set cost.
Your wrong and here is why:
A The Xeon 2697 v2 pulls 130W not 95 so 2 will pull(260W) almost as much as 3 of these Opterons(297W)
B These Opterons in the news pull 99W not 151
C The Cinebench R11.5 chart shows performance with perfect HT scaling so if you're using the server for data management and tasks that use the same part of the CPU over and over again the Xeons will be 40% slower than what Cinebench shows
D The Xeon 2697 v2 cost 2100$ more than one of these new Opterons which is a difference so big that the Xeon won't close the price gap any time soon definitely not in a year or 2
E The Xeon 2660 cost 700$ more than the Opterons in the news and pulls 95W while being barely faster than the old Opterons which the new ones will either match or beat so again the price gap won't be closed in less than 2 years
 
I disagree... I think you are wrong or I am misunderstanding you. HT will enable virtual cores, we agree with that. Windows will see those as cores, windows won't care if they are real or virtual, if its there it will assign any task or process to that free core. Performance on the other side will be impacted as soon as you use the virtual cores... do you mean that core 1 is not the "virtual core" from core 0?
Assuming virtual core 0 and 1 are assigned to physical core 0, virtual core 2 and 3 are assigned to physical core 1, and so on and then you run 4 heavy threads on even numbered virtual cores and shift it to odd number virtual cores, the performance for both tests will be more or less equal. The physical core itself prioritizes threads from each virtual core and it tries to give each virtual core about equal processor time. This is why, from the software perspective, virtual core or physical core is moot.

Their results show that if you were running an FX 8 core as a quad core, it is better to disable one core per module rather than disabling two whole modules.
Well, yeah...
Normal: 8 ALUs, 4 FPUs
Disable half modules: 4 ALUs, 4 FPUs
Disable two modules: 4 ALUs, 2 FPUs

I just wonder how much of a performance hit it takes in a generic benchmark between the first two senarios (half of the ALUs disabled). HTT loses about 30-35%. Looking at the URL, WinRAR is the closest to what Aquinus posted and 8 ALU/4 FPU scores about 47.6% higher than 4 ALU/4 FPU. That's slightly better than what Aquinus got but again, that's more of a memory benchmark than a compute power benchmark.

Edit: One of the users gave a range of: 33-59%. For HTT, it looks anywhere from 2-33%: http://semiaccurate.com/2012/04/25/does-disabling-hyper-threading-increase-performance/

Another article largely mirrors these results:
http://www.extremetech.com/computin...e-effects-of-hyper-threading-software-updates

I guess the morale of the story is that an AMD module struggles to keep up with an Intel core, HTT enabled or not. This is sad.
 
Last edited:
I guess the morale of the story is that an AMD module struggles to keep up with an Intel core, HTT enabled or not. This is sad.

I wouldn't call it sad. Intel's current micro-architecture has evolved a bit since the Core and Core 2. AMD's design just isn't as mature. AMD CPUs keep up. They might not be better, but they're adequate. I wouldn't really call that sad.

What is sad is how AMD doesn't have more low power CPUs. For example, Intel has a 8c/8t Atom now, it's an SoC, and has a 20-watt TDP. I don't mean to contradict you, but performance isn't what's sad about AMD CPUs lately. In all honesty, if AMD cpus were a bit lighter on the power we probably wouldn't care as much about single-threaded performance being lacking.
 
I wouldn't call it sad. Intel's current micro-architecture has evolved a bit since the Core and Core 2. AMD's design just isn't as mature. AMD CPUs keep up. They might not be better, but they're adequate. I wouldn't really call that sad.

What is sad is how AMD doesn't have more low power CPUs. For example, Intel has a 8c/8t Atom now, it's an SoC, and has a 20-watt TDP. I don't mean to contradict you, but performance isn't what's sad about AMD CPUs lately. In all honesty, if AMD cpus were a bit lighter on the power we probably wouldn't care as much about single-threaded performance being lacking.
But that's my point. Except price, there's no where AMD wins. Intel has higher performance, less heat output, and lower power consumption. When it comes to servers and HPC where Opterons are found, upfront cost is not a selling point because they save money on lower power and cooling bills from Xeons. The only situation where AMD wins is if you only have X amount of money to spend right now and AMD is below that threshold while Intel is not.
 
Just the right article posted yesterday: http://www.anandtech.com/show/7711/...f-kaveri-and-other-recent-amd-and-intel-chips

It is no secret that AMD's Bulldozer family cores (Steamroller in Kaveri and Piledriver in Trinity) are no match for recent Intel cores in FP performance due to the shared FP unit in each module. As a comparison point, one core in Haswell has the same floating point performance per cycle as two modules (or four cores) in Steamroller.

That means an AMD CPU needs four times the core-count to be equal clock-for-clock in FP performance to a Intel CPU! That makes this 16-core Opteron exactly as fast as an ordinary Intel quad-core clock-for-clock regarding FP performance! Didn't expect that... :)
 
Last edited:
I just wonder how much of a performance hit it takes in a generic benchmark between the first two senarios (half of the ALUs disabled).
I guess the morale of the story is that an AMD module struggles to keep up with an Intel core, HTT enabled or not. This is sad.

I will test it out with my own benchmark software, when I get home
 
Back
Top