Wednesday, January 22nd 2014

AMD Debuts New 12- and 16-Core Opteron 6300 Series Processors

AMD today announced the immediate availability of its new 12- and 16-core AMD Opteron 6300 Series server processors, code named "Warsaw." Designed for enterprise workloads, the new AMD Opteron 6300 Series processors feature the "Piledriver" core and are fully socket and software compatible with the existing AMD Opteron 6300 Series. The power efficiency and cost effectiveness of the new products are ideal for the AMD Open 3.0 Open Compute Platform - the industry's most cost effective Open Compute platform.

Driven by customers' requests, the new AMD Opteron 6338P (12 core) and 6370P (16 core) processors are optimized to handle the heavily virtualized workloads found in enterprise environments, including the more complex compute needs of data analysis, xSQL and traditional databases, at optimal performance per-watt, per-dollar.

"With the continued move to virtualized environments for more efficient server utilization, more and more workloads are limited by memory capacity and I/O bandwidth," said Suresh Gopalakrishnan, corporate vice president and general manager, Server Business Unit, AMD. "The Opteron 6338P and 6370P processors are server CPUs optimized to deliver improved performance per-watt for virtualized private cloud deployments with less power and at lower cost points."

The new AMD Opteron 6338P and 6370P processors are available today through Penguin and Avnet system integrators and have been qualified for servers from Sugon and Supermicro at a starting price of $377 and $598, respectively. More information can be found on AMD's website.
Add your own comment

48 Comments on AMD Debuts New 12- and 16-Core Opteron 6300 Series Processors

#1
fullinfusion
1.21 Gigawatts
Wow nothing wrong with it's price!

16 real cores :twitch:
Posted on Reply
#2
buildzoid
If there were desktop boards for these I'd be all over the 12 core variant.
Posted on Reply
#3
Pap1er
by: buildzoid
If there were desktop boards for these I'd be all over the 12 core variant.
I would also like to see desktop board for these meat grinders
Posted on Reply
#4
ZetZet
by: fullinfusion
Wow nothing wrong with it's price!

16 real cores :twitch:
Not all that real.
Posted on Reply
#5
buildzoid
by: ZetZet
Not all that real.
More real than intel's 8 cores 16 threads. The 8 extra threads only appear in specific scenarios and in others they don't exist whereas AMD's 16 cores are always capable of doing 16 tasks simultaneously it just doesn't scale perfectly because 1 core will do 100% single core performance but 16 will only do around 1260% instead of semi perfect scaling like what intel has where 1 core does 100% and 8 cores do 799% and with hyper threading it maxes out at 1038% so in some scenarios(3D graphics rendering) the 2000$ 8 core intel will beat the 600$ 16 core AMD and the AMD will win in video encoding and similar dumb work loads like searching for stuff so the AMD is a better server CPU than the intel.
Posted on Reply
#6
Assimilator
And, sadly, the Xeons will still beat the ever living crap out of these.
Posted on Reply
#7
NC37
by: Assimilator
And, sadly, the Xeons will still beat the ever living crap out of these.
I dunno. In multithreading AMD was beating Intel. Xeons are another story but when it comes to price for the performance. That I'd be interested to see.
Posted on Reply
#8
SIGSEGV
by: Assimilator
And, sadly, the Xeons will still beat the ever living crap out of these.
Sadly, there is no opteron based server available in my country..
So i have no choice instead using (buying) xeon server and workstation for my lab which is very expensive..
That was very frustating..
Posted on Reply
#9
techy1
soon there will be AMD marketing slides about +400% performance increase over "other competitors" 4 core CPUs :D
Posted on Reply
#10
ensabrenoir
...cool and at a great price..... but once again the lemming approach. A bunch of little ...adequate cores. The best result would be a price reduction at intel....bah hhhaaa hhhhaaaa:roll: yeah right. Maybe some day but not by this. Nether the less Amd is still moving in the right direction.
Posted on Reply
#11
Aquinus
Resident Wat-man
I would like everyone to remember what the equivalent Xeon is at that price point. I'm willing to bet that the Opteron is more cost effective, considering a 10 Core Xeon starts at 1600 USD, I think everything needs to be put into perspective. I would rather take two 16c Opterons than a single 10c Xeon, but that's just me.
Posted on Reply
#12
buildzoid
by: techy1
soon there will be AMD marketing slides about +400% performance increase over "other competitors" 4 core CPUs :D
it'd be true for the integer math capability but not much else.
Posted on Reply
#13
Fragman
by: ZetZet
Not all that real.
Your either to stupid or don't know anything about Amd CPU's they are all independent cores with own multiplier and volt control and if 1 core go's up in speed all the other stays down until used.
That makes for better power usage.
Posted on Reply
#14
Breit
by: Fragman
Your either to stupid or don't know anything about Amd CPU's they are all independent cores with own multiplier and volt control and if 1 core go's up in speed all the other stays down until used.
That makes for better power usage.
I don't get what the power characteristics have to do with the debate about what counts as a "real" core and what does not?!
The fact is that with the Bulldozer architecture AMD choose to implement CMT in the form of modules rather than Hyperthreading as implemented by Intel (here called SMT). A module on an AMD CPU acts as 2 independent cores, but nonetheless they share certain functional units together. So technically they are NOT 2 independent cores. It's more or less the same as with Intels Hyperthreading, where a core can run 2 threads simultaneously and is seen by the OS as 2, but is actually only one core.
So maybe AMDs implementation of CMT/SMT in the form of modules is a step further in the direction of independent cores than Intel is with Hyperthreading. But all that doesn't really matter at all. At the end of the day, what counts is the performance you get out of the CPU (or performance per dollar or performance per watt, whatever matters most to you).

As far as I'm concerned, they should advertise these as 6 modules / 12 threads and 8 modules / 16 threads like Intel does with for instance the 8 core / 16 threads (8c/16t) nomenclature...
Posted on Reply
#15
Prima.Vera
by: Fragman
Your either to stupid or don't know anything about Amd CPU's they are all independent cores with own multiplier and volt control and if 1 core go's up in speed all the other stays down until used.
That makes for better power usage.
Wow. You must be very smart for insulting and flaming users. Please, go on...
Posted on Reply
#16
Aquinus
Resident Wat-man
by: Breit
I don't get what the power characteristics have to do with the debate about what counts as a "real" core and what does not?!
The fact is that with the Bulldozer architecture AMD choose to implement CMT in the form of modules rather than Hyperthreading as implemented by Intel (here called SMT). A module on an AMD CPU acts as 2 independent cores, but nonetheless they share certain functional units together. So technically they are NOT 2 independent cores. It's more or less the same as with Intels Hyperthreading, where a core can run 2 threads simultaneously and is seen by the OS as 2, but is actually only one core.
So maybe AMDs implementation of CMT/SMT in the form of modules is a step further in the direction of independent cores than Intel is with Hyperthreading. But all that doesn't really matter at all. At the end of the day, what counts is the performance you get out of the CPU (or performance per dollar or performance per watt, whatever matters most to you).

As far as I'm concerned, they should advertise these as 6 modules / 12 threads and 8 modules / 16 threads like Intel does with for instance the 8 core / 16 threads (8c/16t) nomenclature...
The problem with that statement is that there is enough shared hardware to run two threads in tandem where hyper-threading won't always because it depends on parts of the CPU that are not being used.

Intel uses unused resources in the CPU to get extra multi-threaded performance. AMD added extra hardware for multi-threaded performance as opposed to using just the extra resources available. The performance of a module vs the performance of a single core with HT has costs and benefits of their own. With an Intel CPU, that second thread doesn't nearly have as much processing power that the first thread does, where with AMD, the amount of performance that second "thread" or "core" if you will has much more tangible gains than the HT thread does.

It's worth mentioning that the integer units do have enough hardware to run two full threads side-by-side. It's the floating point unit that doesn't but even still, FMA is supposed to give some ability to decouple the 256-bit FP unit to do two 128-bit ops at once.

I think AMD's goal is to emphasize what CPUs do best, integer math, and let GPUs do what they do best, FP math. Not to say that a CPU shouldn't do any FP math, but if there is a lot of FP math to be done, a GPU is better optimized to do those kinds of operations.

Also, I should add that I'm pretty sure that AMD clocks are controlled on a per-module basis but parts of each module can be power gated to improve power usage. One of the biggest benefits of having a module is that you save die space to add that second thread without too much of a hit on single-threaded performance (relatively speaking).

by: Prima.Vera
Wow. You must be very smart for insulting and flaming users. Please, go on...
Please don't feed the ducks trolls.
Posted on Reply
#17
Prima.Vera
by: Aquinus


Also, I should add that I'm pretty sure that AMD clocks are controlled on a per-module basis but parts of each module can be power gated to improve power usage. One of the biggest benefits of having a module is that you save die space to add that second thread without too much of a hit on single-threaded performance (relatively speaking).
Aq, agree with you.
However I have a question. Don't you think this approach is somehow not ideal for AMD, because in this way, a core is having a lot less transistors than Intel's, therefore the bad performance in single-threaded applications, like games for example?
I don't understand why AMD is still going for strong GPU performance, even on the so called top CPU's, instead of having a GPU with only basic stuff to run the Win 7 desktop, then with the available space to increase the transistor count for each of the cores?? This way I think they will finally have a CPU to compete with the I7. Just some thoughts.
Posted on Reply
#18
Aquinus
Resident Wat-man
Well, AMD has always pushed the "future is fusion" motto. HSA has always been a constant theme of theirs. I will be thrilled when AMD has an APU where CPU and iGPU compute units are shared, further blurring the distinction between massively parallel workloads on GPUs and fast serial workloads on CPUs.

Either way, CPUs are fast enough where there definitely is a point of diminishing returns. A CPU will only go so fast and you can only cram so many transistors in any given area. Also, on games that can utilize multi-core systems well, AMD isn't trailing behind all that much. Considering upcoming consoles have 8c CPUs in them, there will be more of a push to utilize that kind of hardware. It's completely realistic for a machine to have at least 4 logical threads now and as many as 8 for a consumer CPU. This wasn't the case several years ago.
Posted on Reply
#19
Breit
by: Prima.Vera
Aq, agree with you.
However I have a question. Don't you think this approach is somehow not ideal for AMD, because in this way, a core is having a lot less transistors than Intel's, therefore the bad performance in single-threaded applications, like games for example?
I don't understand why AMD is still going for strong GPU performance, even on the so called top CPU's, instead of having a GPU with only basic stuff to run the Win 7 desktop, then with the available space to increase the transistor count for each of the cores?? This way I think they will finally have a CPU to compete with the I7. Just some thoughts.
I guess that's because it's technically very challenging and AMD might simply not be able to come up with something better? Just a guess... ;)
Posted on Reply
#20
Steevo
Dual socket with 16 cores, can run 32 VM's in one rackmount tray, company X has 320 employees running thin clients, 10 racks plus one and assuming same drive/memory/board cost the AMD will win for $$$ reason alone. Data entry jobs don't need Xeon core performance for 10 key and typing.
Posted on Reply
#21
Breit
by: Aquinus

I think AMD's goal is to emphasize what CPUs do best, integer math, and let GPUs do what they do best, FP math. Not to say that a CPU shouldn't do any FP math, but if there is a lot of FP math to be done, a GPU is better optimized to do those kinds of operations.
Sure? In theory you might be right, but most of at least consumer grade hardware is not that great at FP math (I'm talking about DP-FP of course).
An ordinary Core i7-4770K quad-core has a DP performance of about 177 GFLOPS. Thats for a 84W CPU (talking TDP). NVidia's 780Ti though is rated at 210 GFLOPS DP performance (DP is crippled on consumer chips, I know), but this comes at a cost of a whopping 250W TDP, which is about 3x the power draw! So simple math tells me that the Haswell i7 is about twice as efficient in DP-FP calculations as the current-gen GPU hardware is...
Single precision might be a totally different story though. :)
Posted on Reply
#22
james888
by: Breit
Sure? In theory you might be right, but most of at least consumer grade hardware is not that great at FP math (I'm talking about DP-FP of course).
An ordinary Core i7-4770K quad-core has a DP performance of about 177 GFLOPS. Thats for a 84W CPU (talking TDP). NVidia's 780Ti though is rated at 210 GFLOPS DP performance (DP is crippled on consumer chips, I know), but this comes at a cost of a whopping 250W TDP, which is about 3x the power draw! So simple math tells me that the Haswell i7 is about twice as efficient in DP-FP calculations as the current-gen GPU hardware is...
Single precision might be a totally different story though. :)
An amd 7970 has ~1060 GFLOPS DP performance at 225 tdp. Amd gpu's are pretty darn great at compute and amd apu's will use amd gpu's not nvidea gpu's. So your comparison with a 780ti is silly.
Posted on Reply
#23
Breit
by: james888
An amd 7970 has ~1060 GFLOPS DP performance at 225 tdp. Amd gpu's are pretty darn great at compute and amd apu's will use amd gpu's not nvidea gpu's. So your comparison with a 780ti is silly.
Even it its way of topic:
A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.
Posted on Reply
#24
Aquinus
Resident Wat-man
by: Breit
Even it its way of topic:
A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.
Might read into APUs again. There are benefits to be had by having HUMA on an APU, which solves the memory copying problem. The simple point is that CPUs are good at serial processing and GPUs are good at massively parallel ops. Depending on your workload, one may be better than the other. More often than not though, CPUs are doing integer math and GPUs are doing floating point math (single or double).

Basically CPUs are good at working with data that changes a lot (relatively small amounts of data that change a lot). GPUs are good at processing (or transforming if you will) a lot of data in a relatively fixed way.

So a simple example of what GPUs do best would be something like.
code:
add 9 and multiply by 2 to every element of [1 2 3 4 5 6 7 8 9 ... 1000]


Where a CPU would excel at something like adding all of those elements, or doing something that reduces those values, as opposed to transforming it to a set of the same size as the input.
GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running one kernel on many records in a stream at once.

A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.[vague]

Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.[11]

Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.
See Stream Processing on Wikipedia.
Posted on Reply
#25
james888
by: Breit
Even it its way of topic:
A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.
Isn't that what amd's HSA and HUMA are meant to solve?
Edit: Aquinius you speedy guy beat me to it with more eloquence.
Posted on Reply
Add your own comment