Wednesday, August 26th 2009

AMD Demos 48-core ''Magny-Cours'' System, Details Architecture

Earlier slated coarsely for 2010, AMD fine-tuned the expected release time-frame of its 12-core "Magny-Cours" Opteron processors to be within Q1 2010. The company seems to be ready with the processors, and has demonstrated a 4 socket, 48 core machine based on these processors. Magny Cours holds symbolism in being one of the last processor designs by AMD before it moves over to "Bulldozer", the next processor design by AMD built from ground-up. Its release will provide competition to Intel's multi-core processors available at that point.

AMD's Pat Conway at the IEEE Hot Chips 21 conference presented the Magny-Cours design that include several key design changes that boost parallelism and efficiency in a high-density computing environment. Key features include: Move to socket G34 (from socket-F), 12-cores, use of a multi-chip module (MCM) package to house two 6-core dies (nodes), quad-channel DDR3 memory interface, and HyperTransport 3 6.4 GT/s with redesigned multi-node topologies. Let's put some of these under the watch-glass.
Socket and Package
Loading 12 cores onto a single package and maintaining sufficient system and memory bandwidth would have been a challenge. With the Istanbul six-core monolothic die already measuring 346 mm² with a transistor-load of 904 million, making something monolithic twice the size is inconceivable, at least on the existing 45 nm SOI process. The company finally broke its contemptuous stance on multi-chip modules which it ridiculed back in the days of the Pentium D, and designed one of its own. Since each die is a little more than a CPU (in having a dual-channel memory controller, AMD chooses to call it a "node", a cluster of six processing cores that connects to its neighbour on the same package using one of its four 16-bit HyperTransport links. The rest are available to connect to neighbouring sockets and the system in 2P and 4P multi-socket topologies.

The socket itself gets a revamp from the existing 1,207-pin Socket-F, to the 1,974-pin Socket G34. The high pin-count ensures connections to HyperTransport links, four DDR3 memory connections, and other low-level IO.

Multi-Socket Topologies
A Magny-Cours Opteron processor can work in 2P and 4P systems for up to 48 physical processing cores. The multi-socket technologies AMD devised ensures high inter-core and inter-node bandwidth without depending on the system chipset IO for the task. In the 2P topology, one node from each socket uses one of its HyperTransport 16-bit links to connect to the system, the other to the neighbouring node on the package, and the remaining links to connect to the nodes of the neighbouring socket. It is indicated that AMD will make use of 6.4 GT/s links (probably generation 3.1). In 4P systems, it uses 8-bit links instead, to connect to three other sockets, but ensures each node is connected to the other directly, on indirectly over the MCM. With a total of 16 DDR3 DCTs in a 4P system, a staggering 170.4 GB/s of cumulative memory bandwidth is achieved.

Finally, AMD projects a up to 100% scaling with Magny-Cours compared to Istanbul. Its "future-silicon" projected for 2011 is projected to almost double that.

Source: INPAI
Add your own comment

104 Comments on AMD Demos 48-core ''Magny-Cours'' System, Details Architecture

#1
mdm-adph
FordGT90Concept said:
I don't know...

Most applications most people use do not make effective use of parallel processing. If a dual-core is more than enough for you, you might as well throw the difference away on a 48-core machine. Parallel processing really doesn't help anyone but the server/super computing market.

I think it is only a matter of time before people catch on that more cores aren't necessarily better and focus will return to making each individual core faster. Even your most basic of word processors could benefit from a huge IPS, single core processor than it could from a multi-core processor.

The server market and the consumer market did clash for a while but I think it is only a matter of time before the go in different directions again.
Yeah, you do remember that Bill Gates was wrong about the whole 640k thing, right? :wtf:

Massive multi-core processing is the way of the future, because there's a limit to how fast you can get a single core to go. I'm not being mean, but if you're not believing this by now, you're deluding yourself.
Posted on Reply
#2
Mussels
Moderprator
when they get a 16 core CPU <100W, it'll start appearing in desktops.

When they get it under 50W, it'll start appearing in dells.

its a when, not an if :)
Posted on Reply
#3
Jizzler
We're so close to `full virtualization` that a machine like this isn't really...... ok, it would still be overkill. But more realistically, a nice 2P box, say a couple Xeons (8C/16T), 24GB, 4 video cards, dedicated RAID controller and a slew of drives would make for one nice multi-user box.

Box stays out of the way, all you have are personal and purpose (kitchen, etc) terminals.
Posted on Reply
#4
Mussels
Moderprator
Jizzler said:
We're so close to `full virtualization` that a machine like this isn't really...... ok, it would still be overkill. But more realistically, a nice 2P box, say a couple Xeons (8C/16T), 24GB, 4 video cards, dedicated RAID controller and a slew of drives would make for one nice multi-user box.

Box stays out of the way, all you have are personal and purpose (kitchen, etc) terminals.
actually i was discussing that with a friend the other day, thats how i see things going as well.
One PC in the home does all the work, the rest just get it streamed.

we can stream 1080P content from a PC to a 360 (re-encoding in a compatible format if needed), it wont be any harder for a game to be done the same way.

"but mussels, that would suck! if my brother started encoding a video while i was gaming i'd lag out!"
well, how much does it suck when he flushes the toilet when you're in the shower? people live with compromises for convenience/cheapness
Posted on Reply
#5
Imsochobo
Thats about right two last posts.

This will run at 75W at about 2.3-2.4-2.5 ghz around there, nothing more at 45 NM.
Theese will deffy come with a 32 NM shrink.

Theese will be very cold to have the amount of cores.

Cache is reworked, memory latency decreased, quad memory channel per cpu meaning 8 memory slots so they double the memory bandwidth.
This is possible when doubling amount of cpu die's and IMC that follows with it per cpu package.

*Wonders how intel respons.*
Posted on Reply
#6
pr0n Inspector
mdm-adph said:
Yeah, you do remember that Bill Gates was wrong about the whole 640k thing, right? :wtf:

Massive multi-core processing is the way of the future, because there's a limit to how fast you can get a single core to go. I'm not being mean, but if you're not believing this by now, you're deluding yourself.
He never said that.

on the topic, I believe future processors would present themselves as much simpler units than they really are.
Posted on Reply
#7
Jizzler
Mussels said:
actually i was discussing that with a friend the other day, thats how i see things going as well.
One PC in the home does all the work, the rest just get it streamed.

we can stream 1080P content from a PC to a 360 (re-encoding in a compatible format if needed), it wont be any harder for a game to be done the same way.

"but mussels, that would suck! if my brother started encoding a video while i was gaming i'd lag out!"
well, how much does it suck when he flushes the toilet when you're in the shower? people live with compromises for convenience/cheapness
Yup. I'm going with "instead of 4 x $1K computers, I'll put $4K into a single box". But people could certainly save money and it still be a great experience. As other posters have stated, i7's have a lot of umph, and could probably support two people rather well. Got $2K for two machines? Put $1500 into one.

A big help is fast storage... just built two of these for work:



Has an Adaptec 5805 with 8 x WD RE3. Does about 600/400 read/write and is hella responsive. I've run drive benchmarks in one VM while using another, and I couldn't perceive any loss of performance. All the while WCG is running (4T @ 100%). I love these machines :)
Posted on Reply
#8
Imsochobo
we just got 3 Z800 with 192 GB memory and 2x 3.2 ghz Core I7.
Designers and those who use them, complain about memory and graphics power, not cpu power, they didnt complain about cpu power with 2x dualcores xeons.
Posted on Reply
#9
Jizzler
HP right? Was just looking at them earlier today (mostly the 400, 600 series). Though I'll probably build IT's new workstations as well.

Heh, I hope there's no complaining now. :D
Posted on Reply
#10
Imsochobo
z800 yeap.

well, the 192 gb/92 gb provides no issues except gpu performance, snap in a 4870x2 and shut em up ;D

Well, 32 gb is an issue....
They could do:
Single socket quad. @ 3ghz.
4870x2.
32 gb memory, does a better job cause better videocard.

So to put it this way, they have never complained about cpu power appearantly.
Memory and gpu power is the issue.

But the chiefs doesnt want the high memory cap. comp, so they complained with the 32 gb comp.

To put it this way:
They use OVER TWO hours to load due to memory restrictions(16gb memory that is) for some cads and drawings. they complain cause they're tired of browsing through all the newspapers on the webby.
Posted on Reply
#11
Melvis
Sweeeeeeeeeeeeeeeeet!!

Id love to bring up Task Manager in front of my m8s when there looking at the screen lol
Posted on Reply
#12
FordGT90Concept
"I go fast!1!11!1!"
mdm-adph said:
Massive multi-core processing is the way of the future, because there's a limit to how fast you can get a single core to go. I'm not being mean, but if you're not believing this by now, you're deluding yourself.
You have to understand how programs work to understand that multi-core, most of the time, is not a good thing. It adds many layers of complexity which a single, faster core gets the same performance with pure simplicity on the coding end.

Simply put, if you got a nail and you need to hammer it in, would you rather have one really big hammer or 48 tiny hammers?


There's a few occassions where multiple cores are good but, those few times are exactly that, a few (no more than four). Faster cores are preferred over SMT.


Mussels said:
we can stream 1080P content from a PC to a 360 (re-encoding in a compatible format if needed), it wont be any harder for a game to be done the same way.
Except the high bandwidth (186.624 MB/s for 1920x1080 24-bit color and 30 FPS) and, if wireless, latency.
Posted on Reply
#13
Geofrancis
this system is like the system with the dual 6 core xeons that intel sell. those use a multi chip setup with 3x dual core die's. amd has done well to get 2x 6 core cpus on a die.

the problem with them as it all went over fsb so was limited to 2 sockets because of lack of bandwidth between the cores. amd doesnt have that problem because of its hypertransport connections between all the cores

. intel will probibly build a similar system soon with 6 core i7 xeons as they have QPI links that get over the fsb problems that they had before.

it makes you think that they thought they made a mistake making there native quads then seen intels multi chip quads and thought hmm we could do that with our quads!
Posted on Reply
#14
Swansen
tidas said:
yes....but can it crysis?
LOL, better question is will it blend??? :laugh::roll:
Posted on Reply
#16
Jizzler
Geofrancis:



Oh... the bandwidth. I'm getting tingly.
Posted on Reply
#17
FordGT90Concept
"I go fast!1!11!1!"
Geofrancis said:
this system is like the system with the dual 6 core xeons that intel sell. those use a multi chip setup with 3x dual core die's. amd has done well to get 2x 6 core cpus on a die.
Neither Dunnington nor Istanbul chips are MCM. The only problems associated with adding more cores is heat and power. Like I said, I think these things are probably in the neighborhood of 230w which is massive. As far as I know, the highest wattage on a retail processor currently is IBM POWER6 processors at 160w. Most consumer processors are 130w or less.
Posted on Reply
#18
thezorro
Jizzler said:
Geofrancis:


Oh... the bandwidth. I'm getting tingly.
nice powerpoint, but it just paper.

just my two cents.
Posted on Reply
#19
Jizzler
I'm confused...

Intel has already shown working 4-way and 8-way Nehalem-EX systems. What's wrong with the diagram?
Posted on Reply
#20
Unregistered
The only reason Intel pulled ahead of AMD was because they entered into a pact with Satan. Supposedly the deal expires 12/21/2012 and Intel's HQ will be swallowed whole by a caldera that suddenly and mysteriously appears and then vanishes.

Of course having Rectal Hector in charge of AMD didn't hurt, but I'm giving the edge to Satan.
Posted on Edit | Reply
#21
Mussels
Moderprator
FordGT90Concept said:

Except the high bandwidth (186.624 MB/s for 1920x1080 24-bit color and 30 FPS) and, if wireless, latency.
i never said uncompressed...
Posted on Reply
#23
FordGT90Concept
"I go fast!1!11!1!"
Mussels said:
i never said uncompressed...
That means massive overhead on the compressing and uncompressing ends. If you mean some lossy format, the picture won't be near as good either. Then again, people traded their higher quality CRT monitors for el cheapo LCDs and aren't complaining so it is possible they will give it up if the price is right.

Regardless, this trend of adding more and more cores won't persist forever. Very few tasks benefit from SMT or even AMT.


Edit: Just remember, of the same architecture, a dual-core at 3.2 GHz is faster than a quad-core at 1.6 GHz. The more cores you have, the more overhead is involved in keeping them all busy.
Posted on Reply
#24
Mussels
Moderprator
FordGT90Concept said:
That means massive overhead on the compressing and uncompressing ends. If you mean some lossy format, the picture won't be near as good either. Then again, people traded their higher quality CRT monitors for el cheapo LCDs and aren't complaining so it is possible they will give it up if the price is right.

Regardless, this trend of adding more and more cores won't persist forever. Very few tasks benefit from SMT or even AMT.


Edit: Just remember, of the same architecture, a dual-core at 3.2 GHz is faster than a quad-core at 1.6 GHz. The more cores you have, the more overhead is involved in keeping them all busy.
compress in H264, decompress with hardware acceleration.

you seem to think its hard, but its a feature built into the latest windows 7 - it can recode HD video and stream it to other PC's (or consoles/extenders) on the fly. you're seeing it as starting from nothing, i'm seeing it as an application for an existing tech.
Posted on Reply
#25
Steevo
Windows 7 has multi-users available, as have all OS's since XP (Pro). Just needs a bit of tweaking.
Posted on Reply
Add your own comment