Tuesday, February 19th 2013

AMD "Jaguar" Micro-architecture Takes the Fight to Atom with AVX, SSE4, Quad-Core

AMD hedged its low-power CPU bets on the "Bobcat" micro-architecture for the past two years now. Intel's Atom line of low-power chips caught up in power-efficiency, CPU performance, to an extant iGPU performance, and recent models even feature out-of-order execution. AMD unveiled its next-generation "Jaguar" low-power CPU micro-architecture for APUs in the 5W - 25W TDP range, targeting everything from tablets to entry-level notebooks, and nettops.

At its presentation at the 60th ISSC 2013 conference, AMD detailed "Jaguar," revealing a few killer features that could restore the company's competitiveness in the low-power CPU segment. To begin with, APUs with CPU cores based on this micro-architecture will be built on TSMC's 28-nanometer HKMG process. Jaguar allows for up to four x86-64 cores. The four cores, unlike Bulldozer modules, are completely independent, and only share a 2 MB L2 cache.

"Jaguar" x86-64 cores feature a 40-bit wide physical address (Bobcat features 36-bit), 16-byte/cycle load/store bandwidth, which is double that of Bobcat, a 128-bit wide FPU data-path, which again is double that of Bobcat, and about 50 percent bigger scheduler queues. The instruction set is where AMD is looking to rattle Atom. Not only does Jaguar feature out-of-order execution, but also ISA instruction sets found on mainstream CPUs, such as AVX (advanced vector extensions), SIMD instruction sets such as SSSE3, SSE4.1, SSE4.2, and SSE4A, all of which are quite widely adopted by modern media applications. Also added is AES-NI, which accelerates AES data encryption. In the efficiency department, AMD claims to have improved its power-gating technology that completely cuts power to inactive cores, to conserve battery life.
Add your own comment

71 Comments on AMD "Jaguar" Micro-architecture Takes the Fight to Atom with AVX, SSE4, Quad-Core

#1
Aquinus
Resident Wat-man
by: Ikaruga
No, and I don't really understand why would I joke about ram timings on my favorite enthusiast site. Do you understand that I was citing the actual latency of the chip itself, and not the latency the MC will have to deal with when accessing the memory?
For example, a typical DDR3@1600 module has about 12ns latency in a modern PC.
You mean the 32ns refresh? That's not access speeds my friend, that is how often that a bit in a DRAM cell is refreshed. All DRAM needs to be refreshed since data is stored in a capacitor and needs to be replenished as caps leak when they're disconnected from active power. Other than that, I see no mention of 32ns there.

That "32ns" sounds a lot like tRFC on DDR3 chips, not access latency.
Posted on Reply
#2
Ikaruga
by: McSteel
I believe that AIDA does round-trip latency, and Ikaruga (love that game btw) probably claims that the GDDR5 used has a CL of 32ns. 1600 MT/s CL9 DDR3 has a CL of ~11.25ns max, close to three times less.

Still, with some intelligent queues and cache management, this won't be too much of a problem.


## EDIT ##
Have I ever mentioned how I hate it when I get distracted when replying, only to find out I made myself look like an idiot by posting the exact same thing as the person before me? Well, I do.
Sorry Ikaruga.
Yes I meant that speed, sorry for my English:shadedshu
Posted on Reply
#3
Harlequin_uk
so sony will be accessing it with there own version of LibGCM , along with OCL 1.2 means some awesome and better control over the hardware - something they cant really do now in the PC world as the hardware is variable , could see `on the fly` changes to core useage depending on whether is high physics load or a cut scene movie
Posted on Reply
#4
Aquinus
Resident Wat-man
by: McSteel
claims that the GDDR5 used has a CL of 32ns. 1600 MT/s CL9 DDR3 has a CL of ~11.25ns max, close to three times less.
Isn't that kind of moot since GDDR5 can run at clocks that are 3 times faster than DDR3-1600? It's the same deal that happened when moving from DDR to DDR2 and to DDR3. Latencies increased but access times remained the same because the memory frequency increased which compensates for it and at the same time provides more bandwidth.

Yeah, there might be more latency, it's possible, but I don't think it will make that much of a difference. Also with more bandwidth you can load more data into cache in one clock than DDR3. So I think the benefits will far outweigh the costs.
Posted on Reply
#5
Ikaruga
by: Aquinus
Isn't that kind of moot since GDDR5 can run at clocks that are 3 times faster than DDR3-1600? It's the same deal that happened when moving from DDR to DDR2 and to DDR3. Latencies increased but access times remained the same because the memory frequency increased which compensates for it and at the same time provides more bandwidth.

Yeah, there might be more latency, it's possible, but I don't think it will make that much of a difference. Also with more bandwidth you can load more data into cache in one clock than DDR3. So I think the benefits will far outweigh the costs.
I don't think the price is the reason why we still don't use GDDR5 as main memory in PCs, after all they are selling graphics cards for much more than how much a GDDR5 ram kit or a supporting chipset/architecture would cost. I did not really red anything about overcoming the GDDR5 latency issue in the past, so that's what made me curious.

by: McSteel
.....and Ikaruga (love that game btw)
:toast:
Posted on Reply
#6
EpicShweetness
by: sergionography
we also know it will have 18gcn clusters = 1152 gcn cores rated at 800mhz
and it was rated at 1.84gflops or something actually
18gcn clusters! Can that be right! That would mean Jaguar would get 576 which is more then a 7750, and that alone is 40w of power. Something is a miss here for me.

So 45+45+ say 45 again (cpu) is 135w+!! Something has to be a miss.
Posted on Reply
#7
Aquinus
Resident Wat-man
by: EpicShweetness
18gcn clusters! Can that be right! That would mean Jaguar would get 576 which is more then a 7750, and that alone is 40w of power. Something is a miss here for me.

So 45+45+ say 45 again (cpu) is 135w+!! Something has to be a miss.
They already said that the graphics power is going to be similar to a 7870, didn't they?
Posted on Reply
#8
sergionography
by: Aquinus
You mean the 32ns refresh? That's not access speeds my friend, that is how often that a bit in a DRAM cell is refreshed. All DRAM needs to be refreshed since data is stored in a capacitor and needs to be replenished as caps leak when they're disconnected from active power. Other than that, I see no mention of 32ns there.

That "32ns" sounds a lot like tRFC on DDR3 chips, not access latency.
and that is exactly what latency is tho, as what happens the ram issues the data to the cpu, and after 32ns it refreshes to send the next batch,gpus are highly paralleled so they arent as affected by latency as most gpus just need a certain amount of data to render while the ram sends the next batch, while cpus are a much more random and general purpose than gpus, for example were certain calculations would be issued from the ram, but in order for the cpu to complete the process it must wait for the second batch of data for example, in such a case the cpu would wait for another 32ns, and this is a big issue with cpus now adays but i think it can be easily masked with a large enough l2 cache for the jaguar cores(I think having 8 of them means 4mb cache that can be shared meaning one core can have all 4mb if it needs to. another thing i can think of is whether all 8gb refresh all at once, or whether sony will allow for the ram to work in turns to feed the cpu/gpu more dynamicaly rather than in big chunks of data(note that bulldozer/piledriver have relatively large data pools of l3 and l2 cache to mask their higher latency, and with steamroller adding larger l1 cahce aswell that sais something, and not to mention how much l3 cache affects piledriver in trinity which is pretty slower than fx piledriver, while phenom II vs athlon II barely had any affect due to its lower latency)

by: EpicShweetness
18gcn clusters! Can that be right! That would mean Jaguar would get 576 which is more then a 7750, and that alone is 40w of power. Something is a miss here for me.

So 45+45+ say 45 again (cpu) is 135w+!! Something has to be a miss.
jaguar gets 576 what?
and the highest end jaguar apu with its graphics cores(128 of them?) is rated at 25watt and with much higher clockspeed than 1.6ghz(amd in their presentation said jaguar will clock 10-15 higher than what bobcat wouldve clocked at 28nm) so ur talking atleast over 2ghz.
and if llano with 400outdated radeon cores, and 4 k10.5 cores clocked atleast 1.6 before turbo, so expect jaguar to be much more efficient on a new node and power efficient architecture, say 25watt max for the cpu cores only, if not less, that leaves them with 75-100watt headroom to work with(think hd7970m rated at 100w, thats 1280gcn cores at 800mhz, this would have 1152gcn cores at 800mhz and after a year of optimization its easily at 75watt)to add up to 100-125w which is very reasonable and since its an apu u just need one proper cooler, also think of graphics cards rated at 250w only requiring one blower fan and a dual slot cooler to cool both gddr5 chips and the gpu. in other words the motherboard and the chip can be as big as a hd7970(but with 100-125w u only need something the size of hd7850 which is rated at 110w-130w) but then of course add the br drive and other goodies. main point is cooling is no problem unless multiple chips are involved requiring cooling the case in general rather than the chip itself using a graphic card style cooler
by: Aquinus
They already said that the graphics power is going to be similar to a 7870, didn't they?
more like between hd7850 and hd7970m, it seems 800mhz is the sweet spot in terms of performance/efficiency/die size considering an hd7970m with 1280gnc cores at 800mhz is at 100w versus 110w measured/130w rated on hd7850 with 1024w with 860mhz
not to mention the mobile pitcairn loses 30watts-75watts measured/rated when clocked at 800mhz(advertised tdp on desktop pitcairn is 175wat but measured at 130w according to the link i have below)

http://www.guru3d.com/articles_pages/amd_radeon_hd_7850_and_7870_review,6.html
here is a reference in regards to the measured tdp, because advertised tdp by amd is higher but also consider other parts on the board and allowing overclock headroom or whatever the case is
Posted on Reply
#9
Prima.Vera
by: Ikaruga
I don't think the price is the reason why we still don't use GDDR5 as main memory in PCs, after all they are selling graphics cards for much more than how much a GDDR5 ram kit or a supporting chipset/architecture would cost. I did not really red anything about overcoming the GDDR5 latency issue in the past, so that's what made me curious.

:toast:
Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory because the Graphic DDR5 is special RAM only to be used in graphics. Is a very big difference between how a GPU and CPU uses RAM. Also GDDR5 is based on DDR3 so you are already using it for a long time. Don't know exactly the specifics, but you can google it already... ;)
Posted on Reply
#10
Aquinus
Resident Wat-man
by: Prima.Vera
Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory because the Graphic DDR5 is special RAM only to be used in graphics. Is a very big difference between how a GPU and CPU uses RAM. Also GDDR5 is based on DDR3 so you are already using it for a long time. Don't know exactly the specifics, but you can google it already... ;)
Read the entire thread before you jump to conclusions, this is mainly stemming from the PS4 discussion.

GDDR5 itself can do whatever it wants, there are no packages or CPU IMCs that will handle it though, that does not mean that it can not be used. PS4 is lined up to use GDDR5 for system and graphics memory and I suspect that Sony isn't just saying that for shits and giggles.

Also it's not all that different, latencies are different, performance is (somewhat, not a ton,) optimized for bandwidth other latency but other than that, communication is about the same sans two control lines for reading and writing. It's a matter of how that data is transmitted, but your statement here is really actually wrong.

Just because devices don't use a particular bit of hardware to do something doesn't mean that you can't use that hardware to do something else. For example, for the longest time low voltage DDR2 was used in phones and mobile devices and not DDR3. Does that mean that DDR3 will never get used in smartphones? Most of us know the answer to that and it's a solid no, GDDR5 is no different. Just because it works best on video cards doesn't mean that it can not be used of a CPU that would be build with a GDDR5 memory controller.
Posted on Reply
#11
Ikaruga
by: Prima.Vera
Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory because the Graphic DDR5 is special RAM only to be used in graphics. Is a very big difference between how a GPU and CPU uses RAM. Also GDDR5 is based on DDR3 so you are already using it for a long time. Don't know exactly the specifics, but you can google it already... ;)
Please consider switching from "write-only mode" on the forum, and read my comments if your reply to me:

many thanks:toast:
Posted on Reply
#12
btarunr
Editor & Senior Moderator
by: Prima.Vera
Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory.
Oh but you can. PS4 uses GDDR5 as system memory.
Posted on Reply
#13
Prima.Vera
by: Ikaruga
Please consider switching from "write-only mode" on the forum, and read my comments if your reply to me:

many thanks:toast:
Please don't tell me what to do, or what I am allowed to do or not.

many thanks:toast:

by: btarunr
Oh but you can. PS4 uses GDDR5 as system memory.
PS4 is NOT PC...But if what you all say is true, than why nobody introduced GDDR5 for PC?? Is from a long time on video cards. And why is it called Graphic DDR then?
Posted on Reply
#14
Frick
Fishfaced Nincompoop
Ps4 is pretty much a custom PC.

EDIT: With a custom OS.
Posted on Reply
#15
Ikaruga
by: Prima.Vera
Please don't tell me what to do, or what I am allowed to do or not.
Again :shadedshu. I think if you would start reading what I write, perhaps you would see that I did not tell you "what to do, or what you are allowed to do or not.", I only asked you to consider it:
by: Ikaruga
Please consider....
Posted on Reply
#16
btarunr
Editor & Senior Moderator
by: Prima.Vera
PS4 is NOT PC...But if what you all say is true, than why nobody introduced GDDR5 for PC?? Is from a long time on video cards. And why is it called Graphic DDR then?
The CPU and software are completely oblivious to memory type. The only component that really needs to know how the memory works at the physical level is the integrated memory controller. To every other component, memory type is irrelevant. It's the same "load" "store" "fetch" everywhere else.

Just because GDDR5 isn't a PC memory standard doesn't mean it can't be used as system main memory. It would have comparatively high latency to DDR3, but it still yields high bandwidth. GDDR5 stores data in the same ones and zeroes as DDR3, SDR, and EDO.
Posted on Reply
#17
Aquinus
Resident Wat-man
by: Prima.Vera
And why is it called Graphic DDR then?
Because it is optimized for graphics, not exclusively for graphics.

You're just digging yourself into a hole.
Posted on Reply
#18
tokyoduong
by: Ikaruga
Strange they do this before Sony's PS4 announcement tomorrow. Both new consoles from MS and Sony gonna have these new cores in their CPUs, I thought Sony would ask them for all the "flare" they can get. It's also strange only four cores allowed on the PC side while there will be more in the consoles (assuming that all the leaks are correct ofc).
Game consoles are not mobile. These are designed for mobile devices with 5-25W power envelope.

by: btarunr
The CPU and software are completely oblivious to memory type. The only component that really needs to know how the memory works at the physical level is the integrated memory controller. To every other component, memory type is irrelevant. It's the same "load" "store" "fetch" everywhere else.

Just because GDDR5 isn't a PC memory standard doesn't mean it can't be used as system main memory. It would have comparatively high latency to DDR3, but it still yields high bandwidth. GDDR5 stores data in the same ones and zeroes as DDR3, SDR, and EDO.
^^tru dat!

by: Aquinus
Because it is optimized for graphics, not exclusively for graphics.

You're just digging yourself into a hole.
By graphics you probably meant bandwidth which is correct.

I'm guessing latency is not as much of an issue when it comes to a specific design for a console rather than a broad compatibility design for PC.
Posted on Reply
#19
Mussels
Moderprator
by: tokyoduong

I'm guessing latency is not as much of an issue when it comes to a specific design for a console rather than a broad compatibility design for PC.
my thoughts as well. these arent meant to be generic multipurpose machines, they're meant to he gaming consoles with pre-set roles, and time to code each game/program to run specifically on them.


this gives game devs the ability to split that 8GB up at will, between CPU and GPU. that could really extend the life of the console, and its capabilities.
Posted on Reply
#20
Aquinus
Resident Wat-man
by: tokyoduong
By graphics you probably meant bandwidth which is correct.
That's to go without saying. GDDR is optimized for graphics which performs best under high bandwidth, high(er) latency situations.
by: tokyoduong
I'm guessing latency is not as much of an issue when it comes to a specific design for a console rather than a broad compatibility design for PC.
I'm not willing to go that far, but I'm sure they will have stuff to mitigate any slowdown it may cause such as intelligent caching and pre-fetching.
Posted on Reply
#21
Ikaruga
by: Mussels
my thoughts as well. these arent meant to be generic multipurpose machines, they're meant to he gaming consoles with pre-set roles, and time to code each game/program to run specifically on them.
Yes that's one of the advantage of working on closed systems like consoles. It helps a lot both in development speed and efficiency wise, but the developing procedure is still the same.

by: Mussels
this gives game devs the ability to split that 8GB up at will, between CPU and GPU. that could really extend the life of the console, and its capabilities.
You can't do anything else but split unified memory (this is also the case with APUs and IGPs on the PC ofc), that's why it's called unified.
The N64 was a console and developers actually released titles on it, but this doesn't change the fact how horrid the memory latency really was on that system, and how much extra effort and work the programmers had to make to get over that huge limitation (probably the main reason why Nintendo introduced 1T-SRAM in the Gamecube, which was basically eDram on die).
If you really wan't to split unified memory into CPU and GPU memory (you can't btw, but let's assume you could), it's extremely unlikely that developers will use more than 1-2GB for "Video memory" in the PS4, not only because the bandwidth would be not enough to use more, but also because it's simply not needed (ok, the ps4 is extremely powerful on the bandwidth side and perhaps there will be some new rendering technique in the future which we don't know about yet, but current methods like deferred rendering, voxel, megatexturing, etc will run just fine using only 1-2GB for "rendering").
Posted on Reply
Add your own comment