Monday, March 28th 2016

AMD "Greenland" Vega10 Silicon Features 4096 Stream Processors?

The LinkedIn profile of an R&D manager at AMD discloses key details of the company's upcoming "Greenland" graphics processor, which is also codenamed Vega10. Slated for an early-2017 launch, according to AMD's GPU architecture roadmap, "Greenland" will be built on AMD's "Vega" GPU architecture, which succeeds even the "Polaris" architecture, which is slated for later this year.

The LinkedIn profile of Yu Zheng, an R&D manager at AMD (now redacted), screencaptured by 3DCenter.org, reveals the "shader processor" (stream processor) count of Vega10 to be 4,096. This may look identical to the SP count of "Fiji," but one must take into account "Greenland" being two generations of Graphics CoreNext tech ahead of "Fiji," and that the roadmap slide hints at HBM2 memory, which could be faster. One must take into account AMD's claims of a 2.5X leap in performance-per-Watt over the current architecture with Polaris, so Vega could only be even faster.
In related news, AMD could be giving final touches to its first chips based on the "Polaris" architecture, a performance-segment chip codenamed "Ellesmere" or Polaris10, and a mid-range chip codenamed "Baffin" or Polaris11. "Ellesmere" is rumored to feature 36 GCN 4.0 compute units, which works out to 2,304 stream processors; and a 256-bit wide GDDR5 (or GDDR5X?) memory interface, with 8 GB standard memory amount. The specs of "Baffin" aren't as clear. The only specification doing rounds is its 128-bit wide GDDR5 memory bus. Products based on both these chips could launch in Q3, 2016. Sources: 3DCenter, 1, 2
Add your own comment

21 Comments on AMD "Greenland" Vega10 Silicon Features 4096 Stream Processors?

#1
the54thvoid
Anyone not seeing the elephant in the room? Polaris releases late 2016, generously maybe by autumn/fall. A few months later, new architecture is released again. Makes zero sense unless they're doing a reverse Nvidia. Vega becomes the Titan style part, with Polaris 10 & 11 being the x80 and x70 parts.

FUD I say.
Posted on Reply
#2
uuuaaaaaa
the54thvoid said:
Anyone not seeing the elephant in the room? Polaris releases late 2016, generously maybe by autumn/fall. A few months later, new architecture is released again. Makes zero sense unless they're doing a reverse Nvidia. Vega becomes the Titan style part, with Polaris 10 & 11 being the x80 and x70 parts.

FUD I say.
These are my predicitons:

Polaris (10 and 11) - May / July - Expect at least R9 390 X class performance cards for cheap. (Much like the HD2900 XT to the HD 3870 transition) R9 470 and R9 480 parts.
Vega - ( September 2016 to January 2017) I expect these to be the R9 490 X cards.

Or you could tier these one up being Polaris 11/10 the 480 and 490 class of cards and Vega be the Fiji successor.
Posted on Reply
#3
Mathragh
Going by what Raja told Ryan from PCPer, I expect RTG to use two chips (on an interposer maybe?) on their top Vega part.
Posted on Reply
#4
john_
I think I was expecting this. Who knows how many years companies will have to stay at 14nm. Who knows how much more expensive will be the transition to 10nm.

So they will give a 15-25% performance improvement, mostly from higher frequencies and architectural changes, for 3/4 of the power consumption thanks to the 14nm/16nm process, compared to today's models and that's it for this summer.

Add to that GDDR5/X and not HBM, the rumors that Polaris remains a feature level 12_0 card and Pascal still doesn't know what Async compute is, and we already start to question if this summer's models are the cards we are waiting for and not those that will come latter.
Posted on Reply
#5
NC37
the54thvoid said:
Anyone not seeing the elephant in the room? Polaris releases late 2016, generously maybe by autumn/fall. A few months later, new architecture is released again. Makes zero sense unless they're doing a reverse Nvidia. Vega becomes the Titan style part, with Polaris 10 & 11 being the x80 and x70 parts.

FUD I say.
Probably because most likely Polaris is just Fiji 2.0. Fury with a die shrink. Could be the entire Polaris line will just be variations of Fury with different stream processor or memory counts.

Things really don't seem to get interesting till Vega. The last time AMD hyped up "performance per watt" we got Fury and were underwhelmed. 2.5x means squat. It's just more of that hype train trying to make Polaris look real great when its Vega folks should be more keen on.

AMD needs something to get them through till next year. Polaris will likely do. I just don't feel so bad about having to jump on a 390 before I planned to spend.
Posted on Reply
#6
Xajel
I think this relates also with the delay of HBM2, as both AMD and NV are choosing GDDR5X for their next gen. high-end... while in the same time, AMD thinks that HBM2 holds a great potential that they're eager to release a product with it as soon as possible...
Posted on Reply
#7
BiggieShady
Ah, R&D Manager spilling out company secrets on his LinkedIn profile ... I wish him good luck on the job market
Posted on Reply
#8
vega22
the54thvoid said:
Vega becomes the Titan style part, with Polaris 10 & 11 being the x80 and x70 parts.
seems vega_num is their new internal code naming scheme.

but i think they want to switch to a more nvidia style too with a tier of card between the gamers cards and firepro too.

as for greenland next year, i doubt it unless polaris fails.
Posted on Reply
#9
FordGT90Concept
"I go fast!1!11!1!"
4096 is pretty disappointing for 14nm. The only incentive you'd have to buy Vega10 would be power consumption compared to Fiji (and minor stuff like DP 1.3, HDMI 2.0a, D3D 12_1 feature level, etc.). They better have something more potent in the works (and not dual GPU) if they want to compete with NVIDIA's top cards. Vega10 should be a upper mid-range card.
Posted on Reply
#10
HD64G
FordGT90Concept said:
4096 is pretty disappointing for 14nm. The only incentive you'd have to buy Vega10 would be power consumption compared to Fiji (and minor stuff like DP 1.3, HDMI 2.0a, D3D 12_1 feature level, etc.). They better have something more potent in the works (and not dual GPU) if they want to compete with NVIDIA's top cards. Vega10 should be a upper mid-range card.
Vega 11 (or 9 if reversed naming) anyone? ;)
Posted on Reply
#11
kiddagoat
Considering the games coming down the pipe and how the porting to PC has become the norm in some cases.... if you haven't bought a new GPU in the last 3 years, anything from the Fury line or the next two AMD lines should be more than enough to hold you over for awhile.

Same on the NV side, if you have a Maxwell or a Pascal and only at 1080p possibly 1440p, you should be good for awhile. I think GPUs are finally falling in line with CPUs, they hitting that size wall and are more about cutting down power/energy efficiency. Sure a die shrink will help performance but they are only going to get so fast.

Developers are lazy and don't really optimize their code. Hence the ports we have been getting on PC lately. They optimize for PS4 and XBone because the hardware is static. With PC they have all the different configurations and unless either side does something to get developers off their ass and actually code for the PC hardware, it is going to be the same dog and pony show it has been for the past 6-8 years.

We've had pretty awesome hardware for awhile now and it just isn't utilized properly.

For some of us we are still on first gen I7's and FX8350's. Unless you synthetic benchmark all day, you'd be hard pressed for a difference. Hardware is starting to get to a limit and finally all this bloat and laziness with developers is going to bite them in their ass.

Now I know some studios do much better than others with their optimization and getting their products to run on various platforms but, a fair majority don't.


I am not too worried though, my 2x Nanos should be more than plenty for awhile.
Posted on Reply
#12
efikkan
With the top chip in Vega/"5th generation GCN" holding 4096 shader processors (the same as Fiji), let's at least hope that it brings greater architectural changes than Polaris, which AMD themselves considers a minor change (except for the shrink of course).

It may be a smart change for AMD to target the upper mid range rather than the high-end, after the Fiji blunder where they spent all the resources on a high-end product that didn't sell well. The $300-550 market is after all the most profitable market, and this should make AMD able to cover most of the market share they can cover with their limited resources.
Posted on Reply
#13
efikkan
kiddagoat said:
Considering the games coming down the pipe and how the porting to PC has become the norm in some cases.... if you haven't bought a new GPU in the last 3 years, anything from the Fury line or the next two AMD lines should be more than enough to hold you over for awhile.

Same on the NV side, if you have a Maxwell or a Pascal and only at 1080p possibly 1440p, you should be good for awhile. I think GPUs are finally falling in line with CPUs, they hitting that size wall and are more about cutting down power/energy efficiency. Sure a die shrink will help performance but they are only going to get so fast.
Unlike CPUs, GPUs can continue to scale efficiently provided we can put more shader cores on the dies, so for a while GPUs have benefited a lot from shrinks while CPUs have really not improved a lot since Sandy-Bridge. But as we all know, the shrinks are fewer and farther between. We might be looking at two shrinks in the next decade or so, and the benefits from the shrinks will also decline.

But in terms of demand, the demand is actually increasing at a higher rate than in the last ten years, since gamers now want higher resolutions and higher frame rates at the same time. And we are still not at the point where GPUs are "powerful enough" so game developers can achieve everything they want, and we can expect performance requirements to continue to increase for new games.

The jump from 28nm to 14/16nm is actually "two steps", except for the interconnect which is still at 20nm. So Pascal is probably going to be the largest performance gain we have seen for a long time, and we are probably not going to see a similar increase for Volta, and post-Volta. Currently a single GTX 980 Ti is still not "powerful enough" for 4K gaming, and is not close to 60 FPS in all games at stock speed. And for those who want higher frame rates, even though GTX 980 Ti is OK for 1440p, it's not enough to push 120-144 FPS in all games at stock speed. With Pascal probably increasing the gaming performance by >60%, we will still not be at 4K 120 FPS. If Volta(2018) is not going to be another shrink, then we might get as little as 20% more performance, which is not going to keep up with the demand.

kiddagoat said:

Developers are lazy and don't really optimize their code. Hence the ports we have been getting on PC lately. They optimize for PS4 and XBone because the hardware is static. With PC they have all the different configurations and unless either side does something to get developers off their ass and actually code for the PC hardware, it is going to be the same dog and pony show it has been for the past 6-8 years.

We've had pretty awesome hardware for awhile now and it just isn't utilized properly.
You are touching a very important subject. Game developers have gotten used to performance leaps every two years or so, so by the time a game is released they expect people to by more powerful hardware than it was developed on. We all know that performance gains in hardware is going to decrease over time, so writing good code is going to become increasingly important.

The gaming consoles are a big problem, which uses outdated low-end hardware. And as long as developers keep making games for these machines and porting them to PCs by cranking up the model details, they are going to continue to suck. The current API call mania (Direct3D 12/Vulkan/etc.) is not going to help the situation. Every developer knows that batching the draws is the only efficient way to render, and when doing efficient batching the API overhead is low anyway.

Game engines are using way too much abstraction to use GPUs efficiently. If the API overhead is a problem for a game, then the engine CPU overhead is going to be even larger. Doing all kinds of GPU manipulation through thousands of API calls is a step backwards. Scaling with API calls is not going to work well with the raw performance of Pascal, Volta, post-Volta and so on.
Posted on Reply
#14
Steevo
More interested in the SOC/IP side of the conment, perhaps we will see the first hardware driver acceleration with compute and X86-64 core on a GPU? The time line fits, one or two HMA Zen cores coupled to a GPU die sharing resources? All the performance with no driver issues from different hardware configs, a simple scalable architecture.

Perhaps I'm all wrong here, but what else does SOC mean?
Posted on Reply
#15
AsRock
TPU addict
System\System on chip ?, but a few things but that's what i first think when the term is used
Posted on Reply
#17
BiggieShady
efikkan said:
Game engines are using way too much abstraction to use GPUs efficiently.
I'd say it's rather that game engines are using abstractions that allow very efficient use of GPUs if you know how to prepare content for it.
Modern engines batch draw calls automatically if different surfaces share textures, materials or shaders. When designing optimal art for 3d games it's all about reusing stuff while making it look like you are not reusing stuff.
Optimizing on a shader level is done one time for all eternity ... essentially there is one optimal "physical" based lighting shader all games use these days with diffuse, gloss/specular, emission, occlusion, normal and displacement textures (with additional detail diffuse+normal textures on top that are visible when close up) that allows sky based global illumination. Very little optimization room left there.
All optimization on cpu side is basically how to feed gpu command queues while minimizing number of context switching on gpu.
The hidden part that can make every game look unoptimized is occlusion culling algorithm which importance is often wrongly underestimated. Too many engines are used in a way they unnecessarily draw occluded objects.
The real problem starts when devs push consoles to the limits, while very much relying on low api overhead and low latencies heterogeneous memory design allows, only to reach locked 30 fps ... then do a straight port on pci-e bus induced latencies and higher overhead api. It may be feasible for jaguar core in ps4 to directly write something in video ram every frame, doing that over pci-e on pc would introduce extra latency.
Once you start using benefits of hsa on consoles, you get less scalable port to pc simply because of the modular nature of the pc.
The opposite way would be: develop optimally for pc, then leverage use of hsa on consoles to get acceptable performance... but I'm digressing and borderline rambling
Caring1 said:
Silicon On Chip?
It's system on chip, every soc is asic but not every asic is soc.
Posted on Reply
#18
efikkan
BiggieShady said:
I'd say it's rather that game engines are using abstractions that allow very efficient use of GPUs if you know how to prepare content for it.
Modern engines batch draw calls automatically if different surfaces share textures, materials or shaders. When designing optimal art for 3d games it's all about reusing stuff while making it look like you are not reusing stuff.
No, when game engines adds an abstraction layer above the actual API and creates a structure which calls upon each object to render itself we end up with the opposite of batching. As you can see with new games like Ashes of the Singularity keep bragging about the amount of API calls they are able to push through, which is evidence of inefficient coding.

If you try to render a bunch of meshes in a single API call the GPU works way more efficient than if you do it by thousands of small API calls. Even an old GTX 680 is able to render millions of polygons at the screen at a high frame rate, but no game is pushing through geometry at that level due to inefficient usage of the GPU.

BiggieShady said:

All optimization on cpu side is basically how to feed gpu command queues while minimizing number of context switching on gpu.
Well, not quite. The GPU itself is way better at scheduling it's GPU threads/batches and even out the load, way better than an infinite powerful CPU could ever do.

BiggieShady said:

The hidden part that can make every game look unoptimized is occlusion culling algorithm which importance is often wrongly underestimated. Too many engines are used in a way they unnecessarily draw occluded objects.
The GPUs themselves are to a large extent able to automatically cull a lot, that was actually one of the hardware improvements between GF100 and GF110.

Still both vertex shaders and compute shaders can be utilized for efficient culling. It's actually way more efficient to do the fine detailed culling on the GPU in the shader, rather than calculating it in the CPU and passing each part of a mesh as separate API calls.
Posted on Reply
#19
Steevo
efikkan said:
As you can see with new games like Ashes of the Singularity keep bragging about the amount of API calls they are able to push through, which is evidence of inefficient coding.

If you try to render a bunch of meshes in a single API call the GPU works way more efficient than if you do it by thousands of small API calls. Even an old GTX 680 is able to render millions of polygons at the screen at a high frame rate, but no game is pushing through geometry at that level due to inefficient usage of the GPU.
Its not the number of API calls as much as the use of existing objects (instancing) being called on again without the call needing to ask the CPU for geometry information and cull status (batch process) to render the object, previous (to DX12/Mantle/Vulkan) the basic scene geometry was done on the CPU and the majority of culls and texture data was handled and shuffled to the GPU along with the scene render instructions from the CPU, which is exactly why a faster processor could render more FPS, and why benchmarking a CPU in smaller pixel counts (lower resolution) was a normally accepted way to do things, but modern GPU's are aware of texture location through drivers and hardware and can hold more textures in memory (onus on developer of the 3D application to have the correct textures in DX12) instead of drivers that have to be optimized for every game, making drivers 10% actual software, and 90% game optimization data in all versions prior.

You are mistaking hardware efficiency and muscle for poor coding, when its actually showing off the hardware and software improvements.
Posted on Reply
#20
GC_PaNzerFIN
BiggieShady said:
Ah, R&D Manager spilling out company secrets on his LinkedIn profile ... I wish him good luck on the job market
There has been so many of these lately that I call the bluff and say some of it might be planted there in purpose to create hype. By this P&R team manager. :p
Posted on Reply
#21
N3M3515
john_ said:
So they will give a 15-25% performance improvement, mostly from higher frequencies and architectural changes, for 3/4 of the power consumption thanks to the 14nm/16nm process, compared to today's models and that's it for this summer.
I don't think anything below a 50% perf increase is worthwhile.
Posted on Reply
Add your own comment