Friday, March 3rd 2017

AMD Responds to Ryzen's Lower Than Expected 1080p Performance

The folks at PC Perspective have shared a statement from AMD in response to their question as to why AMD's Ryzen processors show lower than expected performance at 1080p resolution (despite posting good high-resolution, high-detail frame rates). Essentially, AMD is reinforcing the need for developers to optimize their games' performance to AMD's CPUs (claiming that these have only been properly tuned to Intel's architecture). AMD also puts weight behind the fact they have sent about 300 developer kits already, so that content creators can get accustomed to AMD's Ryzen, and expect this number to increase to about a thousand developers in the 2017 time-frame. AMD is expecting gaming performance to only increase from its launch-day level. Read AMD's statement after the break.
AMD's John Taylor had this to say:

"As we presented at Ryzen Tech Day, we are supporting 300+ developer kits with game development studios to optimize current and future game releases for the all-new Ryzen CPU. We are on track for 1000+ developer systems in 2017. For example, Bethesda at GDC yesterday announced its strategic relationship with AMD to optimize for Ryzen CPUs, primarily through Vulkan low-level API optimizations, for a new generation of games, DLC and VR experiences.

Oxide Games also provided a public statement today on the significant performance uplift observed when optimizing for the 8-core, 16-thread Ryzen 7 CPU design - optimizations not yet reflected in Ashes of the Singularity benchmarking. Creative Assembly, developers of the Total War series, made a similar statement today related to upcoming Ryzen optimizations.

CPU benchmarking deficits to the competition in certain games at 1080p resolution can be attributed to the development and optimization of the game uniquely to Intel platforms - until now. Even without optimizations in place, Ryzen delivers high, smooth frame rates on all "CPU-bound" games, as well as overall smooth frame rates and great experiences in GPU-bound gaming and VR. With developers taking advantage of Ryzen architecture and the extra cores and threads, we expect benchmarks to only get better, and enable Ryzen excel at next generation gaming experiences as well.

Game performance will be optimized for Ryzen and continue to improve from at-launch frame rate scores."

Two game developers also chimed in.

Oxide Games, creators of the Nitrous game engine that powers Ashes of the Singularity:

"Oxide games is incredibly excited with what we are seeing from the Ryzen CPU. Using our Nitrous game engine, we are working to scale our existing and future game title performance to take full advantage of Ryzen and its 8-core, 16-thread architecture, and the results thus far are impressive. These optimizations are not yet available for Ryzen benchmarking. However, expect updates soon to enhance the performance of games like Ashes of the Singularity on Ryzen CPUs, as well as our future game releases." - Brad Wardell, CEO Stardock and Oxide

And Creative Assembly, the creators of the Total War Series and, more recently, Halo Wars 2:

"Creative Assembly is committed to reviewing and optimizing its games on the all-new Ryzen CPU. While current third-party testing doesn't reflect this yet, our joint optimization program with AMD means that we are looking at options to deliver performance optimization updates in the future to provide better performance on Ryzen CPUs moving forward. "
Source: PC Perspective
Add your own comment

126 Comments on AMD Responds to Ryzen's Lower Than Expected 1080p Performance

#102
FordGT90Concept
"I go fast!1!11!1!"
calebI don't think anybody will recode already published titles to utilize more cores but lets see.
If this dev kit includes a C/C++ compiler for Ryzen, all they should have to do is point it at their code base and compile then push it out a digital distribution update. I think the only games that will get that treatment though are ones actively getting updates (CS:GO, DOTA2, PD2, MMOs, etc.).


FX didn't get special compiler treatment because that was putting lipstick on a pig.
Posted on Reply
#103
Batou1986
theoneandonlymrk@Batou1986 ,you get what you pay for ,Intel or amd,if you buy a low end cheap motherboard you get low end performance and will end up a moaner.

I overclocked my friends 6320 on you're board two nights ago because he finally got a evo 212 like I told him.
Nightmare, his throttled all the time at stock settings,I was forced into bclk clocking it by the crapness of his board, I've had his chip easily do 4.5 in my rig but not in his, 4.3 max.
Point being you're reference and perception have been affected by your purchase choices and you should have chosen better imho.
My man I think you need to stop making assumptions, The crapness of my board has nothing to do with the lack of performance from the FX series.
I easily beat the benchmarks for an 8350 because all 8 cores are running 4.2 as a non turbo boosted speed and I have none of these throttling issues you mentioned even when running linpac for hours.
Running at 5ghz is not going to make DCS or Star Citizen or any number of other games that have issues with AMD CPU's run any better for me.

Its great that AMD kinda caught up to Intel with Ryzen.
But if its going to be the same as the FX series where certain applications perform worse specifically because of AMD CPUs like DCS World and Cryengine games that's a major issue that cant just be ignored.

Also you need to stop repeating that devs are all making games for 8 cores and using that as a reason your "8 core" cpu is still ok, its well known fact that there are 4 full cpu cores and 4 limited cpu cores on the FX series and this makes a HUGE difference in performance when comparing it with a true 8 core cpu.
Posted on Reply
#104
efikkan
FordGT90ConceptFX didn't get special compiler treatment because that was putting lipstick on a pig.
Bulldozer got compiler optimizations from major compilers such as GCC and LLVM, in fact GCC alone has 4 levels of it.
FordGT90ConceptIf this dev kit includes a C/C++ compiler for Ryzen, all they should have to do is point it at their code base and compile then push it out a digital distribution update. I think the only games that will get that treatment though are ones actively getting updates (CS:GO, DOTA2, PD2, MMOs, etc.).
Compiler optimizations have been available for a long time, ever since the ISA was planned. You can see some of the results from it here. Such optimizations usually helps with specific edge cases and helps with vectorization, which can help a bit in some applications. But games are usually limited by cache misses and branch mispredictions, compiler optimization wouldn't do much to help with this, so game developers can't just throw a compiler at it.
Posted on Reply
#105
TheoneandonlyMrK
Batou1986My man I think you need to stop making assumptions, The crapness of my board has nothing to do with the lack of performance from the FX series.
I easily beat the benchmarks for an 8350 because all 8 cores are running 4.2 as a non turbo boosted speed and I have none of these throttling issues you mentioned even when running linpac for hours.
Running at 5ghz is not going to make DCS or Star Citizen or any number of other games that have issues with AMD CPU's run any better for me.

Its great that AMD kinda caught up to Intel with Ryzen.
But if its going to be the same as the FX series where certain applications perform worse specifically because of AMD CPUs like DCS World and Cryengine games that's a major issue that cant just be ignored.

Also you need to stop repeating that devs are all making games for 8 cores and using that as a reason your "8 core" cpu is still ok, its well known fact that there are 4 full cpu cores and 4 limited cpu cores on the FX series and this makes a HUGE difference in performance when comparing it with a true 8 core cpu.
and you think a HT core is a full core? your so ,so wrong the FX series are a closer fit to dual cores and thats there inherant problem each core had less actual rescources and no micro ops so under utilisation happens.
intel on the other hand had micro ops and could if wanted use a hole cores(2 intel cores worth) on one thread leaveridging a wider execution pipe micro ops and better cache plus two node swaps lower ,but they are all old advantages and its clear amd have the raw per core and multicore performance so a few tweaks here and there on this brand new uarch and im sure it will be fine.

Then i might buy one ,but as i said if i bought one , running two 480s and 4k , i could not do better buying intel anything in any metric , apparently , so i could happily dodge 1080p my whole life but alas im skint so im dreamin still.
Posted on Reply
#106
DeathtoGnomes
calebI don't think anybody will recode already published titles to utilize more cores but lets see.
I dont see why not when Trion/Rift has.
Posted on Reply
#107
Camm
Well being fair, games don't need to be recoded to use more cores to benefit Ryzen, but they could do with being compiled with an AMD friendly compiler. Contrary to belief, its not quite as simple as just using the AMD compiler and off you go, but the work to do it wouldn't be extravagant either.

Depending on how well Ryzen sells, I can see plenty of recentish games getting patches.
Posted on Reply
#108
BiggieShady
efikkangame developers can't just throw a compiler at it.
To be fair, you are partially right, there are number of compiler optimizations that can help with prefetch to have less cache misses: ece-research.unm.edu/jimp/611/slides/chap5_3.html ... I say partially, because they either do so in the expense of more added instructions or covering edge cases for this particular purpose (gaming) ... it's win some lose some situation. Can't know for certain until you fire up the cpu profiler on the ryzen for the specific game.
Trouble is game devs will find little incentive to do so for past projects ... and for the new ones, compilers will get tuned as time goes by because of the zen in the console space
Posted on Reply
#109
medi01
efikkanResorting to conspiracy theories?
CPU optimization is a conspiracy theory?
Need to take into account that 8 core chip is actually 4x4 (OS would do that) with their own L3 is a conspiracy theory?

"lower than expected" is a fact nowadays? Expected by whom?

I have seen Starcraft 2 benchmarks with Ryzen doing min 16 average 31 fps (on 980), are you freaking kidding me?
This is plain and outright bullshit, there is no desktop CPU that is less than 4 years old that would score like that in that game.

There is an expected single thread advantage that Intel's 4 cores have, and AMD has voiced it actually.
AMD states they they are 6% behind Skylake IPC, taking higher clock into account it's flat 20% advantage for 7700k in single core tasks, who "expected" something, pretty please?
Haswell was an "unlikely but hopefully" target. It ended up on Broadwell levels, jeez.


/double facepalm
Posted on Reply
#110
efikkan
BiggieShadyTo be fair, you are partially right, there are number of compiler optimizations that can help with prefetch to have less cache misses: ece-research.unm.edu/jimp/611/slides/chap5_3.html ...
This is primarily referring to other instruction sets than x86, since modern x86 architectures have a prefetcher with a large instruction window. If a prefetching hint should give any helpe before dereferecning a pointer, then the programmer has to insert this hint earlier in the code. Large number of cache misses usually occur when traversing a list, but using such hints inside a loop will provide no benefit since the CPU will what's inside the loop, and you can't know the memory address of data several iterations ahead without dereferecning pointers, so doing so will probably reduce cache efficiency causing a performance penalty. For this reason manual prefetching is discouraged.
BiggieShadyI say partially, because they either do so in the expense of more added instructions or covering edge cases for this particular purpose (gaming) ... it's win some lose some situation. Can't know for certain until you fire up the cpu profiler on the ryzen for the specific game.
Please explain what this means.
BiggieShadyTrouble is game devs will find little incentive to do so for past projects ... and for the new ones, compilers will get tuned as time goes by because of the zen in the console space
Compilers are already "tuned", so we wouldn't see any major change there, but as I've mentioned optimizing compilers can't do much with branch mispredictions and cache misses.
If a compiler were to eliminate some branching, the CPU has to have some new unique instructions allowing certain conditionals to be converted into branchless code. Otherwise, a compiler can't help here.
Data cache misses usually occur because of traversal of lists, and the only way to eliminate this would be to rewrite the whole codebase to align the data in a native array, no compiler can ever do this. This is largely a result of how the developer chose to do OOP.
Code cache misses is once again usually a result of the code structure, OOP and lists of arbitrary elements is the greatest challenge here. Once again the solution is to restructure the code which is outside the realm of a compiler. A kind of optimization I can think of which would help here would be to inline small functions, but compilers already do that, like GCC with -O2 which enables -finline-small-functions.
Posted on Reply
#111
BiggieShady
efikkanPlease explain what this means.
I was pointing out that only running a cpu profiler while debugging a specific game on ryzen can show where those nanoseconds are lost inside a frame compared to less new cpu architectures. Then critical sections get either rewritten for specific arch, or those libraries can be compiled with different options. Mostly combination of both.
efikkanCompilers are already "tuned", so we wouldn't see any major change there, but as I've mentioned optimizing compilers can't do much with branch mispredictions and cache misses.
Let's not forget this is a completely new arch, I'm not saying compilers should suddenly start doing impossible ... but in the realm of what is possible, it seems there is a headroom. I'm guessing here that CPU with such large cache shouldn't suffer much from cache misses, branch misprediction is another story but anyway both would produce stuttery experience, and fps seems extremely steady only lower on average.
Posted on Reply
#112
efikkan
BiggieShadyI was pointing out that only running a cpu profiler while debugging a specific game on ryzen can show where those nanoseconds are lost inside a frame compared to less new cpu architectures. Then critical sections get either rewritten for specific arch, or those libraries can be compiled with different options. Mostly combination of both.
First of all, profiles will not measure large problems like cache misses accurately. There will hardly be any games which can get substantial benefits from AMD specific tweaks this way, since compiler optimizations are limited to small patterns of instructions. Unless there are some big "bugs" in the Zen architecture here, there is little to gain from this. Almost all larger problems would need a rewrite, and are not AMD specific in any way.
BiggieShadyLet's not forget this is a completely new arch, I'm not saying compilers should suddenly start doing impossible ... but in the realm of what is possible, it seems there is a headroom.
Why does there seem to be headroom? Do you even know how a compiler works? You clearly don't seem to do so.
BiggieShadyI'm guessing here that CPU with such large cache shouldn't suffer much from cache misses,
You know what a kB is right?
Just the rendering of a single frame will process several hundred MBs, and at 60 FPS there is a lot of data flowing through.
With 512 kB of L2 cache, and 8 MB of shared L3 cache it's not like even 1% of the data is in there at any point.
BiggieShadybranch misprediction is another story but anyway both would produce stuttery experience, and fps seems extremely steady only lower on average.
So since FPS is "stable", there is not branch mispredictions and cache misses? I'm sorry, but you clearly don't even know at which scale this things even happen. We are not talking of single stalls causing ms of latency known as stutter, no we are talking about clock cycles which are in ns scale, and since there are so many thousands of them every second they add up to a steady performance drop rather than noticeable stutter. A single local branch misprediction causes ~20 clocks of idle, a non-local adds a cache miss as well(code cache miss), so +~250 clocks. A data cache miss is ~250 clocks on modern CPUs.
Posted on Reply
#113
bug
geon2k2There are for sure but in general these parts are not used for gaming so this whole 1080p lower gaming performance might not be an issue at all. At least not now. We will discuss once more on this when r5/r3 will come.
Quite frankly, I'm not worried about FHD gaming at all (I game at 1920x1200 atm). In a couple of years I hope 4k will become much more affordable. What's giving me pause is AMD has only matched an architecture that has been stagnant for years. AMD themselves said Zen is their workhorse for the next four years. And if Intel comes up with something till then (which they probably will), AMD may not get a chance to cash in properly. Then again, I was never that good at predicting things ;)
Posted on Reply
#114
rruff
bugAnd if Intel comes up with something till then (which they probably will), AMD may not get a chance to cash in properly. Then again, I was never that good at predicting things ;)
On the other hand if Intel doesn't have some secret weapon, then I think AMD will stand to gain ~100% in market share by 2018 (would put them ~35%) based on Ryzen and refinements. This is *so* much better than Bulldozer. Ryzen may be down a little on raw speed but power efficiency is very good, which will bode well for laptops and servers.
Posted on Reply
#115
BiggieShady
efikkanSo since FPS is "stable", there is not branch mispredictions and cache misses? I'm sorry, but you clearly don't even know at which scale this things even happen. We are not talking of single stalls causing ms of latency known as stutter, no we are talking about clock cycles which are in ns scale, and since there are so many thousands of them every second they add up to a steady performance drop rather than noticeable stutter. A single local branch misprediction causes ~20 clocks of idle, a non-local adds a cache miss as well(code cache miss), so +~250 clocks. A data cache miss is ~250 clocks on modern CPUs.
Of course I'm not talking about single cache miss ... rather about max frame times ... with each cache miss at 62.5 ns, any excessive cache misses in one frame compared to previous would show as much bigger variation of the maximum frame time ... you say it adds up to a steady performance drop, but I say it should affect measured frame time variations in a non-steady manner.
Posted on Reply
#116
Patriot
londistedisadvantage does not melt at higher resolution due to anything to do with cpu. higher resolutions will simply bring gpu limit quite a bit lower.
with 1080ti (and hopefully vega) out soon, titanxp level of performance will be more accesible than ever. that performance level is the same on 1440p as gtx1080 performance is on 1080p.
Not necessarily true... There are actually quite a few 8 -12 threaded games on the market.
When there is falloff in framerate on the intel side and the AMD side stays flat at higher res... that shows a cpu bottleneck plain and clear. If the gap does anything other than stay constant... the difference is more than the gpu.
Posted on Reply
#117
EarthDog
@Patriot - Please list all titles which can utilize 8+ threads. :)

Wondering how many you believe is 'quite a few'..
Posted on Reply
#118
yoyo2004
EarthDog@Patriot - Please list all titles which can utilize 8+ threads. :)

Wondering how many you believe is 'quite a few'..
Rise of the tomb raider uses all my 8 cores/ threads...
Posted on Reply
#119
Patriot
EarthDog@Patriot - Please list all titles which can utilize 8+ threads. :)

Wondering how many you believe is 'quite a few'..
The last two tomb raiders, Crytek engine games, frostbite engine games (BC2, 9 threads, BF3 was up to 12 threads... BF4, battlefront BF1 ... and whatever else uses it.
GTA5, Sniiper Elite is a showcase of it...

idk, how many AAA titles does it take? I am sure there are more... and as DX12 and vulkan become more prevalent I am sure that is the trend.

Point stands... If the gap does anything other than stay constant when you change resolutions... the difference is more than the gpu.

Even on the games that just hit 4-6 threads hard... having spare threads means if anything hiccups in the background doesn't hurt you.
Posted on Reply
#120
XiGMAKiD
Well there's hope that AMD's push on much more affordable multicore to the masses could result in developers making prettier and higher performance games even though it won't be easy just like AMD's push on lower level API with Mantle and more recently Vulkan/DX12
Posted on Reply
#121
Patriot
XiGMAKiDWell there's hope that AMD's push on much more affordable multicore to the masses could result in developers making prettier and higher performance games even though it won't be easy just like AMD's push on lower level API with Mantle and more recently Vulkan/DX12
They are also pushing 1000+ dev units out... they are giving away ryzen to game devs...
Posted on Reply
#122
XiGMAKiD
PatriotThey are also pushing 1000+ dev units out... they are giving away ryzen to game devs...
Well that's a good start
Posted on Reply
#123
akumod77
Why not compare any Ryzen againts i7 7700k at same clock speed, mem timings, core/thread count?

For example, because Ryzen won't oc much. Clock them both @ 3.9ghz ~ 4.1ghz, 4c/8t. I know we are gimping the i7 7700k but i'm just curious to know the result of "almost the same" setup would be. Gaming & productivity benches needed

Posted on Reply
#125
efikkan
BiggieShadyOf course I'm not talking about single cache miss ... rather about max frame times ... with each cache miss at 62.5 ns, any excessive cache misses in one frame compared to previous would show as much bigger variation of the maximum frame time ... you say it adds up to a steady performance drop, but I say it should affect measured frame time variations in a non-steady manner.
You still don't understand the time scale here.
Fluctuations around 1-2 ms is very noticeable, and I would claim anything below ~0.2 ms is hard to notice.
For comparison 0.2 ms = 200 μs = 200,000 ns.
PatriotNot necessarily true... There are actually quite a few 8 -12 threaded games on the market.
When there is falloff in framerate on the intel side and the AMD side stays flat at higher res... that shows a cpu bottleneck plain and clear. If the gap does anything other than stay constant... the difference is more than the gpu.
XiGMAKiDWell there's hope that AMD's push on much more affordable multicore to the masses could result in developers making prettier and higher performance games even though it won't be easy just like AMD's push on lower level API with Mantle and more recently Vulkan/DX12
For both of you;
Multithreading in games mainly comes down to freeing up the rendering thread to work undisturbed building a queue. Granted, Direct3D 12 allows you to use multiple threads to build a single queue, but there's really not any point to it. Having several threads querying the driver this way will create a number a synchronization issues, so the gains will be minimal. So the gains of multiple threads will mostly be limited to having one thread per queue, and since most games use 1-2 queues for most of the load, there will not be a huge potential here. It's not like we can just throw four threads at it and scale nicely.

If a game has a problem with a bottlenecked CPU, it's usually caused by the computations done between each API call. So e.g. precalculating animations in a different thread can help a bit, but of course it mostly comes down to the code structure in the game engine. This is why I started by mentioning "freeing up the rendering thread".
PatriotThey are also pushing 1000+ dev units out... they are giving away ryzen to game devs...
Too little, too late…
This is all about PR, sending out some dev kits is not going to make developers rewrite their games over night. In ~99% of cases reducing the bloat would require a major rewrite, which is not something that can be done in 10 hours or so.
Posted on Reply
Add your own comment
May 7th, 2024 12:19 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts