• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Gen12 Xe iGPU Could Match AMD's Vega-based iGPUs

We really don't know if RDNA was made with mobile in mind, or at least I don't. Porting RDNA to mobile may not bring much benefits if it was not tailored for mobile platforms in a first place.
Wait, what? A GPU architecture is a GPU architecture, and AMD builds all of theirs to be modular and scaleable (which, frankly, all GPU architectures are to some extent due to the parallel nature of the workload). The only criterion for it being "built for mobile" or not is efficiency, where RDNA clobbers GCN - a 40 CU 5700 XT at ~220W matches or beats a 60CU Radeon VII at ~275W on the same node after all, and that's the least efficient implementation of RDNA. AMD has specifically said that RDNA is their GPU architecture (singular) for the coming decade, so GCN is going the way of the dodo in all markets - it's just that iGPUs generally lag a bit architecturally (due to having to combine multiple architectures it's more of a challenge to have everything line up properly, leading to delays). Of course they also have CDNA for compute accelerators, but those aren't technically GPUs. Of course RDNA dGPUs all have GDDR6, which is a significant advantage compared to any laptop or desktop platform, but the advantage isn't any bigger than in the DDR3/GDDR5 era - and arguably current/upcoming LPDDR4X designs are much more usable than anything previous. I would be shocked if next-gen APUs didn't use some version of RDNA, as there is absolutely no reason for them not to implement it at this point.
Yeah I don't remember seeing any major reviews publish their numbers with LPDDR4x @4266 MHz, the vast majority of reviews you are seeing have 2666 or 3200 MHz regular DDR4 & yet they smash every other Intel IGP out there & nearly match or beat the MX250 ~ in that sense there's still plenty of performance gains to be had. Remember at CES we didn't have final retailer versions of laptops neither drivers fine tuned to make the IGP shine, I'd say (IGP) Vega is still king of the hill for about a year or so!
Have there been any proper reviews of the U-series at all? I've only seen leaked ones (that I trust to a reasonable extent, particularly that Notebookcheck Lenovo leak, though we don't know all the details of the configurations for those laptops), and otherwise there are H-series reviews with DDR4-3200 like the Asus G14. And as AnandTech has shown, that implementation roughly sits between the desktop 3200G and 3400G with DDR4-2933 (slightly closer to the 3400G on average), and soundly beats the 3500U (15W Picasso Vega 8) with DDR4-2400. Of course this is a 35W chip in a chassis with significant cooling capacity, so 15W versions are might perform worse, but might make up for that or even beat it if they have LPDDR4X - all iGPUs are starved for bandwidth, after all. Also, at least according to that Notebookcheck leak, the (possibly 25W-configured) 4800U with LPDDR4X consistently beats the MX 250 and 330, while lagging about 5% behind the MX 350.

But as this news post says, it's possible that Tiger Lake Xe iGPUs can match it - though frankly I doubt that given Intel's driver track record. They have often managed to get close to AMD iGPUs with Iris Plus SKUs in synthetics like 3DMark, yet have consistently lagged far, far behind in real-world gaming. I expect a push for better and more frequently updated drivers with Xe, but it'll take time to get them out of the gutter. And by then, RDNA APUs will be here.
 
More players in every segment is a good thing for all of us. It would be great if Nvidia entered cpu segment as well in the long run.
 
More players in every segment is a good thing for all of us. It would be great if Nvidia entered cpu segment as well in the long run.
Huawei and Microsoft are heavily stirring the status quo with ther ARM push. Huawei made decent server Taishan ARM CPU, while Microsoft is porting more and more to ARM.

I wonder how long x86 has left outside gaming and workstation segments?
 
Given that AMD has simply rehashed the same design since APUs began, it makes sense that Intel can catch them. The core problem is the Shared VRAM. AMD solves for it in consoles but does nothing for PC market. All Intel needs to do is solve for that and they can beat any APU. Which they have shown a willingness to do in their IRIS platform.
depends on which vega you mean, the vega cores in the latest Ryzen Mobile 4000 series is much faster than previous cores,
 
depends on which vega you mean, the vega cores in the latest Ryzen Mobile 4000 series is much faster than previous cores,
Faster by how much, and under what thermal envelope and workloads?
 
Huawei and Microsoft are heavily stirring the status quo with ther ARM push. Huawei made decent server Taishan ARM CPU, while Microsoft is porting more and more to ARM.

I wonder how long x86 has left outside gaming and workstation segments?

If Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
 
If Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.

Well spoken. I also think history by now has pointed out to us that one does not exclude the other.

That goes for gaming. It goes for x86 / ARM. The market has become so all encompassing, there IS no one size fits all. It also echoes in MS's Windows RT attempt, for example. People want Windows for specific reasons. Windows phone..., same fate. And even within Windows x86, the cross compatibility just doesn't happen.
 
If Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
Yeah, I also forgot to mention Apple, but you didn't. Thanks for the reminder. :)

Microsoft is missing a huge chunk of the mobile and wearables pie. ARM is their re-entry trajectory for these markets, so I really don't think they have abandoned ARM.
 
Faster by how much, and under what thermal envelope and workloads?
AMD claims around 59% increased perf/CU for Renoir over Picasso. I haven't seen any detailed reviews yet doing like-for-like comparisons, but leaks and preliminary data suggest it's not far off at least. But again, a significant part of this is due to faster RAM. The best case scenario for Picasso was DDR4-2400, and now pretty much the worst case scenario is DDR4-3200 with LPDDR4X-4266 being a shoo-in for anything thin and light. That'll be an immense boost for the 15W SKUs (and especially the ones configured to 25W).
 
More players in every segment is a good thing for all of us. It would be great if Nvidia entered cpu segment as well in the long run.
Nvidia tried their hand at ARM SoCs back in the early 2010s, and bowed out due to stiff competition and small margins. While they still make them for their automotive customers (... and Nintendo, though that design is ancient by now), they aren't likely to return to custom ARM chips for consumer or enterprise use any time soon - designing the chips is too expensive and difficult and competition against entrenched players with years of experience is likely too much to take on (though I could see them buying an ARM server vendor if that suited their long term goals). And of course they don't have (and are never getting) an X86 licence (why would Intel grant them one?), so that door is shut.

If Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
While I tend to mostly agree with you, Windows on ARM has promise simply due to the emulated compatibility layer (and the reportedly upcoming expansion of it to 64-bit). That would make thin-and-light ARM Windows laptops pretty great if the performance was good enough. Of course Qualcomm and the others are still miles behind Apple in this regard, so the ideal combo there would be an iPad or ARM MacBook running WoA :p
 
Yeah, I also forgot to mention Apple, but you didn't. Thanks for the reminder. :)

Microsoft is missing a huge chunk of the mobile and wearables pie. ARM is their re-entry trajectory for these markets, so I really don't think they have abandoned ARM.
The thing is, they tried to have a presence in that market and completely fumbled away their progress. Nokia made some great phones, and WP8 was decent, WP8.1 was the pinnacle of MS’s mobile efforts. W10M was an utter disaster. As was MS buying Nokia. As was Windows RT. I know, because I was one of those heavily invested in MS’s mobile consumer push—I had purchased several Nokia WPs, both Surface RT and Surface 2 (even a Surface 3 non-pro), and I even tried MS band 2. The pre-MS Lumias were great. The Lumix 950 was literally a hot mess—mine got blazing hot doing absolutely nothing. Band 2 was a great idea, but the thing was so poorly made that it fell apart inside 3 months, and it’s warranty replacement did too.

I was all-in with MS, but I’ve had so many bad experiences with their hardware that I’ve vowed to never buy anything with their name on it that isn’t a mouse or keyboard. I’ll use their OS and Office, but that’s it–they‘ve shown no real commitment to anything else. I don’t even trust Surface. If you look at that brand’s track record, few devices have really been successful. Their App Store is a joke too. The few apps I’ve purchased or tried from there won’t even install correctly or run after the fact. MS can’t even master what other software companies have managed to do–install software on Windows!
 
The thing is, they tried to have a presence in that market and completely fumbled away their progress. Nokia made some great phones, and WP8 was decent, WP8.1 was the pinnacle of MS’s mobile efforts. W10M was an utter disaster. As was MS buying Nokia. As was Windows RT. I know, because I was one of those heavily invested in MS’s mobile consumer push—I had purchased several Nokia WPs, both Surface RT and Surface 2 (even a Surface 3 non-pro), and I even tried MS band 2. The pre-MS Lumias were great. The Lumix 950 was literally a hot mess—mine got blazing hot doing absolutely nothing. Band 2 was a great idea, but the thing was so poorly made that it fell apart inside 3 months, and it’s warranty replacement did too.

I was all-in with MS, but I’ve had so many bad experiences with their hardware that I’ve vowed to never buy anything with their name on it that isn’t a mouse or keyboard. I’ll use their OS and Office, but that’s it–they‘ve shown no real commitment to anything else. I don’t even trust Surface. If you look at that brand’s track record, few devices have really been successful. Their App Store is a joke too. The few apps I’ve purchased or tried from there won’t even install correctly or run after the fact. MS can’t even master what other software companies have managed to do–install software on Windows!
All you say is true. I still remember my Lumia 1020, its godly camera and buggy microphones...:rolleyes:

However, Microsoft failing a few times doesn't mean they will also fail the next time they give it a try. Let's face it, future is all mobile and MS have 0 presence in mobile, so 2+2=4? It's only a matter of time until we see their next attempt at it.
 
All you say is true. I still remember my Lumia 1020, its godly camera and buggy microphones...:rolleyes:

However, Microsoft failing a few times doesn't mean they will also fail the next time they give it a try. Let's face it, future is all mobile and MS have 0 presence in mobile, so 2+2=4? It's only a matter of time until we see their next attempt at it.
I still have my Lumia 950 and still love the phone. It's just the apps available is :banghead:

Their android apps, like office and edge browser, are pretty good imo.
 
All Intel needs to do is solve for that and they can beat any APU. Which they have shown a willingness to do in their IRIS platform.
Easier said than done when you need mainstream settings just to meet the challenge with your esram special.

There is nothing to "beat", you simply need faster memory and at this point this means DDR5. AMD had the closest thing to a solution with the HBC thing but that never made it's way to APUs.
Total disagreement. Logic dictates AMD ought to have a full house with the advent of compute graphics, however Nvidia still holds together with grace. Numbers aren't the issue, it is internal bandwidth and Nvidia knows it best to adopt mobile rasterization for this purpose.
One could say, yeah it is just a bunch of 3D stages seperated nicely into performance numbers, however that would overlook the runtime of data in flight. It is the architecture that makes bandwidth possible. AMD has its shot, not by the dint of its memory, but heterogeneous memory address space. If they can keep addressing to a minimum with cpu serving the scalar graphics pipeline.
 
Last edited:
Total disagreement. Logic dictates AMD ought to have a full house with the advent of compute graphics, however Nvidia still holds together with grace. Numbers aren't the issue, it is internal bandwidth and Nvidia knows it best to adopt mobile rasterization for this purpose.
One could say, yeah it is just a bunch of 3D stages seperated nicely into performance numbers, however that would overlook the runtime of data in flight. It is the architecture that makes bandwidth possible. AMD has its shot, not by the dint of its memory, but heterogeneous memory address space. If they can keep addressing to a minimum with cpu serving the scalar graphics pipeline.

I read your comment multiple times and I honestly couldn't understand one iota of what you wrote. Bandwidth is bandwidth, it's an architecture agnostic characteristic.
 
I read your comment multiple times and I honestly couldn't understand one iota of what you wrote. Bandwidth is bandwidth, it's an architecture agnostic characteristic.
Fury X has bandwidth, too. The difference of external bandwidth is it is not agnostic and only available as a benchmark. Not internal bandwidth, though. It is truly agnostic.
 
Fury X has bandwidth, too. The difference of external bandwidth is it is not agnostic and only available as a benchmark. Not internal bandwidth, though. It is truly agnostic.

Again, I have no idea what you are trying to say, there is no such thing as internal or external bandwidth for VRAM, it's just bandwidth, that's it.

A GPU is engineered to function with any amount of memory bandwidth available but to operate optimally with a specific minimal level. There is no point in putting faster GPUs in APUs when the memory isn't getting faster and there is no going around that, it's a hard limit.

This is probably the last time I respond as I have no clue what exactly are you arguing against, what you're writing is just borderline incoherent to me.
 
You cannot change the gpu regime using bandwidth as a springboard. The gpu access patterns are the same. It takes 4 cycles to do a full read on anisotropic filtering primer(don't expect articulate nomenclature please). You cannot apply supersampling to leverage the memory bandwidth better, in supposition that since the gpu is big-die and the memory is cutting edge hbm that you will leverage it to the full extent, doing more work in the same number of cycles. You will not. The rendering depends on rasterization hardware, it is not thoroughput, well it says it is, but truly it is latency dependent. AF takes multiple bilinear attempts, supersampling takes multiple AF attempts. It is always conformant, never divergent. It is just that in the past, pixel shaders did the work, now the compute does the same. Is it faster? Well, only if you are programming it. It is just a proxy for hardware buffers(forgot the name).
A GPU is engineered to function with any amount of memory bandwidth but to operate optimally with a specific minimal level. There is no point in putting faster GPUs in APUs when the memory isn't getting faster and there is no going around that, it's a hard limit.
Well, if you have a better gpu, you have a more up to date gpu compiler. You can only utilise performance that the compiler can optimize the rasterization pipeline for. If you want all things equal, you have to look at the pipeline. As again, AMD has hardware support to parallelize the pipeline, but it is what it is.

The external bandwidth has data the gpu is unaware of. That is the difference. At full speed, it takes 250MHz to read memory end to end. Every cycle a single module of the GDDR5 system bursts just 4 bytes. It is not online memory like the registers are. Those are crazy.

I guess the correct term is,
it may take up to 768 threads to completely hide latency.
^that was how quick the registers are. 250MHz vs 768Hz.

Plus, GDDR5 is aligned. You get seriously worse performance when you introduce timings. Signals need to phase in.
 
Last edited:
I read your comment multiple times and I honestly couldn't understand one iota of what you wrote. Bandwidth is bandwidth, it's an architecture agnostic characteristic.
Sadly that is par for the course with that user's posts - they tend to be a word salad of the highest order. I think it's a language barrier thing, but it's further compounded by an outright refusal on their part to even attempt to clarify what they are trying to say. (Might be they are using a translation service? That would definitely complicate any explanation, though they seem to fundamentally refuse to accept even the slightest suggestion that they have been unclear about anything whatsoever, and seem to assume that any failure of comprehension is entirely due to the reader's lack of knowledge rather than their writing. It's rather fascinating.) Sometimes parts of it makes sense, but I've never seen a post of theirs longer than a few sentences make sense as a whole, and typically not even that.

Sadly, that doesn't stop me from trying. You might call this tilting at windmills, but one day - one day! - I want to have a coherent and comprehensible discussion with them.
You cannot change the gpu regime using bandwidth as a springboard. The gpu access patterns are the same. It takes 4 cycles to do a full read on anisotropic filtering primer(don't expect articulate nomenclature please). You cannot apply supersampling to leverage the memory bandwidth better, in supposition that since the gpu is big-die and the memory is cutting edge hbm that you will leverage it to the full extent, doing more work in the same number of cycles. You will not. The rendering depends on rasterization hardware, it is not thoroughput, well it says it is, but truly it is latency dependent. AF takes multiple bilinear attempts, supersampling takes multiple AF attempts. It is always conformant, never divergent. It is just that in the past, pixel shaders did the work, now the compute does the same. Is it faster? Well, only if you are programming it. It is just a proxy for hardware buffers(forgot the name).

Well, if you have a better gpu, you have a more up to date gpu compiler. You can only utilise performance that the compiler can optimize the rasterization pipeline for. If you want all things equal, you have to look at the pipeline. As again, AMD has hardware support to parallelize the pipeline, but it is what it is.

The external bandwidth has data the gpu is unaware of. That is the difference. At full speed, it takes 250MHz to read memory end to end. Every cycle a single module of the GDDR5 system bursts just 4 bytes. It is not online memory like the registers are. Those are crazy.

I guess the correct term is,
^that was how quick the registers are. 250MHz vs 768Hz.

Plus, GDDR5 is aligned. You get seriously worse performance when you introduce timings. Signals need to phase in.
You misunderstand the issues being raised against you. You are claiming that increasing external memory bandwidth wouldn't help iGPUs because they are limited by internal restrictions. While parts of what you say are true, the whole is not. While there are of course bandwidth limitations to the internal interconnects and data paths of any piece of hardware, these interconnects have massive bandwidth compared to any external memory interface, and these internal pathways are thus rarely a bottleneck. For an iGPU this is especially true as the external memory bandwidth is comparatively tiny. Compounding this is the fact that architecturally the iGPUs are the same as their larger dGPU siblings, meaning they have the same internal characteristics. If what you say was true, then a Vega 64 at the same clocks as a Vega 8 iGPU would perform the same as they would both be limited by internal bandwidth. They obviously don't, and thus aren't.

Beyond this, your post is full of confused terminology and factual errors.

Simple ones first: how is supersampling (a form of anti-aliasing) related to anisotropic filtering (texture filtering)? And how does the computational cost of performing an operation like that become "bandwidth"? What you are describing is various aspects of the processing power of the GPU. Processing of course has its metrics, but bandwidth is not one of them, as bandwidth is a term for data transfer speed and not processing speed (unless used wrongly or metaphorically). Of course this could be relevant through the simple fact that no shader can compute anything without having data to process, which is dependent on external memory. You can't do anisotropic filtering on a texture that isn't available in time. But other than that, what you are saying here doesn't relate much to bandwidth.

Second: the statement "it takes 250MHz to read memory end to end" is a meaningless statement. Hz is a measure of cycles per second. Any amount of memory can be read end to end at any rate of cycles/second if given sufficient time. Do you mean to read a specific amount of memory end to end within a specific time frame over a specific bus width? You need to specify all of these data points for that statement to make sense. Also, the point of memory bandwidth is to be able to deliver lots of data rapidly, but not to read the entire memory end to end - most data in VRAM is unused at any given time. The point of increased memory bandwidth is thus not to be able to deliver the full amount of memory faster, but to be able to keep delivering the necessary amount of data to output a frame at either a higher detail level/resolution at the same rate, or at the same detail level/resolution at a higher rate.

Also, how does 768 threads become 768Hz? A thread is not a cycle. 768 threads means 768 parallel (or sequential, though that would be rare for a GPU) threads at the given speed, for however long the threads are running. The statement you quoted seems to be saying that at a given speed (I assume this is provided on your source) 768 threads would be needed to overcome the latency of the system as compared to the X number of threads (again, I assume provided in your source) the current system actually has. (Btw, where is that quote from? No source provided, and not enough context to know what you're quoting.) The quote certainly doesn't seem to say what you mean it to say.
 
Back
Top