That totally depends on how Windows intend to detect which threads can be moved to the slow cores. If for instance these cores are only used for low-priority background threads, then the implications will be very low, but so will the efficiency gains. There are usually thousands of these threads, but they only add up to a few percent of a single core in average load.
The issue isn't just the load, having a lot of context switches for the threads is a slow process and again, it can cause the CPU to trash some important(to the game) cache data from L1/L2(or maybe L3) into the higher hierarchy. In the end, it can cause big latencies hits for games.
But in any user application and especially games, all threads will have medium or high priority, even if the load is low. So just using statistics to determine where to run a thread will risk causing serious latencies.
Most user tasks aren't highly demanding either.
Most things in a game is synchronized, some with the game simulation, some with rendering etc. If one thread is causing delays, it will have cascading effects ultimately causing increased frame times (stutter) or even worse delays to the game tick which may cause game breaking bugs.
That's for the game logic/simulation side of things really. Because yes, that could cause issues if not synchronized, many things however wouldn't be.
Networking may or may not be a big deal, it will at least risk having higher latencies.
Physical medium latency for networking is much higher than anything that the little core would likely provide.
Graphics are super sensitive. Most people will be able to spot small fluctuations in frame times.
Frametimes fluctuations isn't only CPU dependent and might be an issue with the GPU too.
I see fairly little poop on my screens

Asset popping is mostly a result of "poor" engine design, since many engines relies feedback from the GPU to determine which higher detail textures or meshes to load, which will inevitably lead to several frames of latency. This is of course more noticeable if the asset loading is slower, but it's still there, no matter how fast your SSD and CPU may be. The only proper way to solve this is to pre-cache assets, which a well tailored engine easily can do, but the GPU will not be able to predict this.
Still not synchronized. The main game loop won't have a mutex/semaphore waiting for assets to load.
Volatile memory is a finite resource and it's less than what HDDs/SSDs have, even a 'well tailored engine' might not have all assets that it needs cached.
Don't forget that most apps in Android is laggy anyway, and it's impossible to the end-user to know what causes the individual cases of stutter or misinterpreted user input.
So I wouldn't say that this is a good case study that hybrid CPUs work well.
Not talking about android specifically. Android is the worst case scenario since the architecture is really made to support a vast array of devices, with each OEM providing the HALs needed to support the devices. Per example, audio was something that had an unacceptable high latency because of those abstraction. Per example, just check this article
https://superpowered.com/androidaudiopathlatency
Linux however is doing a good job at it though. They have energy aware scheduling and a lot of features really
or patches like this
Heuristics helps the average, but does little for the worst case, and the worst case is usually what causes latency.
It all depends. If the scheduler has a flag that says like 'this task cannot be put into a little core', then it shouldn't cause the worst case scenario. That of course was just an example, we don't know how exactly the scheduler and the hardware that Intel will put into the chip to help with it, actually works.
There is a very key detail that you are missing. Even if the small cores have IPC comparable to Skylake, it's important to understand that IPC does not equate performance, and especially when multiple cores may be sharing resources. If they are sharing L2, then the real world impact of that will vary a lot, especially since L2 is very closely tied to the pipeline, so any delays here is way more costly than a delay in e.g. L3.
Of course a delay in L2 is way more costly than one in L3, they have pretty different latencies. And no, L2 generally aren't super tied to the pipeline, L1I and L1D would be more tied to it.
Now about sharing resources, yes that's true. But also keep in mind that the L2 is very big with it having 2 MB per 4 Gracemont cores, more than what Skylake(and it's optimizations like Comet Lake) had per core, which was 256 kb/core. I find it unlikely that it would cause an issue as assuming that all 4 cores are competing for resources, they will have 512kb for each. The worrying part isn't resource starving, as it has more than enough to feed 4 little cores, it's issues like how a big L2 slice like that acts in question of latency. It will obviously be considerably bigger than 512kb of L2, but then, those aren't high performance cores, so might not end up being noticeable.
Another thing that reduces a little strain on the L2 is that each little core has 96 kb of L1, 64kb being L1I and 32kb being L1D.
Anyway, we can't know for sure until Intel releases Alder Lake, this is all unsupported speculation until then. And it all depends on how good 10 nm ESF ends up being and how they clock those little cores. Knowing Intel, they might do it pretty aggressively for desktop parts, so really, 3.6GHz or more could have a chance of happening, or they could just clock it Tremont and keep it at 3.3 GHz. Nobody knows.