I have not said one thing about cores, only performance. You also failed to post the very next line by the article
It's also easier to dump down system requirements to core count, because it's a quick way to dismiss a wide range of CPUs. For example, games no longer run properly, or at all, on dual-core CPUs, so in that sense you require at minimum a quad-core to game. Having said that, most modern and demanding games don't run well on quad-cores, even if they support SMT (simultaneous multi-threading). That sounds like I'm contradicting my own argument right off the bat, but once again, it's first and foremost about overall CPU performance.
Yes, because minimum core count is minimum prerequisite before you can talk about overall performance nowadays. Without that games won't run well at all. You can theoretically condense power of quad core into single core chip, but it would still fail of very basic reason and reason for that is simply games today are written for multicore processors and expect at least certain amount of independent processing cores and that's because if you tried to run multicore optimal code on single core chip's pipeline would stall a lot, because much of code depends on sequential code's results, therefore you can't ever fill up pipeline of that theoretical chip's fully and it matters a lot, because full pipeline means more code executed per each cycle, so if you use multicore optimal code, you basically starve pipeline of code. Or you end up trying to do 4 core's job in one pipeline, which simply doesn't fit in it, even if that code could enter pipeline fast, then you would need more ALU's, FPU's, other instruction logic quadruppled and in the end that would only create one core, that has many other internal bottlenecks. Your 1% lows will be shit and framerate will gyrate a lot during game, that's why we benefited a lot from more than one core. It's not just about overall processing power, it's also about simply more efficient whole chip's design. Obviously, some compute tasks don't scale well across cores, but most code can and does, but not to infinity (mostly due to nature of MIMD code, aka multiple instructions, multiple data). That's also why Intel switched to P and E cores, because it works well for both poorly scaling code and for well scaling code, oh and also for poorly scaling code which requires many and complex instructions and also code that is light on instructions, but simply needs dependencies of previous results (again, not to infinity, because then you end up with GPU or in other words a powerful SIMD processor, in other words a processor that scales to many cores very easily, but requires processing of only one or very little different instructions, which is basically what many 3D models in games are, mostly instruction light vector data, also mostly single precision floating point data too).
Anyway, it's very simply, you can talk about overall performance, only when you have enough cores to begin with, oh and also don't expect that performance to scale with core count too (some games can utilize a lot of cores, others don't). So as long as you have enough cores, more correctly stated, you have to care about performance of each of those cores that can be utilized, not all cores on chip.
So why does Techspot talk about cache? Reason is very simple, CPU itself processes data and asks for data from RAM, but what actually fetches data from RAM is branch predictor and cache. Since cache is only very small (mostly for performance reasons, although for cost too, but only to some extent) and can't store the whole code of the game, which is in RAM, branch predictor has to guess what code will be most likely needed. Most of the time branch predictor is right, but not always. And since it is right most of the time cache is filled with what CPU actually needs. Some data can and is too big to fit into cache, which costs CPU, cache and RAM cycles, therefore hurts performance, but most code fits and is executed reasonably fast. Obviously bigger cache lowers chances of code fitting into it, therefore you get better performance, but often designing big cache is simply impossible due to its size and size means increased cost of chip, but also can mean lower performance and also it can make other more important parts of CPU not fit too. Brands like Intel and AMD don't touch branch predictors, each core has the same amount of them, but cache (more precisely L3 cache) is resized according to core count. And if you have too many cores for task like running game, you still get more free L3 cache per core, therefore it has potential to reduce lower performance in big code (or compensate for bad branch predictor, which makes CPU fetch more data than needed). That's why you see a bit improved performance (in some instance it can be a lot, especially outside of gaming) between different core count chips in same game, even if not all cores are utilized.
Like article shows, you can compensate that lack of L3 cache by running chip at higher clock speed. It works, but it doesn't eliminate bottleneck from having too little cache and you basically waste cycles of CPU with empty pipelines or underutilized ones. So if you up the clock speed, you end up using more electricity to achieve a task, which bigger cache could have solved at lower power level, not to mention that it's not always possible to raise clock speed for various reasons.
And we end up with TL;DR, that you need certain minimum core count (ignoring SMT as it merely helps to fill up gaps in pipelines, but doesn't behave like core and doesn't make your CPU process more data) and you better have enough L3 cache for intended task.
BTW there could be even more nuance in performance, but this is reasonable enough short advanced description of what happens in CPUs while gaming.