What about new GPU architecture doctrine where raster and ray traced cores are modulary driven, designers could add more raytracing cores without affecting raster performance.
They're radically different kinds of calculations.
If you look at the papers on Ampere and Turing you will see that INT (integer) functions were added and the new cores carry those functions, because they're built to handle them faster. Efficiency in GPUs is often obtained by
reducing the functionality of cores, making them more single-purpose. Example of that is how Nvidia don't offer high precision floating point on Geforce, and did on Titan, but eventually killed that too.
But in the history of GPU development there is always a shift back and forth, as new functionality is added, and refined, and at some point becomes a known quantity. There's always a phase where the resources a GPU gets are not perfectly aligned with what games want. That's why there are differences in performance between engines/GPU families/games.
Pascal to Turing: less performance per shader, lower perf per clock, too, AND lower clocks, but the functionality was expanded. Ampere iterated on that with a further refinement of shader count, and the sacrifice was made in TDP to remain competitive in raster performance.
Either way, yes, I do agree the ideal situation is one where you don't waste die space on cores that are going to be idle at any point in time. We have yet to see if that is feasible.