Originally Posted by Steevo
That is exactly what I was talking about, fetch and decode and a hardware scheduler to push it to the GPU cores, or CPU cores based on type of work and load. Even if the GPU is twice as slow at one instruction if the CPU cores are busy and its not....
And if the GPU and CPU can read and write to shared cache so the GPU could execute instruction A, C, F, H, I, J, and M, while the CPU runs B which is dependent on A, D & E which are then stored for F to run two iterations of, the checked results are then stored back for the CPU to run G....so on and so forth.
Take a computer architecture course and you will understand why this is not feasible. Not just with the hardware but x86 as well. First of all your logic fails when you consider the application. Your GPU does not execute x86 instructions
it is informed what to do, it does something and gives it back. The video card doesn't just do something, it does the same thing to everything in the buffer provided to it.
What if Instruction B depends on data from instruction A? or F depends on data from instruction E? Between the PCI-E latency, time it takes for the GPU to execute the instruction, store it, then send it back over PCI-E you just hammered your performance by 10 fold. GPUs aren't designed for procedural code. They're designed to process large amounts of data in a similar fashion and I think you're confusing what a GPU can actually do. GPUs process many kilobytes to many megabytes of data per every instruction, not just two operands.
Learn what you're talking about before you start saying that something can be done when people who do this for a living and have had 8+ years of schooling to do this stuff. Honestly, what you're describing isn't feasible and I think I pointed this out before.