Originally Posted by W1zzard
Explaining TLB is kinda difficult, most articles I found, including Wikipedia are junk.
These slides http://www.ece.cmu.edu/~ece548/handouts/05vmarch.pdf
do a decent job. Gotta go through the whole presentation though to understand the underlying concept.
+1: Good stuff. Thanks for the link. I already knew a good amount of about virtual memory and paging so the level this was pretty good, at least for me. Pardon me if I do a bad job of explaining this, but I think there is a middle ground to describe what happens here.
Basically it's just a set of mappings of virtual memory relative to the cache. Keep in mind that cache does not reside in physical or virtual memory, it stands on its own and does it own thing. Whenever cache is accessed the real address that you're trying to access get translated and checked against the cache. The cache itself only stores the value of the data. Where it is in the cache is derived from the translation from the virtual address.
This is just specifics on how cache itself works as one of the faster tiers of memory (SRAM is fast stuff,) and doesn't tell you much more than what cache is supposed to do, and that's holding on to data that we think we will need again soon. The issue is, if I were to take a guess, is that memory access times are varying and since it's not getting data in the same amount of time every time an instruction is called it adds variability to the amount of time it will take for the GPU to fetch data process it and store it. So even though a particular set of instructions might execute in x amount of time in one circumstance, it could take y time in another instance because the composition of the cache is different and now you have a different number of hits and misses as you did before.
As I said, don't take my word on that, I'm speculating pretty hardcore right here and I could be very wrong, but it would describe why this happens.