I never looked it up until now and what I see fully supports what I am saying:
Heh, kinda looks like I authored it, but I didn't. XD
Fetching extra data isn't necessarily an instruction--it is the continuation of a previous instruction carried out over more clocks. Again, a deteriment to the IPC figure.
Indeed, if you believe wikipedia, a site made up of information provided by people that usually have no clue what they are talking about. I would be even more inclined to believe the Wikipedia article if it didn't have
This article does not cite any references or sources.
in huge letter at the top of it.
Though even if we go by what wikipedia says, and I'm not saying we should, nothing really there supports anything you have said. IPC can vary even on the same processor depending on what software is run, I don't see how that helps your argument any. Different software uses different instruction sets, this can have a great impact on the IPC for different software on a given processor. It talks nothing of IPCs being different because of the useful work done on the processor or cache misses.
And even if you read futher down you would have seen:
The useful work that can be done with any computer depends on many factors besides the processor speed. These factors include the processor architecture, the internal layout of the machine, the speed of the disk storage system, the speed of other attached devices, the efficiency of the operating system, and most importantly the high level design of the application software in use.
For users and purchasers of a computer system, Instructions Per Clock is not a particularly useful indication of the performance of their system. For an accurate measure of performance relevant to them, application benchmarks are much more useful. Awareness of its existence is useful, in that it provides an easy-to-grasp example of why clock speed is not the only factor relevant to computer performance.
Heh, yeah. Small caches are a mixed blessing. They take up less silicone space so they are cheaper to manufacturer and because they are simple in design, they tend to overclock better; however, they also aren't as efficient at completing work, clock for clock.
Here is another area you are wrong. In terms of the Celeron and Sempron, they don't take up less silicon space, and aren't actually any cheaper to manufacture than their full cache'd brothers. Celerons and Semprons are the same as their bigger brothers, they simply have sections of L2 cache that are defective. The defective sections are disabled, and the processors are sold with lower cache. They are not any cheaper to produce, they actually cost exactly the same, however they can be sold for much cheaper priced, because if they were not sold, they would be a loss as they would just be thrown out. It is much better to sell a processor for next to no profit(which is usually the case with the Celeron and Sempron) than it is to take the loss of just throwing it away.
Sometimes they do overclock betteer with the smaller cache, as having less cache means less to go unstable. However, the instability has moved away from the cache, and we are starting to see both the high end and low end maxing out at very similar speeds. Though this even varies from processor to processor greatly, even varing greatly in the same stepping(the G0 Q6600's being a good example of this).
One advantage though, is the relatively small power consumption and heat output the lower cache chips produce. Disabling large sections of L2 really helps lower heat production, especially since the L2 takes up the majority of the die.