• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Hyperthreading in i7 vs P4?

Main reason why Intel is insisting on HT is because they need a very small chunk of physical die for HT functionality and they get boost ranging from 20-50% in various applications, meaning they need very little effort for some major performance boosts. Plus it's using more of the existing CPU instead of passing around half empty threads. So to speak.
 
i just think of ht as threads not cores lol. id go off on a rant if i ever thought about them being cores.
Like i have to call amd modules. to me a core is a core. and a module is a core with extra bits that lets it do 2x as many of some operations in the same clock. and ht is just forcing an extra thread to work in the core letting it do 2x as many of some operations in the same clock. id prefer a Pentium d dual core, to a Pentium 4 with hyper threading. id prefer an i5 quad to an i3 with hyper threading. and id prefer a 6 core i7 with HT turned off to a i7 4 core with ht turned on..
this is because i count the actual amount of cores. and not the amount of threads it can sometimes do in a perfect situation. whether thats because of lack of fp and other components in a module. or because its just forcing things to be computed when its able to thanx to scheduling. its not really the same as having actual cores.
Still like i said i cant call them cores because they just arent.. its like buying 2 tv's on ebay. and one of them is fine but the other one will only come on whilst the adverts are on. id be pissed.
 
Main reason why Intel is insisting on HT is because they need a very small chunk of physical die for HT functionality and they get boost ranging from 20-50% in various applications, meaning they need very little effort for some major performance boosts. Plus it's using more of the existing CPU instead of passing around half empty threads. So to speak.

Where do you get those numbers from? Even in highly parallel workloads like 7zip, I've seen this to not be the case, 30% is about the max you're going to get, even now. I've done a little bit of testing by disabling/enabling cores and HT to get some numbers for 7zip. What's even more interesting is how having HT disabled with 4 cores enabled makes 4-thread 7zip workloads perform better than with HT enabled, probably because of how the threads are being scheduled. In this case, enabling HT on 4c/8t only provides a 20% boost over 4c/4t in performance when running 7zip with 8 threads. HOWEVER, notice how 4c/8t is almost exactly twice as fast as 2c/4t, which would indicate that more cores means better scaling. If you were to do the same test with a 4-module CPU, you would see better scaling than HT. HT uses extra resources when they're available. Where modules have dedicated extra resource for running a second thread. I think you can guess which results in better integer performance scaling. :p

HT isn't what makes your i7 fast, man. ;)
ht-png.49520
 
Last edited:
Even in highly parallel workloads like 7zip, I've seen this to not be the case, 30% is about the max you're going to get, even now.

Hyperthreading isn't for parallel workloads which most of the threads are processing similar instructions that are using the same execution units in each core.

The purpose of hyperthreading is to minimize the idle execution units when processing different threads. Like, for example, playing call of duty while recording it with fraps and also listening to music in the background. Or watching some netflix while compressing a backup with 7zip running in the background.

htt-8.png
 
HT isn't what makes your i7 fast, man. ;)
ht-png.49520
24% increase with 8 threads.
11% decrease with 4 threads.
33% increase when looking at home-field advantage (8 on 8, 4 on 4). This is what programmers developing multithreaded software make as the default behavior.

I think the point that is easy to miss is that in the scenario where performance suffered, I think Windows can be safely blamed for allocating the threads improperly (e.g. it had two threads running on one physical core). Regardless of underlying reason why it was slower, the processor also was ~50% idle where the other two instances were 0% idle (presumably). Just because this one task took a small hit covers up the fact that the processor was ready and able to do other, unrelated work simultaneously.

Hyper-threading is typically responsible for about -5-50% performance change depending on workload.

The fact the processor handled 200% threads slightly better than 100% threads speaks volumes to both Windows and the processor's scheduler.
 
24% increase with 8 threads.
11% decrease with 4 threads.
33% increase when looking at home-field advantage (8 on 8, 4 on 4). This is what programmers developing multithreaded software make as the default behavior.

I think the point that is easy to miss is that in the scenario where performance suffered, I think Windows can be safely blamed for allocating the threads improperly (e.g. it had two threads running on one physical core). Regardless of underlying reason why it was slower, the processor also was ~50% idle where the other two instances were 0% idle (presumably). Just because this one task took a small hit covers up the fact that the processor was ready and able to do other, unrelated work simultaneously.

Hyper-threading is typically responsible for about -5-50% performance change depending on workload.

The fact the processor handled 200% threads slightly better than 100% threads speaks volumes to both Windows and the processor's scheduler.

I would agree with you if the improvement from 100% to 200% wasn't something along that lines of 2% gain from 8 threads to 16 threads (4c/8t case) and ~6% for 4 threads to 8 threads (2c/4t case). 2% is within margin for error, so I would consider that unchanged from the previous value. The values for "percent increase" is the improvement in performance is over the last number of threads. So improvement for 8 threads would be over 4 threads. The part that makes me think it's the scheduler is not the improvements over the max number of threads, but rather the difference in improvement for 4c/8t @ 8 threads versus 2c/4t @ 4 threads where the instance with no HT threads excelled. Also the other big thing that makes me think the scheduler is at fault is that 4c/4t running 4 threads was faster than the 4c/8t configuration running 4 threads.

Also the table here is confusing, I should redo the percent increases, which really means the amount of performance per core over 1 thread, not improvement from 4 threads to 8 threads which is really want I want here.

Hyperthreading isn't for parallel workloads which most of the threads are processing similar instructions that are using the same execution units in each core.

Actually it is because no workload is purely using one part of the CPU. All HT does is takes advantage of the fact that super scalar system architectures gave Intel the opportunity to cram data into the pipeline while it waits for some reason. It sees an opening, so it puts something in, instructions always get put into the pipeline at the beginning, so it's not really just grabbing an unused part, it wait for unused parts. Also saying that HT isn't for parallel workloads is really funny since the only purposes of more than 1 core or 1 thread is for parallel thread-like workloads, not GPGPU-like tasks where all you're really doing is transforming matrices, just multiple threads in general because anything happening at the same time is "parallel processing", weather that task is the same or different. Stuff running in parallel doesn't imply that they're the same instructions either or that parts of the CPU can't be shared. A lot of reasons for more threads is to do a task alone and asynchronously which would benefit the most from HT. They might share less of the CPU because of similar kinds of instructions used, but not certainly not all of. HT suffers when you have a lot of context switching or if a pipeline stall occurs because it needs to wipe out the entire pipeline to recover from the stall if it was due to branch misprediction. HT suffers even more when you have a ton of locking going on which can slow everything else down depending on how much locking is going on.

Also keep in mind what you just said
The purpose of hyperthreading is to minimize the idle execution units when processing different threads. Like, for example, playing call of duty while recording it with fraps and also listening to music in the background. Or watching some netflix while compressing a backup with 7zip running in the background.
A lot of applications (in particular games,) have a really hard time making code run in tandem (in the sense that two threads are trying to do parts of the same task,) versus say, a game logic thread, versus a network communications thread, versus threads that do network access asyncronously with the network comm. thread, versus a thread that goes GPU dispatch for rendering. So one application can have a lot of different threads, so to say that HT isn't designed for parallel workloads is funny because you're clumping "parallel workloads" with tasks you would give GPGPU devices, which CPUs suck at doing in the first place.
 
whatever the case HAT you are LONG overdue for a full system upgrade but I guss you you already know that ;)

HAT did have an I7 setup but I now have the cpu and he went back to the q6600.
 
I would agree with you if

I was just trying to point out that using 7zip is a terrible way to measure hyperthreading performance.
 
I was just trying to point out that using 7zip is a terrible way to measure hyperthreading performance.
...and I'm telling you that it's not as terrible as you think, there are just limits to what it can do.
 
In my usage HT is hit or miss. Most of the programs I use don't seem to make use of it so it is just not something I feel I should pay for.
 
Running MSE full virus scan while playing SC2 is possible with HT, don't notice any performance hit. With i5 at same clock it seemed to have more latency. Can't really tell a big difference otherwise, i7 is overkill for my use.
 
I was just trying to point out that using 7zip is a terrible way to measure hyperthreading performance.

I disagree with that as 7zip is with 8 threads by far the fastest compressor and also results in smallest archives. I'm using custom Ultra profile for LZMA2 with larger dictionary and word size, eating up around 12-14GB of RAM when working. But it goes through gigabytes of data like it's nothing.
 
Somebody already said this but you people aren't getting the point
HYPER THREADING IS NOT FOR PARALLEL TASKING
it is for utilizing cpu resources that would normally sit unused during a given clock cycle they aren't "cores" so please stop thinking of them as such
given the above 7 zip is INDEED a terrible benchmark so is gaming
infant any type of work load that requires PARALLEL processing of related data is EXACT WHAT YOU DON'T WANNA USE hyper-THREADING FOR

a crude analogy would be using two separate kitchen mixers to make cake batter with one for mixing flour and sugar and baking powder and one for mixing eggs and milk (parallel)
what would be a good idea would be to use one mixer for the cake and another for the frosting (hyper threading)
 
Last edited:
Also the other big thing that makes me think the scheduler is at fault is that 4c/4t running 4 threads was faster than the 4c/8t configuration running 4 threads.
As I pointed out, that's because the processor is ready to accept more work. That is not necessarily a bad thing. The fact of the matter is no programmer, with a task like 7-zip, is going to only use half of the cores available unless the user explicitly tells it to use less. It would use 8 on 4c/8t and 4 on 4c/4t which is a 33% increase--not too shabby. Apples to apples, we'd have to compare 4 threads on 4c/8t to 2 threads on 4c/4t. The former would thoroughly trounce the latter.


a crude analogy would be using two separate kitchen mixers to make cake batter with one for mixing flour and sugar and baking powder and one for mixing eggs and milk (parallel)
what would be a good idea would be to use one mixer for the cake and another for the frosting (hyper threading)
The analogy would be needing to make four batches of cake dough with one mixer. HT enabled would add 33% more to each batch and pull it off when it is prepared for cooking. A normal mixer would have to be ran four times. With HT enabled, it would only have to run three times to get four batches mixed.
 
Last edited:
Somebody already said this but you people aren't getting the point
HYPER THREADING IS NOT FOR PARALLEL TASKING
it is for utilizing cpu resources that would normally sit unused during a given clock cycle they aren't "cores" so please stop thinking of them as such
given the above 7 zip is INDEED a terrible benchmark so is gaming
infant any type of work load that requires PARALLEL processing of related data is EXACT WHAT YOU DON'T WANNA USE hyper-THREADING FOR

a crude analogy would be using two separate kitchen mixers to make cake batter with one for mixing flour and sugar and baking powder and one for mixing eggs and milk (parallel)
what would be a good idea would be to use one mixer for the cake and another for the frosting (hyper threading)

You fail to understand what the CPU is doing under the hood. Hyperthreading is simply pipeline level parallelism and depends on the particular op code instructions being executed as well as the op code instructions that were run before it that are still in the pipeline. HT just tries to fill the gaps in the pipeline to squeeze a little more juice out of it, that is all. Depending on the code running and what kinds of instructions are being executed, parallel tasks that are sufficiently complex will benefit from HT but if all you're doing is adding or strictly using the ALU or any task that a GPU would excel at, you're not going to realize any benefit from it because a single component in the CPU is becoming your bottleneck, but using the blanket statement that HT isn't for parallel applications is ludicrous and I think we can see that 7-zip takes advantage of it pretty well. A better statement is that hyper-threading is not an optimal solution for highly parallel tasks, but improves efficiency by getting more done without adding more cores or altering too much of the core itself.

The simple point is that you can only squeeze so much power out of a CPU for any given task and that some tasks benefit from HT more than others, but saying "it isn't for parallel processing" is nuts because it's extra performance you might not otherwise have. It would be stupid to reject it.

Also, do you find that using all caps makes you more correct or does it just make you feel better?
 
As I pointed out, that's because the processor is ready to accept more work. That is not necessarily a bad thing. The fact of the matter is no programmer, with a task like 7-zip, is going to only use half of the cores available unless the user explicitly tells it to use less. It would use 8 on 4c/8t and 4 on 4c/4t which is a 33% increase--not too shabby. Apples to apples, we'd have to compare 4 threads on 4c/8t to 2 threads on 4c/4t. The former would thoroughly trounce the latter.



The analogy would be needing to make four batches of cake dough with one mixer. HT enabled would add 33% more to each batch and pull it off when it is prepared for cooking. A normal mixer would have to be ran four times. With HT enabled, it would only have to run three times to get four batches mixed.
assume you have TWO mixers but need to bake 3 cakes 1 chocolate 1 vanilla and 2 carrot now each mixing bowl only holds so much and each mixer can only do so much work

now a carrot cake can be made by starting with a vanilla cake mix and adding other ingredients to the vanilla cake batter this is where hyper threading comes in I can put both vanilla cake mixes in one mixer now conversely I can make a vanilla cake by taking a carrot cake recipe and omitting the carrots and adding extra vanilla and other stuff

now a chocolate cake while sharing some of the same base components is a totally different beast while it shares some components with carrot and vanilla its best left in its own mixer now if I had 3 mixers I don't need to fuss with reserving some of the vanilla batter for my carrot cake and I can just dedicate one mixer to each cake(Physical cores) now of course you can share things like measuring cups,spoons,bowls,pans between the 3(hyper threading) but If I need to measure 1/2 cup of sugar for the carrot cake and 1 & 3/4 cup flower for the carrot cake well I only have one 2 cup measuring cup so one is going to haft to wait until I am done with the other
 
Last edited:
Why on Earth would 4c/8t be runnng with only 2 threads for comparison with 4c/4t? Thats like comparing apples and oranges.

Use same thread count for both and the HT enabled will triumph pretty much every time easily.
 
You guys went into oblivion :D.

Anyways... HT and data prefetch logic differs greatly in between generations... And HT does work now...(only exception are wooden old game engines and progs made by lazy and uneducated coders) as an old dual socket mobo user I can assure, that during the ancient days of dual PIII then Socket A, then 940, then Socket F, barely anything touched the second stone in the socket in daily tasks so it was with first HT enabled P4 crap... the netburst was so unsuccessful that HT didn't help also much... K8 killed it. But thus HT gained a bit bad reputation at its debut, as it even made things run slower. But not anymore. If the program is compiled recently, it is aware of many CPU features and thus compilation flags are enabled and optimizations are made... (yea the slow flag upon detecting AMD CPU lol :D)

Last true AMD core was K10.5, then they split it to smaller shorter pipe cores and I would like to say, they are comparable with HT, the design win in this scenario is power consumption as you can power off more parts of the cpu for idle process. You know they just can't use HT due to legal reasons, the idea is fine. If it runs better without adding much costs? 5-10% gain is about the same we gain going to next generation of CPU's. Those ain't old +100% more speed days...
 
Why on Earth would 4c/8t be runnng with only 2 threads for comparison with 4c/4t? Thats like comparing apples and oranges.

Use same thread count for both and the HT enabled will triumph pretty much every time easily.

Then compare just the 8 threads with 4c/8t with 4 threads at 4c/4t and 2c/4t. It gets the same point across...

Is it an improvement: YES!
The debate is how much of an improvement it is in certain situations, not whether it's an improvement at all.

I think most complaints about HT are more resultant of a CPU scheduler not taking advantage of certain resources in the right manner, not HT making matters worse.
 
Back
Top