• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

core 2 quad numa affinity and scheduling

Joined
Oct 14, 2017
Messages
210 (0.07/day)
System Name Lightning
Processor 4790K
Motherboard asrock z87 extreme 3
Cooling hwlabs black ice 20 fpi radiator, cpu mosfet blocks, MCW60 cpu block, full cover on 780Ti's
Memory corsair dominator platinum 2400C10, 32 giga, DDR3
Video Card(s) 2x780Ti
Storage intel S3700 400GB, samsung 850 pro 120 GB, a cheep intel MLC 120GB, an another even cheeper 120GB
Display(s) eizo foris fg2421
Case 700D
Audio Device(s) ESI Juli@
Power Supply seasonic platinum 1000
Mouse mx518
Software Lightning v2.0a
in the beginning of core 2 cpu there were 2 cores running the same die and communication between the 2 cores was internal, in the core 2 quad there are 2 glued together duo dies completely separated from each other and their communication is through the FSB - that reduces performance and increases latency in addition to that the use of the FSB for inter-core communication reduces the available FSB resources for other FSB operations (PCI, RAM, SB etc). the OS (in this case Linux) is not aware of this so it cannot deal with this problem automatically, and neither does windows. this requires user intervention, to tell the system how to run it's processes in the most efficient way.

this means I need to control affinity of processes manually. what I am asking is if anyone has experience with this and advice on what strategy is best to use to run as fast as possible - such as:

1. run light loads on one core and heavy loads on the others
2. run light loads spread on all cores and the heavy loads with affinity
3. run light loads spread on same die and the heavy loads with affinity
4. group together to the same die all the processes that interact with each other and separate the processes that don't
 
in the core 2 quad there are 2 glued together duo dies completely separated from each other and their communication is through the FSB - that reduces performance and increases latency in addition to that the use of the FSB for inter-core communication reduces the available FSB resources for other FSB operations (PCI, RAM, SB etc).
I don't believe that is correct.

First, they are not glued together. They are on the same die with a communications path measured in micrometers or even nanometers.

Second, communications between the two pairs does NOT occur over the same bus as communications with PCIe, RAM etc. That bus is on the motherboard and can be measured in inches.
this means I need to control affinity of processes manually.
Ummm, no it doesn't.

The problem with your scenario is that it is no where near completely defined.

Setting CPU affinity manually forces Windows to use the core or core designed by that setting for that specific application. Windows will only run that app on that core (which may be busy) even if other cores are doing nothing.

If you just leave it alone, the OS will assign the application to run on the least busy core. That's a good thing. And modern operating systems know how to optimize those settings quite well.

I am not saying there is no reason to manually dink with these setting, but it is application specific and typically only advantageous with specfic older programs not designed for multi-core (or multi-CPU) systems.
 
(...) and neither does windows. this requires user intervention, to tell the system how to run it's processes in the most efficient way.
Could you provide any sources proving that Windows has no knowledge of how to handle c2q cpus?
 
if I could provide sources proving OS mis-scheduling, I wouldn't be asking the question, because then and there I would be informed about the solution to the problem, in addition you should know that windows is closed source, nobody can prove whether the windows scheduler does or doesn't know (and if it does know, how it handles it) about the hardware relationship and architecture of the core 2 quad. my search on the internet shows no results that give information of the problem or the solution.
 
I don't believe that is correct.

First, they are not glued together. They are on the same die with a communications path measured in micrometers or even nanometers.

Core 2 Quads were definitely made out of two separate dies and they most definitely communicate through the FSB as well :

1514562498128.png


Remember how AMD used to boast about the fact that their quad cores are true " quad cores" on just one die ?

To OP : don't bother , it doesn't make that much of a difference.
 
ok, thank :)
I will just enable multicore processor support in the kernel options and be done with it
 
Then please decide. Either:
the OS (in this case Linux) is not aware of this so it cannot deal with this problem automatically, and neither does windows.
or
"I don't know".

Assuming you find some kind of "solution" - how are you going to measure the improvement?
How are you going to prove that "solution" actually works?
What would be the baseline for that measurement, without knowing how the OS handles the hardware?
 
I would bet windows already has some sort of scheduling optimization for Core 2 processor.

Modern Ryzen CPUs are made in the exact same way using two dies and they run just fine.
 
https://bitsum.com/

You might want to play with process lasso. I think this is what you're after.
 
Modern ryzen CPUs are not made the same way. Yes they have two different dies, but they have a native interconnect between the dies. Core 2 quad made the cores talk to each other through the FSB. I thought this was common knowledge, at least among the "old" veterans. @Vya Domus is right though on the point that Windows (7+ at least afaik) already handles this as best it can. Setting affinity can help in certain circumstances, but if you're using this processor for everyday use, and don't have a very specific application in mind, there is no setting that you can just set and have it fix the issue. It has to be handled on an application specific basis.
 
Modern ryzen CPUs are not made the same way. Yes they have two different dies, but they have a native interconnect between the dies. Core 2 quad made the cores talk to each other through the FSB. I thought this was common knowledge, at least among the "old" veterans

It's same in the sense there are two dies that have a limited capability in terms of communication between them the same way the Core 2 Quad had and thus it's susceptible to the same theoretical problems/limitations.
 
Papahyooie is going to be the E5450 core 2 quad modded with 771-to-775 and im going to run on it avisynth with x264 along with a couple other light apps, all on linux
my plan is to start 2 separate encoding/filtering instances of: avs2yuv + x264, each instance will run 2 deinterlacing threads. my thinking was that I should keep each of the instances "sticked" to a specific die (one instance to each die) since they are independent tasks - there should not be communication between them, and would avoid migrations between dies, on top of that I planned to distribute evenly the other 2 light loads across the 2 dies.
right now this box is running 2 instances with 1 deinterlacing threads each on an E8600, 3800 mhz, it gives 0.06 fps.
I made this thread to see if I can maximize utilization of the quad when I get it to get to 0.12 fps
 
Last edited:
The only option when it would anything (affinity), is when app can use two cores max. and scales with cache.
Then switching it to use two cores on seperate dies would be a good idea.
 
agent_x007 I could do that, tell the avisynth to do 4-thread deinterlacing and run only 1 instance, but there is a problem: cpu is never at 100% so I loose speed, the most ideal situation for avisynth is to run 1:1, one instance per cpu not 1 thread per cpu, so that is the problem. but on the other hand I can't do 1:1 because instances use 1.5GB RAM each, unlike threads so...
 
Umm, people are getting their facts mixed up here.

-Core2Quads WERE 2x Core2Duos 'glued' together.
-They DO communicate over the FSB if cache data needs to be shared.
-Windows DOES know about Core2Quads and know how to do processor affinity properly.
-You claim that Windows doesn't know how to handle affinity for the Core2Quads yet you said that "windows is closed source so "nobody can prove whether the windows scheduler does or doesn't know" yet you are trying to fix this issue because you "know" that it doesn't work properly with Core2Quads...
 
Last edited:
The lack of notion for hardware abstraction layer, multi socket vs. multi-core and NUMA in this thread is baffling. Also, high performance in a 771-to-775 mod :wtf:
Just run the tasks and let the OS do its thing. Cutting through the management the OS should be doing with it's thread scheduler is probably why your performance is so low...that and the fact that it's a wolfdale/yorkfield core. It is old.
 
Gasaraki all of windows by principle "doesn't work properly" just so you know, the worst OS in the world, people use it only because they are forced to use it, so how would you know what it does or doesn't, you are not the developer of it and neither is anyone else. im not here to judge you or anyone else for using windows, you want to believe you know things that nobody could possibly know, that's fine. but here is the thing that you are missing: this isn't about who knows what, this is about trust: an OS that suckx in many departments cannot be trusted to do anything right, just go and look at what windows 10 scheduler is doing to your processes and since some people already mentioned ryzen - is awful performance when ryzen is with windows 10. how dare you trying to prove windows is does or doesn't know when everything it does is plain and simple bullshit.

im not running windows if you didn't noticed, and now you know why.
 
The FSB is part of the motherboard and is used to establish communications between the CPU, RAM and graphics solution via PCIe. You are suggesting the two halves of this CPU communicate over that bus. I'm just saying they communicate directly over an on-die bus.
 
don't have it yet _JP_ , but I was thinking to plan ahead, so you also say it best to leave it to the scheduler to deal with.
 
Gasaraki all of windows by principle "doesn't work properly" just so you know, the worst OS in the world

Hold on mate, that can't be true, there are OS out there that come with KDE as the default DE :laugh:

I'd be interested in before and after benches if you can
 
UI preference is a subject of itself, I was more referring to the kernel aspect of the operating systems :)

yes I will do some testing when I get it, but so far my impression from people is that it doesn't have as much an impact on performance as I was afraid of
 
Last edited:
The FSB is part of the motherboard and is used to establish communications between the CPU, RAM and graphics solution via PCIe. You are suggesting the two halves of this CPU communicate over that bus. I'm just saying they communicate directly over an on-die bus.

I can't provide a source at the moment, but I'm pretty sure that's not true. I have always been under the impression that C2Q dies talked over the FSB, which was one of the architecture's major drawbacks for multithreaded applicatons. QPI, the on-die interconnect, didn't come until the i7's first iteration. Like I said... I thought this was common knowledge, so I haven't really researched it in forever. May need to dig up some old articles and verify.

AM4 ryzen CPUs have only one die ;)
Perhaps I'm thinking of the Epyc line. Intel had made the accusation that they were "glued together"... Which was especially juicy, considering that Intel had essentially done the same thing with C2Q a few years back. Regardless, the huge difference being that C2Q's multiple dies didn't speak together with an on-package interconnect (if the above is indeed true... I'll see if I can verify) whereas AMD's modern lines do have an on-die interconnect (as well as modern intel does.)
 
Core 2 Quads were definitely made out of two separate dies and they most definitely communicate through the FSB as well :

View attachment 95314

Remember how AMD used to boast about the fact that their quad cores are true " quad cores" on just one die ?

To OP : don't bother , it doesn't make that much of a difference.

It actually makes no difference.

Why?

NUMA is for CPUs that have seperate memory busses, to indicate what CPU core has direct access to what portion of memory. C2Q does not have NUMA, they both share the same FSB link to the same northbridge to the same memory controller. NUMA helps nothing here, as the memory access is uniform.

Ryzen is in a similar boat. EPYC and Threadripper are not.
 
I was afraid not from memory access penalty which as you say is the same, but from penalty associated with inter-die communication such as: process migration from one die to another, inter-process communication between several processes across the separate dies (which in my case would be avs2yuv.exe sending it's finished frames from one die, while the encoder process x264 is running on another)
 
Back
Top