CISC vs RISC - Does it affect cooling?

Vya Domus · Dec 8, 2019

R-T-B said:
There you go, no example necessary.

At this point I am fairly convinced you never had an example in mind to being with.

R-T-B said:
Implementing a instruction, takes transistors. CISC has more instructions.

As I explained above, the difference this makes in real world is practically zero. Half or more of most CPU's transistor budget these days is just the cache for crying out loud.

EPYC 7742 a 64 core x86 CPU is 32 billion transistors and Graviton2 an ARM based CPU, also 64 cores, is 30 billion transistors. Not quite what you'd except is it ?

FordGT90Concept · Dec 8, 2019

windwhirl said:
Does anyone know if that's actually true or not?

No. Most Intel Atoms are passively cooled, for example.

All processors have thermal/power design envelopes to meet. It just happens that most CISC processors target high power applications.

Superscalar processors (like Ryzen and Core) have CISC frontends with RISC execution units.

The fundamental difference between CISC and RISC is that CISC takes care of a lot of memory operations internally where RISC doesn't. That allows complex operations to be micromanaged in integrated circuits to accelerate it. Because of those circuits, for example, CPUs can do a much better job at hardware transcoding video streams than ASICs do which RISC architectures like ARM uses. They tend to sacrifice accuracy for performance.

R-T-B · Dec 9, 2019

biffzinker said:
Only on the front end for decoding though, the back end is RISC (K.)

I'd say RISC won the battle between CISC, and RISC when the backend of a x86 processor decodes to a internal RISC ISA.

Again, waters are muddied to Oblivion. But still, implementing that takes extra transistors: point stands.

Vya Domus said:
At this point I am fairly convinced you never had an example in mind to being with.

You dismissed the only example I could provide besides logical ones as "pointless without power consumption figures"

So until I establish less instructions is less power expensive than more, showing you assembly instruction listings is pointless.

Vya Domus said:
As I explained above, the difference this makes in real world is practically zero.

...

Alrighty then. Sure. Nevermind. ISA doesn't take much die space at all. Ignore google.

Vya Domus · Dec 9, 2019

R-T-B said:
So until I establish less instructions is less power expensive than more, showing you assembly instruction listings is pointless.

Less instructions does generally mean less power but that's meaningless if want to talk power consumption in real world use cases. When you write something the purpose is to get something done, that something needs a minimum amount of computation, computation which isn't fundamentally different depending on the ISA used.

In other words if something needs say X instructions in a CISC processor there is very little chance that it would need less than X instructions in a RISC architecture, because by definition the ISA is less capable and on average you'll need either the same amount or more instructions to get the same thing done. RISC is supposedly more efficient when the computation is less complicated but life's a bitch and that's not always the case, in fact most of the time it's not, computers do all sorts of complicated shit nowadays. As I keep saying this was true once but not anymore. And by the way, RISC doesn't really mean less instructions necessary.

The reason you can't convince me that something like ARM is somehow intrinsically more power efficient is because of this :

void func(double &a, double &b)
{
b=a*a;
}

The assembler output for the function above looks like this for x86 and ARM64 on GCC with no flags :

push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-8]
movsd xmm1, QWORD PTR [rax]
mov rax, QWORD PTR [rbp-8]
movsd xmm0, QWORD PTR [rax]
mulsd xmm0, xmm1
mov rax, QWORD PTR [rbp-16]
movsd QWORD PTR [rax], xmm0
nop
pop rbp
ret

sub sp, sp, #16
str x0, [sp, 8]
str x1, [sp]
ldr x0, [sp, 8]
ldr d1, [x0]
ldr x0, [sp, 8]
ldr d0, [x0]
fmul d0, d1, d0
ldr x0, [sp]
str d0, [x0]
nop
add sp, sp, 16
ret

If for something this basic there is almost no difference in terms of the instructions required how could you still argue one is more efficient than the other in any real manner ?

R-T-B said:
Alrighty then. Sure. Nevermind. ISA doesn't take much die space at all. Ignore google.

I just gave you an example of two comparable CPUs one x86 and one ARM that are within 6% of each other in terms of transistors used. That's as close as you can ever get in terms of a fair comparison, I did my best to provide evidence that the ISA has little impact in the way CPUs are actually built with google and all. What does make a difference is what their built for, where are they going to be used and what are the constraints.

I guess you could tell me that this 6%, incidentally, must be because of the different ISAs ...

R-T-B · Dec 9, 2019

Vya Domus said:
I just gave you an example of two comparable CPUs one x86 and one ARM that are within 6% of each other in terms of transistors used.

Yes. Clockspeeds aside, I have been repeatedly stating this debate is purely conceptual as the lines between CISC and RISC are nowadays purely academic. Nearly everything but MIPS and a few odd-ducks blurs the lines now.

R-T-B said:
Also this. RISC ain't what it used to be. POWER is nearly at CISC level instruction quantity. Back in the day, some RISCs lacked hardware inteter multiply functions. That obviously changed.

Grog6 · Dec 9, 2019

Ahh, but with a cisc processor, it wouldn't have to break up the multiply:

function(x, y, *lower, *higher)
movq %rx,%rax #Store x into %rax
mulq %y #multiplies %y to %rax
#mulq stores high and low values into rax and rdx.
movq %rax,(%r8) #Move low into &lower
movq %rdx,(%r9) #Move high answer into &higher

That's for a 64 bit multiply, in x64 i86 code.
The Fmul instruction is 7 clocks max, IIRC. (Last time it mattered to me was ~90's, lol)

A sequential add scales as the operand.

That's the complex part of the CISC; dedicated math units and multiply instructions and more.

Now, doing it as an ADD instruction might be faster in some architectures, it probably isn't.

Smaller multiply ops are easier.

EDIT: I see we're saying the same thing, in the code.

EDIT2: The ARM processors are no longer what I'd consider a RISC processor at all; I've missed all the added functionality with the later updates to the core technology.
I found the newer opcode listings; the list I posted above is no longer definitive.

I think we're down to arguing the same argument as Intel vs M68k, from 30 years ago.

The architecture is different; some like one, others like others.

Switching assembler languages was always like switching to Spanish or German, in my brain; ARM is like learning French, by comparison.

PIC for me was like learning a trade language; few words, but you can get the important stuff through. (You giva me money, I giva you beer.)

If you write in C or other languages, it really doesn't matter anyway; the compiler is the only one who knows where it goes.

Deleted member 185158 · Dec 9, 2019

Well, I dug out LG G Stylo phone with a cracked screen. Updating system software so I can hook it up to the PC (hopefully) get files off it and go from there.
Will obtain system specs of the phone and get some temp readouts. Then configure a way to cool it. The battery is in the way, this will take some modifications.
Right up my alley, will be a fun side project. Android is upgrading 1 of 54 items..... it's gonna take a while lol.

I don't know what you guys are talking about and the effects of writing the letter C and how it affects cooling situations with these types of processors.

FordGT90Concept · Dec 9, 2019

The problem is that most of the assembly is for the function (prologue and exit)...

Code:

x86 SSE                           ARM                 Description
push rbp                          sub sp, sp, #16     loading the function  (#16 is 16 bytes, 8 for each double parameter)
mov rbp, rsp                                          points to top of stack which is effectively the same as #16 for ARM above
mov QWORD PTR [rbp-8], rdi        str x0, [sp, 8]     a pointer
mov QWORD PTR [rbp-16], rsi       str x1, [sp]        b pointer
mov rax, QWORD PTR [rbp-8]        ldr x0, [sp, 8]     move the value at (a) pointer to register
movsd xmm1, QWORD PTR [rax]       ldr d1, [x0]        move the register value to the floating point register (1)
mov rax, QWORD PTR [rbp-8]        ldr x0, [sp, 8]     move the value at (a) pointer to register
movsd xmm0, QWORD PTR [rax]       ldr d0, [x0]        move the register value to the floating point register (0)
mulsd xmm0, xmm1                  fmul d0, d1, d0     multiply the two registers (0*1), with the result being stored in (0)
mov rax, QWORD PTR [rbp-16]       ldr x0, [sp]        move the value at (b) pointer to register
movsd QWORD PTR [rax], xmm0       str d0, [x0]        move the floating point register (0) result to (b) address
nop                               nop                 no operation...not sure why GCC is adding this
pop rbp                           add sp, sp, 16      retiring the function
ret                               ret                 call return

You'd have to add more code that actually does work (especially things that aren't basic math) to see x86's advantage. A good example would be an AVX instruction.

Vya Domus · Dec 9, 2019

FordGT90Concept said:
A good example would be an AVX instruction.

AVX is used almost exclusively for basic math, well let's call it just math. Here's a function that adds floats with NEON and with AVX :

Code:

void add_float(float* dst, float* src1, float* src2, int count)
{
     for (int i = 0; i < count; i += 4)
     {
         float32x4_t in1, in2, out;
         in1 = vld1q_f32(src1);
         src1 += 4;
         in2 = vld1q_f32(src2);
         src2 += 4;
         out = vaddq_f32(in1, in2);
         vst1q_f32(dst, out);
         dst += 4;
     }
}

Code:

void add_float(float* dst, float* src1, float* src2, int count)
{
     for (int i = 0; i < count; i += 8)
     {
         __m256 in1 = _mm256_loadu_ps(src1);
         src1 += 4;
         __m256 in2 = _mm256_loadu_ps(src2);
         src2 += 4;
         __m256 out = _mm256_add_ps(in1, in2);
         _mm256_storeu_ps(dst, out);  
         dst += 8;
     }
}

Code:

add_float(float*, float*, float*, int):
        sub     sp, sp, #176
        str     x0, [sp, 24]
        str     x1, [sp, 16]
        str     x2, [sp, 8]
        str     w3, [sp, 4]
        str     wzr, [sp, 172]
.L6:
        ldr     w1, [sp, 172]
        ldr     w0, [sp, 4]
        cmp     w1, w0
        bge     .L7
        ldr     x0, [sp, 16]
        str     x0, [sp, 32]
        ldr     x0, [sp, 32]
        ldr     q0, [x0]
        str     q0, [sp, 144]
        ldr     x0, [sp, 16]
        add     x0, x0, 16
        str     x0, [sp, 16]
        ldr     x0, [sp, 8]
        str     x0, [sp, 40]
        ldr     x0, [sp, 40]
        ldr     q0, [x0]
        str     q0, [sp, 128]
        ldr     x0, [sp, 8]
        add     x0, x0, 16
        str     x0, [sp, 8]
        ldr     q0, [sp, 144]
        str     q0, [sp, 64]
        ldr     q0, [sp, 128]
        str     q0, [sp, 48]
        ldr     q1, [sp, 64]
        ldr     q0, [sp, 48]
        fadd    v0.4s, v1.4s, v0.4s
        str     q0, [sp, 112]
        ldr     x0, [sp, 24]
        str     x0, [sp, 104]
        ldr     q0, [sp, 112]
        str     q0, [sp, 80]
        ldr     x0, [sp, 104]
        ldr     q0, [sp, 80]
        str     q0, [x0]
        ldr     x0, [sp, 24]
        add     x0, x0, 16
        str     x0, [sp, 24]
        ldr     w0, [sp, 172]
        add     w0, w0, 4
        str     w0, [sp, 172]
        b       .L6
.L7:
        nop
        add     sp, sp, 176
        ret

Code:

add_float(float*, float*, float*, int):
        push    rbp
        mov     rbp, rsp
        and     rsp, -32
        sub     rsp, 200
        mov     QWORD PTR [rsp-80], rdi
        mov     QWORD PTR [rsp-88], rsi
        mov     QWORD PTR [rsp-96], rdx
        mov     DWORD PTR [rsp-100], ecx
        mov     DWORD PTR [rsp+196], 0
.L6:
        mov     eax, DWORD PTR [rsp+196]
        cmp     eax, DWORD PTR [rsp-100]
        jge     .L7
        mov     rax, QWORD PTR [rsp-88]
        mov     QWORD PTR [rsp-72], rax
        mov     rax, QWORD PTR [rsp-72]
        vmovups ymm0, YMMWORD PTR [rax]
        vmovaps YMMWORD PTR [rsp+136], ymm0
        add     QWORD PTR [rsp-88], 16
        mov     rax, QWORD PTR [rsp-96]
        mov     QWORD PTR [rsp-64], rax
        mov     rax, QWORD PTR [rsp-64]
        vmovups ymm0, YMMWORD PTR [rax]
        vmovaps YMMWORD PTR [rsp+104], ymm0
        add     QWORD PTR [rsp-96], 16
        vmovaps ymm0, YMMWORD PTR [rsp+136]
        vmovaps YMMWORD PTR [rsp-24], ymm0
        vmovaps ymm0, YMMWORD PTR [rsp+104]
        vmovaps YMMWORD PTR [rsp-56], ymm0
        vmovaps ymm0, YMMWORD PTR [rsp-24]
        vaddps  ymm0, ymm0, YMMWORD PTR [rsp-56]
        vmovaps YMMWORD PTR [rsp+72], ymm0
        mov     rax, QWORD PTR [rsp-80]
        mov     QWORD PTR [rsp+64], rax
        vmovaps ymm0, YMMWORD PTR [rsp+72]
        vmovaps YMMWORD PTR [rsp+8], ymm0
        vmovaps ymm0, YMMWORD PTR [rsp+8]
        mov     rax, QWORD PTR [rsp+64]
        vmovups YMMWORD PTR [rax], ymm0
        nop
        add     QWORD PTR [rsp-80], 32
        add     DWORD PTR [rsp+196], 8
        jmp     .L6
.L7:
        nop
        leave
        ret

Again, the differences are small, modern x86 and ARM are very alike to the point it's not worth saying one definitely has an advantage. Of course the AVX version is probably more power efficient since the loops needs to go over less instructions as they process 8 instead of 4 floats at a time but that's not because of an ISA philosophy difference, ARM just hasn't implemented 256bit instructions.

londiste · Dec 9, 2019

Neon kind of defeats the point of RISC

At adds instructions as well as hardware to back it up.

FordGT90Concept · Dec 9, 2019

Vya Domus said:
Of course the AVX version is probably more power efficient since the loops needs to go over less instructions as they process 8 instead of 4 floats at a time but that's not because of an ISA philosophy difference, ARM just hasn't implemented 256bit instructions.

But that's exactly the point: x86 will get it done faster using less power. ARM will take substantially more time to do it.

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6200(Running 1T no GDM)
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64, other office machines run Windows 11 Enterprise

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6200(Running 1T no GDM)
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64, other office machines run Windows 11 Enterprise

CISC vs RISC - Does it affect cooling?

Vya Domus

FordGT90Concept

"I go fast!1!11!1!"

R-T-B

Vya Domus

R-T-B

Grog6

Deleted member 185158

Guest

FordGT90Concept

"I go fast!1!11!1!"

Vya Domus

londiste

FordGT90Concept

"I go fast!1!11!1!"

System Name	BorgX79
Processor	i7-3930k 6/12cores@4.4GHz
Motherboard	Sabertoothx79
Cooling	Capitan 360
Memory	Muhskin DDR3-1866
Video Card(s)	Sapphire R480 8GB
Storage	Chronos SSD
Display(s)	3x VW266H
Case	Ching Mien 600
Audio Device(s)	Realtek
Power Supply	Cooler Master 1000W Silent Pro
Mouse	Logitech G900
Keyboard	Rosewill RK-1000
Software	Win7x64

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos