• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

CISC vs RISC - Does it affect cooling?

Joined
Jan 8, 2017
Messages
8,860 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
There you go, no example necessary.

At this point I am fairly convinced you never had an example in mind to being with.

Implementing a instruction, takes transistors. CISC has more instructions.

As I explained above, the difference this makes in real world is practically zero. Half or more of most CPU's transistor budget these days is just the cache for crying out loud.

EPYC 7742 a 64 core x86 CPU is 32 billion transistors and Graviton2 an ARM based CPU, also 64 cores, is 30 billion transistors. Not quite what you'd except is it ?
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Does anyone know if that's actually true or not?
No. Most Intel Atoms are passively cooled, for example.

All processors have thermal/power design envelopes to meet. It just happens that most CISC processors target high power applications.

Superscalar processors (like Ryzen and Core) have CISC frontends with RISC execution units.


The fundamental difference between CISC and RISC is that CISC takes care of a lot of memory operations internally where RISC doesn't. That allows complex operations to be micromanaged in integrated circuits to accelerate it. Because of those circuits, for example, CPUs can do a much better job at hardware transcoding video streams than ASICs do which RISC architectures like ARM uses. They tend to sacrifice accuracy for performance.
 
Last edited:
Joined
Aug 20, 2007
Messages
20,709 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Only on the front end for decoding though, the back end is RISC (K.)

I'd say RISC won the battle between CISC, and RISC when the backend of a x86 processor decodes to a internal RISC ISA.

Again, waters are muddied to Oblivion. But still, implementing that takes extra transistors: point stands.

At this point I am fairly convinced you never had an example in mind to being with.

You dismissed the only example I could provide besides logical ones as "pointless without power consumption figures"

So until I establish less instructions is less power expensive than more, showing you assembly instruction listings is pointless.

As I explained above, the difference this makes in real world is practically zero.

...

Alrighty then. Sure. Nevermind. ISA doesn't take much die space at all. Ignore google.
 
Last edited:
Joined
Jan 8, 2017
Messages
8,860 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
So until I establish less instructions is less power expensive than more, showing you assembly instruction listings is pointless.

Less instructions does generally mean less power but that's meaningless if want to talk power consumption in real world use cases. When you write something the purpose is to get something done, that something needs a minimum amount of computation, computation which isn't fundamentally different depending on the ISA used.

In other words if something needs say X instructions in a CISC processor there is very little chance that it would need less than X instructions in a RISC architecture, because by definition the ISA is less capable and on average you'll need either the same amount or more instructions to get the same thing done. RISC is supposedly more efficient when the computation is less complicated but life's a bitch and that's not always the case, in fact most of the time it's not, computers do all sorts of complicated shit nowadays. As I keep saying this was true once but not anymore. And by the way, RISC doesn't really mean less instructions necessary.

The reason you can't convince me that something like ARM is somehow intrinsically more power efficient is because of this :

void func(double &a, double &b)
{
b=a*a;
}

The assembler output for the function above looks like this for x86 and ARM64 on GCC with no flags :

push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-8]
movsd xmm1, QWORD PTR [rax]
mov rax, QWORD PTR [rbp-8]
movsd xmm0, QWORD PTR [rax]
mulsd xmm0, xmm1
mov rax, QWORD PTR [rbp-16]
movsd QWORD PTR [rax], xmm0
nop
pop rbp
ret

sub sp, sp, #16
str x0, [sp, 8]
str x1, [sp]
ldr x0, [sp, 8]
ldr d1, [x0]
ldr x0, [sp, 8]
ldr d0, [x0]
fmul d0, d1, d0
ldr x0, [sp]
str d0, [x0]
nop
add sp, sp, 16
ret

If for something this basic there is almost no difference in terms of the instructions required how could you still argue one is more efficient than the other in any real manner ?

Alrighty then. Sure. Nevermind. ISA doesn't take much die space at all. Ignore google.

I just gave you an example of two comparable CPUs one x86 and one ARM that are within 6% of each other in terms of transistors used. That's as close as you can ever get in terms of a fair comparison, I did my best to provide evidence that the ISA has little impact in the way CPUs are actually built with google and all. What does make a difference is what their built for, where are they going to be used and what are the constraints.

I guess you could tell me that this 6%, incidentally, must be because of the different ISAs ...
 
Last edited:
Joined
Aug 20, 2007
Messages
20,709 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
I just gave you an example of two comparable CPUs one x86 and one ARM that are within 6% of each other in terms of transistors used.

Yes. Clockspeeds aside, I have been repeatedly stating this debate is purely conceptual as the lines between CISC and RISC are nowadays purely academic. Nearly everything but MIPS and a few odd-ducks blurs the lines now.

Also this. RISC ain't what it used to be. POWER is nearly at CISC level instruction quantity. Back in the day, some RISCs lacked hardware inteter multiply functions. That obviously changed.
 
Joined
Oct 21, 2006
Messages
621 (0.10/day)
Location
Oak Ridge, TN
System Name BorgX79
Processor i7-3930k 6/12cores@4.4GHz
Motherboard Sabertoothx79
Cooling Capitan 360
Memory Muhskin DDR3-1866
Video Card(s) Sapphire R480 8GB
Storage Chronos SSD
Display(s) 3x VW266H
Case Ching Mien 600
Audio Device(s) Realtek
Power Supply Cooler Master 1000W Silent Pro
Mouse Logitech G900
Keyboard Rosewill RK-1000
Software Win7x64
Ahh, but with a cisc processor, it wouldn't have to break up the multiply:

function(x, y, *lower, *higher)
movq %rx,%rax #Store x into %rax
mulq %y #multiplies %y to %rax
#mulq stores high and low values into rax and rdx.
movq %rax,(%r8) #Move low into &lower
movq %rdx,(%r9) #Move high answer into &higher

That's for a 64 bit multiply, in x64 i86 code.
The Fmul instruction is 7 clocks max, IIRC. (Last time it mattered to me was ~90's, lol)

A sequential add scales as the operand.

That's the complex part of the CISC; dedicated math units and multiply instructions and more.

Now, doing it as an ADD instruction might be faster in some architectures, it probably isn't.

Smaller multiply ops are easier.

EDIT: I see we're saying the same thing, in the code.

EDIT2: The ARM processors are no longer what I'd consider a RISC processor at all; I've missed all the added functionality with the later updates to the core technology.
I found the newer opcode listings; the list I posted above is no longer definitive.

I think we're down to arguing the same argument as Intel vs M68k, from 30 years ago. :)

The architecture is different; some like one, others like others.

Switching assembler languages was always like switching to Spanish or German, in my brain; ARM is like learning French, by comparison.

PIC for me was like learning a trade language; few words, but you can get the important stuff through. (You giva me money, I giva you beer.) :)

If you write in C or other languages, it really doesn't matter anyway; the compiler is the only one who knows where it goes. :D
 
Last edited:
D

Deleted member 185158

Guest
Well, I dug out LG G Stylo phone with a cracked screen. Updating system software so I can hook it up to the PC (hopefully) get files off it and go from there.
Will obtain system specs of the phone and get some temp readouts. Then configure a way to cool it. The battery is in the way, this will take some modifications.
Right up my alley, will be a fun side project. Android is upgrading 1 of 54 items..... it's gonna take a while lol.

I don't know what you guys are talking about and the effects of writing the letter C and how it affects cooling situations with these types of processors.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
The problem is that most of the assembly is for the function (prologue and exit)...
Code:
x86 SSE                           ARM                 Description
push rbp                          sub sp, sp, #16     loading the function  (#16 is 16 bytes, 8 for each double parameter)
mov rbp, rsp                                          points to top of stack which is effectively the same as #16 for ARM above
mov QWORD PTR [rbp-8], rdi        str x0, [sp, 8]     a pointer
mov QWORD PTR [rbp-16], rsi       str x1, [sp]        b pointer
mov rax, QWORD PTR [rbp-8]        ldr x0, [sp, 8]     move the value at (a) pointer to register
movsd xmm1, QWORD PTR [rax]       ldr d1, [x0]        move the register value to the floating point register (1)
mov rax, QWORD PTR [rbp-8]        ldr x0, [sp, 8]     move the value at (a) pointer to register
movsd xmm0, QWORD PTR [rax]       ldr d0, [x0]        move the register value to the floating point register (0)
mulsd xmm0, xmm1                  fmul d0, d1, d0     multiply the two registers (0*1), with the result being stored in (0)
mov rax, QWORD PTR [rbp-16]       ldr x0, [sp]        move the value at (b) pointer to register
movsd QWORD PTR [rax], xmm0       str d0, [x0]        move the floating point register (0) result to (b) address
nop                               nop                 no operation...not sure why GCC is adding this
pop rbp                           add sp, sp, 16      retiring the function
ret                               ret                 call return
You'd have to add more code that actually does work (especially things that aren't basic math) to see x86's advantage. A good example would be an AVX instruction.
 
Last edited:
Joined
Jan 8, 2017
Messages
8,860 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
A good example would be an AVX instruction.

AVX is used almost exclusively for basic math, well let's call it just math. Here's a function that adds floats with NEON and with AVX :

Code:
void add_float(float* dst, float* src1, float* src2, int count)
{
     for (int i = 0; i < count; i += 4)
     {
         float32x4_t in1, in2, out;
         in1 = vld1q_f32(src1);
         src1 += 4;
         in2 = vld1q_f32(src2);
         src2 += 4;
         out = vaddq_f32(in1, in2);
         vst1q_f32(dst, out);
         dst += 4;
     }
}
Code:
void add_float(float* dst, float* src1, float* src2, int count)
{
     for (int i = 0; i < count; i += 8)
     {
         __m256 in1 = _mm256_loadu_ps(src1);
         src1 += 4;
         __m256 in2 = _mm256_loadu_ps(src2);
         src2 += 4;
         __m256 out = _mm256_add_ps(in1, in2);
         _mm256_storeu_ps(dst, out);  
         dst += 8;
     }
}
Code:
add_float(float*, float*, float*, int):
        sub     sp, sp, #176
        str     x0, [sp, 24]
        str     x1, [sp, 16]
        str     x2, [sp, 8]
        str     w3, [sp, 4]
        str     wzr, [sp, 172]
.L6:
        ldr     w1, [sp, 172]
        ldr     w0, [sp, 4]
        cmp     w1, w0
        bge     .L7
        ldr     x0, [sp, 16]
        str     x0, [sp, 32]
        ldr     x0, [sp, 32]
        ldr     q0, [x0]
        str     q0, [sp, 144]
        ldr     x0, [sp, 16]
        add     x0, x0, 16
        str     x0, [sp, 16]
        ldr     x0, [sp, 8]
        str     x0, [sp, 40]
        ldr     x0, [sp, 40]
        ldr     q0, [x0]
        str     q0, [sp, 128]
        ldr     x0, [sp, 8]
        add     x0, x0, 16
        str     x0, [sp, 8]
        ldr     q0, [sp, 144]
        str     q0, [sp, 64]
        ldr     q0, [sp, 128]
        str     q0, [sp, 48]
        ldr     q1, [sp, 64]
        ldr     q0, [sp, 48]
        fadd    v0.4s, v1.4s, v0.4s
        str     q0, [sp, 112]
        ldr     x0, [sp, 24]
        str     x0, [sp, 104]
        ldr     q0, [sp, 112]
        str     q0, [sp, 80]
        ldr     x0, [sp, 104]
        ldr     q0, [sp, 80]
        str     q0, [x0]
        ldr     x0, [sp, 24]
        add     x0, x0, 16
        str     x0, [sp, 24]
        ldr     w0, [sp, 172]
        add     w0, w0, 4
        str     w0, [sp, 172]
        b       .L6
.L7:
        nop
        add     sp, sp, 176
        ret
Code:
add_float(float*, float*, float*, int):
        push    rbp
        mov     rbp, rsp
        and     rsp, -32
        sub     rsp, 200
        mov     QWORD PTR [rsp-80], rdi
        mov     QWORD PTR [rsp-88], rsi
        mov     QWORD PTR [rsp-96], rdx
        mov     DWORD PTR [rsp-100], ecx
        mov     DWORD PTR [rsp+196], 0
.L6:
        mov     eax, DWORD PTR [rsp+196]
        cmp     eax, DWORD PTR [rsp-100]
        jge     .L7
        mov     rax, QWORD PTR [rsp-88]
        mov     QWORD PTR [rsp-72], rax
        mov     rax, QWORD PTR [rsp-72]
        vmovups ymm0, YMMWORD PTR [rax]
        vmovaps YMMWORD PTR [rsp+136], ymm0
        add     QWORD PTR [rsp-88], 16
        mov     rax, QWORD PTR [rsp-96]
        mov     QWORD PTR [rsp-64], rax
        mov     rax, QWORD PTR [rsp-64]
        vmovups ymm0, YMMWORD PTR [rax]
        vmovaps YMMWORD PTR [rsp+104], ymm0
        add     QWORD PTR [rsp-96], 16
        vmovaps ymm0, YMMWORD PTR [rsp+136]
        vmovaps YMMWORD PTR [rsp-24], ymm0
        vmovaps ymm0, YMMWORD PTR [rsp+104]
        vmovaps YMMWORD PTR [rsp-56], ymm0
        vmovaps ymm0, YMMWORD PTR [rsp-24]
        vaddps  ymm0, ymm0, YMMWORD PTR [rsp-56]
        vmovaps YMMWORD PTR [rsp+72], ymm0
        mov     rax, QWORD PTR [rsp-80]
        mov     QWORD PTR [rsp+64], rax
        vmovaps ymm0, YMMWORD PTR [rsp+72]
        vmovaps YMMWORD PTR [rsp+8], ymm0
        vmovaps ymm0, YMMWORD PTR [rsp+8]
        mov     rax, QWORD PTR [rsp+64]
        vmovups YMMWORD PTR [rax], ymm0
        nop
        add     QWORD PTR [rsp-80], 32
        add     DWORD PTR [rsp+196], 8
        jmp     .L6
.L7:
        nop
        leave
        ret

Again, the differences are small, modern x86 and ARM are very alike to the point it's not worth saying one definitely has an advantage. Of course the AVX version is probably more power efficient since the loops needs to go over less instructions as they process 8 instead of 4 floats at a time but that's not because of an ISA philosophy difference, ARM just hasn't implemented 256bit instructions.
 
Last edited:
Joined
Feb 3, 2017
Messages
3,475 (1.33/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Neon kind of defeats the point of RISC :)
At adds instructions as well as hardware to back it up.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Of course the AVX version is probably more power efficient since the loops needs to go over less instructions as they process 8 instead of 4 floats at a time but that's not because of an ISA philosophy difference, ARM just hasn't implemented 256bit instructions.
But that's exactly the point: x86 will get it done faster using less power. ARM will take substantially more time to do it.
 
Last edited:
Top