• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Engineers Boost Computer Processor Performance By Over 20 Percent

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
34,496 (9.18/day)
Likes
17,519
Location
Hyderabad, India
#1
Researchers from North Carolina State University have developed a new technique that allows graphics processing units (GPUs) and central processing units (CPUs) on a single chip to collaborate – boosting processor performance by an average of more than 20 percent.

“Chip manufacturers are now creating processors that have a ‘fused architecture,’ meaning that they include CPUs and GPUs on a single chip,” says Dr. Huiyang Zhou, an associate professor of electrical and computer engineering who co-authored a paper on the research. “This approach decreases manufacturing costs and makes computers more energy efficient. However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren’t as efficient as they could be. That’s the issue we’re trying to resolve.”

GPUs were initially designed to execute graphics programs, and they are capable of executing many individual functions very quickly. CPUs, or the “brains” of a computer, have less computational power – but are better able to perform more complex tasks.

“Our approach is to allow the GPU cores to execute computational functions, and have CPU cores pre-fetch the data the GPUs will need from off-chip main memory,” Zhou says.

“This is more efficient because it allows CPUs and GPUs to do what they are good at. GPUs are good at performing computations. CPUs are good at making decisions and flexible data retrieval.”

In other words, CPUs and GPUs fetch data from off-chip main memory at approximately the same speed, but GPUs can execute the functions that use that data more quickly. So, if a CPU determines what data a GPU will need in advance, and fetches it from off-chip main memory, that allows the GPU to focus on executing the functions themselves – and the overall process takes less time.

In preliminary testing, Zhou’s team found that its new approach improved fused processor performance by an average of 21.4 percent.

This approach has not been possible in the past, Zhou adds, because CPUs and GPUs were located on separate chips.

The paper, “CPU-Assisted GPGPU on Fused CPU-GPU Architectures,” will be presented Feb. 27 at the 18th International Symposium on High Performance Computer Architecture, in New Orleans. The paper was co-authored by NC State Ph.D. students Yi Yang and Ping Xiang, and by Mike Mantor of Advanced Micro Devices (AMD). The research was funded by the National Science Foundation and AMD.

The paper abstract follows.

“CPU-Assisted GPGPU on Fused CPU-GPU Architectures”

Authors: Yi Yang, Ping Xiang, Huiyang Zhou, North Carolina State University; Mike Mantor, Advanced Micro Devices

Presented: Feb. 27, 18th International Symposium on High Performance Computer Architecture, New Orleans

Abstract: This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures. In our model of fused architectures, the GPU and the CPU are integrated on the same die and share the on-chip L3 cache and off-chip memory, similar to the latest Intel Sandy Bridge and AMD accelerated processing unit (APU) platforms. In our proposed CPU-assisted GPGPU, after the CPU launches a GPU program, it executes a pre-execution program, which is generated automatically from the GPU kernel using our proposed compiler algorithms and contains memory access instructions of the GPU kernel for multiple threadblocks. The CPU pre-execution program runs ahead of GPU threads because (1) the CPU pre-execution thread only contains memory fetch instructions from GPU kernels and not floating-point computations, and (2) the CPU runs at higher frequencies and exploits higher degrees of instruction-level parallelism than GPU scalar cores. We also leverage the prefetcher at the L2-cache on the CPU side to increase the memory traffic from CPU. As a result, the memory accesses of GPU threads hit in the L3 cache and their latency can be drastically reduced. Since our pre-execution is directly controlled by user-level applications, it enjoys both high accuracy and flexibility. Our experiments on a set of benchmarks show that our proposed preexecution improves the performance by up to 113% and 21.4% on average.
 

FreedomEclipse

~Technological Technocrat~
Joined
Apr 20, 2007
Messages
17,215 (4.38/day)
Likes
5,407
Location
London,UK
System Name Codename: Rapture X Mk.VI {Still....MoonPig Edition}
Processor Intel 3930k@4.5Ghz
Motherboard Asus P9X79 PRO
Cooling Corsair H105 {2x Corsair ML 120 Pro}|VRM: Antec Spotcool 100
Memory 32GB DDR3 Kingston HyperX Beast 2400Mhz {8x4GB}
Video Card(s) MSI 1070 Gaming X (Samsung)
Storage 512GB Samsung 850 Pro (Boot)|1x 512GB Crucial MX100|2x 3TB Toshiba DT01ACA300
Display(s) Asus PB278Q 27"
Case Corsair 760T (White) {1x140mm NB PK-3, 2x Corsair AF140}
Audio Device(s) Creative SB Z {Speakers: Logitech Z-5500 }
Power Supply Corsair AX760
Mouse Logitech G900 Chaos Spectrum
Keyboard Duckyshine Dead LED(s) III
Software Windows 7 7600 x64
Benchmark Scores ( ͡°( ͡° ͜ʖ( ͡° ͜ʖ ͡°)ʖ ͡°) ͡°)
#3
for a moment, i thought there was going to be hope for BD :p

/troll
 
Joined
Oct 30, 2008
Messages
1,538 (0.46/day)
Likes
380
System Name Lailalo / Edelweiss
Processor FX 8320 @ 4.5Ghz / i7 3610QM @2.3-3.2Ghz
Motherboard ASrock 990FX Extreme 4 / Lenovo Y580
Cooling Cooler Master Hyper 212 Plus / Big hunk of copper
Memory 16GB Samsung 30nm DDR3 1600+ / 8GB Hyundai DDR3 1600
Video Card(s) XFX R9 390 / GTX 660M 2GB
Storage Seagate 3TB/1TB + OCZ Synapse 64GB SSD Cache / Western Digital 1TB 7200RPM
Display(s) LG Ultrawide 29in @ 2560x1080 / Lenovo 15.6 @ 1920x1080
Case Coolermaster Storm Sniper / Lenovo Y580
Audio Device(s) Asus Xonar DG / Whatever Lenovo used
Power Supply Antec Truepower Blue 750W + Thermaltake 5.25in 250W / Big Power Brick
Software Windows 10 Pro / Windows 10 Home
#4
Saw this coming/predicted it even before APUs came out. When NV showcased using GPUs for CPU tasks years ago...it was like just one massive hint of where future tech was going. But can AMD capitalize it? Curious to see. Intel can utilize the same idea but their GPU tech is so far behind that I could see them leveraging the CPU side even more to compensate. So then it is a matter of how far AMD can take it to offset their weakness on the x86.

Either way, forces both companies to innovate. Innovation is good!!
 
Joined
Mar 24, 2010
Messages
4,599 (1.61/day)
Likes
921
Location
Independent in Imperialistic
System Name Oh the name!
Processor i7 7700K
Motherboard MSI Z270 Xpower
Cooling EK 360 Extreme
Memory 16Gb G.Skill TridentZ 3866
Video Card(s) nVidia 1080 Ti Flanders Edition
Storage 1 Intel PCIE SSD750, 2 Sam 840Evo 1TB SSD, WD Black 2TB, Toshiba 3TB
Display(s) Acer Predator X1 (32")
Case Rajintek Paean
Audio Device(s) onboard
Power Supply Corsair AX860
Mouse Mad Catz Pro X
Keyboard Corsair K70
Software W10Pro
#5
Ha! Wait a second! Didn't they say -and we believe...- that we are having this feature since we installed our first Physix enabled videocard? hohoho! HOHOHOHO!
 
Joined
Mar 10, 2010
Messages
5,130 (1.79/day)
Likes
1,665
Location
Manchester uk
System Name Quad GT evo V
Processor FX8350 @ 4.8ghz1.525c NB2.64ghz Ht2.84ghz
Motherboard Gigabyte 990X Gaming
Cooling 360EK extreme 360Tt rad all push/pull, cpu,NB/Vrm blocks all EK
Memory Corsair vengeance 32Gb @1333 cas9
Video Card(s) Rx vega 64 waterblockedEK + Rx580 waterblockedEK
Storage samsung 840(250), WD 1Tb+2Tb +3Tbgrn 1tb hybrid
Display(s) Samsung uea28"850R 4k freesync, samsung 40" 1080p
Case Custom(modded) thermaltake Kandalf
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup
Power Supply corsair 1000Rmx
Mouse CM optane
Keyboard CM optane
Software Win 10 Pro
Benchmark Scores 15.69K best overall sandra so far
#6
Saw this coming/predicted it even before APUs came out. When NV showcased using GPUs for CPU tasks years ago...it was like just one massive hint of where future tech was going. But can AMD capitalize it? Curious to see. Intel can utilize the same idea but their GPU tech is so far behind that I could see them leveraging the CPU side even more to compensate. So then it is a matter of how far AMD can take it to offset their weakness on the x86.

Either way, forces both companies to innovate. Innovation is good!!
+1:) but arm and nvidia imho make this more then a two horse race from here on so i am likeing AMD's open standards policy regarding HSA as hopefully most Inovators will at least try and get them standards working across platforms ,but ive a fiver says nvidia make up some more stuff only they can use.
 

naoan

New Member
Joined
Jul 12, 2009
Messages
303 (0.10/day)
Likes
62
System Name AMD?
Audio Device(s) onboard
Software 7 X64
#7
This stuff would probably remain as an abstract unless AMD took the aggressive stance.
 
Joined
Mar 10, 2010
Messages
5,130 (1.79/day)
Likes
1,665
Location
Manchester uk
System Name Quad GT evo V
Processor FX8350 @ 4.8ghz1.525c NB2.64ghz Ht2.84ghz
Motherboard Gigabyte 990X Gaming
Cooling 360EK extreme 360Tt rad all push/pull, cpu,NB/Vrm blocks all EK
Memory Corsair vengeance 32Gb @1333 cas9
Video Card(s) Rx vega 64 waterblockedEK + Rx580 waterblockedEK
Storage samsung 840(250), WD 1Tb+2Tb +3Tbgrn 1tb hybrid
Display(s) Samsung uea28"850R 4k freesync, samsung 40" 1080p
Case Custom(modded) thermaltake Kandalf
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup
Power Supply corsair 1000Rmx
Mouse CM optane
Keyboard CM optane
Software Win 10 Pro
Benchmark Scores 15.69K best overall sandra so far
#8
This stuff would probably remain as an abstract unless AMD took the aggressive stance.
hopefully , I prefer open standards that everybody works to, that way the devs have to work harder to make their chip better than others rather then trying to differentiate with under used additional features that need to be tailored for specificaly

and just when you start to think your pc might last a while an all, tutt be next year im looking at mine with that hmmmm upgrade time eye:)
 
Joined
Mar 27, 2008
Messages
697 (0.19/day)
Likes
70
Location
Zagreb, Croatia
Processor C2D E8400@3.9GHz (488x8, 1.4v :( )
Motherboard Abit IP35-E
Cooling Thermaltake Sonic Tower+120mm fan
Memory 2GB kingmax ddr1066@976MHz 5-5-5-15
Video Card(s) Radeon X1800GTO @700/1400MHz with Accelero S1+Glacialtech fancard
Storage 2xSeagate Barracuda 7200.10 160GB
Display(s) Samsung SyncMaster 793s... just you laugh...
Case some Aplus case
Audio Device(s) Realtek ALC888
Power Supply Chieftec 450W
Software Win7 x64
#9
this looks very good for amd in the coming years.
 
Joined
Mar 24, 2011
Messages
2,286 (0.92/day)
Likes
528
Location
Burlington, VT
Processor Intel i5-2500k
Motherboard MSI P67A-GD65
Cooling Deep Cool Gammax 400
Memory 8GB (4x2GB) G.Skill Ripjaws X DDR3-1600
Video Card(s) Gigabyte GTX 1060 Windforce OC 6GB
Storage Samsung EVO 850 256GB / WD Caviar Black 1TB
Display(s) Acer GD235HZbid 120hz LCD
Case Rosewill Challenger Mid-Tower
Audio Device(s) Onboard
Power Supply Corsair 650W 650-TX
Software Windows 10
#10
Keep in mind, they didn't actually physically accomplish anything yet. With the help of AMD Engineers they modeled how a supposed performance gain could potentially occur, but have yet to get it functioning. When APU's were first introduced I figured they would find a way to have the GPU and CPU simultaneously process when a Discrete GPU Solution was present, but apparently they didn't care much for developing that idea. This entire study should be taken with a truckload of grains of salt.
 
Joined
Mar 27, 2008
Messages
697 (0.19/day)
Likes
70
Location
Zagreb, Croatia
Processor C2D E8400@3.9GHz (488x8, 1.4v :( )
Motherboard Abit IP35-E
Cooling Thermaltake Sonic Tower+120mm fan
Memory 2GB kingmax ddr1066@976MHz 5-5-5-15
Video Card(s) Radeon X1800GTO @700/1400MHz with Accelero S1+Glacialtech fancard
Storage 2xSeagate Barracuda 7200.10 160GB
Display(s) Samsung SyncMaster 793s... just you laugh...
Case some Aplus case
Audio Device(s) Realtek ALC888
Power Supply Chieftec 450W
Software Win7 x64
#11
well they can't use it now since there is no software support for something like this. but in 5-10 years... intel has nothing like this and i bet heterogenous computing is going to gain some serious ground in the near future simply because amd made a chip that that makes it commercially viable.
 
Joined
Nov 2, 2008
Messages
767 (0.23/day)
Likes
414
Processor Intel Core i3-4370
Motherboard Gigabyte GA-H97-D3H
Cooling Zalman CNPS9500 AT
Memory 16GB Crucial Ballistix Sport DDR3-1600
Video Card(s) Gigabyte GV-N75TOC-2GI GeForce GTX 750 Ti WindForce
Storage Crucial MX100 256GB SSD
Display(s) Dell S2316M LCD
Case Fractal Design Define R4 Black Pearl
Audio Device(s) Realtek ALC1150
Power Supply Corsair CX600M
Mouse Logitech M500
Keyboard Lenovo KB1021 USB
Software Windows 10 Professional x64
#12
What happens if the GPU is busy doing video-related work and the CPU throws a calculation request at it? Does the display stutter or freeze? Or does the GPU perform the calculations more slowly? In that case, the CPU might be able to perform the calculations faster just because it isn't bogged down with other work. Definitely an issue to take into consideration.