Intel Optimizes PyTorch for Llama 2 on Arc A770, Higher Precision FP16

btarunr · Feb 26, 2024

Intel just announced optimizations for PyTorch (IPEX) to take advantage of the AI acceleration features of its Arc "Alchemist" GPUs.PyTorch is a popular machine learning library that is often associated with NVIDIA GPUs, but it is actually platform-agnostic. It can be run on a variety of hardware, including CPUs and GPUs. However, performance may not be optimal without specific optimizations. Intel offers such optimizations through the Intel Extension for PyTorch (IPEX), which extends PyTorch with optimizations specifically designed for Intel's compute hardware.

Intel released a blog post detailing how to run Meta AI's Llama 2 large language model on its Arc "Alchemist" A770 graphics card. The model requires 14 GB of GPU RAM, so a 16 GB version of the A770 is recommended. This development could be seen as a direct response to NVIDIA's Chat with RTX tool, which allows GeForce users with >8 GB RTX 30-series "Ampere" and RTX 40-series "Ada" GPUs to run PyTorch-LLM models on their graphics cards. NVIDIA achieves lower VRAM usage by distributing INT4-quantized versions of the models, while Intel uses a higher-precision FP16 version. In theory, this should not have a significant impact on the results. This blog post by Intel provides instructions on how to set up Llama 2 inference with PyTorch (IPEX) on the A770.

View at TechPowerUp Main Site

Scrizz · Feb 26, 2024

interesting

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	:)
Processor	Intel 13700k
Motherboard	Gigabyte z790 UD AC
Cooling	Noctua NH-D15
Memory	64GB GSKILL DDR5
Video Card(s)	Gigabyte RTX 4090 Gaming OC
Storage	960GB Optane 905P U.2 SSD + 4TB PCIe4 U.2 SSD
Display(s)	Alienware AW3423DW 175Hz QD-OLED + AOC Agon Pro AG276QZD2 240Hz QD-OLED
Case	Fractal Design Torrent
Audio Device(s)	MOTU M4 - JBL 305P MKII w/2x JL Audio 10 Sealed --- X-Fi Titanium HD - Presonus Eris E5 - JBL 4412
Power Supply	Silverstone 1000W
Mouse	Roccat Kain 122 AIMO
Keyboard	KBD67 Lite / Mammoth75
VR HMD	Reverb G2 V2 / Quest 3
Software	Win 11 Pro

Intel Optimizes PyTorch for Llama 2 on Arc A770, Higher Precision FP16

btarunr

Editor & Senior Moderator

Scrizz