News Posts matching #Tensor

Return to Keyword Browsing

NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs

Generative AI has reshaped how people create, imagine and interact with digital content. As AI models continue to grow in capability and complexity, they require more VRAM, or video random access memory. The base Stable Diffusion 3.5 Large model, for example, uses over 18 GB of VRAM - limiting the number of systems that can run it well. By applying quantization to the model, noncritical layers can be removed or run with lower precision. NVIDIA GeForce RTX 40 Series and the Ada Lovelace generation of NVIDIA RTX PRO GPUs support FP8 quantization to help run these quantized models, and the latest-generation NVIDIA Blackwell GPUs also add support for FP4.

NVIDIA collaborated with Stability AI to quantize its latest model, Stable Diffusion (SD) 3.5 Large, to FP8 - reducing VRAM consumption by 40%. Further optimizations to SD3.5 Large and Medium with the NVIDIA TensorRT software development kit (SDK) double performance. In addition, TensorRT has been reimagined for RTX AI PCs, combining its industry-leading performance with just-in-time (JIT), on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs. TensorRT for RTX is now available as a standalone SDK for developers.

NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results

NVIDIA is working with companies worldwide to build out AI factories—speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training—the 12th since the benchmark's introduction in 2018—the NVIDIA AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark's toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining.

The NVIDIA platform was the only one that submitted results on every MLPerf Training v5.0 benchmark—underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks. The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. In addition, NVIDIA collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs.

Doudna Supercomputer Will be Powered by NVIDIA's Next-gen Vera Rubin Platform

Ready for a front-row seat to the next scientific revolution? That's the idea behind Doudna—a groundbreaking supercomputer announced today at Lawrence Berkeley National Laboratory in Berkeley, California. The system represents a major national investment in advancing U.S. high-performance computing (HPC) leadership, ensuring U.S. researchers have access to cutting-edge tools to address global challenges. "It will advance scientific discovery from chemistry to physics to biology and all powered by—unleashing this power—of artificial intelligence," U.S. Energy Secretary Chris Wright (pictured above) said at today's event.

Also known as NERSC-10, Doudna is named for Nobel laureate and CRISPR pioneer Jennifer Doudna. The next-generation system announced today is designed not just for speed but for impact. Powered by Dell Technologies infrastructure with the NVIDIA Vera Rubin architecture, and set to launch in 2026, Doudna is tailored for real-time discovery across the U.S. Department of Energy's most urgent scientific missions. It's poised to catapult American researchers to the forefront of critical scientific breakthroughs, fostering innovation and securing the nation's competitive edge in key technological fields.

NVIDIA & Microsoft Accelerate Agentic AI Innovation - From Cloud to PC

Agentic AI is redefining scientific discovery and unlocking research breakthroughs and innovations across industries. Through deepened collaboration, NVIDIA and Microsoft are delivering advancements that accelerate agentic AI-powered applications from the cloud to the PC. At Microsoft Build, Microsoft unveiled Microsoft Discovery, an extensible platform built to empower researchers to transform the entire discovery process with agentic AI. This will help research and development departments across various industries accelerate the time to market for new products, as well as speed and expand the end-to-end discovery process for all scientists.

Microsoft Discovery will integrate the NVIDIA ALCHEMI NIM microservice, which optimizes AI inference for chemical simulations, to accelerate materials science research with property prediction and candidate recommendation. The platform will also integrate NVIDIA BioNeMo NIM microservices, tapping into pretrained AI workflows to speed up AI model development for drug discovery. These integrations equip researchers with accelerated performance for faster scientific discoveries. In testing, researchers at Microsoft used Microsoft Discovery to detect a novel coolant prototype with promising properties for immersion cooling in data centers in under 200 hours, rather than months or years with traditional methods.

NVIDIA Launches GeForce RTX 5060 Series, Beginning with RTX 5060 Ti This Week

NVIDIA today announced the GeForce RTX 5060 series, with a combined announcement of the GeForce RTX 5060, the GeForce RTX 5060 Ti 8 GB, and the RTX 5060 Ti 16 GB. The latter two will be available from tomorrow, 16th April, which is also when media reviews of the RTX 5060 Ti 16 GB and 8 GB go live. The RTX 5060, meanwhile, is expected to be launched in May. The RTX 5060 Ti introduces the new 5 nm "GB206" silicon, which the SKU maxes out. This chip features 36 streaming multiprocessors (SM) across 3 GPCs. These work out to 4,608 CUDA cores, 144 Tensor cores, 36 RT cores, 144 TMUs, and 48 ROPs. The chip features a 128-bit GDDR7 memory interface driving 8 GB of 16 GB of 28 Gbps (GDDR7-effective) memory for 448 GB/s of memory bandwidth, which is a 55% increase over the RTX 4060 Ti.

NVIDIA recommends the RTX 5060 Ti for maxed out 1080p gameplay, including with ray tracing, although we expect it to be unofficially capable of 1440p gameplay with fairly high settings and ray tracing. You get new features being introduced with the RTX 50-series, including Neural Rendering, and DLSS 4 Multi Frame Generation. NVIDIA is pricing the RTX 5060 Ti 8 GB at $375, and the 16 GB sibling about $50 higher, at $425, although these could be fairy tale pricing given the unpredictable world trade environment and scarcity profiteering by scalpers.

Official: Nintendo Switch 2 Leveled Up With NVIDIA "Custom Processor" & AI-Powered Tech

The Nintendo Switch 2, unveiled April 2, takes performance to the next level, powered by a custom NVIDIA processor featuring an NVIDIA GPU with dedicated RT Cores and Tensor Cores for stunning visuals and AI-driven enhancements. With 1,000 engineer-years of effort across every element—from system and chip design to a custom GPU, APIs and world-class development tools—the Nintendo Switch 2 brings major upgrades. The new console enables up to 4K gaming in TV mode and up to 120 FPS at 1080p in handheld mode. Nintendo Switch 2 also supports HDR, and AI upscaling to sharpen visuals and smooth gameplay.

AI and Ray Tracing for Next-Level Visuals
The new RT Cores bring real-time ray tracing, delivering lifelike lighting, reflections and shadows for more immersive worlds. Tensor Cores power AI-driven features like Deep Learning Super Sampling (DLSS), boosting resolution for sharper details without sacrificing image quality. Tensor Cores also enable AI-powered face tracking and background removal in video chat use cases, enhancing social gaming and streaming. With millions of players worldwide, the Nintendo Switch has become a gaming powerhouse and home to Nintendo's storied franchises. Its hybrid design redefined console gaming, bridging TV and handheld play.

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and enhancing productivity. At GTC 2025, running March 17-21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads—highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX
RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more. With more than 100 million GeForce RTX and NVIDIA RTX GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session "Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations," Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

Imagination's New DXTP GPU for Mobile and Laptop: 20% More Power Efficient

Today Imagination Technologies announces its latest GPU IP, Imagination DXTP, which sets a new standard for the efficient acceleration of graphics and compute workloads on smartphones and other power-constrained devices. Thanks to an array of micro-architectural improvements, DXTP delivers up to 20% improved power efficiency (FPS/W) on popular graphics workloads when compared to its DXT equivalent.

"The global smartphone market is experiencing a resurgence, propelled by cutting-edge AI features such as personal agents and enhanced photography," says Peter Richardson, Partner & VP at Counterpoint Research. "However, the success of this AI-driven revolution hinges on maintaining the high standards users expect: smooth interfaces, sleek designs, and all-day battery life. As the market matures, consumers are gravitating towards premium devices that seamlessly integrate these advanced AI capabilities without compromising on essential smartphone qualities."

NVIDIA Outlines Cost Benefits of Inference Platform

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform—a full stack comprising world-class silicon, systems and software—is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost. NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience. But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system—and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task. Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

ADLINK Launches the DLAP Supreme Series

ADLINK Technology Inc., a global leader in edge computing, unveiled its new "DLAP Supreme Series", an edge generative AI platform. By integrating Phison's innovative aiDAPTIV+ AI solution, this series overcomes memory limitations in edge generative AI applications, significantly enhancing AI computing capabilities on edge devices. Without increasing high hardware costs, the DLAP Supreme series achieves notable AI performance improvements, helping enterprises reduce the cost barriers of AI deployment and accelerating the adoption of generative AI across various industries, especially in edge computing.

Lower AI Computing Costs and Significantly Improved Performance
As generative AI continues to penetrate various industries, many edge devices encounter performance bottlenecks due to insufficient DRAM capacity when executing large language models, affecting model operations and even causing issues such as inadequate token length. The DLAP Supreme series, leveraging aiDAPTIV+ technology, effectively overcomes these limitations and significantly enhances computing performance. Additionally, it supports edge devices in conducting generative language model training, enabling them with AI model training capabilities and improving their autonomous learning and adaptability.

NVIDIA NIM Microservices and AI Blueprints Usher in New Era of Local AI

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology. Countless times, industry-defining technological breakthroughs have been invented in one place—a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level. These models—offered as NVIDIA NIM microservices—are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32 GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."

Google's Upcoming Tensor G5 and G6 Specs Might Have Been Revealed Early

Details of what is claimed to be Google's upcoming Tensor G5 and G6 SoCs have popped up over on Notebookcheck.net and the site claims to have found the specs on a public platform, without going into any further details. Those that were betting on the Tensor G5—codenamed Laguna—delivering vastly improved performance over the Tensor G4, are likely to be disappointed, at least on the CPU side of things. As previous rumours have suggested, the chip is expected to be manufactured by TSMC, using its N3E process node, but the Tensor G5 will retain the single Arm Cortex-X4 core, although it will see a slight upgrade to five Cortex-A725 cores vs. the three Cortex-A720 cores of the Tensor G4. The G5 loses two Cortex-A520 cores in favour of the extra Cortex-A725 cores. The Cortex-X4 will also remain clocked at the same peak 3.1 GHz as that of the Tensor G4.

Interestingly it looks like Google will drop the Arm Mali GPU in favour of an Imagination Technologies DXT GPU, although the specs listed by Notebookcheck doesn't add up with any of the specs listed by Imagination Technologies. The G5 will continue to support 4x 16-bit LPDDR5 or LPDDR5X memory chips, but Google has added support for UFS 4.0 memory, something that's been a point of complaint for the Tensor G4. Other new additions is support for 10 Gbps USB 3.2 Gen 2 and PCI Express 4.0. Some improvements to the camera logic has also been made, with support for up to 200 Megapixel sensors or 108 Megapixels with zero shutter lag, but if Google will use such a camera or not is anyone's guess at this point in time.
Return to Keyword Browsing
Jun 15th, 2025 22:50 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts