News Posts matching #CUDA

Return to Keyword Browsing

New NVIDIA EGX Edge Supercomputing Platform Accelerates AI, IoT, 5G at the Edge

NVIDIA today announced the NVIDIA EGX Edge Supercomputing Platform - a high-performance, cloud-native platform that lets organizations harness rapidly streaming data from factory floors, manufacturing inspection lines and city streets to securely deliver next-generation AI, IoT and 5G-based services at scale, with low latency.

Early adopters of the platform - which combines NVIDIA CUDA-X software with NVIDIA-certified GPU servers and devices - include Walmart, BMW, Procter & Gamble, Samsung Electronics and NTT East, as well as the cities of San Francisco and Las Vegas.

Primate Labs Introduces GeekBench 5, Drops 32-bit Support

Primate Labs, developers of the ubiquitous benchmarking application GeekBench, have announced the release of version 5 of the software. The new version brings numerous changes, and one of the most important (since if affects compatibility) is that it will only be distributed in a 64-bit version. Some under the hood changes include additions to the CPU benchmark tests (including machine learning, augmented reality, and computational photography) as well as increases in the memory footprint for tests so as to better gauge impacts of your memory subsystem on your system's performance. Also introduced are different threading models for CPU benchmarking, allowing for changes in workload attribution and the corresponding impact on CPU performance.

On the Compute side of things, GeekBench 5 now supports the Vulkan API, which joins CUDA, Metal, and OpenCL. GPU-accelerated compute for computer vision tasks such as Stereo Matching, and augmented reality tasks such as Feature Matching are also available. For iOS users, there is now a Dark Mode for the results interface. GeekBench 5 is available now, 50% off, on Primate Labs' store.

NVIDIA Brings CUDA to ARM, Enabling New Path to Exascale Supercomputing

NVIDIA today announced its support for Arm CPUs, providing the high performance computing industry a new path to build extremely energy-efficient, AI-enabled exascale supercomputers. NVIDIA is making available to the Arm ecosystem its full stack of AI and HPC software - which accelerates more than 600 HPC applications and all AI frameworks - by year's end. The stack includes all NVIDIA CUDA-X AI and HPC libraries, GPU-accelerated AI frameworks and software development tools such as PGI compilers with OpenACC support and profilers. Once stack optimization is complete, NVIDIA will accelerate all major CPU architectures, including x86, POWER and Arm.

"Supercomputers are the essential instruments of scientific discovery, and achieving exascale supercomputing will dramatically expand the frontier of human knowledge," said Jensen Huang, founder and CEO of NVIDIA. "As traditional compute scaling ends, power will limit all supercomputers. The combination of NVIDIA's CUDA-accelerated computing and Arm's energy-efficient CPU architecture will give the HPC community a boost to exascale."

NVIDIA's SUPER Tease Rumored to Translate Into an Entire Lineup Shift Upwards for Turing

NVIDIA's SUPER teaser hasn't crystallized into something physical as of now, but we know it's coming - NVIDIA themselves saw to it that our (singularly) collective minds would be buzzing about what that teaser meant, looking to steal some thunder from AMD's E3 showing. Now, that teaser seems to be coalescing into something amongst the industry: an entire lineup upgrade for Turing products, with NVIDIA pulling their chips up one rung of the performance chair across their entire lineup.

Apparently, NVIDIA will be looking to increase performance across the board, by shuffling their chips in a downward manner whilst keeping the current pricing structure. This means that NVIDIA's TU106 chip, which powered their RTX 2070 graphics card, will now be powering the RTX 2060 SUPER (with a reported core count of 2176 CUDA cores). The TU104 chip, which power the current RTX 2080, will in the meantime be powering the SUPER version of the RTX 2070 (a reported 2560 CUDA cores are expected to be onboard), and the TU102 chip which powered their top-of-the-line RTX 2080 Ti will be brought down to the RTX 2080 SUPER (specs place this at 8 GB GDDR6 VRAM and 3072 CUDA cores). This carves the way for an even more powerful SKU in the RTX 2080 Ti SUPER, which should be launched at a later date. Salty waters say the RTX 2080 Ti SUPER will feature and unlocked chip which could be allowed to convert up to 300 W into graphics horsepower, so that's something to keep an eye - and a power meter on - for sure. Less defined talks suggest that NVIDIA will be introducing an RTX 2070 Ti SUPER equivalent with a new chip as well.

Manli Introduces its GeForce GTX 1650 Graphics Card Lineup

Manli Technology Group Limited, the major Graphics Cards, and other components manufacturer, today announced the affordable new member within the 16 series family - Manli GeForce GTX 1650. Manli GeForce GTX 1650 is powered by award-winning NVIDIA Turing architecture. It is also equipped with 4 GB of GDDR5, 128-bit memory controller, and built-in 896 CUDA Cores with core frequency set at 1485 MHz which can dynamically boost up to 1665 MHz. Moreover, Manli GeForce GTX 1650 has less power consumption with only 75W, and no external power supply required.

NVIDIA Extends DirectX Raytracing (DXR) Support to Many GeForce GTX GPUs

NVIDIA today announced that it is extending DXR (DirectX Raytracing) support to several GeForce GTX graphics models beyond its GeForce RTX series. These include the GTX 1660 Ti, GTX 1660, GTX 1080 Ti, GTX 1080, GTX 1070 Ti, GTX 1070, and GTX 1060 6 GB. The GTX 1060 3 GB and lower "Pascal" models don't support DXR, nor do older generations of NVIDIA GPUs. NVIDIA has implemented real-time raytracing on GPUs without specialized components such as RT cores or tensor cores, by essentially implementing the rendering path through shaders, in this case, CUDA cores. DXR support will be added through a new GeForce graphics driver later today.

The GPU's CUDA cores now have to calculate BVR, intersection, reflection, and refraction. The GTX 16-series chips have an edge over "Pascal" despite lacking RT cores, as the "Turing" CUDA cores support concurrent INT and FP execution, allowing more work to be done per clock. NVIDIA in a detailed presentation listed out the kinds of real-time ray-tracing effects available by the DXR API, namely reflections, shadows, advanced reflections and shadows, ambient occlusion, global illumination (unbaked), and combinations of these. The company put out detailed performance numbers for a selection of GTX 10-series and GTX 16-series GPUs, and compared them to RTX 20-series SKUs that have specialized hardware for DXR.
Update: Article updated with additional test data from NVIDIA.

Details on GeForce GTX 1660 Revealed Courtesy of MSI - 1408 CUDA Cores, GDDR 5 Memory

Details on NVIDIA's upcoming mainstream GTX 1660 graphics card have been revealed, which will help put its graphics-cruncinh prowess up to scrutiny. The new graphics card from NVIDIA slots in below the recently released GTX 1660 Ti (which provides roughly 5% better performance than NVIDIA's previous GTX 1070 graphics card) and above the yet-to-be-released GTX 1650.

The 1408 CUDA cores in the design amount to a 9% reduction in computing cores compared to the GTX 1660 Ti, but most of the savings (and performance impact) likely comes at the expense of the 6 GB (8 Gbps) GDDR5 memory this card is outfitted with, compared to the 1660 Ti's still GDDR6 implementation. The amount of cut GPU resources form NVIDIA is so low that we imagine these chips won't be coming from harvesting defective dies as much as from actually fusing off CUDA cores present in the TU116 chip. Using GDDR5 is still cheaper than the GDDR6 alternative (for now), and this also avoids straining the GDDR6 supply (if that was ever a concern for NVIDIA).

NVIDIA Adds New Options to Its MX200 Mobile Graphics Solutions - MX250 and MX230

NVIDIA has added new SKUs to its low power mobility graphics lineup. the MX230 and MX250 come in to replace The GeForce MX130 and MX150, but... there's really not that much of a performance improvement to justify the increase in the series' tier. Both solutions are based on Pascal, so there are no Turing performance uplifts at the execution level.

NVIDIA hasn't disclosed any CUDA core counts or other specifics on these chips; we only know that they are paired with GDDR 5 memory and feature Boost functionality for increased performance in particular scenarios. The strange thing is that NVIDIA's own performance scores compare their MX 130, MX150, and now MX230 and MX250 to Intel's UHD620 IGP part... and while the old MX150 was reported by NVIDIA as offering an up to 4x performance uplift compared to that Intel part, the new MX250 now claims an improvement of 3.5x the performance. Whether this is because of new testing methodology, or some other reason, only NVIDIA knows.

NVIDIA Readies GeForce GTX 1660 Ti Based on TU116, Sans RTX

It looks like RTX technology won't make it to sub-$250 market segments as the GPUs aren't fast enough to handle real-time raytracing, and it makes little economic sense for NVIDIA to add billions of additional transistors for RT cores. The company is hence carving out a sub-class of "Turing" GPUs under the TU11x ASIC series, which will power new GeForce GTX family SKUs, such as the GeForce GTX 1660 Ti, and other GTX 1000-series SKUs. These chips offer "Turing Shaders," which are basically CUDA cores that have the IPC and clock-speeds rivaling existing "Turing" GPUs, but no RTX capabilities. To sweeten the deal, NVIDIA will equip these cards with GDDR6 memory. These GPUs could still have tensor cores which are needed to accelerate DLSS, a feature highly relevant to this market segment.

The GeForce GTX 1660 Ti will no doubt be slower than the RTX 2060, and be based on a new ASIC codenamed TU116. According to a VideoCardz report, this 12 nm chip packs 1,536 CUDA cores based on the "Turing" architecture, and the same exact memory setup as the RTX 2060, with 6 GB of GDDR6 memory across a 192-bit wide memory interface. The lack of RT cores and a lower CUDA core count could make the TU116 a significantly smaller chip than the TU106, and something NVIDIA can afford to sell at sub-$300 price-points such as $250. The GTX 1060 6 GB is holding the fort for NVIDIA in this segment, besides other GTX 10-series SKUs such as the GTX 1070 occasionally dropping below the $300 mark at retailers' mercy. AMD recently improved its sub-$300 portfolio with the introduction of Radeon RX 590, which convincingly outperforms the GTX 1060 6 GB.

NVIDIA Introduces RAPIDS Open-Source GPU-Acceleration Platform

NVIDIA today announced a GPU-acceleration platform for data science and machine learning, with broad adoption from industry leaders, that enables even the largest companies to analyze massive amounts of data and make accurate business predictions at unprecedented speed.

RAPIDS open-source software gives data scientists a giant performance boost as they address highly complex business challenges, such as predicting credit card fraud, forecasting retail inventory and understanding customer buying behavior. Reflecting the growing consensus about the GPU's importance in data analytics, an array of companies is supporting RAPIDS - from pioneers in the open-source community, such as Databricks and Anaconda, to tech leaders like Hewlett Packard Enterprise, IBM and Oracle.

VUDA is a CUDA-Like Programming Interface for GPU Compute on Vulkan (Open-Source)

GitHub developer jgbit has started an open-source project called VUDA, which takes inspiration from NVIDIA's CUDA API to bring an easily accessible GPU compute interface to the open-source world. VUDA is implemented as wrapper on top of the highly popular next-gen graphics API Vulkan, which provides low-level access to hardware. VUDA comes as header-only C++ library, which means it's compatible with all platforms that have a C++ compiler and that support Vulkan.

While the project is still young, its potential is enormous, especially due to the open source nature (using the MIT license). The page on GitHub comes with a (very basic) sample, that could be a good start for using the library.

Intel is Adding Vulkan Support to Their OpenCV Library, First Signs of Discrete GPU?

Intel has submitted the first patches with Vulkan support to their open-source OpenCV library, which is designed to accelerate Computer Vision. The library is widely used for real-time applications as it comes with 1st-class optimizations for Intel processors and multi-core x86 in general. With Vulkan support, existing users can immediately move their neural network workloads to the GPU compute space without having to rewrite their code base.

At this point in time, the Vulkan backend supports Convolution, Concat, ReLU, LRN, PriorBox, Softmax, MaxPooling, AvePooling, and Permute. According to the source code changes, this is just "a beginning work for Vulkan in OpenCV DNN, more layer types will be supported and performance tuning is on the way."

It seems that now, with their own GPU development underway, Intel has found new love for the GPU-accelerated compute space. The choice of Vulkan is also interesting as the API is available on a wide range of platforms, which could mean that Intel is trying to turn Vulkan into a CUDA killer. Of course there's still a lot of work needed to achieve that goal, since NVIDIA has had almost a decade of head start.

NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

The GeForce RTX 2080 Ti is indeed based on an ASIC codenamed "TU102." NVIDIA was referring to this 775 mm² chip when talking about the 18.5 billion-transistor count in its keynote. The company also provided a breakdown of its various "cores," and a block-diagram. The GPU is still laid out like its predecessors, but each of the 72 streaming multiprocessors (SMs) packs RT cores and Tensor cores in addition to CUDA cores.

The TU102 features six GPCs (graphics processing clusters), which each pack 12 SMs. Each SM packs 64 CUDA cores, 8 Tensor cores, and 1 RT core. Each GPC packs six geometry units. The GPU also packs 288 TMUs and 96 ROPs. The TU102 supports a 384-bit wide GDDR6 memory bus, supporting 14 Gbps memory. There are also two NVLink channels, which NVIDIA plans to later launch as its next-generation multi-GPU technology.

NVIDIA GeForce RTX 2000 Series Specifications Pieced Together

Later today (20th August), NVIDIA will formally unveil its GeForce RTX 2000 series consumer graphics cards. This marks a major change in the brand name, triggered with the introduction of the new RT Cores, specialized components that accelerate real-time ray-tracing, a task too taxing on conventional CUDA cores. Ray-tracing and DNN acceleration requires SIMD components to crunch 4x4x4 matrix multiplication, which is what RT cores (and tensor cores) specialize at. The chips still have CUDA cores for everything else. This generation also debuts the new GDDR6 memory standard, although unlike GeForce "Pascal," the new GeForce "Turing" won't see a doubling in memory sizes.

NVIDIA is expected to debut the generation with the new GeForce RTX 2080 later today, with market availability by end of Month. Going by older rumors, the company could launch the lower RTX 2070 and higher RTX 2080+ by late-September, and the mid-range RTX 2060 series in October. Apparently the high-end RTX 2080 Ti could come out sooner than expected, given that VideoCardz already has some of its specifications in hand. Not a lot is known about how "Turing" compares with "Volta" in performance, but given that the TITAN V comes with tensor cores that can [in theory] be re-purposed as RT cores; it could continue on as NVIDIA's halo SKU for the client-segment.

NVIDIA Releases GeForce 388.71 WHQL Drivers

NVIDIA today released the latest version of their GeForce software suite. Version 388.71 is a game-ready one, which brings the best performance profile for the phenomenon that is Player Unknown's BattleGrounds. For professionals, there's added support for CUDA 9.1, and Warframe SLI profiles have been updated. There are also many 3D Vision profiles that have been updated for this release, so make sure to check them out after the break, alongside other bug fixes and known issues.

As always, users can download these drivers right here on TechPowerUp. Just follow the link below.
DOWNLOAD: NVIDIA GeForce 388.71 WHQL

NVIDIA Announces TITAN V "Volta" Graphics Card

NVIDIA in a shock move, announced its new flagship graphics card, the TITAN V. This card implements the "Volta" GV100 graphics processor, the same one which drives the company's Tesla V100 HPC accelerator. The GV100 is a multi-chip module, with the GPU die and three HBM2 memory stacks sharing a package. The card features 12 GB of HBM2 memory across a 3072-bit wide memory interface. The GPU die has been built on the 12 nm FinFET+ process by TSMC. NVIDIA TITAN V maxes out the GV100 silicon, if not its memory interface, featuring a whopping 5,120 CUDA cores, 640 Tensor cores (specialized units that accelerate neural-net building/training). The CUDA cores are spread across 80 streaming multiprocessors (64 CUDA cores per SM), spread across 6 graphics processing clusters (GPCs). The TMU count is 320.

The GPU core is clocked at 1200 MHz, with a GPU Boost frequency of 1455 MHz, and an HBM2 memory clock of 850 MHz, translating into 652.8 GB/s memory bandwidth (1.70 Gbps stacks). The card draws power from a combination of 6-pin and 8-pin PCIe power connectors. Display outputs include three DP and one HDMI connectors. With a wallet-scorching price of USD $2,999, and available exclusively through NVIDIA store, the TITAN V is evidence that with Intel deciding to sell client-segment processors for $2,000, it was a matter of time before GPU makers seek out that price-band. At $3k, the GV100's margins are probably more than made up for.

NVIDIA Announces SaturnV AI Supercomputer Powered by "Volta"

NVIDIA at the Supercomputing 2017 conference announced a major upgrade of its new SaturnV AI supercomputer, which when complete, the company claims, will be not just one of the world's top-10 AI supercomputers in terms of raw compute power; but will also the world's most energy-efficient. The SaturnV will be a cluster supercomputer with 660 NVIDIA DGX-1 nodes. Each such node packs eight NVIDIA GV100 GPUs, which takes the machine's total GPU count to a staggering 5,280 (that's GPUs, not CUDA cores). They add up to an FP16 performance that's scraping the ExaFLOP (1,000-petaFLOP or 10^18 FLOP/s) barrier; while its FP64 (double-precision) compute performance nears 40 petaFLOP/s (40,000 TFLOP/s).

SaturnV should beat Summit, a supercomputer being co-developed by NVIDIA and IBM, which in turn should unseat Sunway TaihuLight, that's currently the world's fastest supercomputer. This feat gains prominence as NVIDIA SaturnV and NVIDIA+IBM Summit are both machines built by the American private-sector, which are trying to beat a supercomputing leader backed by the mighty Chinese exchequer. The other claim to fame of SaturnV is its energy-efficiency. Before its upgrade, SaturnV achieved an energy-efficiency of a staggering 15.1 GFLOP/s per Watt, which was already the fourth "greenest." NVIDIA expects the upgraded SaturnV to take the number-one spot.

25+ Companies Developing Level 5 Robotaxis on NVIDIA CUDA GPUs

NVIDIA today unveiled the world's first artificial intelligence computer designed to drive fully autonomous robotaxis. The new system, codenamed Pegasus, extends the NVIDIA DRIVE PX AI computing platform to handle Level 5 driverless vehicles. NVIDIA DRIVE PX Pegasus delivers over 320 trillion operations per second -- more than 10x the performance of its predecessor, NVIDIA DRIVE PX 2.

NVIDIA DRIVE PX Pegasus will help make possible a new class of vehicles that can operate without a driver -- fully autonomous vehicles without steering wheels, pedals or mirrors, and interiors that feel like a living room or office. They will arrive on demand to safely whisk passengers to their destinations, bringing mobility to everyone, including the elderly and disabled.

NVIDIA Announces the Tesla V100 PCI-Express HPC Accelerator

NVIDIA formally announced the PCI-Express add-on card version of its flagship Tesla V100 HPC accelerator, based on its next-generation "Volta" GPU architecture. Based on the advanced 12 nm "GV100" silicon, the GPU is a multi-chip module with a silicon substrate and four HBM2 memory stacks. It features a total of 5,120 CUDA cores, 640 Tensor cores (specialized CUDA cores which accelerate neural-net building), GPU clock speeds of around 1370 MHz, and a 4096-bit wide HBM2 memory interface, with 900 GB/s memory bandwidth. The 815 mm² GPU has a gargantuan transistor-count of 21 billion. NVIDIA is taking institutional orders for the V100 PCIe, and the card will be available a little later this year. HPE will develop three HPC rigs with the cards pre-installed.

NVIDIA Announces Its Volta-based Tesla V100

Today at its GTC keynote, NVIDIA CEO Jensen Huang took the wraps on some of the features on their upcoming V100 accelerator, the Volta-based accelerator for the professional market that will likely pave the way to the company's next-generation 2000 series GeForce graphics cards. If NVIDIA goes on with its product carvings and naming scheme for the next-generation Volta architecture, we can expect to see this processor on the company's next-generation GTX 2080 Ti. Running the nitty-gritty details (like the new Tensor processing approach) on this piece would be impossible, but there are some things we know already from this presentation.

This chip is a beast of a processor: it packs 21 billion transistors (up from 15,3 billion found on the P100); it's built on TSMC's 12 nm FF process (evolving from Pascal's 16 nm FF); and measures a staggering 815 mm² (from the P100's 610 mm².) This is such a considerable leap in die-area that we can only speculate on how yields will be for this monstrous chip, especially considering the novelty of the 12 nm process that it's going to leverage. But now, the most interesting details from a gaming perspective are the 5,120 CUDA cores powering the V100 out of a total possible 5,376 in the whole chip design, which NVIDIA will likely leave for their Titan Xv. These are divided in 84 Volta Streaming Multiprocessor Units with each carrying 64 CUDA cores (84 x 64 = 5,376, from which NVIDIA is cutting 4 Volta Streaming Multiprocessor Units for yields, most likely, which accounts for the announced 5,120.) Even in this cut-down configuration, we're looking at a staggering 42% higher pure CUDA core-count than the P100's. The new V100 will offer up to 15 FP 32 TFLOPS, and will still leverage a 16 GB HBM2 implementation delivering up to 900 GB/s bandwidth (up from the P100's 721 GB/s). No details on clock speed or TDP as of yet, but we already have enough details to enable a lengthy discussion... Wouldn't you agree?

NVIDIA Announces the TITAN Xp - Faster Than GTX 1080 Ti

NVIDIA GeForce GTX 1080 Ti cannibalized the TITAN X Pascal, and the company needed something faster to sell at USD $1,200. Without making much noise about it, the company launched the new TITAN Xp, and with it, discontinued the TITAN X Pascal. The new TITAN Xp features all 3,840 CUDA cores physically present on the "GP102" silicon, all 240 TMUs, all 96 ROPs, and 12 GB of faster 11.4 Gbps GDDR5X memory over the chip's full 384-bit wide memory interface.

Compare these to the 3,584 CUDA cores, 224 TMUs, 96 ROPs, and 10 Gbps GDDR5X memory of the TITAN X Pascal, and 3,584 CUDA cores, 224 TMUs, 88 ROPs, and 11 GB of 11 Gbps GDDR5X memory across a 352-bit memory bus, of the GTX 1080 Ti. The GPU Boost frequency is 1582 MHz. Here's the catch - the new TITAN Xp will be sold exclusively through GeForce.com, which means it will be available in very select markets where NVIDIA's online store has a presence.

NVIDIA Unveils New Line of Quadro Pascal GPUs

NVIDIA today introduced a range of Quadro products, all based on its Pascal architecture, that transform desktop workstations into supercomputers with breakthrough capabilities for professional workflows across many industries. Workflows in design, engineering and other areas are evolving rapidly to meet the exponential growth in data size and complexity that comes with photorealism, virtual reality and deep learning technologies. To tap into these opportunities, the new NVIDIA Quadro Pascal-based lineup provides an enterprise-grade visual computing platform that streamlines design and simulation workflows with up to twice the performance of the previous generation, and ultra-fast memory.

"Professional workflows are now infused with artificial intelligence, virtual reality and photorealism, creating new challenges for our most demanding users," said Bob Pette, vice president of Professional Visualization at NVIDIA. "Our new Quadro lineup provides the graphics and compute performance required to address these challenges. And, by unifying compute and design, the Quadro GP100 transforms the average desktop workstation with the power of a supercomputer."

AMD Radeon Technology Will Be Available on Google Cloud Platform in 2017

At SC16, AMD announced that Radeon GPU technology will be available to Google Cloud Platform users worldwide. Starting in 2017, Google will use AMD's fastest available single-precision dual GPU compute accelerators, Radeon-based AMD FirePro S9300 x2 Server GPUs, to help accelerate Google Compute Engine and Google Cloud Machine Learning services. AMD FirePro S9300 x2 GPUs can handle highly parallel calculations, including complex medical and financial simulations, seismic and subsurface exploration, machine learning, video rendering and transcoding, and scientific analysis. Google Cloud Platform will make the AMD GPU resources available for all their users around the world.

"Graphics processors represent the best combination of performance and programmability for existing and emerging big data applications," said Raja Koduri, senior vice president and chief architect, Radeon Technologies Group, AMD. "The adoption of AMD GPU technology in Google Cloud Platform is a validation of the progress AMD has made in GPU hardware and our Radeon Open Compute Platform, which is the only fully open source hyperscale GPU compute platform in the world today. We expect that our momentum in GPU computing will continue to accelerate with future hardware and software releases and advances in the ecosystem of middleware and libraries."

NVIDIA Announces Xavier, Volta-based Autonomous Transportation SoC

At its inaugural European edition of the Graphics Technology Conference (GTC), NVIDIA announced Xavier, an "AI supercomputer for the future of autonomous transportation." An evolution of its Drive PX2 board that leverages a pair of "Maxwell" GPUs with some custom logic and an ARM CPU, to provide cars with the compute power necessary to deep-learn the surroundings and self-drive, or assist-drive; Xavier is a refinement over Drive PX2 in that it merges three chips - two GPUs and one control logic into an SoC.

You'd think that NVIDIA refined its deep-learning tech enough to not need a pair of "Maxwell" SoCs, but Xavier is more than that. The 7 billion-transistor chip built on 16 nm FinFET process, offers more raw compute performance thanks to leveraging NVIDIA's next-generation "Volta" architecture, one more advanced than even its current "Pascal" architecture. The chip features a "Volta" GPU with 512 CUDA cores. The CVA makes up the vehicle I/O, while an image processor that's capable of 8K HDR video streams feeds the chip with visual inputs from various cameras around the vehicle. An 8-core ARM CPU performs general-purpose compute. NVIDIA hopes to get the first engineering samples of Xavier out to interested car-makers by Q4-2017.

NVIDIA Launches Maxed-out GP102 Based Quadro P6000

Late last week, NVIDIA announced the TITAN X Pascal, its fastest consumer graphics offering targeted at gamers and PC enthusiasts. The reign of TITAN X Pascal being the fastest single-GPU graphics card could be short-lived, as NVIDIA announced a Quadro product based on the same "GP102" silicon, which maxes out its on-die resources. The new Quadro P6000, announced at SIGGRAPH alongside the GP104-based Quadro P5000, features all 3,840 CUDA cores physically present on the chip.

Besides 3,840 CUDA cores, the P6000 features a maximum FP32 (single-precision floating point) performance of up to 12 TFLOP/s. The card also features 24 GB of GDDR5X memory, across the chip's 384-bit wide memory interface. The Quadro P5000, on the other hand, features 2,560 CUDA cores, up to 8.9 TFLOP/s FP32 performance, and 16 GB of GDDR5X memory across a 256-bit wide memory interface. It's interesting to note that neither cards feature full FP64 (double-precision) machinery, and that is cleverly relegated to NVIDIA's HPC product line, the Tesla P-series.
Return to Keyword Browsing