Wednesday, September 12th 2018

NVIDIA Announces Tesla T4 Tensor Core GPU

Fueling the growth of AI services worldwide, NVIDIA today launched an AI data center platform that delivers the industry's most advanced inference acceleration for voice, video, image and recommendation services. The NVIDIA TensorRT Hyperscale Inference Platform features NVIDIA Tesla T4 GPUs based on the company's breakthrough NVIDIA Turing architecture and a comprehensive set of new inference software.

Delivering the fastest performance with lower latency for end-to-end applications, the platform enables hyperscale data centers to offer new services, such as enhanced natural language interactions and direct answers to search queries rather than a list of possible results. "Our customers are racing toward a future where every product and service will be touched and improved by AI," said Ian Buck, vice president and general manager of Accelerated Business at NVIDIA. "The NVIDIA TensorRT Hyperscale Platform has been built to bring this to reality - faster and more efficiently than had been previously thought possible."
Every day, massive data centers process billions of voice queries, translations, images, videos, recommendations and social media interactions. Each of these applications requires a different type of neural network residing on the server where the processing takes place.

To optimize the data center for maximum throughput and server utilization, the NVIDIA TensorRT Hyperscale Platform includes both real-time inference software and Tesla T4 GPUs, which process queries up to 40x faster than CPUs alone.

NVIDIA estimates that the AI inference industry is poised to grow in the next five years into a $20 billion market.

Industry's Most Advanced AI Inference Platform
The NVIDIA TensorRT Hyperscale Platform includes a comprehensive set of hardware and software offerings optimized for powerful, highly efficient inference. Key elements include:

NVIDIA Tesla T4 GPU -
  • Featuring 320 Turing Tensor Cores and 2,560 CUDA cores, this new GPU provides breakthrough performance with flexible, multi-precision capabilities, from FP32 to FP16 to INT8, as well as INT4. Packaged in an energy-efficient, 75-watt, small PCIe form factor that easily fits into most servers, it offers 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4.
  • NVIDIA TensorRT 5 - An inference optimizer and runtime engine, NVIDIA TensorRT 5 supports Turing Tensor Cores and expands the set of neural network optimizations for multi-precision workloads.
  • NVIDIA TensorRT inference server - This containerized microservice software enables applications to use AI models in data center production. Freely available from the NVIDIA GPU Cloud container registry, it maximizes data center throughput and GPU utilization, supports all popular AI models and frameworks, and integrates with Kubernetes and Docker.
Supported by Technology Leaders Worldwide
Support for NVIDIA's new inference platform comes from leading consumer and business technology companies around the world.

"We are working hard at Microsoft to deliver the most innovative AI-powered services to our customers," said Jordi Ribas, corporate vice president for Bing and AI Products at Microsoft. "Using NVIDIA GPUs in real-time inference workloads has improved Bing's advanced search offerings, enabling us to reduce object detection latency for images. We look forward to working with NVIDIA's next-generation inference hardware and software to expand the way people benefit from AI products and services."

Chris Kleban, product manager at Google Cloud, said: "AI is becoming increasingly pervasive, and inference is a critical capability customers need to successfully deploy their AI models, so we're excited to support NVIDIA's Turing Tesla T4 GPUs on Google Cloud Platform soon."

More information, including details on how to request early access to T4 GPUs on Google Cloud Platform, is available here.

dditional companies, including all major server manufacturers, voicing support for the NVIDIA TensorRT Hyperscale Platform include:

"Cisco's UCS portfolio delivers policy-driven, GPU-accelerated systems and solutions to power every phase of the AI lifecycle. With the NVIDIA Tesla T4 GPU based on the NVIDIA Turing architecture, Cisco customers will have access to the most efficient accelerator for AI inference workloads - gaining insights faster and accelerating time to action."
- Kaustubh Das, vice president of product management, Data Center Group, Cisco

"Dell EMC is focused on helping customers transform their IT while benefiting from advancements such as artificial intelligence. As the world's leading provider of server systems, Dell EMC continues to enhance the PowerEdge server portfolio to help our customers ultimately achieve their goals. Our close collaboration with NVIDIA and historical adoption of the latest GPU accelerators available from their Tesla portfolio play a vital role in helping our customers stay ahead of the curve in AI training and inference."
- Ravi Pendekanti, senior vice president of product management and marketing, Servers & Infrastructure Systems, Dell EMC

"Fujitsu plans to incorporate NVIDIA's Tesla T4 GPUs into our global Fujitsu Server PRIMERGY systems lineup. Leveraging this latest, high-efficiency GPU accelerator from NVIDIA, we will provide our customers around the world with servers highly optimized for their growing AI needs."
- Hideaki Maeda, vice president of the Products Division, Data Center Platform Business Unit, Fujitsu Ltd.

"At HPE, we are committed to driving intelligence at the edge for faster insight and improved experiences. With the NVIDIA Tesla T4 GPU, based on the NVIDIA Turing architecture, we are continuing to modernize and accelerate the data center to enable inference at the edge."
- Bill Mannel, vice president and general manager, HPC and AI Group, Hewlett Packard Enterprise

"IBM Cognitive Systems is able to deliver 4x faster deep learning training times as a result of a co-optimized hardware and software on a simplified AI platform with PowerAI, our deep learning training and inference software, and IBM Power Systems AC922 accelerated servers. We have a history of partnership and innovation with NVIDIA, and together we co-developed the industry's only CPU-to-GPU NVIDIA NVLink connection on IBM Power processors, and we are excited to explore the new NVIDIA T4 GPU accelerator to extend this state of the art leadership for inference workloads."
- Steve Sibley, vice president of Power Systems Offering Management, IBM

"We are excited to see NVIDIA bring GPU inference to Kubernetes with the NVIDIA TensorRT inference server, and look forward to integrating it with Kubeflow to provide users with a simple, portable and scalable way to deploy AI inference across diverse infrastructures."
- David Aronchick, co-founder and product manager of Kubeflow

"Open source cross-framework inference is vital to production deployments of machine learning models. We are excited to see how the NVIDIA TensorRT inference server, which brings a powerful solution for both GPU and CPU inference serving at scale, enables faster deployment of AI applications and improves infrastructure utilization."
- Kash Iftikhar, vice president of product development, Oracle Cloud Infrastructure

"Supermicro is innovating to address the rapidly emerging high-throughput inference market driven by technologies such as 5G, Smart Cities and IOT devices, which are generating huge amounts of data and require real-time decision making. We see the combination of NVIDIA TensorRT and the new Turing architecture-based T4 GPU accelerator as the ideal combination for these new, demanding and latency-sensitive workloads and plan to aggressively leverage them in our GPU system product line."
- Charles Liang, president and CEO, Supermicro
Add your own comment

31 Comments on NVIDIA Announces Tesla T4 Tensor Core GPU

#1
First Strike
Real intriguing, seems to be a TU104 cutdown. Cut from 48 SM to 40 SM, 6 GPC to 5 GPC. What's the point?

First Strike said:
Real intriguing, seems to be a TU104 cutdown. Cut from 48 SM to 40 SM, 6 GPC to 5 GPC. What's the point?
Oh 75W TDP, super-binned I see.
@btarunr You seemed to forget to mention the memory side of things.:confused:
Posted on Reply
#2
Arjai
inference. Used 11 times, throughout this article..

"or assumed to be true"
"includes hypotheses"

BS, to me.
Posted on Reply
#3
cucker tarlson
Arjai said:
inference. Used 11 times, throughout this article..

"or assumed to be true"
"includes hypotheses"

BS, to me.
why ?
don't know much about AI acceleration,but isn't it all about creating the most accurate outcome based on statistical data ?
Posted on Reply
#4
First Strike
Arjai said:
inference. Used 11 times, throughout this article..
"or assumed to be true"
"includes hypotheses"

BS, to me.
Ya, you just trolled on a scientific jargon that is quite fundamental in neural network AI field.
A neural network AI = training model&algorithm + inferencing algorithm. Now you called the second part BS.
Posted on Reply
#5
notb
Arjai said:
inference. Used 11 times, throughout this article..

BS, to me.
"Inference" is a mathematical term. That's how we call the thing neural networks do. :-)
Hence, it appears a lot in a text about product designed for training neural networks.

Open a text about a gaming GPU and check how many times words "game" and "gaming" appear. Is that also BS? :-)
Posted on Reply
#6
ZoneDymo
things are getting pretty tense
Posted on Reply
#7
techy1
ZoneDymo said:
things are getting pretty tense
*ba dum tss*
:roll:
Posted on Reply
#8
DeathtoGnomes
This is all fine and dandy until Dave opens the airlock.
Posted on Reply
#9
techy1
DeathtoGnomes said:
This is all fine and dandy until Dave opens the airlock.
I am sorry Dave I'm afraid I can't do that
Posted on Reply
#12
jabbadap
theoneandonlymrk said:
Could this be the pro incarnation of the 2060?
uhm, how about no? RTX 2070 has 2304 ccs this one has 2560.
Posted on Reply
#13
Vayra86
ZoneDymo said:
things are getting pretty tense
Real Tense. ;)

Arjai said:
inference. Used 11 times, throughout this article..

"or assumed to be true"
"includes hypotheses"

BS, to me.
This is pure gold. BS to you doesn't make it BS in the real world. But it's telling - also the person who gave you a big +3 on that post doesn't surprise me one bit...

The fact is, if RT and deep learning has a right to exist, its precisely in this segment of the market (and not gaming GPUs). You did notice this isn't a Geforce release, I hope?
Posted on Reply
#14
medi01
Something something, every day, massive, billions, even more expensive, something.
Posted on Reply
#15
theoneandonlymrk
jabbadap said:
uhm, how about no? RTX 2070 has 2304 ccs this one has 2560.
awe did a question cause you to be sarcastic umn No, you decided to answer like a tool.
Posted on Reply
#16
DeathtoGnomes
theoneandonlymrk said:
awe did a question cause you to be sarcastic umn No, you decided to answer like a tool.
come on keep it civil, no name calling, Mr. Tool. :nutkick:
Posted on Reply
#17
jabbadap
theoneandonlymrk said:
awe did a question cause you to be sarcastic umn No, you decided to answer like a tool.
Well let's try this way. No it's not. 1.) it's server part not a pro(quadro) part, 2.) it has more cuda cores than 2070.

Well one thing that it might have been sort of 2060, if full tu106 would have 2560cc(which i doubt because of 3 GPC:s with Turing SM structure is 2304cc, I doubt nvidia would change that). And there is rumors that RTX 2070 is tu106 not a cut down tu104, which would make a rtx 2070 as successor of gtx1060 not gtx0170.
Posted on Reply
#18
theoneandonlymrk
DeathtoGnomes said:
come on keep it civil, no name calling, Mr. Tool. :nutkick:
Hope you realise that's just you ,i said Like a tool.
@jabbadap that is better ty.
Posted on Reply
#19
Vayra86
medi01 said:
Something something, every day, massive, billions, even more expensive, something.
Fantastic contribution! Thanks man, where would we be without your wisdom.
Posted on Reply
#20
T4C Fantasy
CPU & GPU DB Maintainer
jabbadap said:
uhm, how about no? RTX 2070 has 2304 ccs this one has 2560.
Yea and 2070 is TU106 which maxes at 2304

Its a fact that 2070 is TU106, not a rumor xD
Posted on Reply
#21
Arjai
For the sake of toast. Will some of you take a deep breath, maybe engage brain, before instantly become some know it all prick?

BS, to me. Perhaps it is not, to you. Maybe, since everyone here has become sooooo sensitive, I should have phrased it, "No offense to you, or you, or you but, this all seems like a pile of malarkey, to me."

In fact, keep an eye out for my new phrase...:shadedshu:
Posted on Reply
#22
cucker tarlson
T4C Fantasy said:
Yea and 2070 is TU106 which maxes at 2304

Its a fact that 2070 is TU106, not a rumor xD
Weird,but seems true. Full 106 with 14gbps ddr6 chip will breathe down the neck of a cut 104 xx80 card this time. Unless 8 SM per GPC vs 12 on 106 will make a difference.

Arjai said:
For the sake of toast. Will some of you take a deep breath, maybe engage brain, before instantly become some know it all prick?

BS, to me. Perhaps it is not, to you. Maybe, since everyone here has become sooooo sensitive, I should have phrased it, "No offense to you, or you, or you but, this all seems like a pile of malarkey, to me."

In fact, keep an eye out for my new phrase...:shadedshu:
If you complain that everyone's so sensitive these days, why do you seem to be the only one triggered by anything that comes from nvidia or intel ?
Posted on Reply
#23
Fluffmeister
This card certainly packs a punch for something that fits in a 75 watt package.
Posted on Reply
#25
jabbadap
cucker tarlson said:
8.1TFlops FP32 at 75W,crazy.
Yeah, that would be one crazy low profile htpc card, if only nvidia would make Geforce variant of it...
Posted on Reply
Add your own comment