Tesla K20 GPU Compute Processor Specifications Released

btarunr · Oct 17, 2012

Specifications of NVIDIA's Tesla K20 GPU compute processor, which was launched way back in May, are finally disclosed. We've known since then that the K20 is based on NVIDIA's large GK110 GPU, a chip never used to power a GeForce graphics card, yet. Apparently, NVIDIA is leaving some room on the silicon that allows it to harvest it better. According to a specifications sheet compiled by Heise.de, Tesla K20 will feature 13 SMX units, compared to the 15 available on the GK110 silicon.

With 13 streaming multiprocessor (SMX) units, the K20 will be configured with 2,496 CUDA cores (as opposed to 2,880 physically present on the chip). The core will be clocked at 705 MHz, yielding single-precision floating point performance of 3.52 TFLOP/s, and double-precision floating point performance of 1.17 TFLOP/s. The card packs 5 GB of GDDR5 memory, with memory bandwidth of 200 GB/s. Dynamic parallelism, Hyper-Q, GPUDirect with RDMA are part of the new feature-set. The TDP of the GPU is rated at 225W, and understandably, it uses a combination of 6-pin and 8-pin PCI-Express power connectors. Built in the 28 nm process, the GK110 packs a whopping 7.1 billion transistors.

View at TechPowerUp Main Site

TheGuruStud · Oct 17, 2012

So, buy 5870s. Got it

sergionography · Oct 17, 2012

in other words it can almost match tahiti

HumanSmoke · Oct 17, 2012

Seems like a repeat of GF100/110. Hardly surprising if the die is 500mm^2+

The first Fermi Tesla's (M2050/M2070) out of the gate were basically GTX 470 spec. M2090 released more recently is pretty much a GTX 580.

Would be interesting to know whether these Tesla's are the same SKU's that ORNL are taking delivery of, or whether they are higher spec since Oak Ridge seemed to be the high profile launch customer.

sergionography said:
in other words it can almost match tahiti

Any comparison probably depends on actual performance efficiency rather than hypothetical. Unless you know what K20 brings to the table, a theoretical comparison is largely useless.

BTW: The original site now no longer features any specification

Solaris17 · Oct 17, 2012

those cores.....my god.

[H]@RD5TUFF · Oct 17, 2012

do want

bogami · Oct 17, 2012

Estimated 20 PFOPS/s peak petaflops .!!! :eek:

and3.52 TFLOP/s normal. D.P.1.17 TFLOPS/s.
Nice peak.
I wish 20 PFOPS/s on next GPU option.

The Von Matrices · Oct 17, 2012

5GB of memory? That's not evenly divisible by the 384-bit memory bus it was rumored to have. Has it been reduced to 320-bit, which could produce an even 5GB?

HumanSmoke · Oct 17, 2012

According to the original info available on the site, the card they were listing the specification for did have one memory controller inactive, so yes, a 320-bit memory bus. The full specification card is 384-bit/ 6, 12 and possibly 24GB

btarunr · Oct 17, 2012

The Von Matrices said:
5GB of memory? That's not evenly divisible by the 384-bit memory bus it was rumored to have. Has it been reduced to 320-bit, which could produce an even 5GB?

Mix matching. Just like 2 GB is made possible on 192-bit.

Prima.Vera · Oct 17, 2012

LOL. 7 billion transistors! I remember that my old 3dfx VooDoo 3 was having 7 million transistors and was the fastest when released.

)))

The Von Matrices · Oct 17, 2012

btarunr said:
Mix matching. Just like 2 GB is made possible on 192-bit.

True, that is possible. But would it really be done on a high-end compute card where consistent and predictable performance is important? It would be a headache for developers to have to track which addresses they write and determine which data should go in the more or less interleaved parts of the memory space.

Maban · Oct 17, 2012

It's probably twenty 256MB chips on a 320-bit bus.

btarunr · Oct 17, 2012

The Von Matrices said:
True, that is possible. But would it really be done on a high-end compute card where consistent and predictable performance is important? It would be a headache for developers to have to track which addresses they write and determine which data should go in the more or less interleaved parts of the memory space.

Low level video memory management is handled by API>CUDA>driver. Apps are oblivious to that. Apps are only told that there's 5 GB of memory, and to deal with it.

largon · Oct 17, 2012

That die shot definitely has 384bits worth of memory bus...

T4C Fantasy · Oct 17, 2012

http://www.techpowerup.com/gpudb/564/NVIDIA_Tesla_K20.html

Xzibit · Oct 17, 2012

HumanSmoke said:
Any comparison probably depends on actual performance efficiency rather than hypothetical. Unless you know what K20 brings to the table, a theoretical comparison is largely useless.

Incase you didnt know Mark Harris points out he works for Nvidia.

So you might want to check who runs the sites your linking to if you want to link to un-bias information.

It be like linking to sites/blog run by AMD employees to make a point or further a view point of a AMD product.

Just silly.

HumanSmoke · Oct 17, 2012

Xzibit said:
Incase you didnt know Mark Harris points out he works for Nvidia

The report is a scientific paper published by the University of Aizu. It has nothing to do with Nvidia. Take your useless trolling elsewhere

cadaveca · Oct 17, 2012

woah, how'd i miss this. Thanks for bumping, Smoke!

:roll:

Xzibit · Oct 17, 2012

HumanSmoke said:
The report is a scientific paper published by the University of Aizu. It has nothing to do with Nvidia. Take your useless trolling elsewhere

Talk about idiot fanboyism.

That site is run by Mark Harris a Nvidia employee. Are you so naive that hes gonna post un-bias research link on his site/blog.
Nvidia would find a way to fire him in a second if he posted links to research papers that put Nvidia in a bad light.

It only took me 1 mouse click to findout he was a Nvidia employee. Come-on now. Whos trolling now ?

Atleast show both sides or attempt to so you wont seam like a Nvidia cheerleader

The performance of DGEMM in Fermi using this algorithm is
shown in Figure 3, along with the DGEMM performance from CUBLAS 3.1.
Note that the theoretical peak of the Fermi, in this case a C2050, is 515 GFlop/s
in double precision (448 cores 1:15 GHz 1 instruction per cycle). The kernel
described achieves up to 58% of that peak.

Thats from a Oak Ridge National Labaratory along with University of Tennesse and University of Manchester in UK study.

58% is lower then 90% in DGEMM. Maybe Kepler GK100/110 has a 34% jump who knows but chip on the GTX 280 was only 34% in DGEMM.

What do i know tho. I would think Oak Ridge National Labaratory does since they use the darn things.

HumanSmoke · Oct 17, 2012

Xzibit said:
Talk about idiot fanboyism.

Sure - I'll use your quotes (and mine since you obviously can't RTFP) as examples

Xzibit said:
Thats from a Oak Ridge National Labaratory along with...

Yup. Which just goes to prove that real-world and theoretical numbers differ. Which is exactly as I noted. Likewise I made no assumption based upon a part whose performance is unknown...or do you have access to Kepler information that everyone outside of Nvidia and HPC projects don't?

Unless you know what K20 brings to the table, a theoretical comparison is largely useless.

So what is the DGEMM efficiency of Kepler ?
All I see here is a brief synopsis of Fermi
And of course, at no point did I make an AMD vs Nvidia comparison- quite the opposite in fact

Any comparison probably depends on actual performance efficiency rather than hypothetical

Get back under your bridge Xzibitroll - I'm sick of having to explain simple compound sentences to you.

T4C Fantasy · Oct 18, 2012

Xzibit said:
Talk about idiot fanboyism.

That site is run by Mark Harris a Nvidia employee. Are you so naive that hes gonna post un-bias research link on his site/blog.
Nvidia would find a way to fire him in a second if he posted links to research papers that put Nvidia in a bad light.

It only took me 1 mouse click to findout he was a Nvidia employee. Come-on now. Whos trolling now ?

Atleast show both sides or attempt to so you wont seam like a Nvidia cheerleader

Thats from a Oak Ridge National Labaratory along with University of Tennesse and University of Manchester in UK study.

58% is lower then 90% in DGEMM. Maybe Kepler GK100/110 has a 34% jump who knows but chip on the GTX 280 was only 34% in DGEMM.

What do i know tho. I would think Oak Ridge National Labaratory does since they use the darn things.

http://www.techpowerup.com/gpudb/923/NVIDIA_Tesla_C2050.html

previous gen NVidia architecture calculates floating points by shader clock so the C2050 would be 1Tflop of single precision

Xzibit · Oct 18, 2012

T4C Fantasy said:
http://www.techpowerup.com/gpudb/923/NVIDIA_Tesla_C2050.html

previous gen NVidia architecture calculates floating points by shader clock so the C2050 would be 1Tflop of single precision

Those test are done in Double-percision. For single-percision it would be SGEMM.
C2050 is 515 GFlop/s in double precision so its only 58% as advertised.

Kepler would have to make up alot of ground in effeciency.

The point i was try'n to make was..

Pointing to a 90% effeciency of Tahiti in DGEMM as if its a bad thing, Especially from a site/blog of a Nvidia employee.
As compared to what ? Nvidias Fermi 58% effeciency in DGEMM ? That Nvidia employee doesnt have a link to that on his site. Wonder why ?
Even if Tahiti ran 58% it still be twice as fast in DGEMM compared to Fermi.

Given K20 is similar spec to W9000 and W8000 It would have to bring its efficiency up in such a comparison.
Maybe the K20 has better effeciency but when someone says hey look AMD can only do 90% when they fail to mention Nvidia only does 58% thats kind cheerleading to me.

We need to see Keplers DGEMM effeciency to see what % it is to its specs/as advertised.

:toast:

Update:
Nvidias marketing slides put DGEMM efficiency of K20 at 80% and Fermi at 60-65%. So if Oak Ridge National Laboratories put it 2% shy of 60% I would say the window would be 78-80% efficiency for K20. So we are more then likely going to see a draw between K20 & W9000 in DGEMM if the marketing slides of 80% effeciency are met.

HumanSmoke · Oct 18, 2012

Xzibit said:
Update:
Nvidias marketing slides put DGEMM efficiency of K20 at 80% and Fermi at 60-65%.

As per usual the troll can't even parse a sentence without altering the content to suit its needs:

Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM efficiency

Nvidia whitepaper May 2012. (pdf)
Still, coming from someone who openly admits to lying, and up until recently didn't even know the difference between a 3D rendering card and a math co-processor, it's hardly surprising.

Xzibit said:
I lied i just wanted to

Keep up with the straw man AMD vs Nvidia bullshit and the hypothetical numbers game. I'll stand by my preference for real world testing*

HumanSmoke said:
Any comparison probably depends on actual performance efficiency rather than hypothetical. Unless you know what K20 brings to the table, a theoretical comparison is largely useless.

*By your reasoning the AMD FirePro W9000 (3.99 TF SP, 1 TF DP) should be four times faster than a Quadro 6000 (1 TF SP, 515 GF DP)...after all, numbers don't lie right?
No...
No...
No

Xzibit · Oct 18, 2012

HumanSmoke said:
As per usual the troll can't even parse a sentence without altering the content:

Nvidia whitepaper May 2012. (pdf)

Now we are taking marketing slides as facts. Guess that doesnt surprise me.

This coming from the idiot who didnt even know who ran GPGPU.ORG

Mark Harris,
Chief Technologist, GPU Computing @ Nvidia

I thought we wanted hard numbers not marketing B.S.

Are you gonna link to Jen-Hsun Huang blog next so we can get nvidia links from there aswell :laugh:

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

Processor	OCed 5800X3D
Motherboard	Asucks C6H
Cooling	Air
Memory	32GB
Video Card(s)	OCed 9070XT red devil
Storage	NVMees
Display(s)	32" Dull curved 1440
Case	Freebie glass idk
Audio Device(s)	Sennheiser, Custom 5.1
Power Supply	Don't even remember

System Name	MoneySink
Processor	2600K @ 4.8
Motherboard	P8Z77-V
Cooling	AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory	16GB Crucial Ballistix DDR3-1600C8
Video Card(s)	GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage	Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s)	Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case	NZXT Switch 810
Audio Device(s)	onboard Realtek yawn edition
Power Supply	Seasonic X-1050
Software	Win8.1 Pro
Benchmark Scores	3.5 litres of Pale Ale in 18 minutes.

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid 5090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on Schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Razer Viper mini signature edition (mercury white)
Keyboard	Wooting 80 HE White, Gateron Jades
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.

System Name	White Boy
Processor	Core i7 3770k @4.6 Ghz
Motherboard	ASUS P8Z77-I Deluxe
Cooling	CORSAIR H100
Memory	CORSAIR Vengeance 16GB @ 2177
Video Card(s)	EVGA GTX 680 CLASSIEFIED @ 1250 Core
Storage	2 Samsung 830 256 GB (Raid 0) 1 Hitachi 4 TB
Display(s)	1 Dell 30U11 30"
Case	BIT FENIX Prodigy
Audio Device(s)	none
Power Supply	SeaSonic X750 Gold 750W Modular
Software	Windows Pro 7 64 bit \|\| Ubuntu 64 Bit
Benchmark Scores	2017 Unigine Heaven :: P37239 3D Mark Vantage

Tesla K20 GPU Compute Processor Specifications Released

btarunr

Editor & Senior Moderator

TheGuruStud

sergionography

HumanSmoke

Solaris17

Super Dainty Moderator

[H]@RD5TUFF

bogami

The Von Matrices

HumanSmoke

btarunr

Editor & Senior Moderator

Prima.Vera

The Von Matrices

Maban

btarunr

Editor & Senior Moderator

largon

T4C Fantasy

CPU & GPU DB Maintainer

Xzibit

HumanSmoke

cadaveca

My name is Dave

Xzibit

HumanSmoke

T4C Fantasy

CPU & GPU DB Maintainer

Xzibit

HumanSmoke

Xzibit

System Name	PC.
Processor	i7 2600K 5.0Gh,i7 3770K 5.00Gh. EK, Liqed Coooleng
Motherboard	P67A-UD7-B3 Gigabyte T.,ASUS,P8Z77-V PREMIUM,MAXIMUS V EXTRIME..
Cooling	Liqed Cooleng ,EK Suprime LTX Nickel,EK for Motherboard,Aqua computer (WGA), Thermaltake .... 0i,
Memory	G.SKILL F3-17600CL7-2GBPISG. 16GBSkill Sniper F3-17000CL94GBSR on 2400Hz 10-12-11-29 1
Video Card(s)	GTX590 ,SLI ,POV TGT best 691Hz ,LiqedCoold,GTX480.....GTX1080MSI SeaHawkEK SLI
Storage	OCZ-REVODRIVE 3-240GB,2xCrucialMX100.512.R-0,1x LMT-32L3m,3x 1TB-WD,1x;1x2TbSEAGATE1x2Tb Seagate
Display(s)	DELL-U2412Mb,Samsung Synkmaster245B,HP ENVY 34c
Case	Thermaltake, NZXT SWITCH 810SE
Audio Device(s)	CREATIVE BLASTER X-Fi Titanium HD , AUNE T1MK2 TUBE USB
Power Supply	ENERMAX Platimax 1500W,Thermaltake 1500W
Mouse	VIPER V560,FUNC MS-3, Prestigio, R.A.T.E.7 and 5,LogitechG502,RAZER,Inperator.,dead...a.s.o.
Keyboard	Trust ....LogotechG410
Software	Windows7 64....
Benchmark Scores	3DMark Fire Strike 21.385 (37.234,11.828,7.176)

System Name	My Surround PC
Processor	AMD Ryzen 9 7950X3D
Motherboard	ASUS STRIX X670E-F
Cooling	Swiftech MCP35X / EK Quantum CPU / Alphacool GPU / XSPC 480mm w/ Corsair Fans
Memory	96GB (2 x 48 GB) G.Skill DDR5-6000 CL30
Video Card(s)	MSI NVIDIA GeForce RTX 4090 Suprim X 24GB
Storage	WD SN850 2TB, Samsung PM981a 1TB, 4 x 4TB + 1 x 10TB HGST NAS HDD for Windows Storage Spaces
Display(s)	2 x Viotek GFI27QXA 27" 4K 120Hz + LG UH850 4K 60Hz + HMD
Case	NZXT Source 530
Audio Device(s)	Sony MDR-7506 / Logitech Z-5500 5.1
Power Supply	Corsair RM1000x 1 kW
Mouse	Patriot Viper V560
Keyboard	Corsair K100
VR HMD	HP Reverb G2
Software	Windows 11 Pro x64
Benchmark Scores	Mellanox ConnectX-3 10 Gb/s Fiber Network Card

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	KUROUTOSHIKOU RTX 5080 GALAKURO
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

System Name	Ladpot ◦◦◦ Desktop
Processor	R7 5800H ◦◦◦ i7 4770K, watercooled
Motherboard	HP 88D2 ◦◦◦ Asus Z87-C2 Maximus VI Formula
Cooling	Mixed gases ◦◦◦ Fuzion V1, MCW60/R2, DDC1/DDCT-01s top, PA120.3, EK200, D12SL-12, liq.metal TIM
Memory	2× 8GB DDR4-3200 ◦◦◦ 2× 8GB Crucial Ballistix Tactical LP DDR3-1600
Video Card(s)	RTX 3070 ◦◦◦ heaps of dead GPUs in the garage
Storage	Samsung 980 PRO 2TB ◦◦◦ Samsung 840Pro 256@178GB + 4× WD Red 2TB in RAID10 + LaCie Blade Runner 4TB
Display(s)	HP ZR30w 30" 2560×1600 (WQXGA) H2-IPS
Case	Lian Li PC-A16B
Audio Device(s)	Onboard
Power Supply	Corsair AX860i
Mouse	Logitech MX Master 2S / Contour RollerMouse Red+
Keyboard	Logitech Elite Keyboard from 2006 / Contour Balance Keyboard / Logitech diNovo Edge
Software	W11 x64 ◦◦◦ W10 x64
Benchmark Scores	It does boot up? I think.

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-14900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX DDR4
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 128GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G PRO X 2 Lightspeed
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915 X Lightspeed
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329