Monday, March 16th 2020

Complete Hardware Specs Sheet of Xbox Series X Revealed

Microsoft just put out of the complete hardware specs-sheet of its next-generation Xbox Series X entertainment system. The list of hardware can go toe to toe with any modern gaming desktop, and even at its production scale, we're not sure if Microsoft can break-even at around $500, possibly counting on game and DLC sales to recover some of the costs and turn a profit. To begin with the semi-custom SoC at the heart of the beast, Microsoft partnered with AMD to deploy its current-generation "Zen 2" x86-64 CPU cores. Microsoft confirmed that the SoC will be built on the 7 nm "enhanced" process (very likely TSMC N7P). Its die-size is 360.45 mm².

The chip packs 8 "Zen 2" cores, with SMT enabling 16 logical processors, a humongous step up from the 8-core "Jaguar enhanced" CPU driving the Xbox One X. CPU clock speeds are somewhat vague. It points to 3.80 GHz nominal and 3.66 GHz with SMT enabled. Perhaps the console can toggle SMT somehow (possibly depending on whether a game requests it). There's no word on the CPU's cache sizes.
The graphics processor is another key component of the SoC given its lofty design goal of being able to game at 4K UHD with real-time ray-tracing. This GPU is based on AMD's upcoming RDNA2 graphics architecture, which is a step up from "Navi" (RDNA), in featuring real-time ray-tracing hardware optimized for DXR 1.1 and support for variable-rate shading (VRS). The GPU features 52 compute units (3,328 stream processors provided each CU has 64 stream processors in RDNA2). The GPU ticks at an engine clock speed of up to 1825 MHz, and has a peak compute throughput of 12 TFLOPs (not counting CPU). The display engine supports resolutions of up to 8K, even though the console's own performance targets at 4K at 60 frames per second, and up to 120 FPS. Variable refresh-rate is supported.

The memory subsystem is similar to what we reported earlier today - a 320-bit GDDR6 memory interface holding 16 GB of memory (mixed chip densities). It's becoming clear that Microsoft isn't implementing a hUMA common memory pool approach. 10 GB of the 16 GB runs at 560 GB/s bandwidth, while 6 GB of it runs at 336 GB/s. Storage is another area that's receiving big hardware uplifts: the Xbox Series X features a 1 TB NVMe SSD with 2400 MB/s peak sequential transfer rate, and an option for an additional 1 TB NVMe storage through an expansion module. External storage devices are supported, too, over 10 Gbps USB 3.2 gen 2. The console is confirmed to feature a Blu-ray drive that supports 4K UHD Blu-ray playback. All these hardware specs combine toward what Microsoft calls the "Xbox Velocity Architecture." Microsoft is also working toward improving the input latency of its game controllers.
Add your own comment

128 Comments on Complete Hardware Specs Sheet of Xbox Series X Revealed

You can't handle the truth when you censor a debate when you can't win.

Meanwhile, NVIDIA PR throws in RT cores' TFLOPS into marketing.

Expect AMD PR to weaponize RT cores TFLOPS when "Big Navi" arrives.

Why debate about FP32 general-purpose shader compute (not generalize like SSE) when future game titles have significant RT workloads?
Current shaders accelerate Z-buffer accelerated structures while RT cores accelerate BVH accelerated structures.
Lol, censoring the debate? It's not my fault you're not able to keep a civil tone in a discussion or keep yourself from personal attacks. That's your own responsibility, not mine. You need to calm down and stop projecting your own missteps onto me.

And again, as addressed in my previous post: Nvidia adopting a bad marketing practice does not in any way wake it a good marketing practice. You apparently need to be spoon fed, so let's go through this point by point.

-TFLOPS in GPU performance metrics is generally accepted to mean FP32 TFLOPS, as that is the "baseline" industry-standard operation (single-precision compute) as opposed to higher or lower precisions (FP64, FP16, INT8, INT4, etc.).

-In GPUs these operations are performed by shader cores, which are fundamentally FP32 compute cores (though sometimes with various degrees of FP64 support either through dedicated hardware or the ability to combine two FP32 cores), which can also perform lower precision workloads either natively at the same speed or faster by combining several operations in one core.

-FP32 compute is a very broad category of general compute operations. Some of these operations can be done by various forms of specialized hardware, or can be done in lower precisions at higher speed (through methods like rapid packed math) without sacrificing the quality of the end result.

-Due to FP32 being a broad category a lot of FP32 operations can also be performed more efficiently by making specialized hardware for a subset of operations. This hardware, by virtue of being specialized for a specific subcategory of operations, is not capable of performing general FP32 compute operations.

-As the operations done on the specialized hardware can also be done on FP32 hardware, you can give an approximation of the equivalent FP32 performance necessary to match the performance of the specialized hardware. I.e. you can say things like "to match the performance of our RT cores you would need X number of FP32 FLOPS". These calculations are then dependent on - among other things - how efficient your implementation of said operation through general FP32 compute is. Two different solutions will very likely perform differently, and will thus result in different numbers for the same hardware.

-This is roughly equivalent to how fixed-function video encode/decode blocks can do this specialized subset of work faster and more efficiently than the same work performed on a CPU or GPU. That doesn't mean you can run your OS or games off a video encode/decode block, as this block is only capable of a small set of operations.

-These comparisons can't be expanded to other tasks, as the specialized hardware is not capable of general FP32 compute. FP32 hardware can do RT; RT hardware can't do FP32. I.e. you cannot say that "our RT cores are capable of X FP32 FLOPS" - because that statement is fundamentally untrue - your RT hardware is capable of zero FP32 FLOPS. That your F1 car (specialized hardware) can do some of the things your Civic (general hardware) can do - driving on a flat surface - and is "X times better" at that (i.e. faster around a track) does not mean that this can be transferred to the other things the general hardware can do - your F1 car has nowhere to put your groceries and would get stuck on the first speed bump you encountered, so it is fundamentally incapable of grocery shopping. It would also be fundamentally incapable of driving your friends around, or letting you listen to the radio while commuting. Just because specialized hardware can be compared to general hardware in the task the specialized hardware can do does not mean this comparison can be expanded into the other tasks that general hardware can do - because the specialized hardware is fundamentally incapable of doing these things.

-So, to sum up: AMD made a claim in marketing that, while technically true, needs to be understood in a very specific way to be true, and is very easy to misunderstand and thus misrepresent the capabilities of the hardware in question. The Xbox Series X is capable of 12.1 TFLOPS of FP32 compute. When performing combined rasterization and RT graphics workloads, it is capable of performing an amount of RT compute that would require 13 TFLOPS of FP32 compute to achieve if said workload was run on pure FP32 hardware (which it isn't, it's run on RT hardware). It is not, and will never be, capable of 25 TFLOPS of FP32 compute. Nvidia copying this does not in any way make it less problematic - I would say it makes it a lot more problematic, as there's no way of knowing if the two companies' ways of performing RT workloads on FP32 cores is equally performant, and unless they are, any comparisons are entirely invalid. Especially problematic is the fact that conversions like this make worse performance look better: if your RT-through-FP32 implementation is worse than the competition, you can claim that your RT hardware is equivalent to more FP32 hardware than theirs is. This tells us nothing of actual performance, only performance relative to something unknown and unknowable.

This just boils down to a very clear demonstration of how utterly useless FP32 FLOPS are as a metric of GPU performance. Not only is the translation from FP32 compute (TFLOPS) into gaming performance not 1:1 but dependent on drivers, hardware utilization, and architectural features, but this now adds another stack abstraction layers, meaning that any numbers made in this way are completely and utterly incomparable. Comparing FLOPS from pure shader hardware across AMD and Nvidia was already comparing apples and oranges, but now it's more like comparing apples and ... hedgehogs. Or something.

Btw, I would sincerely like to see you point out what of the above (or my previous posts on this) makes me an AMD fanboy. The ball's in your court on that one.
Posted on Reply
FP64 for life!

*Runs away*
I definitely prefer my games in FP64. I also like the CPU load for the games to run on the CPU's video encode/decode block only ;)
Posted on Reply
Add your own comment