• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Alleged Intel Sapphire Rapids Xeon Processor Image Leaks, Dual-Die Madness Showcased

Lot of negative energy ITT. We need an Intel resurgence to keep AMD pricing in check.

Need to have negativity or you cant see whats positive anymore either. Intels now on the shitlist until they get serious again. Times change eh
 
Max 96 cores per die most possibly.
That or stick to 64 cores and reduce the number of CCX on those designs to cut down on latency and leaving more headroom for higher clock speeds per core. I mean it doesn't matter too much at this point for AMD they are competing with themselves basically though they should keep their foot on the gas a bit for certain Intel will find it's footing eventually. I suppose they could also keep it 64 cores and reduce the CCX, but throw in some APU tech for good measure.
 
well,wait for it....
its good also that intel finaly get soon its 10nm cpu out.

when that happend,its soon ,then we can reality compare and see how 'good' amd ryzen 7nm really are.


its just useless to compare 14nm cpu against 7nm cpu...sure amd want it that way..


think we seen final battle next year when 10nm adler lake coming,its showing where we go cpu world.


but bfore that intel release is out,march 2021 release is answer for amd vermeer cpu,its rocket lake,still 14nm,but last one that kind.
its should beat easily amd vermeer, and i mean gaming performance.


but ,then,june 2021 incoming adler lake, it is 10nm tech maded and also hydrib cpu,
so,its finally show is it intel step up top of cpu world.
 
but bfore that intel release is out,march 2021 release is answer for amd vermeer cpu,its rocket lake,still 14nm,but last one that kind.

its should beat easily amd vermeer, and i mean gaming performance.

but ,then,june 2021 incoming adler lake, it is 10nm tech maded and also hydrib cpu,
so,its finally show is it intel step up top of cpu world.
I believe even end of 2021 is a best case scenario for Alder Lake. If Intel had these ready in volumes, they would have skipped Rocket Lake, which may seem to be pushed all the way to March.

Still, I wonder how much it matters when Zen 3 is so hard to find, while Comet Lake is in plenty supply. It's really hard to tell how large quantities are shipping when it's sold out everywhere. But I saw one shop estimates the next batch in April, so it makes me wonder what's going on…
 
128 cores after Genoa(up to 96c) with ZEN5 architecture(noname yet for products with this architecture). Milan is up to 64 cores just like Rome. In citated article clear is write that 128c is is 2sp configuration.
 
Max 96 cores per die most possibly.

You mean per package, this means either 12Cx8D or 8Cx12D. Either way, it's not happening till 5nm.
 
Intel has spent 3 years spreading FUD about AMD's glue.

The irony and shame here is fantastic.

If you can't beat them join them ..
 
One of the things I do not see mentioned in this thread is the new flag ship instruction set features that Sapphire Rapids will introduce: AMX.
I see some discussion about AVX-512 but that's not the "new thing" with Sapphire Rapids, AMX is!
In short AMX is the advanced matrix extension. It basically gives you 8KB of new registers files space in the form of 8 separate dynamically (size) configurable (2 dimensional) matrix registers (tmm0-tmm7) compared to AVX-512's 32 separate (1-dimensional) vector 64-byte registers (zmm0-zmm31 ) = 2KB. That's 4x additional register files space!
Those who (me) actually work with and use AVX-512 in programming can see speed ups of 2x to 4x (I've seen even higher in special cases) compared to AVX2. It does require you to have a good grasp of linear algebra and multivariate calculus, SIMD algorithms, because the compilers of today have a really hard time automatically translating high level programming languages to high performing vectorized AVX-512 machine code.
To get the value out of AVX-512 the designer/programmer needs to vectorized the algorithms, not just write sequential programs like usual and hope that the compiler somehow figures it out (it won't). This requires theoretical knowledge of math and computer science, but if you learan how to do it right the reward is great!
An 18 core consumer skylake/cascade lake like 10980XE has 32 FMA's (2 per core on port 0 and port 5) and if you know what you are doing you can execute 32*16 = 512 single precision mult+add flops (or 32-bit integer ops) every 1 CPU cycle (Throughput) or so. That comes in very handy if you know what you are doing and are comfortable vectorizing your algorithms.

So moving to AMX will require even more knowledge to really capitalize and make full use of the extra register capacity and instruction set.

a few points about that AVX-512 and AMX is "only" AI and neural nets: It is true that NN and AI benefits greatly from AVX-512, not only because it comes natural to vectorize and translate to AVX-512 registers (zmm0-zmm31), but also because AVX-512 provides versatility, generality and richness in the instruction set you can use on the zmm## registers, and integrates seamlessly with the rest of your x64 code.
However, it is far from the only application for AVX-512. Again if you know what you are doing as a architect/designer/programmer and spend a bit of effort verctorizing the algorithms and the code you can get similar benefits in the inner loops of other classes of problems/applications (image processing, sound processing etc), anything really that involves processing lots of data applying floating point or integer calculations in some sort of uniform pattern (loops). This is an advantage AVX-512 based general CPUs have over specialized GPUs (fewer but more general flops/iops vs. more but specialized flops/iops)

A lot of weight is put on 10-20 % IPC improvement of the same code (gen over gen AMD or Intel - doesn't matter) which is nice, but if you know what you are doing and vectorize your algorithms you can get 200% - 400% improvements with AVX-512 today and maybe another ???% with AMX - we'll see... I have high hopes... AMX do look great on paper. Intel® Architecture Instruction Set Extensions Programming Reference
 
Last edited:
Right, that's why AMD is winning high profile contracts left and right. Apparently it's a lot easier for companies to switch than you think.

Intel hasn't updated their Xeon platform with anything noteworthy for around 2 years now, that's an eternity in the server space and customers just don't wont settle for an inferior, slower and more costly platform. These things are custom built with huge support teams dedicated towards their maintenance, it's not like you plop in some racks with Intel hardware and they miraculously "just work" and the AMD ones don't. If that fabled Intel support and ecosystem was worth so much they'd never switch, expect they do, because it isn't.

Oh, and Intel's memory technology business side is so good that apparently they're looking to sell it. Hmm.

Sadly no, so far only Big Tech company that have resource to develop it's own infrastructure are able to do so, mostly cloud computer business like AWS, Google and Microsoft, which is all have development team in house

Other big company that isn't tech company does not have luxury, and they already paid another IT Vendor, to write a custom software that run on Xeon hardware, something like McD, Unilever, Coca Cola, PnG, Nestle, Bank Industry and so on, Requesting developing new software to work with whole new architecture and even new hardware vendor is not cheap, even application themselves is not centralized, the transition to new system, possibility downtime, losing sales, ect, not to mention they have to Retest all of their smaller application, that interfacing with these application that run on Server,

these company are not made their own software, they rather pay Intel Tax. and pay additional redundant security measure rather to have Downtime during transition and possibility future unforeseen downtime in future due to new software on new architecture, possibility disrupting business process, or even losing sales, for a company these scale, losing sales even 1 day can equal tons of money, and they rather not take a risk, even the cost of changing software requirment, Testing of all software module, is not cheap, and for multinational company, each country need to do the testing as well since each country business requirement may vary

For now to AMD to completely take over Server market is by All of these Big company is moving to cloud, and that seems unlikely happen very soon, I don't even happen in next 20 years, especially when it involving bank industry
 
on server and datacenter market, it's not only about raw horsepower but more about whole ecosystem

Intel has spend so much time on nuturing Xeon platform, that's their core business, even if the performance wasn't the top, the overall feature that they are using and being utilized by their customer cannot be replaced by AMD

so far I haven't see any simillar technology as Optane on Data Center environment from AMD, while optane is kind of janky on Desktop, it being fully utilized on datacenter platform
Irrelevant when AMDs EPYC are beating anything Intel has to offer in the data centre and server market. The upgrade cycle is a long one, and Intel has entrenched themselves in that market due to years of domination. I believe Intel owns over 90% of this market. If AMD can even capture say 10% or even 15%, that is more than enough to keep AMD pumping out faster, efficient and highly innovative processors. A slight increase for AMD in % is a huge win for the company. That's how much Intel controls that market. But its slowly turning to AMDs favour. And the industry sees AMD now as a compelling & proven alternative. Things are going to look quite interesting in 2-5 years from now.

Intel has spent 3 years spreading FUD about AMD's glue.

The irony and shame here is fantastic.
AMD's ZEN :D caught Intel with its knickers in a knot o_O And they haven't recovered, so you need to ask yourself why they created a job position named "Marketing Strategist" i.e.: Damage Control :kookoo:
That last time I've witnessed Intel do weird things because they are behind innovatively and technologically was back in the day, with its Legendary Athlon 64 days when AMD took the Price/Performance crown.
 
Last edited:
It does require you to have a good grasp of linear algebra and multivariate calculus, SIMD algorithms, because the compilers of today have a really hard time automatically translating high level programming languages to high performing vectorized AVX-512 machine code.
To get the value out of AVX-512 the designer/programmer needs to vectorized the algorithms, not just write sequential programs like usual and hope that the compiler somehow figures it out (it won't). This requires theoretical knowledge of math and computer science, but if you learan how to do it right the reward is great!
The good news is that many workloads are essentially vectorized in nature, but unfortunately the coding practices taught in school these days tell people to ignore the real problem (the data) and instead focus on building a complex "architecture" to hide and scatter the data and state all over the place, making SIMD (or any decent performance really) virtually impossible.

Vectorizing the data is always a good start, but it may not be enough. Whenever you see a loop iterating over a dense array, there is some theoretical potential there, but the solution may not be obvious. Often, the solution is to restructure the data with similar data grouped together (the data oriented approach), rather than the typical "world modelling", then potential for SIMD often will become obvious. But as we know, a compiler can never do this, the developer still has to do the ground work.

So moving to AMX will require even more knowledge to really capitalize and make full use of the extra register capacity and instruction set.
I haven't had time to look into the details of AMX yet, but my gut feeling tells me that any 2D data should have some potential here, like images, video, etc.
 
Is CPU's using math tricks to shortcut calculations? Or this is possible only for analog systems like human brain?
 
Last edited:
Is CPU's using math tricks to shortcut calculations?

Kind of, it might be possible for example that a CPU would execute integer multiplication by shifting the bits instead of doing the actual multiplication.
 
well,wait for it....
its good also that intel finaly get soon its 10nm cpu out.

when that happend,its soon ,then we can reality compare and see how 'good' amd ryzen 7nm really are.


its just useless to compare 14nm cpu against 7nm cpu...sure amd want it that way..


think we seen final battle next year when 10nm adler lake coming,its showing where we go cpu world.


but bfore that intel release is out,march 2021 release is answer for amd vermeer cpu,its rocket lake,still 14nm,but last one that kind.
its should beat easily amd vermeer, and i mean gaming performance.


but ,then,june 2021 incoming adler lake, it is 10nm tech maded and also hydrib cpu,
so,its finally show is it intel step up top of cpu world.
With recent reports claiming 50%+ yeilds and the size of one side/half of an intel core here I'm not surprised there's shortages, hopefully they get their top man sorted soon and gain direction.
 
Is CPU's using math tricks to shortcut calculations? Or this is possible only for analog systems like human brain?
CPUs can do some minor optimization in real time, but this is limited to things that are guaranteed to produce identical results, probably limited to specific patterns of instructions which have an "optimal" alternative. CPUs can also unroll some loops and eliminate some redundant instructions in real time.

But the compilers are able to do much more, as they have more time and more context. I don't know how much tricks like these are done by compilers though, since many such integer tricks have a lot of assumptions the compiler might not know. Nevertheless, many of these operations have dedicated instructions that are branchless and faster anyway.
Still, the compilers do things like if a variable is set to 0 (for integers), they will convert this to xor itself, because it's faster.
Compilers like GCC offers flags like -ffast-math, which will do a lot of tricks with floats which may sacrifice some precision or compliance with IEEE 754 in order to gain some extra performance.
 
CPUs can do some minor optimization in real time, but this is limited to things that are guaranteed to produce identical results, probably limited to specific patterns of instructions which have an "optimal" alternative. CPUs can also unroll some loops and eliminate some redundant instructions in real time.

But the compilers are able to do much more, as they have more time and more context. I don't know how much tricks like these are done by compilers though, since many such integer tricks have a lot of assumptions the compiler might not know. Nevertheless, many of these operations have dedicated instructions that are branchless and faster anyway.
Still, the compilers do things like if a variable is set to 0 (for integers), they will convert this to xor itself, because it's faster.
Compilers like GCC offers flags like -ffast-math, which will do a lot of tricks with floats which may sacrifice some precision or compliance with IEEE 754 in order to gain some extra performance.
Almost of this transformations are normal algebra actions not tricks like this which I linked but make some shortcuts.

I read more about math tricks. Its are hundreds maybe thousands. LoL! Maybe has many of them useful to implement inside CPU architecture. Maybe the CPU arhitects and programmers need of help for this at high educated mathematicians with lot of experience?
 
I read more about math tricks. Its are hundreds maybe thousands. LoL! Maybe has many of them useful to implement inside CPU architecture. Maybe the CPU arhitects and programmers need of help for this at high educated mathematicians with lot of experience?
While no one is perfect, I do believe Intel have several thousands of engineers working on chip design, and math usually accounts for ~1/3 of a MS degree, so I don't think that's the problem.

I do wonder what kind of tricks you are talking about though. Like replacing a single multiplication with a shift operation, or more like an approximation like the fast inverse square root?
There are many transformations that are correct algorithmically, but may lead to errors when implemented in fixed precision. The CPU can't do any such things in real time, and never will, but the compiler can, if you ask it to.

Also keep in mind that a CPU decodes 5-6 instructions per clock cycle, so anything optimized in real time needs to be very simple and hardcoded logic. That's not to say someone can't come up with some smarter way of implementing stuff. It's not that many years ago someone came up with a way to do multiplication with less transistors.
 
Also keep in mind that a CPU decodes 5-6 instructions per clock cycle, so anything optimized in real time needs to be very simple and hardcoded logic.
Yes this is problem. Implementation to limited to 0;1;0/1. Must make different achitecture which is more complex on level 0 and to be not limited to working with the binary number system, but to be able to work with all major number systems directly in the hardware, not in the software layers of the operating system and applications. But this seems to be an impossible task for engineers?
 
Back
Top