@Maxx Finally something I can work with.
Reserving best cells for pSLC doesn't seem to be an advantage. Controllers shuffle writes around to use the best cells anyway (like you noted), so the overall number of writes is still the same. That seems to be the only argument in favor of "static cache is better for endurance".
I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.
Static pSLC does use the top layers which due to HAR have better data retention characteristics. Samsung discusses this in a digest for their
6th generation V-NAND (92L, reference 3), citing
this article (see figure 6). For a source on the static SLC part,
see here (start reading at line 53, column 5, then refer to 330-1 in figure 3). Static pSLC is dedicated for the life of the device and never converts back to native flash in operation so does not have the additive wear associated with dynamic pSLC. Because of this, the wear zone (and garbage collection) is separate from native flash and dynamic pSLC (which share a zone), such that the lifetime of the flash is the worst of the two zones. Drives like Intel's 545s, which has static pSLC, counts SLC writes separate from TLC for this reason. There's multiple patents related to this, like
this one. Plenty of
other good patents with more details on endurance with SLC vs. XLC as well (listing 40K P/E for static SLC in this case - I do have articles suggesting the average P/E of a dynamic SLC block, again since it comes from the logical pool of native flash, is a bit lower, e.g. 30K relative).
Fundamentally, all else being equal, static pSLC improves the endurance of the flash for these reasons, however (again all else being equal) as I stated above this may need to be balanced with other factors as the device is worn.
This patent illustrates how and why you might want to reallocate over time - see figure 3B. Specifically also read [0018] and [0019], then [0021] and [0022].
If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).
Sorry I can't find the link to your discord in your profile. Can you share it here?
Here is the patent I'm referencing (hosted on my domain/site), start at line 4 in column 7. To see why they count it as a full TLC erase anyway, start at line 59 in column 6. If you check my sources above (in my quote-reply to bug) you'll see 40K P/E being referenced which is a real possibility - most datasheets are at 30-40K in static/permanent SLC mode (as with QLC for Chia drives). I've posted these on my discord, specifically Micron's datasheets list SLC and TLC mode endurance (B17A for example would be 30000/1500). Also correct that in the same node, native SLC will be 100K+. I've illustrated this by comparing Kioxia's 96L flash - they have digests for TLC, QLC, and SLC (XL Flash) which we have access to also.
I'm on Reddit under NewMaxx and also run a subreddit with the same name (/r/newmaxx) - which links to my server. Not sure on rules about posting these things here.
But if the cache is dynamic, each write is still of the "less harmful" kind.
More to the point, if the static SLC cell only has 1,000 p/e cycles left, that's it, it has 1,000 p/e cycles and there's nothing you can do about it. But if dynamic cache cell is down to 1,000, you can switch to another cell having more p/e cycles available that was previously allotted to main storage.
Not all the cells will be written equally in absolute cycles, the controller picks the cells with the least
effective wear for dynamic SLC mode and cycles through over time. Blocks and their properties (differences) are tracked in tables, for example with bias for programming, because of variation. This variation implies that the dynamic-native zone will have lower average endurance/cycles than dedicated (static) because the average cell/block is in the middle of the deck. Worth noting from my source above, the lower cells do program faster and this is also a characteristic of P/E cycling (i.e. worn out flash performs worse with reads due to ECC and has worse data retention but programs faster due to material breakdown). So we're talking writes per block in comparison, but it's a bit irrelevant when you consider my sources above.