• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Effect of SLC Caching on SSD Endurance

Joined
Dec 17, 2011
Messages
360 (0.07/day)

The Kingston KC3000 has 2000 GB of TLC cache. It can use almost all of it (1930 GB) in SLC mode for 1930/3 = 643 GB.

I keep wondering though. Isn't this writing to the NAND twice? Say I write 100 GB. First I consume 300 GB worth of NAND when writing in SLC mode. Then I consume 100 GB worth NAND in TLC mode. Of course, writing in SLC isn't nearly as harmful but it is somewhat harmful isn't it? We don't even get to choose if we are willing to let go of this SLC caching for better endurance.

@Chris_Ramseyer Can you offer some insights into how harmful (or harmless) SLC caching is to NAND endurance?

@W1zzard Can you offer some insights to this?
 
You mean apart from some/many of the SSD reviews?

Sustained Write Performance SLC Cache
You do realise that if I write 100 GB to the 980 Pro, wait for 15 minutes, the drive moves the data from the SLC cache to the TLC so that the SLC cache is ready to be used again right?

It's not 100% copy of PrimoCache because SLC caches come in lots of different sizes & are implemented differently. But the net result is mostly the same.
I want to know. What do you think PrimoCache does?
 
There's a simple way to test this ~ use CDM or any other benchmark write totally random data to a small file & loop it 5-10 times so that it fits well within the SLC cache size. Check for your write (throughput) speeds on the benchmark application & check actual data written through something like process Hacker, HD sentinel or any other utility in real-time. Now admittedly, like I said it's not 100% copy of PrimoCache, so all that data will not be "trimmed" like it would with PrimoCache but the writes should be much lower than requested by the (benchmark) app. It's important to use a utility which measure realtime data written.

What do you think PrimoCache does?
What do you mean? Do you have an idea of what it does/doesn't do from that screenshot I posted?
 
Maybe yes, but in terms of SSD'/NAND's endurance no.
And once again: if we're dealing with the same number of writes, what makes static cache better for endurance? Are you referring to the fact that, when dealing with static cache, the user-facing storage is only written to once?
 
And once again: if we're dealing with the same number of writes, what makes static cache better for endurance? Are you referring to the fact that, when dealing with static cache, the user-facing storage is only written to once?
Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).
 
What do you mean? Do you have an idea of what it does/doesn't do from that screenshot I posted?
PrimoCache and TLC SSD's SLC cache have different objectives and thus different approaches.

The history behind SLC caching is that when Samsung first released their first TLC SSD, the 840 back in 2012 they found that compared to its MLC counterpart (840 Pro), it had comparable read speed but much slower writes speeds. So to compensate for the slow write speed of native TLC NAND, they created SLC caching (840 Evo) in 2013. The objective of SLC caching was to increase write speeds for TLC NAND SSDs. This is why it is necessary for the SLC cache to be emptied. So that it is ready to be filled again when you are writing lots of data the next time.

PrimoCache was created to increase read/write speeds for data that is accessed more frequently while not affecting the read/write speeds for data that is accessed infrequently. For example, if you have a movie stored on your drive, you probably don't access it frequently so it stays in the slower storage while something like frequently accessed Windows system files stays in high speed storage. This why PrimoCache uses Deferred Caching.

Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).
How should I explain this... Bug and I are saying that think of the SSD's entire NAND pool (OP or not). Irrespective of whether you use static/dynamic SLC cache, whenever I make a write to the SSD, I am writing to the NAND pool twice. First to the SLC portion of the entire NAND pool. Then to the TLC portion of the entire NAND pool. And we are concerned about how the endurance of the entire NAND pool is being affected because we are writing to it twice. Hope it makes sense.
 
Last edited:
Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).
Technically, you are not writing to the cache, that's something the SSD does on its own.

Long story short, can you explain, step by step, what in the static nature of a cache affects a drive's endurance? Pretend I know nothing about SSDs.
 
Last edited:
"This site is blocked due to a security threat." :(
Will try a computer that's not managed by my employer later.
Now I am on iPhone, I can’t download it and post it.
 
Now I am on iPhone, I can’t download it and post it.
No worries, I'll get to it later today. Linux isn't scared as easily as Cisco Umbrella :D

Thanks @blanarahul , I doubt I'll go through all that. I was looking for a simple explanation, I doubt I need a 19 page document for that.
 
Thanks @blanarahul , I doubt I'll go through all that. I was looking for a simple explanation, I doubt I need a 19 page document for that.
I'll post some excerpts.
[ 0017 ] One downside to the use of SLC cache is that it increases the amount of times data is written to the physical memory because data is written twice once to the SLC cache , and then later to MLC storage . Instances in which same data is written multiple times to flash is called Write Amplification ( WA ) . WA can be defined as the actual amount of information physically written to the storage media in comparison to the logical amount intended to be written over the life of that data as it moves throughout the memory device . In addition to the use of SLC cache , an amount of WA is also affected by other necessary tasks on the NAND such as garbage collection . The larger the SLC cache , the more likely a write request is to be serviced by SLC cache . Consequentially , the larger the SLC cache the greater the likelihood of an increase in write amplification .
You and I aren't wrong in worrying about the effect of SLC caching on endurance.

[ 0018 ] There are two types of SLC cache : static SLC cache in which blocks can only be used for in SLC mode ; and dynamic SLC cache in which blocks can be used in SLC mode or TLC mode . Most current mobile storage devices use dynamic SLC cache . The maximum program / erase cycle ( PEC ) of the dynamic blocks is same as a TLC block regardless of whether the block is being used for SLC or TLC mode . Thus , for dynamic SLC cache , the tera bytes written ( TBW ) of a dynamic SLC block is limited to the TBW of a TLC block .
This is extremely concerning and does somewhat answer the question I had. A drive like the Kingston KC3000 with dynamic SLC caching is trading endurance for speed.

[ 0019 ] Currently , the static SLC cache size is fixed and the dynamic SLC cache size is dynamic . The present subject matter makes the static SLC cache size dynamically based on maximum logical saturation ( LS ) in a device lifetime , in various embodiments . For static SLC cache , the maximum PEC is 20-40 times of dynamic SLC cache , which means that static SLC cache may have 20-40 times data written in the same time period compared to the same size dynamic SLC cache .
So dynamic SLC caching is bad for endurance.

[ 0050 ] FIG . 3B illustrates an example table for providing dynamic size of SLC static cache . In various embodiments , the device monitors the highest LS and changes the static SLC cache size based on the monitored highest LS . Thus , a memory device residing in different devices may have different static SLC cache sizes . In the depicted example , if the LS is A % , the SLC cache size is determined using the equation : ( 100 % -A % ) / 3 . In addition , assuming a current OP for the 100 % LS is 7 % , the OP static SLC cache is determined using the equation : ( ( 100 % -A % ) / 3 + 7 % ) / A % = 121 - A ) / ( 3A ) . As shown in the table of FIG . 3B , the largest number of blocks of GC to free one block is not increased . Thus , a device using the memory controller of the present subject matter can get the increased TBW benefit from the static SLC cache without increasing the worst - case GC to free additional storage .
I don't understand how they reached the conclusion they did regarding the TBW benefit from static SLC cache. There is no mention of TBW/endurance anywhere else in the paper.
 
Last edited:
You and I aren't wrong in asking about the effect on endurance of SLC caching. I'll post more as I read more.


I don't understand how they reached the conclusion they did regarding the TBW benefit from static SLC cache. There is no mention of TBW/endurance anywhere else in the paper.


This is concerning and does somewhat answer the question I had.
Ah, you went for the red herring :(

If you understand something well enough, you can explain it in plain language to somebody who doesn't understand the first thing about the subject. If you can't you'll do exactly what black did.

PS Of course writing in SLC more will still eat one p/e cycle. You're still physically writing to a 3bit cell. SLC mode means you're only setting the cell at max or min voltage level, which means voltage doesn't need to be as strict as you don't need to discern between 8 levels anymore. But the wear is still there, the cell continues to lose charge capacity.
 
Ah, you went for the red herring :(
I am afraid I don't understand.
Of course writing in SLC more will still eat one p/e cycle. You're still physically writing to a 3bit cell. SLC mode means you're only setting the cell at max or min voltage level, which means voltage doesn't need to be as strict as you don't need to discern between 8 levels anymore. But the wear is still there, the cell continues to lose charge capacity.
It is because the voltage doesn't need to be strict for SLC mode that I was expecting it to consume less than one p/e cycle. The fact that TLC or SLC either way you are consuming 1 P/E cycle means that we are throwing half the endurance away for speed that most people will rarely use.
 
I am afraid I don't understand.
He was unable to explain plainly how he got to the conclusion, he just dumped a (seemingly useless) document on us instead. And you went for it ;)
 
He was unable to explain plainly how he got to the conclusion, he just dumped a (seemingly useless) document on us instead. And you went for it ;)
I mean citations do have uses. But so do explanations, yeah.
 
Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).
The drives can still use the OP space, the OS cant
 
He was unable to explain plainly how he got to the conclusion, he just dumped a (seemingly useless) document on us instead. And you went for it ;)
Oh, I understand, I'm sorry that I wasn't able to explain this to the "the guy who is right 90% of the time".
 
Oh, I understand, I'm sorry that I wasn't able to explain this to the "the guy who is right 90% of the time".
You managed to get even the sarcasm wrong. Kudos.

And there's no need to apologize, there's still space left to put in a few words how a static cache improves endurance. I'll be around.
 
I keep wondering though. Isn't this writing to the NAND twice? Say I write 100 GB. First I consume 300 GB worth of NAND when writing in SLC mode. Then I consume 100 GB worth NAND in TLC mode. Of course, writing in SLC isn't nearly as harmful but it is somewhat harmful isn't it? We don't even get to choose if we are willing to let go of this SLC caching for better endurance.

I can absolutely answer this for you in detail but it will have to be at a later time. I have discussed this a lot on my discord server. I'll be brief here for now on a quick post (so there might be some errors) but feel free to hit me up directly and/or on discord.

"Writing to TLC in SLC mode causes about as much wear as writing to SLC in SLC mode" from an above comment. This is absolutely false. Native SLC has higher endurance, for one thing, but also there's critical differences in static and dynamic pSLC. The former has its own wear zone, is in OP space, and is made up of the cells with the best data retention (top layers). The latter shares a zone with the native flash (e.g. TLC). Black unfortunately linked the wrong patent for this discussion; Intel has one where they clarify that on the balance, a dynamic SLC write that later goes to TLC is approximately 0.4 times as impactful as a TLC erase but they count it conservatively as a full TLC erase. Micron in their Dynamic Write Acceleration document also specifically talks about "additive wear" which means rewriting to TLC increases wear.

"Anything written in SLC mode uses 1/4 the lifespan of QLC writes" is also false. You can see with the pSLC Chia drives made from QLC, which is rated up to 1000-1500 P/E (64-96L Intel), that the flash is rated for 30K P/E in permanent (static) SLC mode. It's not a linear progression regardless; for example, you only need one read point for SLC but 7 for TLC and 15 for QLC which amounts to 7/3 (points/bits) or 2.33 for TLC and 15/4 or 3.75 for QLC nominally (see: Kioxia's 96L QLC ISSCC digest). For programming it's more complex but you need verification reads there as well.

"SLC mode *reduces* the number of write operations* - no, a page is a page. pSLC mode is just one page per word line while TLC is three pages per word line. SSDs generally write with page granularity which is 16k with modern consumer flash, sooner or later it may get moved to native flash and takes up the same amount of space. "Folding" is taking 3 SLC blocks and compressing them into a single TLC block, but each SLC block is made from a TLC block. As such you are doing a SLC write, a SLC read, and then a TLC write, with the TLC write being an average for writing all 3 of its pages (lower/middle/upper). If you mean that writing to SLC can defer writes and avoid writing to TLC which, on the balance, is better for wear, then that's true; DRAM on a SSD works similarly (for metadata updates) and likewise host memory (RAM) caching writes before committing to non-volatile media does as well.

"Intelligent behavior files ... would stay in SLC" - this is actually true. Modern SSDs have behavioral profiles and algorithms for SLC caching and will retain certain user data in SLC to improve read performance. It's also good to defer writes to reduce additive wear.

"What makes static cache better for endurance" - because it uses the best cells/blocks of each die, but also because it doesn't have to convert back to native flash. Dynamic does which as mentioned above is typically counted as a native flash erase; the SSD will cycle through all available flash (addressed logically) based on wear. The average lifespan of the deck is going to be weaker because the lower cells/blocks have worse data retention (but faster program speed) due to differences in the critical dimension and related coupling capacitance, as caused by uneven etching from the required high aspect ratio. As with all things this can become more complex because space used for static SLC may reduce what's available for ECC and/or spare, and in fact many patents (including Black's) allow for rebalancing as the flash is worn, a good example being focusing on OP early on (to reduce write amplification) then reallocating for more ECC near end of life, and similarly static SLC can be reallocated as dynamic-native. (note: static having its own wear/GC zone means that endurance is "worst of" between that zone and dynamic-native, and drives can balance writes accordingly later in lifespan, and in fact random writes -> SLC mode vs. sequential -> TLC is one strategy)

Back to the OP - yes, programming in pSLC mode has less impact since the charge threshold is much larger and you can use fewer pulses (ISPP). It's a bit more complicated than that when involving other factors (temperature incl. dwell/swing, wear level of the flash, architecture, et cetera) but on the balance it's an order-of-magnitude less harmful. I'm also discounting direct-to-native (e.g. direct-to-TLC) which can be done for algorithmic reasons, and we do see many modern drives get "stuck" in such a mode (for example, launch 980 PRO with benchmarks).
 
Last edited:
@Maxx Finally something I can work with.
Reserving best cells for pSLC doesn't seem to be an advantage. Controllers shuffle writes around to use the best cells anyway (like you noted), so the overall number of writes is still the same. That seems to be the only argument in favor of "static cache is better for endurance".
I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.
 
I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.
Intel has one where they clarify that on the balance, a dynamic SLC write that later goes to TLC is approximately 0.4 times as impactful as a TLC erase but they count it conservatively as a full TLC erase.
If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).

I have discussed this a lot on my discord server
Sorry I can't find the link to your discord in your profile. Can you share it here?
 
Last edited:
If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).
But if the cache is dynamic, each write is still of the "less harmful" kind.
More to the point, if the static SLC cell only has 1,000 p/e cycles left, that's it, it has 1,000 p/e cycles and there's nothing you can do about it. But if dynamic cache cell is down to 1,000, you can switch to another cell having more p/e cycles available that was previously allotted to main storage.
 
@Maxx Finally something I can work with.
Reserving best cells for pSLC doesn't seem to be an advantage. Controllers shuffle writes around to use the best cells anyway (like you noted), so the overall number of writes is still the same. That seems to be the only argument in favor of "static cache is better for endurance".
I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.
Static pSLC does use the top layers which due to HAR have better data retention characteristics. Samsung discusses this in a digest for their 6th generation V-NAND (92L, reference 3), citing this article (see figure 6). For a source on the static SLC part, see here (start reading at line 53, column 5, then refer to 330-1 in figure 3). Static pSLC is dedicated for the life of the device and never converts back to native flash in operation so does not have the additive wear associated with dynamic pSLC. Because of this, the wear zone (and garbage collection) is separate from native flash and dynamic pSLC (which share a zone), such that the lifetime of the flash is the worst of the two zones. Drives like Intel's 545s, which has static pSLC, counts SLC writes separate from TLC for this reason. There's multiple patents related to this, like this one. Plenty of other good patents with more details on endurance with SLC vs. XLC as well (listing 40K P/E for static SLC in this case - I do have articles suggesting the average P/E of a dynamic SLC block, again since it comes from the logical pool of native flash, is a bit lower, e.g. 30K relative).

Fundamentally, all else being equal, static pSLC improves the endurance of the flash for these reasons, however (again all else being equal) as I stated above this may need to be balanced with other factors as the device is worn. This patent illustrates how and why you might want to reallocate over time - see figure 3B. Specifically also read [0018] and [0019], then [0021] and [0022].

If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).


Sorry I can't find the link to your discord in your profile. Can you share it here?
Here is the patent I'm referencing (hosted on my domain/site), start at line 4 in column 7. To see why they count it as a full TLC erase anyway, start at line 59 in column 6. If you check my sources above (in my quote-reply to bug) you'll see 40K P/E being referenced which is a real possibility - most datasheets are at 30-40K in static/permanent SLC mode (as with QLC for Chia drives). I've posted these on my discord, specifically Micron's datasheets list SLC and TLC mode endurance (B17A for example would be 30000/1500). Also correct that in the same node, native SLC will be 100K+. I've illustrated this by comparing Kioxia's 96L flash - they have digests for TLC, QLC, and SLC (XL Flash) which we have access to also.

I'm on Reddit under NewMaxx and also run a subreddit with the same name (/r/newmaxx) - which links to my server. Not sure on rules about posting these things here.

But if the cache is dynamic, each write is still of the "less harmful" kind.
More to the point, if the static SLC cell only has 1,000 p/e cycles left, that's it, it has 1,000 p/e cycles and there's nothing you can do about it. But if dynamic cache cell is down to 1,000, you can switch to another cell having more p/e cycles available that was previously allotted to main storage.

Not all the cells will be written equally in absolute cycles, the controller picks the cells with the least effective wear for dynamic SLC mode and cycles through over time. Blocks and their properties (differences) are tracked in tables, for example with bias for programming, because of variation. This variation implies that the dynamic-native zone will have lower average endurance/cycles than dedicated (static) because the average cell/block is in the middle of the deck. Worth noting from my source above, the lower cells do program faster and this is also a characteristic of P/E cycling (i.e. worn out flash performs worse with reads due to ECC and has worse data retention but programs faster due to material breakdown). So we're talking writes per block in comparison, but it's a bit irrelevant when you consider my sources above.
 
Last edited:
Nice explanation @Maxx or should i say Newmaxx aahah.
 
Back
Top