I'd like to correct or clarify some statements from the past posts.
An SSD swaps the files around the whole space to use each cells evenly...
What you've probably meant is that SSDs try to distribute
writes evenly across cells. This is what the "wear levelling" algorithm of SSDs does. If the operating system writes a file to let's say something like block address 42, deletes this file again and writes it to block 42 again, the same chunk of data, written to the same block address, will end up in a different flash cell. On the contrary, when using an HDD, the chunk of data written to block address 42 will always end up at the exact same physical location (same platter, same track, ...).
SSD's do housekeeping which moves data around internally, so that file wont be just left in the same part of nand.
This sounds like an SSD is always shovelling all data around internally when being left alone. This is not the case and would dramatically shorten the lifespan of an SSD due to limited write cycles per flash cell.
The one time a SSD might move data around is when data is
actively being read and the SSD controller notices an above-average voltage drop in the cell being read. Whenever this situation occurs a number of things may happen, depending on the SSD controller and it's firmware:
- The data might actually be unreadable if the voltage dropped below a certain readability threshold. In this case the controller has to report a read error to the operating system and the data is lost.
- The controller might actually try a refresh of this cell before finally marking it as bad. The cell might still be okay, because a certain voltage drop is normal when a cell is being left alone for a long time.
- The data may still be readable. In this case the controller returns the data to the operating system and will normally try to write the data to the same cell again (recharge). After that it reads the data again and checks the voltage level of the cell. If it's still below a certain threshold the cell is marked as bad and the data will be written to a spare cell.
And the firmware update, which has fixed the problems with 870 Evo drives, is very likely changing that part of the controller operation. My guess is that the voltage threshold for cells has simply been adjusted to match the real-life behavior of the TLC flash used in these drives. This should have nothing to do with bad flash or anything like that. Another explanation for the early 870 failures could be a bug in the wear levelling algorithm that pushed some cells harder than others, causing them to die prematurely. Although this is very unlikely because Samsung has a lot of experience in these algorithms and they sureley would not develop this from scratch for this specific kind of drive.
And I have personally never heared of SSD which scan their cells regularily completely on their own. This would dramatically increase power consumption. I am not saying there are none. But normally they only do that either if a read is actually requested from the host or when performing a full/extended S.M.A.R.T. test.
Another occasion when data is moved around is when a single bit of a TLC cell is erased. The remaining 2 bits may be moved by the garbage collector later.
I migh be wrong, but TRIM operation only move empty cells, not the filled one.
TRIM does not really move cells. The only thing it does is writing zeroes to the sectors/cells being written. Why do we do this? Flash cells need to be erased (with a zero voltage) before they can be re-written. This may take some time. That's why writing to an empty cell will be faster than writing to a cell which is already holding a charge. And since we want fast writes we zero-out the cells when deleting data so writing to them again will be fast.
BTW: This is not the same as writing an actual "zero" to a specific sector. TRIM is more than that. It's telling the SSD controller: "erase this cell". Depending on the controller this cell may also be used as a spare cell in case other cells are failing.
EDIT: To be more precise, TRIM is telling the garbage collector of the SSD that a certain block/cell may be cleared. Normally this clear operation is not synchronous, because it is expensive. And the garbage collector might actually move filled cells. E.g. when 1 bit of a TLC cell was erased the garbage collector might move the other two bits of this cell to a new cell so the original cell can be completely cleared (for improved performance).
To perform a housekeeping of a cell, you need to have a non-failure signal after reading the source cell.
This is correct.