If you happened to be at the Flash Memory Summit, you might have caught Kent Smith, the LSI SandForce Senior Marketing Director, delivering presentation on the virtues of over-provisioning.
Over-provisioning is the process of setting aside additional capacity to help with SSD performance and longevity, and Kent’s message is don’t be afraid to take advantage of it. Let’s take a look at a few of Kent’s slides, courtesy of LSI.
Some end users might not be too fond at the prospect of sacrificing capacity for performance. While fewer consumer drives are shipping with additional over-provisioning (above and beyond the ~7% spare area required for the drive to function), adding it later is a straightforward task. Consumer workloads are fairly gentle on flash storage, but drives in TRIM-less environments really need extra OP, as do drives exposed to heavy random workloads.
As seen in the slide above, sequential writes aren’t affected by additional over-provisioning, though SandForce’s compression technology can enhance writes with easily compressible data. The real benefit from over-provisioning comes when the drive is subjected to a highly random workload for protracted periods of time.
As you can see, the additional over-provisioning makes a substantial difference with both traditional and LSI SandForce products, even up to the 80% level. Speaking specifically on SF-powered drives, Kent is keen to illustrate that the SF approach to real time compression/deduplication gives several key advantages. First, the performance gain seen when SandForce hits easily compressible data is pretty substantial.
Secondly, even when the OS thinks the drive is full, the actual amount of data written on the flash is typically much less. That means more flash is available for background tasks because SF was able to reduce the amount of data that actually hit the flash. It’s a sort of built-in OP effect that SF drives can leverage depending on compressibility even when the drive isn’t over provisioned as long as the data on the drive is compressible. Either way, steady state random performance continues to scale, and write amplification stays low — both very good things.
https://www.google.com/patents/US20120054415
SandForce presumably uses some sort of differential information
update. When a block is modified, you find the difference between the
old data and the new data. If the difference is small, you can just
encode it over a smaller number of bits in the flash page. If you do the
difference encoding, you cannot gc the old data unless you reassemble
and rewrite the new data to a different location.
Difference encoding requires more time (extra read, processing, etc).
So, you must not do it when the write buffer is close to full. You can
always choose whether or not you do differential encoding.
It is definitely not deduplication. You can think of it as compression.