Computational Storage: Go Beyond Computation Offloading


The key to computational storage is to initiate its commercialization by dropping the conventional “computation offloading” mindset.

As a hot topic today, computational storage has a beautifully simple rationale: Moving computational tasks closer to where data reside could improve the system performance/efficiency. Unfortunately, its large-scale commercial success has remained elusive so far, which warrants skepticism on its practical viability.

As proponents of computational storage, we should not conveniently ignore the skepticism or dismiss non-believers as naysayers who dare not change or innovate. Instead, true proponents should honestly study/analyze the skepticism to develop even better arguments and rationale to convert the non-believers. This is certainly not a pleasant exercise (after all, who enjoys questioning their own belief?). But most likely, it will reveal the best (if not the only) path for computational storage to evolve from a nice idea on paper/slideware to mainstream products. This article aims to do such an unpleasant exercise. 

 Most proponents justify the idea of computational storage by centering around the value of computation offloading and parallel it with the success of GPU, SmartNIC, AI accelerator, and video codec. How may one refute this argument? We all know that modern computing systems are built upon the principle of abstraction. In their legendary computer architecture textbook, Professors Patterson and Hennesy (who together received the ACM Turing Award in 2017) list eight great ideas in computer architecture, where abstraction ranks the 2nd after the No. 1 idea of “design for Moore’s Law.” By relocating certain operations across existing abstractions (e.g., application, OS, file system, driver, and hardware), computation offloading inevitably breaks the principle of abstraction and demands the changes of cross-layer interfaces.

The practical viability of computation offloading depends on whether its benefits can offset the cost that one must pay to break the existing abstractions. Located at the very bottom of the entire storage I/O stack connected via standard cross-layer interfaces (e.g., POSIX, NVMe, SATA), computational storage indeed faces a much higher abstraction-breaking cost than others. Meanwhile, in contrast to others that target at domain-specific, self-contained computational tasks (e.g., network processing in SmartNIC), there is no consensus today on what “computational functions” computational storage should provide, which makes it much harder to quantify the benefits. Therefore, skeptics indeed have a valid point that computational storage faces a bigger challenge on establishing a commercially justifiable cost vs. benefit trade-off. The ongoing standardization efforts of the SNIA and NVMe communities aim to curtail the abstraction-breaking cost. Nevertheless, history teaches us that it will take a long time (e.g., 5+ years) before the entire ecosystem could embrace any new interfaces.

Why then do we still strongly believe in the commercial viability of computational storage today? First, the bigger challenge on establishing a commercially justifiable cost vs. benefit trade-off is not necessarily an inherent problem of computational storage. Instead, it is caused by the intention of explicitly offloading computational tasks from applications/OS into storage drives via APIs. Therefore, to kick off its commercialization journey, we must drop the mindset of “computation offloading” and instead focus on native in-storage computation that is transparent to other abstraction layers. By eliminating the abstraction-breaking cost, transparent in-storage computation makes it much easier to establish a commercially justifiable cost vs. benefit trade-off. Meanwhile, to further enhance the benefits, the transparent in-storage computation should have two properties: (1) wide applicability and (2) low efficiency of CPU/GPU-based implementation. Intuitively, general-purpose lossless data compression is one good candidate here. Besides its almost universal applicability, lossless data compression (e.g., the well-known LZ77 and its variants such as LZ4, Snappy, and ZSTD) is dominated by random data access that causes very high CPU/GPU cache miss rates, leading to very low CPU/GPU hardware utilization efficiency and hence low-speed performance. Therefore, native in-storage compression could transparently exploit runtime data compressibility to reduce the storage cost without consuming any host CPU/GPU cycles and without incurring any abstraction-breaking cost.

The benefit of native in-storage compression goes far beyond “transparently reducing the storage cost.” The design of any data management systems (e.g., relational database, key-value store, and file system) is subject to trade-offs among read/write performance, implementation complexity, and storage space usage. In-storage compression essentially decouples the host-visible logical storage space usage from the physical storage space usage, which allows data management systems to purposely trade the logical storage space usage for higher read/write performance and/or lower implementation complexity without sacrificing the true physical storage cost.

This creates a new spectrum of design space for innovating data management systems without demanding any changes on the existing abstractions. This could significantly strengthen the value proposition of in-storage transparent compression, which can be illustrated by the following example. It has become a fashion for data management system developers to apply a log-structured merge tree (LSM-tree) instead of the classical B-tree as the basic data structure because of the widely cited LSM-tree-over-B-tree advantages on storage cost and write amplification. Although this is true when they operate on normal storage devices, it can be completely invalidated by the arrival of in-storage compression. The latest research shows that, upon in-storage transparent compression, one could modify the implementation of B-tree to close (or even reverse) its gap with LSM-tree in terms of storage cost and write amplification. Since B-tree empowers almost all relational databases today, it is highly valuable to mitigate the major shortcomings of B-tree, other than replacing it with another data structure that has its own set of drawbacks. 

This is the reason why we strongly believe in computational storage today. The key is to initiate its commercialization by dropping the conventional “computation offloading” mindset. As one of the best low-hanging fruits, in-storage transparent compression can bring significant benefits without breaking existing abstractions. It is very plausible that computational storage drives with transparent compression will become the “normal” SSDs in the not-too-distant future. In addition to compression, native in-storage transparent computation could include encryption, deduplication, virus detection, fault detection, and tolerance, etc. Once the entire ecosystem has widely adopted the first-generation computational storage drives with native transparent computation and embraced those new cross-layer interfaces in support of explicit computation offloading (e.g., the ones being developed by SNIA/NVMe), computational storage will readily carry its initial commercialization success to enter its next chapter and unleash its full potential, which will be the topic of an article in another day. 

Dr. Tong Zhang

About Dr. Tong Zhang

Dr. Tong Zhang is a well-established researcher with significant contributions to the areas of data storage systems and VLSI signal processing. He is a co-founder and Chief Scientist of ScaleFlux, responsible for developing key techniques and algorithms for Computational Storage products and exploring their optimal use in mainstream application domains such as database. He is currently a Professor at Rensselaer Polytechnic Institute (RPI). His current and past research span over a wide range of areas, including database, filesystem, solid-state and magnetic data storage devices and systems, digital signal processing and communication, error correction coding, VLSI architectures, and computer architecture.

Leave a Reply

Your email address will not be published. Required fields are marked *