Object Storage and Erasure Coding: the Future of Storage?

PinIt

Why erasure coding replaces RAID for big data storage.

According to an IDC study from 2015, the data we create and copy annually will reach 44 zettabytes (ZB) by 2020, or 44 trillion gigabytes (GB). Approximately one-third of that data will pass through the cloud. On to top that, there will be 200 billion smart objects in use by 2020, or 26 IoT devices for every person.

“All this data is really not going to be useful unless you can find the right tools to extract value from it,” said Erik Ottem, the director of product marketing at Western Digital Corporation, in a recent webinar on the future of data storage. “And ideally what you’re trying to do is not only manage it, but extract information that will lead to better decisions.”

Traditional storage infrastructures such as RAID and replication are not prepared to handle the vast quantity of data, as well as the degree of unstructured data, Ottem says. The variety of disparate sources—think about all those embedded IoT devices—doesn’t help, either.

These traditional systems rely on RAID for redundancy, and replication is used to make sure that systems stay online during moments of crisis. Organizational strategies depend mostly on expanding folder hierarchies, and management comes primarily from adding more rack-based instances. These have worked extremely well in the past, but RAID suffers from higher data loss risk with larger disk sizes, and hierarchical data structures — with folders nested within folders within folders — bog down any type of analytics.

Using object storage and erasure coding for data resiliency

Object storage is the core component of a new generation of data storage technology. In object storage, a piece of data is given complex metadata and a globally unique identifier so that it can be discovered no matter where it is (a different server rack, or a different part of the world). Block storage breaks information down into raw chunks to be recompiled as needed.

According to Ottem, erasure coding replaces RAID as the go-to data protection method when it comes to large capacities. Erasure coding allows IT staff to restore corrupted data by using information about that data from other places within the array, even if it’s in a different location. It’s the basis of the technology behind error correction in DVDs that allow them to be played despite scratches. Ottem says that erasure coding can create “15 nines of data availability,” or a data loss probability of 0.00000000000001 percent.

Equally important is spreading storage across geographies, rather than using completely redundant sites, to keep the system available during adverse events. This can reduce cost significantly, and offers the exact same reliability of redundant sites.

By combining these new technologies together inside an object storage system, companies will be able to store more data for lower costs, which eliminates the need to discard old data just because the traditional systems are getting bogged down. Ottem even says that many companies are taking old data off tape drives in cold storage and bringing them into object storage systems to increase their accessibility to analytics tools. By not deleting old data, or making old data more accessible, it becomes unlocked to new analytics tools, which are in near-constant development.

Use cases for object storage

How are these object storage-based innovations used today? Hal Woods, the CTO of Western Digital’s Data Center System business unit, spoke about a successful application with the San Diego Supercomputer Center (SDSC) and local children’s hospitals. After a six-day process to sequence a terminally-ill child’s entire DNA sequence, the SDSC takes advantage of a massive dataset of previous patient data, all help in object storage on SSDs, to determine the best course of treatment moving forward.

Proving the point, perhaps, that no data is completely worthless, even if it’s years old. One never knows when an analytics tool will extract some new critical insight that could help a business make the right decisions, or, more importantly, save a life.

Related:
Hadoop data in the dark? How governance, metadata can help

Joel Hans

About Joel Hans

Joel Hans is a copywriter and technical content creator for open source, B2B, and SaaS companies at Commit Copy, bringing experience in infrastructure monitoring, time-series databases, blockchain, streaming analytics, and more. Find him on Twitter @joelhans.

Leave a Reply

Your email address will not be published. Required fields are marked *