Alluxio Aims to Make Big Data Readily Available In-Memory


Latency improvements gained by leveraging in memory data handling could prove to have great benefits for many users.

When it comes to providing insights in real-time, one of the biggest challenges comes down to the physics associated with accesses data that is increasingly distributed across the enterprise. Accessing data residing on-premises or in the cloud involves a lot of latency that adversely impacts the performance of an application.

To address that issue Alluxio, formerly known as Tachyon, has created namesake open source software that moves data being accessed into memory regardless of whether it’s being accessed on a local storage system or in a public cloud. Alluxio can serve data at memory speed when data is local, or at the speed of the network when data is remote. However, even once that data is remotely accessed, Alluxio will then push that data into the memory of the remote cluster.

See also: Tableau adds data engine running in-memory

To drive adoption Aluxio just appointed Bob Wiederhold to be its executive chairman. Prior to joining Alluxio, Wiederhold has been CEO of Couchbase, a provider of a document database based on an open source project. Wiederhold says he plans to enable Alluxio to build an ecosystem around its open source software.

“That where my experience will come in,” says Wiederhold.

The core challenge, says Wiederhold, is making all the data that exists in the enterprise more readily available without having to move it to some central repository. Most enterprise IT organizations have data residing in multiple repositories. Alluxio today makes it possible for applications that invoke the Apache Spark in-memory computing framework or an Apache Hadoop MapReduce interface to access data stored in an Amazon S3-compatible service, and OpenStack Swift Distributed File System or Hadoop Distributed File System (HDFS).

That can be accomplished using either a command line interface or a mounting an Alluxio distributed file system But Alluxio also makes available a Java application programming interface (API) and supports multiple client languages, including REST, Go and Python. Each client can then communicate with a single Alluxi master node that manages multiple worker nodes.

Short of moving that data the only other approach to this problem would be to rely on one of several data virtualization techniques. But while data virtualization works within the context of a traditional warehouse, modern applications now require access to data in near real time, says Wiederhold.

See also: Case study — Faster analytics with in-memory tech

Wiederhold notes that DataOps is quickly emerging as a distinct IT discipline as the volume of data that needs to be stored, analyzed and managed. Developers in the age of agile programming and DevOps are building applications faster than ever; each of which assumes data is going to be readily available. But moving data to be closer to where an application winds up running can be cost-prohibitive. Alluxio enables IT operations teams to address that issue by ensuring the data that needs to be accessed is readily available in memory, says Wiederhold.

As data becomes increasingly distributed across the enterprise the way enterprise IT organizations approach data and storage management is about to fundamentally change. Most organizations will need to be able to manage distributed data in a federated fashion that make it both more readily accessible and simpler to manage at scale. The challenge many of them now face is application developers are running out of patience waiting for IT operations teams to resolve these issues once and for all.

Leave a Reply

Your email address will not be published. Required fields are marked *