Center for Continuous Intelligence

Getting the Edge with Data About Data

PinIt

As IoT data becomes a more important part of enterprise business operations, the ability to reduce latency in data analytics and processing can make a difference. It raises the promise of real-time to a new level.

There’s a lot of data moving across IoT networks — to the point where identifying and locating data of material importance may slow thing down. Metadata — data about data — is the keys to the data kingdom, especially when it comes to indexing and identifying unstructured data. Just as data can overwhelm enterprise functions, metadata can slow things down even further.

A new proposal, presented at the recent IEEE Edge Computing conference,
offers a way to tackle the issue of terabytes of metadata being assigned within many application domains — what they call “efficient and scalable metadata,” a term that wouldn’t have been even necessary in the pre-edge, batch era.

The researchers, Bing Zhang of the University of Illinois and Tevfik Kosar of the University at Buffalo, put forth a solution that moves metadata across IoT networks in a faster and more efficient manner. They also devised a way to cache and predict metadata access across the network, which potentially could reduce latency in data access and movement. “We replayed approximately 20 million metadata access operations from real audit traces, in which our system achieved 80% accuracy during prefetch prediction and reduced the average fetch latency 50% compared to the state-of-the- art mechanisms.”

See also: Deloitte Report Details Scope of Data Modernization Challenge

Already, “more than 50% of all I/O operations are due to metadata-intensive
computing and the requests to read file attributes dominate in all workloads,” Zhang and Kosar state. They say more aggressive prefetch routines — which move data from storage to temporary memory in anticipation of upcoming user requests — can work better with metadata than actual data itself.

The authors tested such an architecture, employing Yahoo Hadoop grid trace logs from the Yahoo! Webscope dataset, consisting of continuous daily metadata operations of Hadoop name node in 2010. The system achieved “an 80% prediction rate on its metadata operation and reduced the average fetch latency 50% compared to other state-of-the-art mechanisms,” they report. “This is friendly to IoT network, where IoT devices with the limited computing and storage capabilities can achieve the same average fetching latency as the proximity edge/fog compute node.”

As IoT data becomes a more important part of enterprise business operations, the ability to reduce latency in data analytics and processing can make a difference. It raises the promise of real-time to a new level.

Avatar

About Joe McKendrick

Joe McKendrick is RTInsights Industry Editor. He is a regular contributor to Forbes on digital, cloud and Big Data topics. He served on the organizing committee for the recent IEEE International Conference on Edge Computing (full bio). Follow him on Twitter @joemckendrick.

Leave a Reply