Cloudera Details Data Management Plan for IoT


Cloudera’s new edge offering will make it simpler to manage streaming data being generated by networked IoT devices.

At the recent Strata Data Conference, Cloudera debuted a pair of data management frameworks specifically designed to address the challenges associated with managing data across highly distributed Internet of Things (IoT) environments.

A Cloudera Edge Management offering will make it simpler to manage streaming data being generated by IoT devices, while Cloudera Flow Management provides a no-code set of tools for ingesting massive amounts of data.

See also: Cloudera launches data warehousing for hybrid cloud

These two offerings are first two products Cloudera has launched since closing its merger with Hortonworks. Both offerings are being added to the open source Cloudera DataFlow platform, which enables organizations to apply streaming analytics to data in motion at scale. As that platform began to be widely deployed it became apparent IT organizations also needed a framework to manage data residing at the IoT edge, says Vikram Makhija, vice president and general manger for cloud at Cloudera.

Cloudera Edge Management enables edge devices to collect data using light-weight agent software deployed on that device. Once that agent software is installed, it becomes possible to both control those devices and the data being generated using a set of visual Edge Flow Manager tools. That same framework can also be employed to distribute artificial intelligence (AI) models out those IoT edge devices, adds Makhija.

Meanwhile, Cloudera Flow Management provides a data ingestion engine that can span as many as 300 processors. That offering also incorporates an instance of the open source Apache NiFi registry to provide a means for managing how that data will actually flow.

As more devices get connected to the Internet it’s clear organizations will need a different approach to managing data both as it resides on those platforms and as it streams into IoT gateways, on-premises IT environments and public clouds.

As adoption of IoT accelerates, Makhija says it is clear IT organizations are starting to appreciate the fact the IoT and edge computing in general is fundamentally different from other classes of IT projects.

“The skills are very different,” says Makhija.

In fact, that skills shortage is one of the primary obstacles an organization needs to overcome when building and deploying IoT applications. IT organizations need to be smart about how what data needs to be processed at the edge in near real-time versus streamed back to, for example, analytics applications running in a centralized data warehouse. That skills requirement complicates any already fierce debate occurring between IT and operational technology (OT) departments over which organization might be better suited to manage an IoT project. But as it turns out, both departments are often short of the skills and experience required to succeed.

In many ways, IoT applications represent the most complex set of distributed instances of computing any organization is likely to attempt to build. It’s relatively simple to connect a few devices to a single database. But once those devices and databases start to multiply it quickly becomes apparent that managing s all the components of a distributed IoT application at scale can be a major challenge.

At the same time, however, those IoT applications represent an opportunity to extend the value of investing in IT into a broad range of new uses cases that can be applied in almost any vertical industry. Arguably, IT investments have focused broadly on two areas. The first is increasing productivity of the individual worker, while the second focused on automating backend processes. With the rise of IoT now the size and scope of any IT project has considerably expanded. The real challenge will be determining what data needs to be processed at the edge to drive a specific process in real-time versus what aggregate of that data needs to be analyzed by a back-end system. Of course, the analytics results generated by the backend system then need to be fed back out to edge to optimize a process. Creating that virtuous cycle of distributed computing will require a very keen understanding of what data needs to be precisely where and when.

Leave a Reply

Your email address will not be published. Required fields are marked *