Making the most of streaming edge data requires a special infrastructure architecture that accommodates the data throughout its lifecycle.
The widescale use of smart sensors and IoT presents organizations with a wealth of data that can be used in multiple ways to enhance operations, cut costs, improve customer engagements, reduce equipment and services downtime and outages, and more.
However, making the most out of the data gushing off these devices requires a special infrastructure architecture. Specifically, any solution must be able to accommodate the data throughout its lifecycle.
That means a solution must support certain core features. It should be able to:
- Ingest the variety of data available in organizations today. That could include IoT data, sensor data, video data, log files, and more.
- Analyze data in real time or batch and offer alerts based on dynamic indicators or when a measurement exceeds certain thresholds.
- Centralize data from the edge to the core data center or cloud for cross-edge stream analysis or model development and training.
Multiple use cases when data lifecycle management is addressed
As organizations deployed sensors and IoT devices, the primary focus for years has been on acting on the data as it is generated. The process might involve simple monitoring, such as triggering an alert when a device’s temperature exceeds its specified operating range. Or an organization might apply AI on the streaming data as it is generated to spot anomalies.
The analysis might be done on the device itself. And in that case, an alert of something operating out of range or in an unusual manner would be sent to a control system or management platform. Alternatively, the data might be streamed to those systems or platforms and be examined or analyzed there.
In such cases, all that matters is that the data is examined or analyzed as it is generated. Often, that was all the data was used for.
Centralizing data collection
More recently, as the volume of data grew and the ability to apply AI and machine learning became more common, organizations set out to do more with that data. For example, one approach would be to build a machine learning model that looks at the collective data from a sensor or IoT device to identify patterns.
A perfect example is organizations that want to move to predictive maintenance operations. Without data, maintenance teams would routinely inspect and replace parts based on a manufacturer’s specs and timetables. The availability of real-time status data about a device allowed departments to shift to condition-based maintenance. If the data noted departures from the norm (e.g., a device running hot), the team could replace the part before it failed.
If the data is captured and stored, many have moved to predictive maintenance approaches based on ML-trained models of a device’s health. For instance, an ML model might be developed that finds if a device’s temperature rises ten degrees in X amount of time or increases Y percent above the normal operating temperature, there is a high probability of failure within 24 hours. Such models would be trained by analyzing the entire volume of collected data.
Such applications represent a significant change over approaches that simply examine or analyze the streaming data as it is generated. Here, the data must be stored in a cloud or on-premises database and made available to the compute engines that train and run the ML model.
Leveraging data for other uses over time
Centralized storage of stream data allows new applications based on access to the data over time. Sticking with the manufacturing example, the IoT data from equipment is typically used by and managed in supervisory control and data acquisition (SCADA) and other OT systems. Increasingly, there is an interest in connecting OT and IT systems.
For instance, an organization might build an application that marries the insights and actions to an ERP system used to order parts. Such an application might ensure that the required spare parts are in-stock when needed, thus avoiding unnecessary downtime waiting for a critical part to ship. It also guarantees parts are not overstocked spare parts, thus reducing the need for excess storage capacity to house the parts. This also reduces the chance of having outdated parts.
Similar needs abound across industries
Manufacturing has been the poster child for streaming data due to the wide-scale adoption of IoT. But many other industries need the same capabilities of ingesting data, analyzing it in real time and over time, using the data to develop ML models, and exploring historic data for insights.
For example, retailers have clickstream and customer interaction data that can be used in multiple ways over time. Real-time analysis of this streaming data could be used to personalize recommendations. The stored data might be used to study purchasing patterns by time of year, customer spending history, or many other parameters. And the collective historic data set could be studied to plan future offerings.
Similarly, financial services companies might analyze real-time transactions for signs of fraud. Explore a collection of transactions over time to develop models that assess a particular customer’s credit risk or worthiness. And long-term find patterns to develop new financial offerings that might appeal to different classes of customers.
In all cases, the same core infrastructure is needed. And there is a need to manage streaming edge data over its lifecycle from generation to being studied at any time years later. Such infrastructure needs to be able to store and move the data on tiered storage. When needed for analysis, the data might be migrated to higher-performance systems to speed computational workflows.