“If your organization benefits from distributed data sources, it’s not a question of whether to perform edge processing, the question is how.”
The rise of the Internet of Things – and the demands IoT data is placing on both corporate and cloud capacity – is giving rise to a new way of handling real-time data, referred to as edge analytics. In the following Q&A, Jack Norris, senior vice president of data and applications for MapR, discusses the opportunities and challenges with this new model of computing.
RTI: What applications are best suited for edge analytics?
Norris: The use cases for edge analytics continue to grow with the proliferation of IoT sources including consumer wearables, vehicles, industrial equipment, RFID sensors, and more. But it is not simply about the proliferation of these data sources. The value derived from the deployment of these IoT devices depends on the ability to effectively capture and harness the generated data.
Edge computing challenges
RTI: What are the challenges with edge analytics?
Norris: The challenges for edge analytics include the volume of data generated at the edge that can overwhelm available resources and the coordinated analysis of IoT data close to the source.
Many core IoT use cases, like vehicles and oil rigs, operate in conditions with limited connectivity, making sending massive streams of data back to a central analytics core impractical. The issue is how to support a significant number of locations that are generating high volumes of data in low-bandwidth environments. For example, organizations could deploy a full-scale, standalone cluster at each IoT site. This option fails to take full advantage of data from other IoT sites that, when taken in aggregate, could have yielded deeper insights.
Alternatively, organizations could send IoT data directly to a central cluster for processing. But this option is not well-suited to IoT environments with limited connectivity or bandwidth, and also limits the possibility of analyzing IoT data directly at the source.
If your organization benefits from distributed data sources, it’s not a question of whether to perform edge processing, the question is how. Data gravity, latency, costs, and government regulations all drive decisions about where to perform processing. The question is not a binary choice of edge versus centralized. The organization that has the flexibility to perform distributed processing within and across edge, on-premises, and cloud environments will have the advantage.
Edge computing: security, use cases, and management
RTI: What about security concerns with placing processing near the edge, such as remote locations?
Norris: The reality is that an enterprise-grade edge analytic solution improves security and reliability of edge solutions. For many applications, the computation and analysis of IoT data close to the source is critical — allowing more efficient and faster decision-making locally, while also allowing subsets of the data to be securely and reliably transported to a central location for analytics.
Edge computing provides computation and data to be processed at a number of connected devices. For example, retail operations, medical equipment, and industrial machinery can all benefit from a lightweight data platform that can effectively handle the complex flow of data from devices and machinery to cloud and on-premises. With a distributed system, the information is collected and encrypted at the edge. The edge processing and analytics should be part of a larger connected system rather than a loosely coupled easily compromised network of independent devices.
For example, by 2020, more than 250 million vehicles will be connected globally, albeit through a range of connection bandwidth and periodic spans with no connectivity. Connected vehicles need a platform that is secure, scalable, reliable, can handle massive volumes, at high speed, and can support diverse analytics and workloads to enable many users and applications around the world to leverage car data. With connected cars there is not a single application, but many applications that span across the entire vehicle lifecycle.
Another example is hospitals. The typical hospital doesn’t have an Internet connection or bandwidth issue, but due to HIPAA regulations, may want to deploy edge processing capabilities to collect measurements, diagnostic data, process data flows, while leaving specific patient data on site. Centralizing the information can provide better failure analytics to improve medical equipment uptime, and patient diagnostics.
RTI: How can multiple edge applications be managed? Are there issues with maintenance, versioning and data cleansing?
Norris: The key to efficient management is to take the notion of a cluster, a distributed set of separate servers managed as a single entity, and extend that beyond the four walls of a data center. Many of the features that make a large cluster easy to manage such as location awareness, self-healing, and automated data replication, can be extended to the edge. In this way the management of independent edge devices can be streamlined as part of a large distributed cluster.
A related consideration is that data-in-motion, the data that is streaming across locations, and data-at-rest needs to be managed and analyzed as one. The coming next generation of intelligent applications must be able to react in real time to incoming IoT data while factoring in historical context such as buying patterns, maintenance records or patient history. As the typical enterprise is getting more distributed, benefits derive from processing closer to the source and being able to act faster where the action is happening.
RTI: Who “owns” the data at the edge, and how is ownership and maintenance of data sources managed?
Norris: There is no one model for data management and ownership at the edge. For many organizations, edge processing will be managed as an extension of a central deployment with the same administrator. Self-healing capabilities and automated data replication will mean very infrequent tasks will be required at the edge. For other use cases, such as edge processing deployed at hospitals, local ownership and control performed at each hospital is augmented by the shared data that is aggregated across locations for improved efficiencies and intelligence.