As Apache Kafka-driven projects become more complex, Hortonworks aims to simplify it with its new Streams Messaging Manager (SMM).
About the only thing harder than setting up a real-time streaming analytics application based on open source Apache Kafka software is arguably managing and securing it. To address those issue Hortonworks has unveiled Streams Messaging Manager (SMM), an open source monitoring and management tool for Kafka environments.
The goal is to provide visibility into Kafka environments that IT operations teams today have no visibility into because they lack tools, says Hortonworks’ Jamie Engesser, vice president of product management. IT operations teams today are unable to see, for example, who is employing Kafka to publish services, who is consuming those services or where there might be any bottlenecks, says Engesser.
“There’s no visibility,” he says.
He notes that monitoring and management tools are needed to optimize not just tuning, replication and synchronization but also track the lineage of data lineage from the network edge to the cloud. As compliance mandates become more challenging to meet around the world, IT operations teams need to be able to audit where data has been streaming.
SMM addresses that issue by providing monitoring and management tools that can reach all the way out to the network edge, says Engesser. In fact, he says that 30 percent of the support revenue being generated by Hortonworks now stems from a distributed big data application that reaches out the network edge.
Apache Kafka external support needed
Engesser says Hortonworks is betting that as the complexity challenges surrounding Hadoop and Kafka becomes more apparent it’s only a matter of time before IT organizations look to rely on more external support.
Hortonworks is also moving to make it simpler to manage data flows across instances of the Hadoop distribution it curates. A release of version 3.2 of Hortonworks DataFlow, which adds tighter integration with version 3.0 of the Hortonworks Data Platform (HDP). Those additional capabilities include enhanced resiliency to smooth workflow across large clusters, support for version 3.0 of Apache Hive data warehouse software and more granular control over multitenant environment using Kerberos keytab isolation.
In general, IT organizations are being challenged to manage data at scale at a time when investments in artificial intelligence (AI) application is starting to escalate. The machine and deep learning algorithms that drive the models on which those AI applications depend require access to massive amounts of data to train them. Engesser says big data platforms such as Hadoop provide a mechanism to not only centrally manage that data but also score data being collected via the open source TensorFlow machine learning framework at the edge of the network.
There’s no doubt that IT operations issues surrounding big data are becoming more challenging with each passing day. Data is now flowing in and out of modern data warehouses to help drive a new generation of real-time analytics applications. Instead of analyzing a sample of that data an organization collects it’s now feasible to analyze not just all the data an organization owns, but also stream external data into the data warehouse to correlate that data against multiple sources.
The paradox behind managing all that data to drive AI models it that as the sheer volume of data that needs to be managed will inevitably require IT administrators to rely more on AI technologies to manage it at scale.