IT leaders’ biggest challenge is now figuring out how to construct DataOps pipeline required to feed data into streaming analytics applications in real time.
If data is the new oil than data, much like oil itself, is only truly valuable once it’s refined. The challenge facing IT leaders now is figuring out to construct the dataops pipelines required to feed data into streaming analytics applications in real time.
Roy Schulte, vice president and distinguished analyst with Gartner and author of “Event Processing: Designing IT Systems for Agile Companies”, says data pipelines are becoming a more strategic concern because it’s now more feasible for organizations to construct various models of events using machine learning and deep learning algorithms. As the number and types of models increase, Schulte says organizations are going to want to swap those models in an out of streaming analytics applications to gain insights into data as it moves across the enterprise.
“You could think of it having an A/B testing for models,” says Schulte. “Organizations will be able to switch in and out whichever one works best.”
See also: The 5 phases of a big data project
After decades of waiting for event processing technologies that drive streaming analytic to be widely implemented, Schulte observes that streaming analytics is now going mainstream in part because organizations need to optimize processes and associated business decisions by processing events in real time. Various types of algorithms are now being applied within streaming analytic applications because the number of factors that need to be considered within a few milliseconds requires artificial intelligence. Humans simply can’t make the required number of calculations in real time.
In fact, Schulte notes this precise scenario has already played out on Wall Street. Most trades today are executed in milliseconds based on events that machine learning algorithms are trained to track. What’s changing is that machine and deep learning models are now being more widely employed as the cost of processing and storing massive amounts of data has dropped. That makes it feasible for more organizations to cost-effectively apply various forms of advanced analytics across the enterprise.
Internet of Things (IoT) projects require similar capabilities. To minimize latency most IoT applications require real-time analytics to be applied as close the sensors that collect that data as possible to, for example, trigger an event. That results in streaming analytics software being installed on a local IoT gateway.
Still need to focus on dataops
The analytics generated locally is them streamed back to an analytics application hosted in a cloud. That approach allows patterns and insights to be generated using data generated by thousands of IoT gateways and endpoints without incurring massive costs associated with transferring raw data over a wide area network (WAN).
Schulte says while much of the focus today is on data architects needed to craft predictive analytics infused with AI capabilities, organizations will still need to hire and train data engineers capable of constructing data pipelines required to feed analytics applications employing multiple types of models. Unfortunately, Schulte says too many IT organizations underappreciate the dataops expertise required to construct those pipelines, which is likely to result in longer times to generate a return on investments in advanced analytics.
None of this means investments in analytics running on traditional data warehouse platforms is going to decline. There will always be a need to analyze events after the fact. Schulte concedes that it surprises him that streaming analytics is turning out to be a killer application for event processing after all these years. But Schulte says the rise of streaming analytics to provide insights into data in motion validates the value of developing event processing expertise like never before.