Streamlio’s co-founder and chief product officer talks about the next generation of stream processing and working with Apache.
Our “6Q4” series features six questions for the leaders, innovators, and operators in the real-time analytics arena who are leveraging big data to transform business as we know it.
RTInsights sat down with Streamlio’s Karthik Ramasamy to talk about why it may finally be stream processing’s time to take center stage and break out of its traditional niches, and about how they’re working with the Apache community.
Q1: Although stream processing and messaging have been around for a while, they’ve only recently become hot technologies. What’s changed?
The “big data” hype of the last decade focused on how to accumulate ever-larger amounts of data, but while that was happening other technology developments were making it easier and easier to generate and handle more and more data as a stream–from application logs to events generated by cloud services, user interactions, IoT devices, and many others.
Rather than the inefficiency of converting that data into batches only to have to deal with escalating complexity and operational challenges in further scaling “big data” solutions to accommodate that, companies are instead shifting to focus on how to understand and act on data as quickly as possible. That’s led to a renewed interest in stream processing and the emergence of a new generation of open source technologies that aim to reduce the barriers to trying out stream processing in order to reduce stress on traditional data pipelines and repositories.
Q2: What led your team to decide what new technologies were needed for streaming and messaging?
Asking companies how they were approaching the need to incorporate this “fast data” using data-driven and streaming approaches, we found them struggling to piece together a zoo of technologies—data collectors, replication technologies, messaging technologies, queuing solutions, stream processing engines, data lakes, etc.
They had been forced to do that because all of the technologies available to date only addressed individual parts of the problem in isolation, leaving it to the user to figure out how to piece together those parts. Seeing that challenge impede or derail these projects, we realized that a fundamentally new approach was needed, one based on new technology designed to deliver an integrated solution from the start.
Q3: Is stream processing a technology niche, or is it something that you see being more broadly adopted?
Stream processing has to date clearly been limited to a niche, largely among financial services and a few other domains such as online advertising where speed of response was directly tied to revenue. However, we’re now seeing it finally break out of that niche because of not only the recognition that speed of action on data has become critical to differentiation but also as streaming data sources, both internal and external, have become available to almost any business.
Q4: What are some examples of new and interesting applications and use cases that you’re seeing?
Beyond the well-known use cases in processing financial transactions or making decisions about displaying online advertisements, new use cases are emerging in a broad range of environments. For example, we’ve talked with companies in the energy sector who are building technology that can help companies understand and optimize their energy consumption.
We’ve also seen companies developing new services platforms on top of cellular and in particular 5G communications networks that will allow interactive experiences to be delivered to people wherever they are, and in general, we’re seeing an increasing number of solutions for real-time customer interactions that leverage stream processing and analytics to make that possible.
Q5: You worked with the Apache community to help create open source technologies like Apache Pulsar, Apache Heron, and Apache BookKeeper. What can we expect next?
We’re seeing a surge of interest in ways to make data accessible even before it ends up in some type of static data repositories like a database or data lake. That’s driving a need to deliver frameworks and interfaces that are both familiar and easily integrated, whether SQL access to stream data or Python access to transform data streams. We’re also seeing people think about the intersection of analytics and streaming data, which is driving work to help companies deploy analytics to operate on data in motion.
Q6: Describe Streamlio’s involvement with the Pulsar project
Streamlio was founded by some of the key architects of the streaming and real-time platforms at Twitter and Yahoo to bring to market the technologies developed inside those companies. We continue to be very active contributors to the ongoing development of Apache Pulsar, which recently became a top-level Apache project, and we offer support and services to people interested in using Pulsar to enable their streaming data projects, both on-premises and in cloud environments.