Perspectives on Real-time and GenAI Convergence

Here, we’ve collected the forward-looking thoughts of some of the leaders in the real-time analytics space participating in Current 2023.

How long does it take for a generative AI tool like ChatGPT or Bard to respond to our prompts? Less than a second? Is that real-time? No matter how you define real-time, the speed at which we can get answers from a complex LLM trained on massive amounts of data is astonishing. In fact, it’s hard to think about the responsiveness of generative AI models without considering real-time data processing.

As we see artificial intelligence being used more frequently to deliver automation of workflows that can’t pause for a couple of minutes waiting for the next instruction or operation, we see real-time in play again. Real-time applications or experiences are becoming commonplace thanks to a surge in the adoption of a complex and demanding technology stack.

At last year’s Current event, the conversation was focused on streaming data vs. batching data.

Current 2023, however, is catching us at a moment when technology leaders and innovative organizations can look at much more of the data workflow in a real-time frame. Technology exists for the real-time collection, transformation, and ingestion of data through to the visualization and consumption of data by other analytics and reporting tools and applications.

Here, we’ve collected the forward-looking thoughts of some of the leaders in the real-time analytics space participating in Current 2023. We asked them two questions looking forward to 2024:

What will be the top trend in real-time use cases or the supporting technology stack?
How will the wide-scale embracement of generative AI impact streaming analytics?

You’ll recognize some company names, but others will be new to you since this space is rife with startups stepping up to the challenge of delivering real-time capabilities to enterprises.

After reading through them, you’ll see the dramatic technology shifts and the opportunities they bring forth as a result of increasing the speed of business to real-time.

Reduce real-time complexity to increase value and fuel GenAI

Eric Sammer, CEO, Decodable

If 2023 has been about people really understanding what AI can do for them, 2024 is going to be all about launching those features and services, especially in the context of real-time, online, user-facing applications.

For real-time data infrastructure itself, customers are pushing hard for reduced complexity and faster time to value. That’s driving a pushback toward integrated platforms that help teams do more in less time, and away from a stack of point solutions that require large teams to build around and maintain.

I think we’ll see streaming processing and analytics impacting generative AI rather than the other way around. Stream processing is already driving online feature extraction, data cleansing and normalization, enrichment, and anonymization of sensitive data being pumped into models that power online applications. This is only going to become more critical as generative AI is more tightly integrated with critical systems.

Turning to Private Clouds for Real-Time Infrastructure

Hojjat Jafarpour, Founder and CEO, DeltaStream

One growing trend we have seen, not only in real-time use cases but also in the wider spectrum of data use cases, is the growing demand for private SaaS known as Bring Your Own Cloud (BYOC). Private SaaS provides help with data security and compliance and can also reduce the cost.

Access to data, especially fresh data, has a crucial role in many Generative AI applications, and we believe this will drive demand for streaming and real-time analytics.

Models and GenAI will enhance the accuracy of real-time decision-making

Ken Exner, Chief Product Officer, Elastic

Real-time data analytics will be enhanced by AI and machine learning models that can make instant decisions based on incoming data streams. These models will be used in predictive maintenance, fraud detection, recommendation systems, and more. On the security side, AI-powered systems will analyze network traffic and system logs in real-time to detect and respond to security breaches.

Generative AI will have several significant impacts on streaming analytics, but one of the most notable and already visible is its use in predictive analytics and anomaly detection. GenAI enhances predictive analytics and anomaly detection by generating synthetic data, filling gaps, simulating scenarios, and improving model training. This helps with data scarcity and improving the accuracy of models for real-time decision-making.

What is real-time? It’s before data goes stale

David Wang, VP Product Marketing, Imply

The big topic in streaming today is about leveraging that fast-moving, high-velocity data in real-time before it goes stale. Companies like Lyft, Pinterest, and Reddit have built a real-time data architecture using Apache Kafka, Flink, and Druid. This open-source architecture delivers the scale, freshness, and reliability needed to handle the full gamut of real-time applications.

Developer productivity is critical as real-time app services explode and streaming data feeds GenAI

Guillaume Aymé, CEO, Lenses.io

Event Driven App development is moving from being a niche within business to now broad adoption across many engineering teams within a business. These organizations are now re-architecting their most critical & core business services to respond to real-time data.

With this, there is an explosion in the developer population building real-time services. As a result, we’ve seen business concerns shift from how to reduce the operational burden of managing real-time data infrastructure (by moving to the Cloud) to now ensuring developer productivity and governance of data.

We’re seeing GenAI projects as a major use case for streaming data and infrastructure and adding huge urgency to projects. Businesses are looking at ways of integrating new sources of data into Large Language Models via streaming data to ensure the freshness of the models. In particular, we’ve noticed organizations with huge sources of unstructured data previously held in just cloud storage, such as AWS S3, now being integrated as live streams of data via solutions such as Apache Kafka to feed downstream Generative AI models. For example, an airline may want to integrate many complex data sources related to the scheduling of their flights as well as other carriers into their AI models to drive automation and offer new levels of assistance to their operations staff as well as directly to their customers. This data is required in real-time to offer any value.

Consumers demand real-time, and GenAI will supplant streaming analytics

DeVaris Brown, CEO and Co-founder, Meroxa

There are a bunch of trends that are pushing the need for real-time data systems: IoT and edge computing, autonomous vehicles, real-time collaboration for remote work, preventative monitoring for healthcare, and fraud detection in finance. All of these are driven by consumer demand. With more data being generated by faster networks and more devices being used by consumers, companies will need to adopt a real-time data strategy for personalization and monitoring to ensure customer satisfaction.

Streaming analytics will become trivial going forward because of generative AI. You will be able to use natural language to do ETL instead of having to Frankenstein a bunch of different point solutions together. Additionally, generating automated insights using LLMs is now possible. This will reduce the need to have specialized talent to make sense of the data for KPI dashboards, observability alerting, etc.

Rich insights call for real-time and historical data

Vinoth Chandar, Founder and CEO, Onehouse

Streaming data is becoming mainstream. Now it’s time to merge fresh data with historical data, in near real-time and real-time, making it available for a wide range of analytics technologies – from business intelligence dashboards to AI and machine learning models. This requires a radical shift away from the batch processing models prevalent in data lakes and data warehouses toward more incremental data processing approaches while using open, interoperable technologies to future-proof organizations’ stacks.

Real-time data, and GenAI even more so, still require quality and governance

Dr. Tendȕ Yoĝurtçu, CTO, Precisely

Real-time data quality, such as cleansing and standardizing data as it is streamed, will be a major trend to support growing volumes of data and an increased focus by businesses on streaming analytics.

Data privacy and security, such as anonymization and masking of data as it is streamed, will be another major trend supporting growing requirements around compliance and risk minimization.

To enable better business insights, organizations will need to utilize their own data as well as third-party datasets, enriching streamed data with attributes such as demographics, location-based risk, and consumer behavior.

The wide-scale embracement of GenAI will accelerate streaming analytics on unstructured data. Common use cases include anomaly detection on text, PII detection, and sentiment analysis. Many solutions already leverage Natural Language Processing (NLP). However, GenAI can democratize access to these solutions for organizations that do not necessarily have strong data science teams.

Interactive interfaces and GenAI keep humans in advanced business applications and streaming analytics

Rayees Pasha, Head of Product, RisingWave Labs

As real-time use cases mature, we predict that the user focus will move from simple automated machine-based decisioning to advanced business applications that require interactive interfaces. The new use cases will span multiple domains, with FinTech, Web3, and real-time AI leading the way. Case in point: One of our customers is building a real-time analytical capability for timely insights by performing complex analysis of on-chain data and present results to users with sub-millisecond latency.

We believe Generative AI will power a new class of Streaming analytics applications. One of the fast-emerging use cases is in the data enrichment space. With widespread AI/ML adoption, Model explainability becomes critical. Generative AI output can help address that by serving as a data enrichment source using Streaming Analytics. For example, if a stock analyst needs to explain predictions, Streaming analytics, along with annotations from GenAI, can help explain Stock Price trends to an end-user.

Visibility and Governance for event streams

Jonathan Schabowsky, Field CTO, Solace

Visibility and governance of real-time event streams will be a top trend. Ie. How do you identify event streams that are already available in your organization? How do you easily make them visible to developers for re-use within new applications? How do you lifecycle manage and govern these streams, similar to how REST APIs have long been governed? Tools will emerge to tackle these challenges.

Four Scenarios for streaming analytics and GenAI convergence

Sijie Guo, Founder and CEO, StreamNative

With the rise of AI, there will be increased data volumes as generative AI models can produce a vast amount of content, including text, images, videos, and more. It also poses new challenges in terms of data integrity, ethics, and infrastructure. So, there is a stronger demand on a streaming data platform like Apache Pulsar and StreamNative to provide real-time messaging and streaming capabilities.

The potential wide-scale embracement of generative AI by 2024 will have multifaceted impacts on streaming analytics. Here’s a breakdown of potential scenarios and repercussions:

Increased Data Volumes: Generative AI models can produce a vast amount of content, including text, images, videos, and more. Streaming analytics platforms will need to scale efficiently to handle this influx of data in real-time.
Data Quality & Authenticity: Generative AI models, especially those like GPT and DALL·E from OpenAI, can create synthetic data that’s often indistinguishable from real data. This can lead to questions of data authenticity, making real-time data verification and validation even more crucial.
Infrastructure Evolution: Streaming analytics infrastructure will have to adapt to support AI workloads better. This might involve more GPU-oriented processing, lower latency responses, and better integration with AI platforms.
Continuous Model Training: Generative AI models, especially when used in real-time applications, will need continuous training and updating. Streaming analytics can provide insights into model performance, helping data scientists make real-time adjustments.

How to deliver contextual insights in real-time for GenAI use cases

Alok Pareek, Co-founder and Executive Vice President of Engineering and Products, Striim

Every major enterprise is striving to be data-driven, and this requires consistent, high-quality, and enriched business data for intelligent decision-making. In 2024, we expect an acceleration in the adoption of transactional change streams in analytical and AI applications. These CDC and continuous EL (Extract and Load) streams will serve as the backbone for the data mesh pattern as different teams deliver data products in a decentralized manner that need to be deployed with consistent real-time data. Transformations will shift to the data lake or lakehouse, which will serve as a real-time staging tier.

Streaming analytics will play a prominent role in Generative AI applications because of its functional capabilities in three areas:

First, adding real-time context for prompt engineering to deliver richer personalized interactions.
Second, faster aggregation of event data to deliver baseline facts with a higher frequency to increase data quality in generative AI use cases.
The third and final area is to create vector encodings in real-time.

As models become more prevalent, streaming analytics will additionally play a role in incremental online model refreshes.

Real-time ML will deliver context for AI applications

Mike Del Balso, CEO and Co-founder, Tecton

Real-time machine learning will really take off as a differentiating capability. It will not only become increasingly important to incorporate real-time context into AI applications but also to have the right tools to support easy creation, operation, and updates to real-time AI applications as well. Real-time feature platforms will become very widely adopted to unlock this capability.

We’ll begin to see LLMs being used to operate on event streams to support various use cases, from early issue detection to general pattern and feature extraction. LLMs will also become faster for inference, and we’ll see them increasingly used in streaming and real-time applications.

Real-time leverages open-source infrastructure (and the AI baby needs a baby monitor)

Michael Benjamin, Chief Growth Officer, Timeplus

Our market continues to optimize costs and mitigate risks, delivering value with continuous intelligence and awareness with continuous query processing. But we see two enhancements in 2024: better intelligence powered by machine learning and broader access with open source and AI adoption. One already sees more fully managed services, Python library integrations, low code, and AI-powered sidekicks. Real-time AI and streaming queries from natural language will continuously deliver insights and quickly identify and eliminate issues.

The widespread adoption of ChatGPT made many “first-time parents” to a unique, highly imaginative child. Unlike previous AI debuts (2016’s Tay), there have not yet been major rogue events – bots aren’t making medical diagnoses or advising investors (despite what some want). But a rogue event seems inevitable. As it has done before, streaming analytics will yield continuous awareness, this time for human-to-AI interactions to improve the guardrails. The baby needs a baby monitor.

Learn more about streaming analytics:

Top Sessions at Confluent Current 2023

12 Streaming Analytics Solutions to Consider in 2024

The RTInsights Guide to Streaming Analytics