Rockset Bridges Divide Between SQL and Kafka

PinIt
data processing

While real-time analytics is clearly on the rise, no one should expect the need for batch-oriented data warehouses to decline any time soon.

SQL may be the lingua franca for querying data but the need to format data in a way SQL can consume has been a limitation as far as generating real-time insights from streaming data is concerned.

Now Rockset is connecting to the open source Kafka distributed streaming event processing platform to make SQL tables based on raw data available in seconds.

See also: The Case for Continuous Intelligence

Rockset makes use of a Converged Indexing capability coupled with a Distributed SQL Processing Engine to filter, aggregate and create joins across different datasets from different sources in milliseconds without any upfront schema definitions required, says Rockset CEO Venkat Venkataramani.

Those SQL tables using a JDBC interface can then be exposed to various dashboards, including Tableau, Apache Superset, Redash, and Grafana. IT organizations can also join Kafka event streams with data residing data stores such as Amazon DynamoDB, Amazon Kinesis, Amazon S3, and Google Cloud Storage. Optionally, IT organizations can also access those SQL tables via application programming interfaces (API) that Rocket has created.

Those capabilities will enable organizations to employ familiar SQL tools against data that moment it is captured by Kafka, notes Venkataramani. The issue organizations struggle with today is building the complex data pipelines required to create schemas for data that then needs to be loaded into a database, which means hours can go by before data can be operationalized.

That’s typically not what businesses want from a real-time analytics application. Most businesses are under pressure to make better decisions faster, so there is a clear need for a faster way to create SQL tables, says Venkataramani.

Confluent, which spearheads the development of Kafka under an Apache license, recently added the ability to control Kafka Connect connectors directly from SQL along with a new pull query feature that allows users to look up values from tables within KSQL. Complementing KSQL’s stream processing capabilities, Rockset provides a SQL analytics backend for ad hoc queries and BI dashboards on Kafka data.

While real-time analytics is clearly on the rise, Venkataramani cautions no one should expect the need for batch-oriented data warehouses to decline any time soon. Rather, use cases for batch-oriented and real-time analytics will continue to evolve in a complementary fashion, says Venkataramani

“I think batch will never go away,” says Venkataramani.

What is clear is that while real-time analytics will enable new use cases that would not have been previously been possible, there’s also a subset of batch-oriented analytics processes that will no longer be necessary to run. The challenge IT organizations face now is striking a balance between emerging real-time analytics and legacy analytics platforms.

In the meantime, business leaders are starting to have a greater appreciation of what’s possible thanks to the rise of real-time platforms such as Kafka. Rather than viewing data as a burden to be borne by the IT department, data is becoming an asset that businesses seek to exploit. Business leaders may not appreciate all the time and effort required to master a platform such as Kafka, but they do intuitively understand if their organization doesn’t find a way to exploit data it’s only a matter of time before their rivals do.

Leave a Reply