Real-time Analytics News for the Week Ending May 4


In this week’s real-time analytics news: LinkedIn announced the launch of LakeChime, a new data trigger service designed to streamline data management within modern data lakes.

Keeping pace with news and developments in the real-time analytics market can be a daunting task. We want to help by providing a summary of some of the important real-time analytics and AI news items our staff came across this week. Here is our list:

LinkedIn announced the launch of LakeChime, a new data trigger service designed to streamline data management within modern data lakes. LakeChime addresses significant infrastructure challenges associated with managing large-scale data by offering a unified solution that simplifies data triggers across both traditional and modern table formats. LakeChime supports both types of data triggers: classical partition triggers, triggering workflows based on the availability of partitions, and modern snapshot triggers, triggering workflows based on the availability of new data snapshots. Additionally, LakeChime is powered by an RDBMS backend, making it ideal for handling large-scale data triggers in very large data lakes.

The Apache Software Foundation (ASF) announced Apache Hive 4.0. For over a decade, Apache Hive has been the cornerstone of data warehouse and data lake architectures, empowering organizations to perform analytics at scale while seamlessly managing vast amounts of data through SQL queries. Key highlights of Apache Hive 4.0 include:

  • Hive Iceberg Integration: Streamlines data management with seamless integration of Apache Iceberg tables.
  • Improved Transaction and Locking Capability: Enhances Hive’s ACID compliance with improved transaction handling and locking mechanisms.
  • Table Maintenance: Introduces compaction mechanisms for both Hive ACID and Iceberg tables to optimize storage and performance.
  • Hive Docker Support: Simplifies deployment with official Apache Hive Docker images for easier setup and configuration. Explore the Docker images on Docker Hub for seamless deployment.
  • Compiler Improvements: Anti-join support, branch pruning, column histogram statistics, HPL/SQL support, scheduled queries, new and improved cost-based optimization (CBO) rules leading to better query plans.
  • Materialized Views Support: Enables the creation and management of materialized views for accelerated query processing.
  • Runtime Optimizations: Enhances query performance with optimizations in Apache Tez and Apache Hive LLAP, ensuring faster data processing.
  • Hive Replication: Introduces improved replication features both for external and ACID tables for efficient data distribution and disaster recovery.
  • Support for Apache Ozone: Introduces support for Apache Ozone, enabling seamless integration with Ozone-based object stores for scalable and efficient storage solutions.

The United States Department of Homeland Security (DHS) announced the establishment of the Artificial Intelligence Safety and Security Board (the Board). The Board will advise the DHS Secretary, the critical infrastructure community, other private sector stakeholders, and the broader public on the safe and secure development and deployment of AI technology in the nation’s critical infrastructure.

The Board will develop recommendations to help critical infrastructure stakeholders, such as transportation service providers, pipeline and power grid operators, and internet service providers, more responsibly leverage AI technologies. It will also develop recommendations to prevent and prepare for AI-related disruptions to critical services that impact national or economic security, public health, or safety.

Real-time analytics news in brief

Tableau and Databricks announced updates to their strategic partnership with the introduction of a new Tableau Delta Sharing connector and the “Explore in Tableau” feature. The Delta Sharing connector, developed in collaboration with the Linux Foundation, is an open and secure method to facilitate real-time data sharing between Tableau and Databricks. The “Explore in Tableau” feature allows users to connect to data with a single click, fostering quicker insights. This feature is designed to integrate data insights more directly into user experiences, enabling users to stay within their workflow in the browser and enhancing productivity and data governance.

Amazon Web Services (AWS) announced the general availability of Amazon Q, a generative artificial intelligence (AI)-powered assistant for accelerating software development and leveraging companies’ internal data. Amazon Q not only generates highly accurate code but also tests and debugs. It also has multi-step planning and reasoning capabilities that can transform (e.g., perform Java version upgrades) and implement new code generated from developer requests.

Aqua Security unveiled new capabilities specifically designed to secure the development and operation of generative AI applications leveraging Large Language Models (LLMs). As more and more businesses embrace LLMs, new attack vectors are introduced into their applications and operations. Aqua Security offers a comprehensive approach to LLM security, delivering code integrity, real-time monitoring of LLM-powered application workloads, and GenAI assurance policies to serve as guardrails for developers of LLM-powered applications.

Azul announced that Azul Intelligence Cloud, Azul’s cloud analytics solution, which provides actionable intelligence from production Java runtime data to dramatically boost developer productivity, now supports Oracle JDK and any OpenJDK-based JVM (Java Virtual Machine) from any vendor or distribution. To that end, Azul’s Intelligence Cloud consists of two services: Azul Vulnerability Detection, to eliminate false positives by accurately identifying and prioritizing known security vulnerabilities; and Code Inventory, to help identify unused and dead code by precisely detailing what custom and third-party code is actually run.

Baffle announced enterprise-grade data security for Amazon Relational Database Service (Amazon RDS) and Amazon Aurora from Amazon Web Services (AWS). Transparent Data Encryption (TDE) is a common mechanism to secure data at rest in enterprise databases. Baffle’s new solution goes beyond TDE by protecting data in PostgreSQL databases at the application tier. In addition, working with AWS’s Trusted Language Extensions for PostgreSQL, it is now possible to run SQL queries on encrypted data stored within Amazon RDS and Amazon Aurora, making them the only Postgres Database As A Service (DBaaS) with this functionality.

CData Software introduced CData Sync Cloud, a new software-as-a-service (SaaS) tool in the ETL/ELT market. Sync Cloud is a CData-hosted ETL/ELT solution that brings all the benefits of CData Sync to the cloud market. Unlike other ETL cloud tools, Sync Cloud charges users based on the number of data connections they require, resulting in inherently predictable and scalable data replication. With a single fixed-price license purchase, customers maintain cost consistency throughout their contract duration, irrespective of the volume of data rows replicated.

Confluent announced AI Model Inference, an upcoming feature on Confluent Cloud for Apache Flink, to enable teams to easily incorporate machine learning into data pipelines. Confluent introduced the Confluent Platform for Apache Flink, a Flink distribution that enables stream processing in on-premises or hybrid environments with support from the company’s Flink experts. Confluent also unveiled Freight clusters, a new cluster type for Confluent Cloud that provides a cost-effective way to handle large-volume use cases that aren’t time-sensitive, such as logging or telemetry data.

Crunchy Data announced a new offering, Crunchy Bridge for Analytics. Crunchy Bridge for Analytics gives users the ability to retrieve and interact with their data lake using PostgreSQL commands, through extensions and a vectorized, parallel query engine. This tool is available today within Crunchy Bridge, Crunchy Data’s fully managed cloud Postgres service.

DataRobot launched new AI observability functionality with real-time intervention for generative AI solutions, which are available across all environments, including cloud, on-premises, and hybrid. This latest release brings AI observability for any AI asset and environment into the DataRobot AI Platform. It delivers cross-environment AI observability, real-time generative AI intervention and moderation, and generative AI alerts and diagnostics.

Dremio announced new capabilities that ensure its Apache Iceberg lakehouse is more flexible, fast, and easy to use for every data environment. To start, Dremio expanded its Apache Iceberg lakehouse to suit any environment, whether cloud, on-premises, or hybrid, including highly regulated, Air Gap network secured, or data sovereignty governed settings. Additionally, Dremio introduced Generative AI capabilities to streamline data discovery and analysis in the Dremio Platform. GenAI Text-to-SQL enables intuitive querying through natural language, while advanced GenAI-driven data descriptions and labeling facilitate fast, accurate data discovery and curation.

Nokod Security announced the general availability of the Nokod Security Platform. This platform enables organizations to protect against security threats, vulnerabilities, compliance issues, and misconfigurations introduced by low-code/no-code (LCNC) applications and robotic process automations (RPAs). The Nokod Security Platform provides citizen developers clear step-by-step guidance for fixing security issues as well as automated remediation options that can be triggered with the click of a button.

Predactiv announced the launch of its next-generation platform. The Predactiv platform empowers users to fully harness the value of their data and activate insights and audiences across the entire digital ecosystem. With Predactiv’s technology, clients can integrate and enrich their datasets with Predactiv’s data or any other source, transforming the combined data into useful, actionable results.

Qdrant and Vultr are partnering to provide scalability and performance for vector search workloads. The collaboration enables the deployment of a fully managed vector database on Vultr’s adaptable platform, catering to the specific needs of diverse AI projects. Specifically, Qdrant’s new Qdrant Hybrid Cloud offering and its Kubernetes-native design, coupled with Vultr’s straightforward virtual machine provisioning, allows for simple setup when prototyping and building next-gen AI apps.

Rivery announced its strategic partnership with Sigma to bring non-technical stakeholders closer to data by creating a better understanding of vital cloud data streams in cloud data platforms like Snowflake and Databricks to improve key decision-making. The collaboration of Rivery and Sigma empowers businesses by harnessing Rivery’s suite of over 200 fully managed data sources including marketing, CRM, finance, databases, and other common sources. This expedites the influx of data into data warehouses, data lakes, and the seamless orchestration of transformations within the same pipelines.

Salesforce announced the general availability of Einstein Copilot, the conversational AI assistant for CRM, along with new capabilities designed to scale adoption of generative AI. Unique to Einstein Copilot are Copilot Actions, which are pre-programmed capabilities that enable Einstein Copilot to not only answer questions using business data but also string together workflows to get things done on behalf of users.

Sigma announced the first three features of its AI Toolkit for Business as well as expanded functionalities to build a data application without writing code. The new features of the AI Toolkit are AI Functions, which allows a user to add new columns into tables that are populated from the most popular LLM technologies like OpenAI; AI Forecasting in Sigma for Snowflake customers; and Sigma Copilot, which is an intelligent assistant for both new and advanced Sigma users.

SymphonyAI unveiled its Apex Enterprise IT Copilot, designed to accelerate problem resolution, boost agent productivity, and elevate customer satisfaction by dynamically providing on-demand information and tailored support. The copilot is part of SymphonyAI Apex, a predictive and generative AI-based IT service management/enterprise service management (ITSM/ESM) platformThe copilot and Apex are powered by SymphonyAI’s predictive and generative Eureka Gen AI platform. 

Teradata announced an open and connected approach to supporting Apache Iceberg and Linux Foundation Delta Lake open table formats (OTFs). Specifically, Teradata’s platform offers first-party services for Apache Iceberg and Linux Foundation Delta Lake OTFs with full support for cross-read and cross-write data stored in multiple OTFs. This interoperability extends to AWS Glue, Unity, and Apache Hive catalogs and works in multi-cloud and multi-data lake environments.

If your company has real-time analytics news, send your announcements to [email protected].

In case you missed it, here are our most recent previous weekly real-time analytics news roundups:

Salvatore Salamone

About Salvatore Salamone

Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Leave a Reply

Your email address will not be published. Required fields are marked *