Open source data warehouses like MariaDB are blurring the line between real-time analytics and traditional data warehouse applications.
The line between real-time analytics and more traditional data warehouse applications is starting to blur as the ability to rapidly ingest data in a traditional data warehouse starts to significantly improve. MariaDB AX, an open source data warehouse based on a relational database project that has drawn support from IBM, Microsoft, Google, and Alibaba, is now being extended via a series of adapters that provide access to streaming analytics.
Shane Johnson, senior director of product marketing for MariaDB, says the beta release of a Streaming Data Adapter for Apache Kafka, will enable MariaDB AX to consume messages and write data to the underlying storage engine employed by the MariaDB database. Johnson says MariaDB will also add adapters for other streaming messaging platforms in the future as well.
Meanwhile, a separate Streaming Data Adapter for MariaDB MaxScale will make it possible to analyze operational data within seconds of being created, says Johnson. That adapter makes use of MariaDB MaxScale, a database proxy, and change-data-capture streams to continuously replicate data from the operational environment to MariaDB AX automatically, explains Johnson.
The MariaDB project is also making available Bulk Adapter that can be employed to continuously collect and write large amounts of data to MariaDB AX for analysis, or publish machine learning results to MariaDB AX for data scientists to analyze using SQL queries.
Johnson says these capabilities not only make data warehouses based on relational databases more relevant than ever; they eliminate the need to rely on cumbersome extract, transform and load (ETL) processes to transfer data.
Support for custom analytical functions for relational and non-relational data such as JSON is also being added to MariaDB AX. An application programming interface (API) for creating user-defined aggregate and window functions, combined with support for text and binary data, will enable end users to analyze structured, semi-structured and unstructured data using custom analytical functions.
Finally, MariaDB AX now supports GlusterFS to provide data high availability without having to rely on storage area network (SAN) and tools that automatically take advantage of multiple, concurrent connections to backup and restore distributed data residing on multiple servers.
In general, Johnson says that because the volume of data a relational database is able to store and process has increased, platforms such as MariaDB are still preferred for analyzing large amounts of structured and semi-structured data. In contrast, large amounts of, for example, binary data are being stored on platforms such as Hadoop.
Goal? To make more data available
The primary driver of these efforts is to make more data available to analytics applications. Today most primary data is already being made accessible via multiple forms of non-volatile memory. Storage systems based Flash memory create an opportunity to also make large volumes of data stored in secondary storage systems available in near real time to drive, for example, artificial intelligence (AI) applications that require access to large volumes of data.
Johnson says the MariaDB community is now making advances in open source database technologies in areas where the rival MySQL project now owned by Oracle doesn’t have any incentive to address. Oracle is largely conflicted between what capabilities to provide via its commercial namesake database and an open source MySQL project it acquired in 2008, says Johnson.
It remains to be seen how much progress the MariaDB community can make in terms of usurping rival commercial and open source databases. But the one thing that is clear is that the collective resources of the MariaDB community are clearly focused on advanced analytics applications.