The Alliance believes that by committing to open data standards, access, and integration between data platforms and applications, it can significantly accelerate business transformations and close the data to value gap.
Have you ever felt like your organization has spent more time creating integrations between your various business systems and platforms or testing out yet another analytics tool that promises to bridge the gap between one dataset and another? A new alliance — the Data Cloud Alliance — might soon have some answers.
Announced in early April, the Data Cloud Alliance already features heavy-hitters and up-and-comers in databases and data management, including Google Cloud, Accenture, Deloitte, Elastic, MongoDB, Redis, and more. For the time being, they’re focused on breaking down barriers between various systems and platforms to ensure that organizations aren’t prevented from undergoing necessary digital transformation because of inaccessible data.
“By committing to open data standards, access, and integration between the most popular data platforms and applications today, we believe we can significantly accelerate business transformations and close the data to value gap,” says Gerrit Kazmaier, VP and GM of Databases, Data Analytics and Business Intelligence at Google Cloud.
See also: The Embracement of API-centric Models
The Alliance’s members believe that to solve the issues around managing and analyzing this proliferation of data, and there need to be common digital data standards and a “commitment to open data.”
Members of the Data Cloud Alliance will contribute to common industry data models, open standards, and end-to-end integrations that simplify the deployment and maintenance of complex data lakes and analytics pipelines. They’re also looking into the challenges around data governance, privacy, and loss prevention, which are major concerns for organizations in heavily-regulated industries or those that deal with personally identifiable information (PII).
But the Alliance also seems to recognize that none of these complex pipelines will work without skilled people to maintain them. In their release, the Data Cloud Alliance says its members will implement new educational efforts to bridge the skills gap and get more people on modern data and analytics platforms.
Each member of the Alliance will provide APIs, infrastructure, and the integrations necessary for organizations to move data between any number of platforms and environments, whether that’s on-premises or in public/private/hybrid clouds. These commitments ideally combine to accelerate the adoption of best practices for data analytics and AI/ML applications across industries, and especially for organizations that have traditionally been left behind by the sheer complexity of data.
The answer seems like “nothing” — at least for a while. The Data Cloud Alliance website is light on details, aside from the above commitments.
The only specific initiative, platform, or environment mentioned, aside from Google Cloud itself, is Delta Lake — an open-source framework for building a Lakehouse data lake that is compatible with Apache Spark, PrestoDB, Kafka, Snowflake, and more. It solves data reliability challenges by making transactions ACID-compliant, with petabyte-scale and access to previous versions of data for full audit trails.
David Meyer, SVP of Products, Databricks, says, “Databricks is excited to partner with Google Cloud to foster data sharing based on open standards like Delta Lake. The Data Cloud Alliance reinforces our commitment to open data sharing and the open data lakehouse paradigm, which empowers data teams to collaborate more effectively.”
Delta Lake was started by Databricks back in October 2017 and then made open source in early 2019. Later that year, The Linux Foundation announced it was taking ownership of the project to drive more adoption and contributions under a neutral, open governance model that could grow its community beyond the existing Databricks customers. Delta Lake has since been implemented by thousands of organizations, including some big names like Comcast, Viacom, Alibaba, Tencent, and more.
Another interesting point was from Mark Van de Wiel, Field CTO at Fivetran, who pointed out the “first leg of analytics—data integration, particularly from SaaS and database sources” as a major area of concern. They seem to be concerned mostly with this “time-to-data-driven,” the moment at which an organization is managing enough data — and making sense of it meaningfully — to confidently say they’re undergoing genuine digital transformation.
Time will tell whether the Data Cloud Alliance comes up with meaningful open standards the way OpenMetrics and OpenTelemetry have for the observability industry, especially since it’s lacking any truly neutral groups at this point. A pessimistic viewpoint is that the Alliance will centralize all its efforts around Google Cloud itself, which would limit its impact even if it does result in new standards, easier integrations, or better skills development resources.
But the Alliance does seem bullish on itself — Lan Guan, Accenture Cloud First Data & AI lead, said, “With the Data Cloud Alliance, we are teaming with our ecosystem partners to be maniacally focused on open standards for data exchange across the cloud continuum.”