SHARE
Facebook X Pinterest WhatsApp

Going Inside SAP’s New Data Strategy

thumbnail
Going Inside SAP’s New Data Strategy

SAP

SAP’s new data catalog capability has changed how the enterprise solutions firm helps clients to tackle big data.

Written By
thumbnail
Michael Vizard
Michael Vizard
Jun 12, 2018

At the heart of an emerging distributed approach to managing data is a new data catalog capability that has been inserted in to SAP Data Hub.

SAP Data Hub combines data virtualization capabilities with an instance of the Apache Spark in-memory computing framework to make it possible to read and write data across a distributed computing environment regardless of whether data is stored in the SAP HANA database or somewhere else. SAP originally made that instance of a distribution of Apache Spark available in an offering known as Vora.

Now SAP has eliminated Vora as a standalone offering in favor of embedding that capability alongside data pipelining and virtualization software included in version 2.3 of SAP Data Hub, says Ken Tsai, Ken Tsai, global vice president and head of product marketing for cloud platform and data management.

To keep track of what data is located where SAP Data Hub also now includes a Data Catalog that captures all the metadata that exists within a distributed computing environment, says Tsai. That functionality will prove to be a critical element of SAP’s approach to processing data inside and out of HANA in near real-time, says Tsai.

As an in-memory database, HANA has emerged as a cornerstone of the SAP approach to processing data in real time. But it’s not feasible to move every piece of data in the enterprise into HANA. By including an instance of Apache Spark in SAP Data Hub it becomes possible to process data outside of HANA at speeds that can keep pace with the rate at which data is being processed within HANA.

Coming soon: container services

As part of that effort both HANA and SAP Data Hub will soon be running as a set of container services hosted on a Kubernetes cluster, adds Tsai. Kubernetes makes it simpler to deploy either HANA or SAP Data Hib anywhere as part of a hybrid cloud computing strategy that will enable IT organizations to more easily process data anywhere it’s located across what SAP describes as an emerging intelligent enterprise. That intelligent enterprise will, for example, be able to process massive amounts of data required to drive machine and deep learning algorithms required to drive artificial intelligence (AI) model in near real time, say Tsai.

“Algorithms are meaningless without being able to access data,” says Tsai.

Tsai also notes the SAP approach to data management will enable IT organizations to anonymously process queries without having to redact data in a way that makes it unreadable. That will prove to be a critical requirement in healthcare applications where researchers will need to be able to study data involving a larger number of patients without having to sacrifice data sets because the underlying data has been masked in a way a query can’t process, explains Tsai.

In general, hybrid cloud computing has proven to be an elusive goal because each cloud and on-premises IT environment processes and stores data differently. Via a combination of HANA and Vora, it’s apparent that SAP is setting out to solve that challenge by putting in a layer of data processing software that shares access to a common framework for processing metadata. It may take a while for HANA and SAP Data Hub to become federated across all of enterprise IT. But once they do SAP is betting that the future of IT will once again be driven more by an ability to most efficiently process data rather than what underlying platform that data happens to reside on.

Recommended for you...

Real-time Analytics News for the Week Ending January 10
The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Beyond Procurement: Optimizing Productivity, Consumer Experience with a Holistic Tech Management Strategy
Rishi Kohli
Jan 3, 2026
Smart Governance in the Age of Self-Service BI: Striking the Right Balance

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.