Digital transformation efforts should take a serious look at real-time data APIs and data virtualization as they might be the keys to making data readily available.
Digital transformation is more than an aspirational phrase. It is a high-level, strategic initiative for many organizations today, one with C-level sponsorship and visibility. Digital transformation aims to leverage emerging digital technologies in the areas of data and analytics to streamline customer and partner interactions across such diverse areas as sales, support, development, and supply-chain logistics.
Often, such transformations proceed in three phases: First by improving internal business processes, next by improving external interactions, and finally by introducing new business models that enable the company to monetize its assets in new ways. It is not easy to begin in the reverse order and introduce a new business model without first improving business processes. This is because the first step requires that companies take a good hard look at their IT systems to ensure that different systems can interoperate and talk to each other via capabilities such as application programming interfaces (APIs).
Digital transformation and the API economy
APIs have been with us for decades, but within the last several, they have been used in increasingly powerful ways. APIs expose the data or the functionality of a specific system and make this information available to other systems. APIs exist within an ecosystem or economy, and, in the last ten years, entire businesses have been established on top of APIs.
Consider Uber. We think of the company as having introduced a brand-new business model, and it most certainly has. But what really sets Uber apart is that APIs have enabled the company to grow much more quickly than it would have grown otherwise. Uber can focus on its core business, connecting drivers with riders, while letting APIs handle all of the “peripheral” activities such as tracking cars, processing payments, integrating data with a phone, sending SMS messages to tell passengers that their driver has arrived, or sending out transactional emails. Uber has not had to build its own mapping technology, as it uses Google for Android or MapKit for iOS. Similarly, Uber is integrated with Braintree for all billing-related activities.
Instead of simply consuming information through APIs, Uber is now also exposing them. Today, the Company works with a network of travel and hospitality partners, which integrate their applications via APIs. Once passengers arrive at their destination airport, airlines can use their apps to offer them an Uber. In this way, Uber has become a platform that extends the company’s business model in a very important way. According to Fortune magazine, the top 5 companies of 2018 based on market capitalization were all platform companies: Apple, Google, Amazon, Microsoft, and Facebook. Just a decade before, none of the top 5 companies (Exxon, General Electric, Microsoft, AT&T, and Proctor and Gamble) were. Note that in 2008, Microsoft was still simply a software company and was not yet a platform company.
Two key API trends to consider: RESTful APIs and microservices
In the 2010s, RESTful APIs became the de facto standard for web services. REST is not a detailed specification but a paradigm or a way of working. With RESTful APIs, the many potential operations are reduced to just a handful, which are the HTTP methods such as get, post, and put. Rather than having to deal with the complexity of the SOAP-based APIs, developers can simply use get statements. In addition to being inherently simple, RESTful APIs are also extremely lightweight, so they can be easily and flexibly deployed. Protocols have been built on RESTful APIs, such as OData, critical for Microsoft products like SharePoint, and OAuth and SAML for authentication. And now we are beginning to see other emerging protocols, like OpenAPI, GraphQL, and Swagger.
In a microservices architecture, applications are built up by composing and integrating microservices, which easily interface, integrate, and communicate with each other using RESTful APIs. In this way, one can think of microservices architecture as “service-oriented architecture (SOA) done right.” SOA was popular at the start of the millennium as the way to introduce efficiency by reducing data-integration complexities. Yet, the combination of SOA, services, and the servers they ran on, grew to be the way to build large applications. Monolithic application servers hosted services, so scalability was an issue. Microservices, in contrast, are extremely light and independent.
However, when companies adopt a microservices architecture, DevOps has to potentially deploy and manage hundreds of thousands of microservices and secure them. This led to a rise in the popularity of container technologies, such as Docker and Kubernetes, which simplify and automate much of the management process.
Picking up from where container capabilities leave off, API management tools go on to address the exposure of APIs to the external world. API management tools enable companies to securely publish select APIs, keep others internal, and to determine exactly which groups or individuals are authorized to access the data. Such tools were not as necessary in the SOA era, but they are becoming more and more necessary in microservices architectures.
Data virtualization in the API ecosystem
Not all data APIs can be real-time data APIs. Traditional data integration techniques used in most digital transformation efforts rely on the physical movement of data from one place to another, where it can then be accessed by the API. But as we all know, sometimes data is delivered via scheduled batches rather than in real-time. Also, such techniques often cannot support modern data types like streaming data and data from social media feeds.
Data virtualization (DV) is a modern data integration approach that provides real-time views of many different kinds of data without having to move it to a new location. This technology is critical for real-time data APIs, as it enables organizations to seamlessly expose their integrated, curated data assets and data services as RESTful APIs, so they can be easily accessed by external entities in real-time. These could be straightforward internal or external data sets, or they could take other more sophisticated forms, such as open government initiatives to share information with development partners. Data virtualization enables phone apps to provide real-time data, such as tracking information. With real-time data and APIs, developers are limited only by their imaginations.
DV can support the API ecosystem in myriad ways, but it typically follows three basic patterns:
1) Data virtualization as a service provider: Suppose we had a common microservices architecture like the one described above in which an organization had several microservices and exposed them, internally and externally, using an API management tool.
Typically, microservices are not the only type of information that an organization might want to expose. In this pattern, DV would be deployed above the organization’s disparate data sources and provide views of the combined data to an API gateway, which delivers the data to consumers. In parallel with the data virtualization layer, the microservices would also deliver their data to the API gateway. The API gateway would control what is exposed, how different individuals and/or groups can access it, and how it must be secured.
2) Data virtualization as the integration layer for microservices: In this pattern, the DV layer would be established above all data sources, including the microservices, and that layer would provide views of the combined data to the API gateway similar to the first pattern.
Here, the DV layer is integrating the microservices, treating them as data sources, and combining views of their data with views from databases and other applications, such as SAP ECC. The advantage of this pattern is that it keeps the microservices very lightweight. They are no longer responsible for such domains as security, auditing, and logging, as those tasks can now be performed by the data virtualization layer itself.
3) Data virtualization as a data services layer for the microservices layer:In the final pattern, DV is established below the microservices layer, acting as its data services layer. This abstracts away many of the complexities surrounding how the microservices get their data, including such details as to its location or required interface. Companies do not have to build a JDBC stack into the microservices themselves, just so that they can access data from an Oracle or SQL Server database, and they do not have to embed SQL queries directly into their microservices.
The importance of the data virtualization layer
Administrators do not need to worry about where the data comes from; all of that is handled by the data virtualization layer, which can also handle more complex requests, such as aggregations, displacements, or other types of reports. Such an arrangement is reminiscent of Uber once again. Uber focused on what it had to do and ignored the “periphery,” which was handled by APIs. Similarly, in this pattern, the microservices developers can focus just on what each microservice is supposed to do, without worrying about how the microservices are going to get data. The data virtualization layer can take care of that in a simple RESTful API call to get the data. This is a powerful pattern because it captures one of the core benefits of the API ecosystem; removing complexity and enabling developers to focus on what they are building.
There is an extension to this pattern as well. So far, we covered fundamental data APIs, which are “read-only,” but there is also a write-back version of this pattern. In the write-back version, data virtualization can support separate APIs for writing and reading, which follows good command-query responsibility separation (CQRS) practice; whereby, separate APIs for writing and reading are established. CQRS is considered a best practice for microservices architecture development.
Real-world example: A fast-food restaurant chain
Consider the case of a fast-food restaurant chain that chose to be anonymous. This chain has restaurants across the United States and Canada and had built a smart-phone app that enables customers from all locations to order food for pickup or take out. Using the app, customers can find the closest restaurant, see a menu, and even call up nutritional information for their choices. And all this information is retrieved through RESTful APIs served up by a data virtualization platform.
While menu information is stored in databases, Excel spreadsheets, and other sources, the DV layer combines it and publishes the combined data as an API. The resulting data is location-specific, as the menu changes by geography. The queries and resulting output are fairly complex, but all of that is hidden from the developers who write the apps. All they have to do is make a call to a RESTful API to deliver the appropriate data given the location, which is usually a restaurant identifier, and then all of that information is available in the app. The app is now used by millions of users, and the data is available without noticeable latency.
Data virtualization for digital transformation
If real-time data APIs underpin digital transformation, data virtualization underpins real-time data APIs. Data virtualization offers a reliable, flexible way to support real-time data APIs, no matter how complex the infrastructure. Companies interested in digital transformation should take a serious look at data virtualization as it might be the missing link to making data readily available and scarce developer resources more available.