The Semantic Layer’s Role in Analytics and Data Integration


When performing advanced analytics against data that is both too wide and big, semantic layers are making a difference in how information is found, used, and leveraged.

Organizations looking to derive insight from their big data continue to face significant challenges because today’s data is inherently hybrid, diverse, and ever-changing. Worse, data tends to be stored in a way that isn’t accessible to everyone. Architectural limitations are to blame and severely impact an organization’s ability to use data to its advantage due to shortcomings in conventional data management. According to Gartner, “the growing levels of data volume and distribution make it hard for organizations to exploit their data assets efficiently and effectively. Data and analytics leaders need to adopt a semantic approach to their enterprise data; otherwise, they will face an endless battle with data silos.”

To avoid these battles and achieve data analytics success, organizations need a modern approach and should adopt democratized, self-service data platforms that leverage a semantic layer to shift from “columns to concepts.” Using a reusable, resilient data foundation — and with the ability to address unanticipated questions — organizations are making data accessible and “insight generation” possible via democratized self-service. And here’s where the semantic aspect becomes critical.

A semantic layer represents a connected network of real-world entities such as events, objects, situations, and concepts — regardless of where it is stored across data lakes, data warehouses, or other data sources. Operating between the storage and consumption layers of the modern enterprise’s data analytics stack, a semantic layer acts as the glue that connects all available data and the business meaning it represents to the enterprise. Unlike relational tables that only IT experts can leverage, a semantic layer does so in a form that is usable to citizen data scientists and business analysts so they, too, can understand, use, and deploy it to their advantage.

See also: Using a Semantic Layer to Propel a Data-Driven Culture

Moving From Columns to Concepts

The dominant data model for storing enterprise data has always been the relational data model. This conventional “columns” approach was more abstract than other data models, but it was still a leaky abstraction because it was overwhelmingly concerned with the structural/physical representation of data: tables, columns, rows, and foreign keys between tables. Yet, while the relational approach is still important and, to some degree,  foundational to the modern world, it creates an unnecessary strain when used as the single approach to enterprise data management. Signs of strain typically include a flimsy model of reference, an unavoidable dependence on string encodings, a massive cognitive dissonance problem, and the inability to accommodate storage from compute.

With a semantic data layer, organizations shift the focus from columns to concepts, resulting in encoded knowledge that more closely matches the view of the enterprise, including business objects and relationships that bring real business meaning to the data. Rather than looking backward in a rigid, metrics-like approach, users can move away from traditional rear-mirror views to a real snapshot in time of what the business is doing and what is revealed by the indicators, metrics, and contextualized data. Given these benefits, one can wonder why did the concept take so long for organizations to adopt it?

In reality, the semantic layer premise has been around for 30+ years. It was often promoted by BI vendors as a way to help companies build purpose-built dashboards, and it was both rigid and complex. As a result, organizations powered semantic layers with an Enterprise Knowledge Graph to help them streamline the shift from columns to concepts as they represent real-world entities and their complex relationships to one another. A knowledge graph-powered semantic layer is capable of providing numerous points of view at the same time and can model complex relationships even if the data is big, siloed, and/or changing. Also, because it describes people, places, things, and how they relate, a semantic data layer promotes self-service, data democratization, and enables users to answer their questions by interacting with the data directly in a format that makes the most sense to them.

Case in Point: Boehringer Ingelheim

Boehringer Ingelheim, a global pharmaceutical company, realized they needed to connect data from disparate parts of the enterprise to increase research and operational efficiencies, increase output, and ultimately accelerate drug research. With numerous teams of researchers working independently to develop new treatments, data was often siloed within teams, making it difficult to link targets, genes, and disease data across different parts of the company. They needed a bigger approach that would establish a technical foundation to enable data sharing across the entire company. It needed to link data from across teams, support ontologies to understand how terms related to one another, and have the flexibility to allow them to connect internal experimental results with external publicly available data of varying quality and formats.

By building a semantic layer on top of their existing data lake, they were able to provide a consolidated one-stop shop for 90 percent of their R&D data. Connecting metadata from across workflow systems, they could integrate information about how samples were generated and stored, which studies were currently underway or completed, and how specific data points were created and stored. The semantic layer allows bioinformaticians to access and work with the data, with no cleaning required, and the data arrives already linked to the proper entities. Users can now search for a particular disease, study, or gene and then explore the results “Wikipedia-style.” Analysts can see directly in the data model how one piece of data relates to the rest of the R&D data, and they can use a light query builder to pull reports with no SPARQL knowledge required.

Why “Meaning” Matters

Democratizing data and generating insights have never been more important to achieving a competitive advantage. Whether performing advanced analytics to drive decision-making or modeling complex relationships against data that is both too wide and big to describe people, places, things, and how they relate, semantic layers are making a difference in how information is found, used, and leveraged.

By moving from columns to concept, not only are insights accelerated, but decision-makers can also use data from any point in the value chain and then experience their benefits. Once everyone is engaged, performing cross-domain analysis, and achieving a 360-degree view—that includes not just product information but all the domain and business objects that matter to the business — a truly holistic view becomes possible.

Navin Sharma

About Navin Sharma

Navin Sharma is VP of Product at Stardog, the leading Enterprise Knowledge Graph (EKG) platform provider. For more information, visit or follow them @StardogHQ.

Leave a Reply

Your email address will not be published. Required fields are marked *