Cognite leverages siloed industrial data within organizations to get the most out of their generative AI efforts.
ChatGPT has created an explosion in interest in generative AI. As businesses become more comfortable with the technology, they are looking for more sophisticated uses than merely a front-end to existing applications. One area that is getting much attention because of the potential benefits is employing generative AI to improve industrial workflows.
RTInsights recently sat down with Ben Skal, Senior Director of Product Marketing at Cognite, to talk about where generative AI might help in this area, what is required for it to work, and the impact it could have in industrial environments.
Here is a summary of our conversation.
RTInsights: Popularized by the explosive use of ChatGPT, everyone’s talking about generative AI. What general areas are this technology being considered for use to improve industrial workflows?
Skal: I would like to start this answer by addressing what the benefit of generative AI is for industrial companies. At the 10,000-foot level, generative AI will make it easier for people out in the field, subject matter experts, production managers, and even data scientists to work with data. It’s about providing simple access to complex industrial data.
As far as what problems it is capable of solving, generative AI isn’t a use case on its own. Generative AI is a technology, and it’s going to be targeted at many of the operational use cases industrial companies have been trying to solve for a long, long time. That includes asset performance, optimizing throughput, improving the ability to plan and execute maintenance, addressing challenges related to improving quality, and more.
Generative AI will 1) improve industrial companies’ ability to generate insights from industrial data and 2) speed up the ability to problem solve. For example, generative AI could find all of the relevant information about a pump and do it in a way that will reduce the complexity and shift away from what is the norm today. What you normally see is that 80% of the time is spent gathering the data, and only 20% is actually used to do the assessment of the problem. We can flip that script where only 10% to 20% of the time is used to actually find the right information, and most of the time can be spent on actually doing the assessment.
RTInsights: What are some of the issues organizations must be aware of to ensure that the results are accurate, useful, and that type of thing? And can you also talk about hallucinations?
Skal: Generative AI thrives on context. To be able to get accurate, trusted responses, generative AI needs industrial context. If you consider a lot of the large language models (LLMs) that exist today, they were trained on billions of data points from the internet. But as far as understanding an individual user’s industrial context, what’s happening at a specific site or across a specific enterprise, that context doesn’t exist. And so, you need to be able to provide industrial context to an LLM to provide accurate, deterministic answers.
If you already have a data lake with centralized data, it may be tempting to combine it with a large language model in hopes that it will allow everyone to start using generative AI. Having the data centralized is great, but industrial data is complex. So, without context, asking an LLM to answer a question using a data lake likely won’t provide the correct answer because it doesn’t know how the time series data is related to a specific asset, work order, engineering drawing, or document. What needs to happen is all of those data sources need to be connected and mapped in a foundational way so that when a user asks an LLM a question, it can understand the context of that question and answer it in a deterministic way.
Without that context, you run a high risk of getting a hallucination (an incorrect or invented response from a large language model). If ChatGPT or any large language model is trying to answer questions on data where it doesn’t understand the context, the risk of hallucinations is very, very high. That’s why providing the context to those large language models to the LLMs is a critical issue that organizations must be aware of.
RTInsights: What are the requirements to address these issues?
Skal: There are a few steps to address context in LLMs. It starts with being able to unify data. In most industrial environments today, data still exists in individual silos across operations, IT, and engineering. Being able to unify that data is what we call “the industrial data problem.” At Cognite, we’ve been talking about solving industrial data and AI problems for a while. You first need to unify the data before you can provide context to the LLMs.
Unifying data still doesn’t provide context. Context is the ability to represent the relationships between all of the different data sources. That means being able to connect time series data to the related work order to the appropriate engineering diagrams or models of a specific asset. You need to map and create those relationships in an automated way. We’re talking about hundreds of thousands of data points, and automating that process can save thousands of hours. Artificial intelligence is a key to automating this process, for example, using optical character recognition (OCR) to identify equipment on previously flat process and instrumentation (P&ID) or process flow diagrams (PFD).
Once you start mapping these relationships, you start to build a data foundation that we refer to as an industrial knowledge graph. That knowledge graph has nodes, which are all these different entities, the time series, work orders, and 3D models, connected with edges that link these entities in a way that makes sense to you as a user.
Next, when you type in a prompt to the large language model about a specific piece of equipment, you want to provide the LLM with a specific subset of that data through a process or through a model called retrieval augmented generation.
Let’s go through an example. Imagine I ask a large language model, “Can you show me what the maximum operating pressure of this specific piece of equipment is designed for?” The retrieval augmented generation will take this prompt and say, let’s collect all the information we know about that piece of equipment. It will then provide that context to the LLM and, I’m being overly simplistic, say, “Answer that question with this subset of information.” And most importantly, so that it can be trusted, “Show me the document that you used to answer that question.” In that way, the end user can see the response to the question. “The max operating pressure is X, and here’s the document the LLM used to provide that answer.”
There are two more critical requirements. The first is ensuring that data doesn’t get exposed over a public network. When I am interacting with a large language model, I don’t want any of my proprietary data to be exposed. So, being able to interact with a large language model while keeping all the data private is very important for IP and security.
The final piece is you need to be able to have access control over what data is shared to the large language model. So, even though I have this data foundation built in an industrial knowledge graph, I only want to provide access to these data sets.
RTInsights: How does Cognite help in general and in areas like improving semantic search?
Skal: Cognite is in a really strong position to help industrial companies adopt generative AI solutions because, from the beginning, we’ve been about building that data foundation. We solve the industrial data problem. Our Industrial DataOps platform, Cognite Data Fusion®, connects all the siloed data sources and puts it into context. And then, importantly, it provides open and stable APIs that allow anybody to interact with that data through our own applications and through any applications our partners, manufacturers, industrial customers, etc., may already use. For example, you may use a visualization application like Power BI, Grafana, or Jupyter Notebook for data scientists. Those applications work natively with Cognite’s open API.
Because Cognite Data Fusion® started with the idea of building that data foundation, we have been able to quickly adopt a comprehensive set of generative AI capabilities – called Cognite AI – that enables LLMs to interact with that data in the industrial knowledge graph.
It works very similarly to what I was just saying. The user puts in a prompt and asks a very specific question. It could be something like, “Show me all the pumps manufactured before this date by the manufacturer that are in poor asset health and don’t have a work order available.” Answering a question like this might require multiple sources. That prompt then gets translated using retrieval augmented generation to provide all of the relevant pump information available in our core product, Cognite Data FusionⓇ.
That is how we provide the industrial context from the knowledge graph to an LLM to be able to give those trusted, deterministic responses. Now, as far as how we see end users interacting with LLMs, I’ll touch on three examples.
The first is semantic search. And that is the example that we discussed above, finding very specific information in a data model about a specific piece of equipment. We intend this to be done both at the desk and also out in the field. It would be used when an operator is walking around a facility and is trying to understand if a pump is operating within its normal limits without having to go through all the documentation to find what those limits are. Instead, the operator would type a question, and generative AI will answer what it believes the operating limits to be and provide the OEM specification document of that pump so the answer can be verified.
The second piece is around summarization. You can think about taking a month of maintenance or shift reports and saying, “Provide me a summary of all the activities related to our critical assets.” The LLM would create a summary from these reports.
The third piece is code generation. And this is really, really interesting because it’s going to allow people who don’t have coding experience and aren’t professional coders, like me, to be able to create their own applications. Being able to use natural language to say, “Build an application that will provide all open work orders and time series data when I search for a piece of equipment.” And the LLM writes the code for an application to do just that. Now, that’s a really simple application, but the idea is that end users who weren’t trained how to code can now be able to create their own applications on the fly.
RTInsights: Can you give us some examples of how all of this will benefit industrial customers in the near term and further out into the future?
Skal: In the near term, the first things that you will see in Cognite Data Fusion are related to semantic searching. That will allow people to search for documents out in the field with the generative AI-powered search. They will be able to search through data models, drill down, and find very specific information very quickly. This will be in Cognite Data Fusion this year.
In the longer term, and I’ll say longer term is six to 12 months, you will see the ability to create applications using natural language. You’ll also start seeing co-pilots in products. Co-pilots will extend beyond search for use cases like root cause analysis, for example. We know workforce turnover in industrial facilities is a real problem, and it’s not so challenging to train someone on a steady-state process, but it’s really challenging to train people on how to react when processes are disrupted.
Imagine having a co-pilot search when you have an issue with a piece of equipment or a process. In such a use case, you might ask, “Provide me with all the relevant data for this specific process.” Or “I think the problem is with this specific piece of equipment. What are the potential issues? What do you recommend is the next step for troubleshooting? “
Broadly speaking, what’s going to be interesting is the deconstruction of how users interact with data because of generative AI. Traditionally, users mostly interacted with data through applications. I needed a prebuilt application that allowed me to track asset health or predict product quality. How we think about applications will change dramatically with a co-pilot that can create applications or generate insights on the fly. In the not-too-distant future, a subject matter expert with no coding experience will be able to build their own tailored applications using natural language. This will dramatically increase the rate at which new applications can be developed without having to rely heavily on developers.
We just announced a new no-code workspace called Industrial Canvas. It’s a collaborative environment where all contextualized data: drawings, 3D models, work orders, and time series data, are available in a single workspace with which a co-pilot is able to find, populate, and support the root cause analysis within that environment. Industrial Canvas creates an open environment with simple access to complex industrial data and enables end users to interact with their data, using generative AI in their language and on their terms.