Best Practices for Deploying and Scaling Industrial AI
Artificial Intelligence (AI) is transforming industrial operations, helping organizations optimize workflows, reduce downtime, and enhance productivity. Different industry verticals leverage AI in unique ways.
Accelerating Manufacturing Digital Transformation with Industrial Connectivity and IoT
Digital transformation is empowering industrial organizations to deliver sustainable innovation, disruption-proof products and services, and continuous operational improvement.
Leading a transportation revolution in autonomous, electric, shared mobility and connectivity with the next generation of design and development tools.
As businesses become data-driven and rely more heavily on analytics to operate, getting high-quality, trusted data to the right data user at the right time is essential.
The goal of automated integration is to enable applications and systems that were built separately to easily share data and work together, resulting in new capabilities and efficiencies that cut costs, uncover insights, and much more.
Digital transformation requires continuous intelligence (CI). Today’s digital businesses are leveraging this new category of software which includes real-time analytics and insights from a single, cloud-native platform across multiple use cases to speed decision-making, and drive world-class customer experiences.
Best Practices for Deploying and Scaling Industrial AI
Artificial Intelligence (AI) is transforming industrial operations, helping organizations optimize workflows, reduce downtime, and enhance productivity. Different industry verticals leverage AI in unique ways.
Accelerating Manufacturing Digital Transformation with Industrial Connectivity and IoT
Digital transformation is empowering industrial organizations to deliver sustainable innovation, disruption-proof products and services, and continuous operational improvement.
Leading a transportation revolution in autonomous, electric, shared mobility and connectivity with the next generation of design and development tools.
As businesses become data-driven and rely more heavily on analytics to operate, getting high-quality, trusted data to the right data user at the right time is essential.
The goal of automated integration is to enable applications and systems that were built separately to easily share data and work together, resulting in new capabilities and efficiencies that cut costs, uncover insights, and much more.
Digital transformation requires continuous intelligence (CI). Today’s digital businesses are leveraging this new category of software which includes real-time analytics and insights from a single, cloud-native platform across multiple use cases to speed decision-making, and drive world-class customer experiences.
Graph analytics is gaining favor for business applications in which insights must be derived from massive unstructured, connected datasets. But there are different computational issues compared to running analytics on traditional, structured data. RTInsights recently sat down with co-founders Keshav Pingali, CEO, and Chris Rossbach, CTO of Katana Graph. We discussed what makes graph analytics different, how to speed the time to results, and how the company’s partnership with Intel fits into the equation. Here is a summary of our conversation.
RT Insights: How does graph analytics differ from
traditional analytics?
Keshav Pingali
Pingali: The way we see it at Katana is that
as datasets become larger, they tend to become unstructured, and they also tend
to become sparse.
I’ll give you an example of what I
mean by that. We are all familiar with social networks. A social network graph
has vertices for each person in an organization, and if two people know each
other, you put an edge between them in the social network graph.
If you consider the social network
graph of Katana, we have about 25 employees at this point, everybody knows
everybody else, so it’s a very structured graph. That kind of data can be put
into SQL tables, and you can use SQL queries with it.
But if you imagine a bigger company,
what’s going to happen is that each person will know fewer and fewer people
overall. That’s what we mean by sparse. The people that one person knows would
be a very different group than the people that someone else in the company knows.
That’s what we mean by unstructured.
It turns out that once data gets to
that size – once data becomes sparse and unstructured – it makes sense to
process that data using what are called graph algorithms. You could use SQL,
but it would just be very inefficient because SQL is not intended for these
sorts of sparse unstructured datasets.
Chris Rossbach
Rossbach: Yes, the key issue is that sparsity
and irregularity argue for a different set of algorithms and different systems
support. Graphs obviously are the most succinct and natural way to capture the
data that is being represented. Still, once you represent them in that form,
the kinds of algorithms you need are different. The kind of runtime, the kind
of infrastructure you need to support at the lower layer, to implement graph
algorithms efficiently is also very different from what you would see in a
traditional relational database.
Advertisement
RT Insights: Different as in complex to build? Expensive?
Time-consuming? All the above?
Rossbach: I’m not sure that I would argue that
graph analytics has higher or lower complexity necessarily than a relational
database. It’s just a very different way of approaching the problem, which
means that the kinds of components that you would assemble to solve the problem
are different.
You can solve math problems with a
relational database. Just as Keshav said, that’s very inefficient because a
relational database is designed to capture data as very dense structured groups
of things that go together in tables. The algorithms and the storage layers
that support that are all designed to be very efficient when dealing with data
with that kind of structure. If you’re dealing with sparse data, using that
kind of infrastructure no longer makes sense. Further, suppose you’re willing
to customize the storage layer, computer engine, and all the lower layers to fit
the kinds of data you’re computing over. In that case, you can achieve massive
gains in efficiency and performance. That’s the key motivation behind Katana.
Advertisement
RT Insights: What type of datasets lends themselves to
graph analytics?
Pingali: Okay, that’s my favorite subject, so
let me give you my view on that. We tend to see graphs everywhere there are
large unstructured sparse datasets.
Katana is engaged with a few pharma
companies, and in pharma, they have what is called knowledge graphs. PubMed is
an example of a very famous knowledge graph. It is a graph that contains
vertices for all biologically active entities, like viruses, bacteria, animals,
and more. It also has vertices for biologically active compounds like arsenic,
for example, which obviously can kill you if you take it. And, it has vertices
for authors of articles, and as well as vertices for articles that they have
written. If you write an article about the effect of arsenic on human beings,
then many edges get added to a graph to capture all that connected information.
Pharma companies are trying to mine
all this knowledge that currently exists in all the articles that have been
written in the biological area to find promising treatments for, say, COVID. That’s
an example of an area where graphs, and in particular knowledge graphs, are
very, very important. We’re providing the computer science expertise, machine
learning expertise, and systems expertise that will enable these folks to do
their work more efficiently.
Another area where we are currently engaged is security in identity management and online intrusion detection in networks. You have a computer network, and bad guys are trying to break in. You can build a graph that captures all the activities that are going on in the network. And then, if you see certain forbidden patterns, you raise an alarm. We have worked with BAE to build a system for them on intrusion detection. That was very successful; it was a DARPA [Defense Advanced Research Projects Agency] project.
Still another area where graphs show
up a lot is in the financial services sector. For example, all major banks want
what are known as 360-degree views of their customers. For example, if you have
a mortgage, they may come to you and say, “Oh, you know what? Maybe you
should refinance your mortgage because we’ve looked at your spending patterns,
and we think this might be a better deal for you.”
The final area I want to mention is
that you can use graph analytics in workflows for designing electronic circuits
on chips. The gates or pins are the vertices, and the wires are the edges of
the graph. We’re currently engaged with some chip design companies. We’ve shown
them that we can use our graph engine to do many things, like partitioning the
circuit, placing the gates, and wiring the gates faster than they can do with
their current approach.
So basically, we find graphs
everywhere we look, from circuits to pharma to banking to online security.
Rossbach: I’d like to add a little bit from a
programmer/developer-facing view on that question. As Keshav stated, there’s a
very wide range of areas and use cases where a graph is a natural data
structure for thinking about problems. But there’s also a view that graphs make
it much easier to reduce your data, which reduces time to insight when working
with a dataset as a graph versus other modalities.
Consider the traditional way I might
try to understand large datasets. I must first put the data into a database. That
means I need to introduce a schema. There’s a much lower barrier with graphs
because you don’t necessarily need a schema. You just need to define the nodes
and the edges. And it’s an incredibly attractive property of graph datasets in
terms of shortening the time to insight and action.
Advertisement
RTInsights: Would this translate into shorter times to
develop the applications?
Rossbach: Yes, I believe so. If you look at the
traditional development lifecycle for applications that consume large amounts
of data, a big chunk of time is allocated to data design, data cleaning, data
management, and data input. You have to manage your data. You have to get it in
and out of your system. The degree of effort required to create a model and
enforce it is much, much lower with a graph database. There is value in terms
of having a very clean and easy programming model, as well.
RT Insights: At the heart of your offering is the ability to scale up graph analytics, so why is that important?
Rossbach:
Nobody is going to
have less data tomorrow than they have today. Graph analytics is a great way of
dealing with data you don’t understand yet. From the perspective of Katana,
scaling out graph algorithms is something Keshav’s group has spent many, many
years researching. It is a hard problem, and they are getting good at it. As
technology trends move towards bigger and bigger data, it’s an area where
Katana has a fundamental advantage.
Pingali: And just to give you a few
numbers. One of the companies we’re working with has a graph with more than four
trillion edges. So that’s how big their graph is, and obviously, they want to
process that instantaneously: faster, please! They don’t want to wait for
insights from a graph of that size. And they also want to be able to ingest new
data faster, which is equally important. We are not talking about a static graph,
but a graph that updates as transactions are happening. These activities happen
in real time, so you need to ingest that new data and then update your graph in
real time. That’s another problem that we’re addressing. There might be a
billion events every 15 minutes that need to be ingested and updated. That
gives you an idea of the problem’s scale and why having scale-out solutions for
graph analytics is very important.
Advertisement
RT Insights: How do you accomplish the scale-up capability?
Rossbach: A lot of that is through great,
well-researched clever algorithms. The careful partitioning of graphs makes it
possible to distribute them over more and more computers while not be
overwhelmed by the communication overhead. Intuitively, I can see why people think
that doing distributed graph computation is difficult. If you partition a large
graph, you put it on lots of different hosts to compute over it. Fundamentally
those computers are going to have to talk to each other.
We’ve had great success in the past doing parallel distributed processing with big data with legacy engines like MapReduce and Spark because partitioning the data is very straightforward for traditional dense or relational data. With graph algorithms, that is not so much the case because when you’re traversing edges and examining those and following paths in the graph, it’s much more difficult to predict when you’re going to need a piece of data is in some other partition. And how are you going to get a piece of data that’s in some other partition? You can imagine that it is quite bad for performance if done inefficiently.
I think in addition to the algorithmic
innovations that Keshav’s group has developed, there’s also key research into
how you partition graphs in a way that maximizes efficiency and minimizes
communication.
That’s at the top layer. If you start
following the technology stack downwards, what you see in Katana is a tiered
design to preserve all the performance benefits derived from clever
partitioning and minimizing communication all the way down through the stack.
Some of our customers have graphs with
trillions of edges. We store those in a way that makes it possible for us to
quickly get that data into the graph engine’s memory. That helps us deliver
performance. Doing that involves storage layer design, runtime design, and so
on all the way up the stack.
Advertisement
RT Insights: What has your relationship with Intel
allowed you to do?
Rossbach: Speed is the answer. What we’re able to do by collaborating with
Intel is to take advantage of CPU features that are emerging and be able to
maximize them for performance. To know how to optimize Katana’s software
runtime to be as efficient as possible every time Intel comes out with a new
performance-focused solution, to take advantage of that quickly and
effectively, that’s an advantage. But I think there are also big advantages
when it comes to the panoply of emerging hardware that we’re seeing come out of
Intel.
Some of that boils down to already-shipping, more mature technologies, like Intel Optane Persistent Memory. Persistent memory has a compelling advantage. I was talking about how we operate over partitioned graphs to deal with scale. Well, if you’re storing things in persistent memory, it’s already in memory. That’s one advantage.
There’s also the fact that persistent memory is often used
as a lower tier in the memory hierarchy to give the abstraction of much, much
higher DRAM capacity than is needed. And guess what? The more DRAM you have,
the larger your host partitions can be. This translates directly to less
communication when you’re doing compute over irregular data. So, I would say graph
computing is one of the domains where Optane has very compelling performance.
Intel also is coming out with some very beefy and
compelling GPUs. We’ve invested a lot of effort in figuring out how to do graph
algorithms well on GPUs. Collaborating with Intel is bringing an advantage
there.
And, of course, there are many efforts
at Intel with other forms of hardware acceleration. Things like FPGAs are
clearly something that most people know about. Then there are graph
accelerators. That’s a research area that both Keshav and I have worked in
quite a bit. We’ve explored hardware acceleration algorithms and runtime
support for them both.
Pingali: Intel, like every other company in
the tech space, realizes the importance of AI and machine learning. Everybody’s
trying to use AI and machine learning applications to drive the hardware that
they build and rebrand themselves as AI companies.
We’re working with Intel to understand
how to redesign some of their hardware for AI and machine learning
applications. That’s to complement what Chris said about using their current
hardware offerings like Optane. We’re also working with them to see where they might
go next.
And then, from our perspective, they have
a huge customer base, and most of them need graphs. Those customers approach
Intel and ask: “Can you optimize this graph application for us?” What this
partnership allows us to do is to step in at that point and say, “Oh, by
the way, here’s what we can do for graphs on Intel hardware.”
Advertisement
RTInsights: So, it’s
almost as though the rising tide will lift all the boats.
Pingali: That’s right. We see this as a
win-win for everybody. People are very familiar with relational databases like
SQL, but they are not as familiar with graph databases and graph analytics. The
entire area is a wonderful opportunity for startups like us because we know
what to do. Having Intel as a partner helps us with this missionary activity,
so to speak, of proselytizing graphs and making converts of everybody!
Elizabeth Wallace is a Nashville-based freelance writer with a soft spot for data science and AI and a background in linguistics. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain - clearly - what it is they do.
Autonomous BI represents a significant step toward truly intelligent enterprises where insights are continuously generated, contextualized, and delivered at the speed of business.
Retailers cannot meet modern consumer expectations through procurement alone. They need a holistic technology management strategy that prioritizes maintenance, security, compliance, resiliency, and visibility.
Businesses using self-service BI often find themselves in a tug of war between too much and too little control. Smart governance can empower enterprise teams to fearlessly derive the insights they need at the speed their business demands.
The evolution from Chief Data Officer to Chief Data, Analytics, and AI Officer reflects the shift in how enterprises derive value from their digital assets.
Zero Trust assumes a strong, unified source of identity and policy. Most enterprises have the opposite. Create a cross-functional “Zero Trust Council” with shared KPIs tied to business outcomes to fix the situation.
Organizations that align their data strategies with 2026 cloud evolution trends will be well-positioned for success in the modern AI-dominated business world.
Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.
Advertiser Disclosure: Some of the products that appear on
this site are from companies from which TechnologyAdvice
receives compensation. This compensation may impact how and
where products appear on this site including, for example,
the order in which they appear. TechnologyAdvice does not
include all companies or all types of products available in
the marketplace.