SHARE
Facebook X Pinterest WhatsApp

LinkedIn Open Source Tool to Deploy TensorFlow on Hadoop

thumbnail
LinkedIn Open Source Tool to Deploy TensorFlow on Hadoop

LinkedIn tool reduces the amount of time required to create AI models by making massive amounts of data stored in Hadoop more accessible.

Written By
thumbnail
Michael Vizard
Michael Vizard
Sep 13, 2018

LinkedIn at the Strata Data Conference today announced it is making code it developed to run the open source TensorFlow framework for building artificial intelligence (AI) applications on Hadoop clusters running Yet Another Resource Manager (YARN) available as an open source project.

TensorFlow on YARN (TonY) was originally developed to facilitate access to massive amounts of data that LinkedIn requires to feed AI models employing deep learning algorithms, also known as neural networks, built using TensorFlow. LinkedIn uses those models to enhance the relevance of feeds and enable a Smart-Replies capability on its social media network.

See also: LinkedIn sees edge computing as the future of the data center

TonY proved especially useful in reducing the amount of time required to create AI models by making massive amounts of data stored in Hadoop more accessible, says Jonathan Hung, senior software engineer for LinkedIn.

“We wanted to speed up the training,” says Hung.

That’s critical because AI training takes place on graphical processor units (GPUs) that are expensive resources that need to be optimally employed, notes Hung.

Training AI models remains the most challenging aspect of building AI models. Each AI model needs access to massive amount of data to increase the accuracy of the machine and deep learning algorithms being applied. Hadoop provides a natural source for that data that can be more easily aggregated and managed across multiple clusters.

The baseline for TonY has already been completed. LinkedIn is expecting organizations will extend TonY for use cases that go beyond social media networks, says Hung.

TensorFlow has emerged as a flexible framework for building AI applications that be deployed on top of everything from Hadoop to Kubernetes clusters. That’s critical because while AI models tend to be built or trained in the cloud or data center, the models themselves tend to be deployed as close to the processes they are intended to automate at the network edge. AI models need to interact with those processes in near real time. Deploying AI models in data centers often creates latency issues that would result in AI recommendations not being created in near real time. In many cases those AI models are tapping directly into data as it streams from the network edge.

Just about every application will soon either have AI models embedded within it or will be able to access an AI model via REST application programming interfaces. Most organizations are still in the early stages of mastering AI. In fact, AI will force many of them to finally embrace more consistent approaches to data management. Data may be the new oil, but few organizations have the pipelines and refineries in place required to process it in a way that enables massive amounts of data to be consumed by AI models. Those pipelines and refineries will require IT organizations to not only acquire new tools, but also master the processes required to pervasively embed AI models across the distributed enterprise.

Naturally, that may take a while to ultimately occur. But as more open source AI tools become available the range of available AI expertise in the enterprise will considerably increase in the months and years ahead.

Recommended for you...

Real-time Analytics News for the Week Ending January 17
Real-time Analytics News for the Week Ending January 10
The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Beyond Procurement: Optimizing Productivity, Consumer Experience with a Holistic Tech Management Strategy
Rishi Kohli
Jan 3, 2026

Featured Resources from Cloud Data Insights

In the Race for Speed, Is Semantic Layer the Supply Chain’s Biggest Blind Spot?
Sajal Rastogi
Jan 25, 2026
The Manual Migration Trap: Why 70% of Data Warehouse Modernization Projects Exceed Budget or Fail
The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.