SHARE
Facebook X Pinterest WhatsApp

Hadoop Data in the Dark? How Governance, Metadata Can Help

thumbnail
Hadoop Data in the Dark? How Governance, Metadata Can Help

data governance

Data governance and metadata synchronization can prevent Hadoop data from going dark.

Written By
thumbnail
Joel Hans
Joel Hans
Oct 18, 2016

“We have to evolve how we manage data,” said Philip Russom, TDWI’s senior research director for data management, during a recent webinar on Hadoop’s role in big data.

For those entrenched in the big data world, this isn’t necessarily a new idea. The issue has been around ever since the advent of big data itself.

But, according to Russom, central to the discussion of properly managing data is how businesses navigate governance, control metadata, and make analytics accessible. And according to his research, the difficulties of these three points can all be addressed with Hadoop’s open-source architecture.

Russom said, “Hadoop is known for its linear scalability. Hadoop can become, essentially, a bigger and better data staging area for both warehousing and data integration.” It’s not just a storage space, but also a processing engine for handling massive volumes of data—relevant for companies who are getting their data from sensors or telematics.

He added: “Hadoop has desirable use cases, but it can be a challenge in terms of data governance. Don’t forget—Hadoop is still kind of new, and it’s still kind of spartan in a lot of ways. That’s part of the secret sauce.”

Of most relevance to businesses is governance, followed by managing metadata. Once those are set in place, they can start creating systems that allow “self-service” analytics — essentially, enabling non-programmer employees to work with analytics to make big business decisions.

Hadoop data governance challenges

Russom noted that while Hadoop offers enormous advantages in linear scalability and the ability to offload data analytics, it does have some governance concerns. “Hadoop is good with high availability, but  … Hadoop replicates data into multiple places,” he said. “That’s actually a data lineage problem. It’s hard to know where data went, how many copies there are. That’s a governance issue.”

In order to tackle this, Russom insists that companies need to think of data governance as much more than business compliance, or negotiating the often-complex regulations that define a business’ activities. Technical standards need to be incorporated into the wider picture. These guidelines can cover data usage, privacy, and security, and if they’re not followed, a business can expose itself to legal issues and potential erosion of their brand’s value.

Metadata is a part of this as well. Jean-Michael Franco, the director of product marketing at Talend, said in the webinar that delivering metadata by design and synchronizing it across data platforms is critical to not only keeping that metadata under control, but also establishing those self-service tools that enable all employees to make analytics-based decisions.

Advertisement

Where Hadoop gets it right

“The primary path to getting business value from big data, and a lot of new data, like machine data, is through analytics,” Russom said. “There are challenges around Hadoop, but I don’t see them stopping anybody.”

Cost and complexity are additional challenges that can make companies stumble if they’re not prepared (see ‘Help! I’ve Been Told I’m Supposed to Use the Cloud and Hadoop But I Don’t Know Why”). TDWI has even published an online tool to help gauge Hadoop readiness.

Franco said that Accolade, which brands itself as a “on-demand healthcare concierge,” has been using Hadoop-enabled analytics to enable important efficiency gains in their processes. Now, they can better individualize services to those in need of medical care, which has resulted in a 75 percent drop in the onboarding effort and time. Many other industries are finding similar success.

While speed can be key to any big data investment, maybe companies should focus first on ensuring that whatever tools they do use are going to assist in the governance process.

Advertisement

More on this topic:

Apache Hadoop and Spark

Analyzing billions of stock transactions with Hadoop

thumbnail
Joel Hans

Joel Hans is a copywriter and technical content creator for open source, B2B, and SaaS companies at Commit Copy, bringing experience in infrastructure monitoring, time-series databases, blockchain, streaming analytics, and more. Find him on Twitter @joelhans.

Recommended for you...

Top 10 Hadoop Data Migration Traps
Big Data Battle Shifts Fronts
Michael Vizard
Jun 22, 2019
Arcadia Data Bolsters BI Capabilities for Cloud-Based Data Warehouses
Sue Walsh
Apr 2, 2019
Case Study: Multinational Retail Chain Migrates To Hadoop
David Curry
Mar 7, 2019

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.