SHARE
Facebook X Pinterest WhatsApp

How Twitter Overcame Its Real-Time Data Challenges

thumbnail
How Twitter Overcame Its Real-Time Data Challenges

Learn how Twitter addressed its real-time processing needs with a distributed, and fault-tolerant-stream processing engine.

Written By
thumbnail
Kat Campise
Kat Campise
Jul 11, 2017

Twitter’s real-time, back-end infrastructure processes more than 500 million tweets per day.  However, in 2013, when Karthik Ramasamy’s company, Locomatix, was purchased by Twitter, the 7-year-old social media platform needed to upgrade its real-time processing and analytics engine. Ramasamy, until recently an engineering manager at Twitter, told the audience at Qubole’s 2017 Data Platforms Conference how the company tackled this challenge.

Ramasam says that data has its highest value when it’s first produced. Certainly, in the realm of marketing, understanding and being proactive to shifts in consumer behavior requires real-time data analytics. Being proactive rather than reactive is the fine line that separates the successes and the failures in any industry.

Heron is born

To address Twitter’s real-time processing needs, Ramasamy and his team developed Heron, a “real time, distributed, and fault-tolerant-stream processing engine,” which has been in use since 2014.

Heron established a stable architecture for Twitter to achieve the following:

  • Provide backward interface compatibility to Apache Storm.
  • Extract, transform and load data in real time.
  • Disaggregate and classify data as it’s being created by Twitter users.
  • Quickly identify and take action regarding fraudulent Twitter accounts.
  • Improve the speed of real-time trending.
  • Perpetually and rapidly update machine learning models to match real-time data processing.
  • Near-immediate classification of the media morass flowing through users’ Twitter feeds.
  • Fast analysis of machine (server) functionality which predicts the probability of possible failures within the network and memory capacity.
Advertisement

Building a culture of data

Ramasamay emphasized the importance of a data-driven culture for reaping the benefits of a well-engineered, real-time, data-oriented system. Though self-service is a key component of providing a method for internal users to access the data, a centralized data team is essential for implementing a self-service framework, he said.

Ramasamay echoed the stance of LinkedIn’s Shrikanth Shankar on having specialized sub-groups of the data team. For example, a dedicated ETL (extract, transform,and load) team or person could focus on ensuring the usability of data throughout an organization. If a company is at a point in its big data initiative where it has hired data scientists, the ETL team could create an interface specifically designed to meet the data-pulling requirements unique to the data science objectives (deriving actionable insight using statistical and machine learning models).

Depending on the size of the enterprise, data scientists can be assigned to each department, such as marketing, production, and so forth. This lessens the delay between ETL and analysis, as everyone will have a streamlined process between the gathering of raw data and determining its utility (since not all data is actionable).

For additional information on Heron use cases, or to read extensive details regarding Twitter’s transition to Heron for its real-time processing, visit the Heron documentation resources at Github.

Related:
Lessons from LinkedIn: Faster Insights Through a Unified Data Ecosystem

More on Qubole’s 2017 Data Platforms Conference

thumbnail
Kat Campise

Kat Campise is a journalist and data scientist. She has a Ph.D in educational psychology from the University of Nevada-Las Vegas.

Recommended for you...

The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Beyond Procurement: Optimizing Productivity, Consumer Experience with a Holistic Tech Management Strategy
Rishi Kohli
Jan 3, 2026
Smart Governance in the Age of Self-Service BI: Striking the Right Balance
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.