The massive steaming service relies on cloud services for its computing and storage needs. Its challenge: keeping this extensive network operating at peak performance and fixing bottlenecks in real time,
Name of Organization: Netflix
Industry: Media and entertainment
Location: Los Gatos, Calif.
Opportunity or Challenge Encountered
Netfl, a leading internet television and movie network with more than 100 million members in more than 190 countries, is truly a cloud company. The service, which streams at least 125 million hours of shows and movies each day, uses cloud services for nearly all its computing and storage needs, including databases, analytics, recommendation engines, and video transcoding.
The challenge is being able to keep this extensive network – purported to consume close to half of the bandwidth of the internet – operating at peak performance, with the capability to prevent or rapidly fix bottlenecks in real time. Netflix uses Amazon Web Services (AWS) to support hundreds of functions that in total use more than 100,000 server instances, which “results in an extremely complex and dynamic networking environment where applications are constantly communicating inside AWS and across the Internet,” as documented in a case study.
[ Related: How Machine Learning Fuels Your Netflix Addiction ]
In particular, Netflix needed to identify performance-improvement opportunities, such as identifying apps that are communicating across regions and collocating them. The company would also be able to increase uptime by rapidly detecting and mitigating application downtime.
How Netflix Meet the Challenge
Netflix managers determined that it needed a new data source that provides greater insight into communication among applications and regions by combining virtual private cloud flow logs with application metadata. These logs record information about communications between IP addresses. Often, the company’s administrators only were able to see one side of these two-way communications.
The solution Netflix deployed — known internally as “Dredge” — centralizes flow logs using Amazon Kinesis Streams.” The application reads the data from Amazon Kinesis Streams in real time and enriches IP addresses with application metadata to provide a full picture of the networking environment,” the case study reports. In the process, the data is routed without first going into a database, as it was originally.
[ Related: Swimming Across Digital Channels in Real-Time ]
Netflix uses the OLAP querying functionality of Druid, an open source analytics engine, to quickly slice data into regions, availability zones, and time windows to visualize it and gain insight into how the network is behaving and performing.
Benefits From the Netflix Initiative
By employing Amazon Kinesis Streams, Netflix has been able to process billions of traffic flows, representing multiple terabytes of log data, on a daily basis. “Events show up in our analytics in seconds,” John Bennett, senior software engineer at Netflix, is quoted in the case study. “We can discover and respond to issues in real time, ensuring high availability and a great customer experience.”
With the Dredge implementation in place, Netflix administrators are now able to identify new ways to optimize applications, whether that means moving an application from one region to another or changing to a more appropriate network protocol for a specific type of traffic.