Center for Stream Computing
Sponsored by

AI and Real-Time Data Deliver Big Hit for Music Streaming Service


Saavn, India’s biggest music streaming service, is an example of how companies can use big data technologies to maximize streaming analytics for big gains on multiple fronts.

It’s no accident that Saavn, India’s biggest music streaming service, has experienced daily average user growth of 9x in the last two year — leading to 500 million streams a month. They’re democratizing data to give more power to their data analysts, marketers and product managers to make the right decisions, without the previous headaches, using real-time data.

The system is called Sniper, and it’s the technological backbone that made possible the company’s shift to targeting users based on context.

“As soon as we moved towards contextual as well as personalized targeting, we saw at least a 3x improvement across the board, just in terms of CTRs,” says Sriranjan Manjunath, the CTO and head of engineering at Saavn.

That was a clear sign that Saavn needed to do more. And now, Sniper is a prime example of how companies of all types can use modern big data technologies to maximize streaming analytics for significant gains on multiple fronts.

Saavn and India’s complex music industry

Streaming music for Indian audiences is no easy task. There is a diverse pool of music across hundreds of genres, more than 800 music labels and 26 official languages to support. On top of that, Saavn wanted its program to work across the massive range of devices that are commonly used in India — everything from old “dumb” phones to the iPhone 8.

[Related: Streaming Analytics Basics: Kafka, Spark, and Cassandra ]

Collecting and making sense of their growing data was another challenge entirely. Scaling became an issue as its data silos filled up and became increasingly segregated.

“We started noticing there were too many repetitive queries being run by the analysts themselves,” Manjunath says. “A lot of product managers on the team want to use the same set of data, and they want the most up-to-date data, so they run the same queries over and over. It’s not the most efficient way to function.”

That was when Saavn started to build out the specifications for its new platform. They wanted to democratize the data, and enable more complex queries, such as “users who signed up but haven’t downloaded the app yet,” or “highly engaged women users who use a premium device.”

“Everyone in the company should have access to the same amount of data, should all be speaking the same language, and should not be waiting for insights,” Manjunath says.

Building out Sniper

The Sniper infrastructure relies on a few different processing areas that work in conjunction with each other. The real-time pipeline pulls data into Kafka, and then into clusters running Apache Storm, to which the company has added custom code that allows it to scale as needed. In the batch pipeline, logs are sent to an Amazon Elastic MapReduce (EMR) cluster running Hadoop, which normalizes the data, processes it, and sends it to Cassandra as well.

[ Related: How Under Armour manages IoT streaming data ]

Finally, the AI pipeline uses PySpark, which bridges Python and the Apache Spark API. There, they can start to use some basic prediction models to put even the least-understood user into one of a few distinct categories. Manjunath says, “We have written some pieces for gender prediction, age prediction, genre prediction and churn-risk prediction, but we also have others, and it’s very easily extensible.”

As a default, Saavn tries to collect as little data on their customers as possible, making these predictive models difficult, but once a user listens to roughly 20 songs, Manjunath says, the company can predict their gender with 85 percent accuracy, using a host of attributes, such as the beats per minute of their chosen songs, the artists, time of day and location.

Eventually, all this information makes its way into Cassandra. The company went through a few different schema designs before settling on the final version, which is a mapped event stream that partitions events by day, platform, and a hash of the device’s ID. This schema means sessions are easy to fetch, and even though Cassandra writes are cheap, relatively speaking, they’re minimized as much as possible. Cassandra feeds multiple serving layers, such as the push notification and email systems.

[ Related: 7 Ways Your Business Can Benefit From Streaming Analytics ]

And that’s important, because the company sends out some 30 million push notifications a day.

Making Sniper work for different teams

As noted, Saavn’s switch to contextual targeting has resulted in massive CTR improvements, particularly on push notifications. Before Sniper, Saavn’s marketing team might send the same push notification to all users based on the most popular song or artist at that particular moment. Now, the company can send something like, “Your September workout mix is here!” to a user who works out a lot, or has listened to multiple workout playlists in the morning.

The new system allows all users — whether they’re data analysts, marketing or product managers — to download the same sets of data, in real-time, for whatever purpose they might require. The querying software allows them to choose filters, combine them using logical operators, and narrow by time periods. The data is now being used to target recommendations, advertisements, and more.

And with millions of push notifications sent daily, even small gains in engagement can mean significant inroads in the Saavn’s bottom line. To learn more about their infrastructure, check out Manjunath’s talk.

Joel Hans

About Joel Hans

Joel Hans is a copywriter and technical content creator for open source, B2B, and SaaS companies at Commit Copy, bringing experience in infrastructure monitoring, time-series databases, blockchain, streaming analytics, and more. Find him on Twitter @joelhans.

Leave a Reply

Your email address will not be published. Required fields are marked *