Sponsored by Tinybird

Overcoming the Obstacles to Real-Time Analytics

PinIt

A discussion about what real time really means, the challenges companies face when trying to do real-time analytics, and the benefits that can be realized when those challenges are overcome.

Real time is garnering tremendous attention these days across many industries and for many use cases. Unfortunately, most businesses encounter problems due to the way their infrastructure was set up years ago when batch was the name of the game. Data had to be captured, stored, and then analyzed before any insights could be published.

Such approaches are hard to scale and have latency issues that make them impractical for real-time applications. RTInsights recently sat down with Jorge Gomez Sancha, Co-Founder and CEO of Tinybird, to talk about what real time really means, the challenges companies face when trying to do real-time analytics and the benefits that can be realized when those challenges are overcome.

Here is a recap of our conversation.

RTInsights: Real time has many meanings in different contexts. Can you explain what real time really means? And what does it mean to Tinybird?

Sancha: When it comes to analytics, real time is about the ability to act on data immediately after that data has been generated. That is different from how data is normally used. Typically, data is captured and stored somewhere. Then there are some processes that run over it with technologies like Airflow, dbt, or something like that, to store it in a model in a data warehouse. If you want to do something with it, you still need to run some other process that will actually publish that data in a consumable format and put it in a place where you can scale whatever you’re trying to build. This means that there’s a lot of complex infrastructure that you need to maintain. But also, you can’t act on the data as soon as it is generated because there are many handoffs and delays in that infrastructure.

Learn more: The Data Journey: Unlocking data for the right now

When we talk about real time, we mean two things: high-frequency ingestion, where you can ingest data very, very quickly, and low-latency consumption. Once you’ve ingested that data, you need to be able to query it with sub-second latency so that you can act on it while it’s fresh. But you also need to be able to scale it. The longer the queries run, the harder it is to scale because you need more CPUs and more infrastructure.

So, the only way to scale real-time use cases is by having very low latency. That serves two purposes. One is acting on data as it is generated. And the second is that it reduces the resources you need to run the queries.

RTInsights: In what industries have you seen the most uptake in applying real-time analytics?

Sancha: We see real time used in many different contexts. One of the most successful applications has been in banking and trading, including automatic trading and high-frequency trading. That is one of the biggest success stories for real time in the sense that it changed the industry completely and it changed how investments were made.

We believe that real time can have the same effect on nearly any industry, and it is now being applied across so many different fields. More recently, we’ve seen it take off in retail eCommerce. A lot of retail companies are trying to leverage real time more and more because the competition is so fierce. That’s number one. Being able to react faster to what’s going on is a competitive advantage. And it is becoming even more so. We always say that speed compounds. If you’re reacting faster than your competition on a regular basis, over time, you will be much, much better because all those small advantages add up.

In retail, that’s especially true because of the competition. Price pressure is always there, and prices are coming down. There are many shops selling the same things. So, being able to recognize faster than your competitor that “Hey, this campaign is not working. Let’s try something else” is a competitive differentiator. And being able to see immediately what’s going on with the changes you’ve made leads to better results, faster iteration, better decisions, faster feedback loop, and ultimately better conversion rates, which is what eCommerce stores really care about.

Another use case for real time can be found in Travel tech and eCommerce. The use case there is for personalized recommendations and customization of the user experience when booking travel. If a prospective traveler is looking at different websites to book a destination or an experience, a Travel company can use that information, plus what similar buyers are looking for, to figure out the buyer’s intent. Some booking companies have had this for a while. For example, there is a feature on Booking.com: when you are looking at a destination, you are told, “Three other people are looking at this.” Traditionally, that has not been easy to build, and you need to be someone like Booking.com to make it possible. Now, real time enables you to do that, and companies like Tinybird enable you to do that very, very easily. That has a very strong impact on the user experience. Those kinds of things can be done now with analytics.

But that’s just a simple use case. Now we see, based on a traveler’s traffic patterns during a session and previous sessions, without knowing who they are, a booking company using real time can already tell that they’re interested in a certain thing. So, they’re going to put it at the top of the traveler’s list. The traveler finds it immediately, and it matches their intent. The booking company needs to be able to make those decisions in a split second when a customer comes back to the website. If the application can make a query and tell you, a traveler, that this is what you’re coming here for, here it is, that’s very powerful. And it leads to a better user experience and better conversion rates.

Other places where real time is really, really important are in security and infrastructure, where you’re looking for patterns and anomalies to detect in real time what’s going on and where you might have security or performance issues. Recently, machine learning has been used to solve some of these problems. But so many use cases that are built with machine learning can be solved with a simple SQL query if you can make it work in real time. The reason that machine learning works is because once you build a model, you only need to use one event to determine what that event is, where it fits, and how it looks, and the machine learning can determine where that event should fall. But building that and getting to that point is also expensive and hard to do. But with real-time analytics, you can use the data that you have of everybody visiting the website right now and figure out what’s going on. You can apply that to security. You can apply that to fraud detection.

You can apply a similar idea to app tech and gaming, as well. It’s a really interesting area, especially the mixture of gaming and app tech. In mobile games, the game developers are making split-second decisions about what to offer you and where they think that you might be interested in actually spending some money. It might be buying a crate of gems for some game or something like that. But to be effective, they need to be able to base the decision on all of the events that they are receiving. In that way, they can determine whether this user looks like he or she needs this particular campaign. They can determine if you’re likely to spend in the game or if you should be served ads as a monetization strategy. That’s something that is really hard to do with traditional data warehousing technology and where real-time technology can help solve problems.

RTInsights: What challenges do businesses face when they try to move to real-time architectures?

Sancha: There are so many. It’s a structural challenge nowadays, centered on the “modern data stack.” Over the last seven to 10 years, there’s been this concept of the “modern data stack.” If you look at the modern data stack, it has a lot of components from different vendors or open source projects for each different thing, including capturing data, storing raw data, ETL technology, data warehousing technology, and analytics technology. I think that approach has become a blueprint for a lot of companies to say, “Okay, this is the way it is done.” The problem is that that blueprint was made when data warehousing and batch were the way to do things. It’s centered around the data warehouse. And that approach makes a lot of sense for batch. “Hey, I have data coming in, and it’s okay if I process it every hour and I do something with that every hour.”

But that stack – centered around the data warehouse – is not designed for real-time use cases. You know the expression, “When all you have is a hammer, everything looks like a nail.” That’s exactly what happens. We see companies banging their heads against the wall trying to build real-time applications and systems with data warehousing technology such as Snowflake, BigQuery, Redshift, or others. They are just not designed for real-time because the latency is not low enough to provide a good user experience at scale. They’re pretty fast in terms of BI. They can do three seconds, four seconds over huge amounts of data, which is great, but that’s not fast enough for real time. And they’re not designed for high concurrency. They’re not built to do thousands or more queries per second. They’re designed to do maybe 100 tops, but that’s it. Even 100 can be prohibitively expensive with any of this data warehousing technology. Can you imagine a business strategy built on real-time apps that could only support 100 concurrent users? It just doesn’t work. And scaling the data warehouse up is just not a good way to do it. It’s not cost-effective.

This is a big challenge, and it’s interesting to understand the journey that companies go through. As companies grow and they become more mature in their development, they start creating different data silos either because of how they’re structured or because they’re moving to microservices that solve different problems. All those microservices start generating data in different places, and that’s when they apply the modern data stack. The first order of business is, “Okay. We need to start capturing this in a particular way. We need to have a single source of analytical truth where people can make queries.” That helps solve the BI use case for the executives. And that’s when you get into data modeling, and you start hiring data engineers and people that will help drive order and structure to all of that data so the business can take advantage of it and feel good about its veracity. Honestly, many companies are still stuck here.

But then they do get to a certain point where the issue is how to actually do something useful with all of that structured and modeled data. This is what I call the “post-Snowflake problem.” What are these companies going to do when Snowflake, or whatever data warehouse they use, can’t solve their next data problem? It doesn’t matter how much they invest in trying to take advantage of the data warehouse for all of their use cases. It’s just not designed to tackle all of these new ones. It’s designed to tackle BI, and it’s designed to tackle maybe data science and exploration and so on. But not application building, not automation, because those things require very low latency, and you need to be able to do it as soon as the data is available. So that last part, getting all of that data, all of that understanding you’ve created around the data, and exploiting it at scale with very low latency, that’s the moment where they hit a wall and where companies like Tinybird can help.

RTInsights: How does Tinybird help companies do real time?

Sancha: We wanted to work with large amounts of data in the same way we have worked with small amounts of data, which is using SQL to query it and building APIs to publish it. We wanted to focus just on helping developers solve problems and removing all the concerns about infrastructure, ETL, technology components, and plumbing in general.

Tinybird accelerates and simplifies building applications on top of large amounts of data. You can work in real time with all of your streaming data. You can do joins with data that you may have in the data warehouse. You can then expose APIs that are low latency, and you can do that at scale without worrying about infrastructure and just using SQL – which most developers know because it’s such a simple and intuitive language.

Learn more: The Data Journey: Unlocking data for the right now

We help companies, developers, and data engineers accelerate their data by letting them take advantage of it faster and at scale. We also accelerate their development, go to market, and speed to market, which is how quickly they can build things on top of that data. That’s how Tinybird fits in here, which is we will give you really great, dev-friendly tools to ingest data, transform it, and exploit it just using SQL and APIs.

RTInsights: Can you share some examples or customer use cases?

Sancha: Of course. We work with companies at different stages. For instance, one of the biggest fast fashion retailers in the world is a Tinybird customer. These guys are super focused on analytics, and they’ve built a bunch of things in-house. They were using a big SaaS analytics tool, and they weren’t capable of understanding in real time what was going on with their business across the world. They had a lag of around 45 minutes to an hour. So they started building an application with Tinybird that could help them report in real time, across every country, on things like orders by revenues or by applying currencies from exchanges from different countries, applying different time zones, applying everything. That’s how they started. That drastically changed how they were working because all of the teams started to use the product they built with Tinybird for day-to-day decisions. It wasn’t just technical people using it. The actual people designing the clothes wanted to understand, “Hey, we just launched this product. How is it selling? Where is it selling more? Is it in Asia? Is it in Europe? Where is it working? Should we continue this for next year?” And the marketers wanted to know which campaigns were working so they could tweak them as soon as something wasn’t working. Given their size, real time can have million-dollar impacts on these kinds of decisions.

That was the first use case, but then they started using it for alerts, like detecting whether if suddenly, during Black Friday, something like sales are not going really fast. That is an indication that something upstream is broken. Maybe the eCommerce site is down or something like that. At their scale, even an hour of downtime could mean millions of dollars lost because Black Friday involves crazy volumes. They’re also doing stock management and personalization recommendations. They started doing more and more things because real time is very sticky. Once you start working in real time, it enables you to work in different ways and to automate and think about how to operate the business in a different way.

We also work with startups, companies like Vercel, for instance. Vercel is a platform or a service that enables you to deploy applications in production at scale. It is very similar to what we do, but for code. They started using Tinybird for user-facing analytics so that every Vercel user could understand how their applications are being used and whether performance is good or not. That is built over Tinybird in real time. But then, once the data was there, they started adding more data and more use cases, from usage-based billing to additional metrics and APM and things like that. They’re adding so many new real-time use cases to their platform. All of that’s built with Tinybird.

We’re also in FinTech. We help other companies in the crypto world, trying to get better analytics for investors in real time. The blockchain is famously slow in terms of being able to query large amounts of data, so there are companies out there like KIRO or Vanguard Finance that want to extract that information and make it available in real time for investors to exploit, or as market makers, to buy assets and have bots in the market doing different things.

We’re in travel as well. A good example is The Hotels Network, which makes personalized recommendations over thousands of hotel reservation websites. They tell their customers that they turn “lookers into bookers,” which is exactly what I mentioned earlier. They use Tinybird so that their customers can put the best offer in front of a prospective traveler, which really helps these hoteliers improve their conversion rates.

As you see, the types of companies that we’ve been working with are very varied. But they all have one thing in common: They want to move fast, and they want to disrupt. They want to do things that others aren’t doing because others don’t have the technology for it. That will change as more people latch on to real time, but right now, it is still a huge competitive advantage.

Learn more: The Data Journey: Unlocking data for the right now

Salvatore Salamone

About Salvatore Salamone

Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Leave a Reply

Your email address will not be published. Required fields are marked *