Tinybird’s Alejandro (Alex) Martín Valledor talks about the challenges of building real-time products, technologies that can help, and the benefits retailers can realize by moving to real time.
Retail is one of the most competitive industries in the world. Always known for razor-thin margins, the retail industry experiences changing purchasing patterns, economic influences, and supply chain disruptions that make business success all the more challenging.
Increasingly, many are looking to real-time data and analytics as a way to be more responsive and personalize offerings. RTInsights recently sat down with Alejandro (Alex) Martín Valledor, product manager at Tinybird, to talk about the challenges of building real-time products, technologies that can help, and the benefits retailers can realize by moving to real time.
Here is a summary of our conversation.
RTInsights: Tell us a little bit about your experience building data and analytics products in the retail industry.
Alex: For the last six, almost seven years, I worked at a top 10 global retail company, and my focus was on building data products that people used within the organization. Whether they’re business analysts, marketing, or revenue operations, these people always need data to make their day-to-day decisions. My job was to provide them with quality, accurate, and timely data to make those decisions.
Of course, these decisions can be critical and costly, like when to ship goods from one continent to another to meet demand in a different market. So, having the right information at the right time is very, very important. The teams that we served were accustomed to using web analytics dashboards or other business intelligence dashboards like Google Analytics, but we were trying to offer a new way forward: that is, real time. What retailers really want and even need is real-time information to make decisions.
See also: Overcoming the Obstacles to Real-Time Analytics
RTInsights: What projects were you and are you focused on?
Alex: My role at first was very technical, actually building analytics products and data pipelines. I was prototyping these analytics products, like a full stack developer or some kind of profile like that. And then eventually, what we created was consolidated into a single suite of internal analytics products that needed to evolve to meet the demands across the business. Thousands of internal users relied on these dashboards and data products on a daily basis, so my role changed to take more ownership of this process. My responsibility was to understand what kind of data the business needed, then architect and build the data pipelines and analytics products that our internal analysts would use to make critical business decisions.
RTInsights: What unique challenges do retail companies face with data and analytics?
Alex: The biggest problem is data quality and consistency. In medium and large organizations, you have many information silos where different systems generate different data. If you want to build a useful data product on top of that information, you have to first consolidate it from the various silos into a single repository and then identify the sources of truth for every single entity within a domain.
Choosing the source of truth and consolidating data in a single place is a big challenge for any company in retail dealing with large amounts of data from various domains. You want to implement a company- or organization-wide data model. You want to build services and systems that reference the same data entities. If you want to make that cross-functional analysis work, you have to be able to draw relationships across all of these systems.
But once you manage to get your data together in, for instance, a cloud data warehouse as a single source of truth, you need to implement a data model so that anyone in the company with the requisite technical skills can read and analyze it and make use of it. Building this data model is a big challenge, and it’s something that never really stops.
When you have that model, and it’s useful and accurate, you really unlock the potential of different teams across the company – data analysts, data scientists, revenue teams, marketing, etc. – to make use of this information in a way that actually changes the performance of the business. Well-modeled data can deliver insights that can really improve the things that the C-suite really cares about: things like revenue, burn rate, customer acquisition costs, advertising spend, and things like that.
But the challenges don’t stop at data modeling.
Even if you build this data model, and it’s accurate and reliable, it often isn’t timely. You end up sending day-old or week-old insights to these teams making decisions, which, compared to 10 years ago, is quite a big improvement, but nowadays, we can do even better.
Now, we want to publish these metrics in an interactive fashion, and it has to be low latency. If your data scientists want to build long-running queries that could take up to 10 or 15 minutes on yesterday’s data, that’s no problem for them. But if you want to connect your website to your data infrastructure to build a personalized user experience, or if you want to have a real-time inventory console, or if you want the performance of a single-day flash sale on the day of the sale, then you need something better.
And this is the challenge: Publishing these complex metrics with low light latency and high concurrency on data that’s just seconds old. This is really, really hard, and it’s something I’ve seen a lot in my experience in retail. Everybody wants to be able to do this right now.
RTInsights: Why does everybody want to move to this real-time architecture? Why does it matter for large retail companies?
Alex: There is a lot of hype around real time, and honestly, some companies aren’t even at a point where they have a good data model, but they are still pushing for real time. The benefits of real time for data-driven teams at big retailers are huge.
Real time is a more efficient way to satisfy customer demand because you can adapt faster and better to their needs or what they are looking for. In the old or legacy model, you would display a static set of offerings on your website, and then probably after hours, a day, or even days or weeks, you change that. With that approach, you cannot react fast. And reacting in real time makes you much more efficient at satisfying demand.
For example, I’ve experienced first-hand use cases where you can better satisfy demand on your website to drive up sales. The items that are on the top of your website get most of the attention. If some of those items are out of stock, you’re missing a very important opportunity to satisfy customer demand. So being able to react in real time to remove products that are out of stock and show alternatives can have a serious impact, especially for really large retailers that have hundreds of thousands or millions of daily visitors to their eCommerce site.
Also, you can deliver a personalized user experience. So, if you just bought a t-shirt or some other product, the platform can immediately suggest something that goes with it based on actual data models. Real-time personalization is really valuable here, and it drops your customer acquisition costs because you have an engaged customer who has shown a willingness to buy, and you can put something that you know they will love – with very high confidence – in front of them. This is a very common pattern we see these days, but without real time those data models used to determine what is relevant to the customer are usually out-of-date or inaccurate.
For example, if you navigate to Shein or any other hot eRetailer that is drawing in lots of people these days, you’ll see that as you refresh the page, the items they show you are continuously changing. This is because they are adapting to consumers in real time. And this is something you can only get if your data platform, infrastructure, and culture are driving towards that goal of building data pipelines that support low latency and high concurrency on super-fresh data.
As a side note, when I was in retail, it was so obvious that consumers were always changing their minds. People don’t plan their purchases as much as you’d think they do. Maybe they are navigating through Instagram or TikTok or whatever, and they see something they want, and they just buy it. These new digital channels are growing a lot, and they work in very different ways. You have to adapt a lot more to the customers and give them many more of these high-speed, personalized experiences.
Another benefit I’ve seen is that these real-time architectures enable really powerful alerting or monitoring systems. Once you have your metrics and information in your company in real time, you can react faster to a drop in visits or sales. You can figure out the problem faster. Without real time, if you analyze your sales the next day, for example, your Monday sales on Tuesday, you will realize a problem hours later. But if you have your operational information in real time, you will realize you’re having an issue seconds or minutes after it happens. So you can take quick action and solve the problem fast.
Think about it for a massive retailer on Black Friday. There’s the potential to do tens of thousands of transactions every minute. If you have a DDoS attack or something wrong with your website, and you don’t notice it for 15 minutes, that could literally be hundreds of thousands of dollars lost. It’s a big deal for these retailers.
RTInsights: If real time is so great, what’s keeping eCommerce giants from effectively moving in that direction?
Valledor: There are different reasons. The first one, and perhaps one that we might underestimate, is that this is a big change, both in technical terms and also organizational cultural terms. It’s not just a matter of swapping out some tools or upgrading your servers.
Earlier I mentioned the process of consolidating and modeling data before you can move to real time because you need to have confidence in your data. Well, it takes even the most forward-thinking customers three, five, or even seven years to get there. You have entire data and engineering teams that now aren’t just responsible for if their service works properly but also if it provides accurate information. This is a big cultural change. It usually means a different kind of staff and a different approach to treating data.
Also, building your data platform the right way, and in a way that enables real time, is another huge challenge. It takes time, and you will make mistakes along the way. It’s a continuously evolving effort.
And finally, there is the issue of choosing the right technology. These days, we are seeing people trying to build things on top of the wrong deck. For example, cloud data warehouses are very popular these days. And the reason they are popular is that they were critical for that path on the data journey, to consolidate and model. So, these companies invested big in platforms like BigQuery, Redshift, and Snowflake. They enable company-wide data analytics, and it’s very cool. It’s that single source of truth these people want. However, they aren’t one size fits all, and one thing they don’t do well is real time. They are architecturally unfit to serve low-latency results on highly recurrent requests over very fresh data. They just can’t do it.
So these companies have to further shift their technology and culture away from, or at least in parallel to, the data warehouse, to accomplish these real-time use cases. And I think many companies make the mistake of trying to expose the information in the data warehouses directly, maybe by building a REST framework or something directly on top of the data warehouse. Now there are technologies, like publication or application services, that actually can enable low-latency, high-concurrency requests on the data warehouse, Tinybird, for example, is one of them. You can have your information from your company and publish it using these real-time pipelines or inputs.
In this way, you can keep a single source of truth and make it available for an interactive use case with high concurrency and very low latency response times.
But still, that service might be working on hours or days old data. There are a lot of technical reasons for this, but data warehouses simply become a bottleneck if you want to achieve the trio of low-latency, high-concurrency, and high-data freshness.
RTInsights: What is the ideal real-time analytics architecture for online retailers?
Alex: Of course, there is no perfect architecture, no silver bullet for this. But generally speaking, I think there is a pattern that usually works quite well: Instead of modeling your data in a data warehouse, you model it before it hits the warehouse, in a streaming architecture. This approach is a bit complex and challenging for some to accept, but it’s the only way to go if you want real time.
Basically, it consists of handling or modeling the information that your company produces as events. So the systems you build no longer have databases that are the single source of truth. Instead, they produce events. Other systems can consume those events in almost real-time via streaming platforms.
Then, of course, you can sink that data into your data warehouse to continue doing the long-running analytical queries for things like BI and AI/ML. But the benefit of this approach is now you can do real-time analytics with a platform like Tinybird, for example.
So, the same events go to the analytical repository like Snowflake or BigQuery, or whatever you’re using, and at the same time, the events go to something like Tinybird. This lets you build those ultra-low latency analytics services on data that has just been created, and you keep using your data warehouse for the use cases that don’t need that speed.
This is a big step forward because it means that from the moment the events are produced, you can start using them to influence the business. If a user buys something on the website, they can have a personalized offer or experience influenced by their actions in that same session. This is truly “future stuff” for eRetailers, and platforms like Tinybird are making it possible.
Learn more: The Data Journey: Unlocking data for the right now