To realize maximum value from today’s massive data volumes, enterprises must consider a new
generation of GPU-based data processing technologies.
Continued digitalization and increasing connectivity have created an explosion of big data—one that most legacy systems cannot handle. Although they are the backbone of the data pipeline for most enterprises, these systems are unable to support modern business needs. Reports take too long to be relevant in today’s fast-paced market, and some complex queries can’t be completed—leaving critical business questions unanswered.
To better understand the effect of massive data volumes on enterprises and how a GPU technology can enable positive business outcomes, RTInsights sat down with Ami Gal, CEO and cofounder of SQream. Here is a summary of our conversation.
RTInsights: Most IT people and even business users are familiar with the concept of big data and the challenges it creates for enterprises. Have we entered a new phase, an era of massive data?
Gal: Yes, definitely, we are in an unprecedented era of unlimited data growth. The number of tools for sharing data is exploding, everyone with a device is a data creator and consumer, and this is before we even consider the data aggregated by businesses and other organizations. 5G has begun and this will take off even more.
Big data is becoming a thing of the past. What we are now seeing is massive data on a scale that is hard to imagine—we are expected to reach 35 zettabytes this year, and 175 zettabytes by 2025. The most amazing thing about this is that estimates say 90 percent of our current data was created in the last two years. So as you can see, big data is really not what we are talking about anymore. It is literally massive.
RTInsights: What are the implications of the massive data era? What happens when companies try to process these massive data volumes using legacy technology?
Gal: Massive data requires a change of mindset. We need to look differently at how we are processing our data. Data sets are growing exponentially, yet many organizations are still using legacy systems, and even a lot of the newer technologies are not able to scale to the magnitude organizations require in order to truly provide them with the full value of their data assets. Data preparation is long and arduous, queries run for hours or even days on limited dimensions, and sometimes don’t even run all the way through. And don’t even think about trying to run complex JOINs.
All this means that organizations using technologies that are limited in their capabilities are essentially compromising, accessing, and analyzing only a portion of their data. Think of the potential insights and BI they are missing out on—and it is there, sitting right under their noses.
RTInsights: How do these technology inadequacies affect the business?
Gal: Here’s one example: Many organizations are highly invested in utilizing AI to develop, test and optimize algorithms that make their processes faster and more accurate. As they attempt to optimize the algorithms, they bring in an increasing number of data sources. Imagine that they are running these algorithms on millions of records, training them, updating them every few minutes, on a daily basis. Taking a query process like this down from days to hours, or hours to minutes, can make a huge difference in the way an organization does business.
Another area for example is risk management, where a small change can have a huge impact on the organization. Or in ad-tech and bid optimization, we see with our customer PubMatic the huge difference that was made to their business because they were able to analyze a lot more of their data in much shorter timeframes.
RTInsights: What should IT and business leaders do to address the challenges of the massive data era?
Gal: The first step in addressing the challenge of massive data lies in understanding that it is so much more than big data. Relying on legacy technology and systems to access and analyze this data is not feasible for an organization that wants to keep up, and surely not for an organization that wants to stay ahead of the competition. Providing effective customer service, optimizing networks, enabling competitive pricing, all of these depend on the ability to prepare, analyze, and respond quickly, using their massive data stores as a basis for achieving valuable critical insights.
GPU technology works together with CPU to give organizations powerful parallel processing capabilities of thousands of cores per processor. With a product like SQream, they can easily integrate these capabilities with their existing infrastructure and ecosystems. This also means that whether they are on the cloud, or on-premise, they can ingest, process, and analyze significantly more data much more rapidly than with a CPU-only configuration, and with the support of multiple applications and frameworks. And most importantly, they can easily scale as their data grows, at a fraction of the cost.
RTInsights: How does this technology support the enterprise and enable positive business outcomes?
Gal: Utilizing GPU parallel processing to effectively implement massive data analytics means the entire process is faster and easier, and covers much more data—from data preparation, to integration, to analysis. The organization can even run queries on raw data. They can do it faster, on multiple dimensions, on petabytes to terabytes of data, even implementing complex joins in the analytics process—all while taking things down significantly in time.
Because SQream is built on disk-based architecture (which means that we read off-disk as opposed to in-memory), we can do a lot more with a lot less effort, even on massive workloads. We use a number of methods together with our GPU technology to enable these processes. Encoding and compression are two main ones, both which help us speed up processes and save on disk space. With encoding, we take the customer data and convert it into a common format, to increase performance and decrease data size. With compression, we transform the data into a smaller format without sacrificing accuracy.
Another method we use is chunking, in which data is stored vertically by column and horizontally by chunks. This allows us to selectively filter and delete data from large tables which can contain millions and even billions of chunks. We’ve also created a transparent method that allows full data skipping across tables and columns, which means we can speed up the query process even more dramatically.
Finally, because SQream utilizes SQL, it is easy to use—there is no need to learn a new language. All in all, our proprietary GPU disk-based solution allows us to provide organizations with a much more powerful technology, one that can scale with them, and enables them to effectively analyze all their growing data stores—no matter how big they become.