Sponsored by Ahana and Intel

The Case for Ahana Cloud for Presto

PinIt

A discussion on why many companies are turning to a managed service like Ahana Cloud for Presto rather than deploying and managing Presto themselves.

The use of Presto, the open-source distributed SQL query engine for running interactive analytic queries against data sources, is on the rise. As with many open-source solutions, companies have different deployment options, ranging from do-it-yourself to running it on a cloud to using a fully managed service.

RTInsights recently sat down with Asif Kazi, Principal Solutions Engineer at Ahana, where we discussed the various Presto deployment options, their differences, and why many are turning to managed services like Ahana Cloud for Presto. Here is a summary of our conversation.

RTInsights: Why run Presto on a cloud?

Asif Kazi

Kazi: Flexibility is the biggest reason. The key drivers for the cloud are agility, elasticity, and cost savings. Additionally, when you have Presto running on a service such as Kubernetes, features like availability and performance come easily. You’re no longer taking care of the heavy lifting of operationally managing your environment—that means a lot to businesses. Another reason to run Presto on cloud is that many companies are moving to open data analytics today. Cloud enables you to do that easily.

RTInsights: What are some of the common issues when you run Presto on a cloud?

Kazi: There are three flavors of Presto on cloud today—either you are using a managed service provider, a serverless offering, or you’re doing it yourself. When you look at Presto, it was based originally on the Hadoop infrastructure. So, there are lots of configuration knobs and tweaks. Doing it yourself is not trivial, you need to manage everything from upgrades, configuration, tuning, and sizing. A managed service takes care of many of those things.

 RTInsights: What are some of the reasons that lead companies to Ahana Cloud for Presto?

Kazi: As I mentioned, there are three primary options customers can use to run Presto. Each of these options have some pros and cons.

1. Amazon Athena, a serverless Presto service

The primary reason why [Amazon] Athena customers come to Ahana is that they run into what we call the Athena wall. There’s limited configurability and visibility. You end up seeing queuing, there isn’t consistency in terms of performance. Customers end up coming to us because they’ve essentially reached the limitations of Athena. With Ahana, you can get the performance that you need at the scale that you need.

2. Amazon EMR Presto, a managed platform for running big data frameworks

With EMR, you’re running a big cluster, and then you’re often running it for a longer time than you traditionally need to. It turns out to be cost-prohibitive, not just from an infrastructure standpoint but also from an operational and personnel management standpoint because you’ve got to manage and operate the cluster. This works for larger businesses but not necessarily for small or medium-sized companies.

3. Do-it-yourself Presto

The last one, do-it-yourself, is probably the most expensive, both in terms of instance cost as well as personnel. It’s just not something practical for small to mid-sized teams. That’s where Ahana comes in. We provide a zero to Presto experience in literally 30 minutes. We are pre-configured with default optimized tunables for most common workloads. You can run the cluster at the scale and size that you want and at the price you want to pay. That gives customers the flexibility that they need.

RTInsights: What are the benefits of using Ahana Cloud for Presto?

Kazi: The main benefit of Ahana Cloud is that it is a fully managed PrestoDB service. We’re the first and only managed PrestoDB service that’s available out there. We can get a customer’s query infrastructure running quickly with an onboarding session guiding the customer through the process. And the important thing: it is pay as you go. This is the traditional cloud and AWS model—you’re only paying when you’re using the service and you’re only paying for what you use. That’s a great advantage. Also, Ahana is available through the marketplace. So, it’s at a click of a button via the AWS MarketPlace. You can subscribe to it, and it shows up on your AWS bill like any other service. You don’t have to go through a separate procurement system.

RTInsights: What’s the difference between AWS Athena and Ahana Cloud?

Kazi: Athena is also based on Presto. It’s Amazon’s version of it. The thing you need to remember about Athena is that it’s quite behind as far as the release is concerned. The engine version one is the 0.172 version of Presto. That’s really, really far behind. And then the latest version, which is essentially version two, is still on 0.217. The latest version of Presto is 0.253 right now. So, Athena is over 35 releases behind, which means that you’re not getting the performance optimizations, tuning, bug fixes, and more that are now available. We’re more up to speed in terms of the current Presto releases.

The other difference between AWS Athena and Ahana is AWS Athena is a fully serverless service, which means you have no ability in terms of controlling the infrastructure. Customers often complain about running into either query queuing or unreliable performance. You have no visibility into the errors or the troubleshooting that you need to do. It’ll just give you an error message with no details, often asking you to contact support or use the community forums. You’ve got little or no visibility into what to fix or how to fix it. That means that you’re scratching your head in terms of getting to the bottom of any problem. We’ve seen a lot of customers migrate away from Athena as they run into the Athena wall.

Another issue to consider in the Athena versus Ahana discussion is that Athena essentially charges on a per terabyte scanned. It’s $5 on a per terabyte scanned. Now, if a single large query is scanning four terabytes, you pay $20 in less than an hour. But on the same data with Ahana, when you look at instance use and cost, it is on the order of 10 to 25 cents, which is nothing compared to terabytes scanned. The difference becomes more pronounced and cost-prohibitive as you move higher up in the spectrum and are scanning more data with more concurrent queries. 

RTInsights: What interesting use cases are you seeing?

Kazi: The beauty of Presto is that it’s a SQL-on-anything engine, and it provides federated querying with SQL-on-anything. That means instead of having to move data, you can query data in place. The beauty of that is no ETL [extract, transform, load]. And if you look at it holistically from a customer standpoint, if you take the time to productivity of using a data source, sometimes those ETL pipelines run 30 minutes, one hour, two hours, three days. This is not counting the time and resources to build the ETL data pipeline, ETL in some cases impedes productivity. With Ahana, you’re completely cutting down on that cost and can simply query the data in place. And the other beauty of using Ahana is that it is ANSI SQL compliant, which means that people who are already used to using SQL can use it without dealing with learning something new or making changes to their underlying database.

RTInsights: Is there a particular size company that uses Ahana Cloud?

Kazi: We are seeing customers across the entire spectrum. There are startups with one or two data engineers. They don’t want to be managing that entire infrastructure, and they’ve already run into the limitations of Athena. They love us because it is quick and easy to get started. They don’t have to deal with the management complexities of DIY.

Then there are mid-market customers who are trying to manage the EMR Presto clusters themselves because Athena does not meet their needs. But again, it’s becoming cost-prohibitive as their size and scale grow, and they don’t want to be adding additional EMR clusters and managing them themselves. The Presto service that Ahana provides is a quick and easy solution that abstracts the complexities of managing the cluster in a cost-effective manner.

Enterprise customers, similar to mid-market turn to us because they need scale when they’re trying to run queries across terabytes and petabytes of data. Not having the flexibility of managing the backend infrastructure on a service such as Athena is a big problem for those customers. Ahana offers a compelling value proposition. They no longer need to manage those EMR Presto clusters, they get the speed and efficiency that Ahana provides at a fraction of the cost, without the pain.

RTInsights: Where does Intel come in?

Kazi: We are an Intel partner. We work with our customers and prospects through an Intel Accelerator Program (IAP). The goal is to drive the use of Intel optimized R and C class instances leveraging the latest generation Intel Xeon Processors for big data use cases so that customers can see the performance and value benefits using those instances. Customers are adopting this program to accelerate their “Path to Production.”  Customers evaluate their workloads on the Intel instance and see the benefits, and then hopefully, at the end of it, not just move to production with the Intel Instance but also talk about their experience and the value that they saw in leveraging those instances.

Salvatore Salamone

About Salvatore Salamone

Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Leave a Reply