Why is Cloud Repatriation Happening?


More and more organizations who went all-in on cloud early are now finding that some analytics workloads are better on-premises and are pulling those workloads back.

There’s a popular notion that all analytics workloads are headed to the cloud, and if an organization’s workload isn’t in the cloud yet, they’re lagging behind. There are certainly some situations where moving analytics workloads to the cloud makes sense, but cloud isn’t the one right and only way despite the industry hype. A recent IDC survey showed a significant amount of organizations – more than half – are pulling some workloads back out of the cloud to on-premises.

This reversal of the trend has been dubbed “Cloud Repatriation.” Like salmon swimming upstream, any company pulling workloads off the cloud has to face a tremendous amount of resistance. So, why would any smart organization decide to do the exact opposite of what all the industry advice says is the right thing to do?

Regulatory Compliance

One obvious reason to pull workloads into a local data center is to comply with regulations like GDPR. New data protection regulations are still hitting companies fairly regularly. To stay in compliance often means pulling workloads back in house. Some organizations have been pulling workloads off the public cloud proactively to avoid any issues with privacy regulations coming in the future.

For example, when a new German privacy regulation said companies could not allow data about German citizens to leave the country, then any data analysis relating to German data had to be pulled off public clouds to local data centers.

Risk when scaling cloud

When a company needs to scale up significantly, they often have enough warning that they can plan ahead, test, experiment, do what is needed to get data centers hardened and ready for the coming expansion. When it finally happens, things generally go smoothly because the IT team has already tested and prepared for every contingency they could.

In public clouds, someone outside the organization handles scaling, and it happens all at once. There’s no way to plan or test or otherwise know that there won’t be any infrastructure scaling failures when the higher workload hits. Cloud scaling is essentially a black box, handled entirely by a third party. And it’s up to that third party, who has no stake in your company’s success, to troubleshoot any problems – after the larger workload is already supposed to be running. Scaling on a public cloud means handing over the specifications and hoping nothing goes wrong. Hope is not a good IT strategy.

Troubleshooting and optimization

Problems don’t just occur when scaling. Certain complex queries needed to power reports may run slower than expected. Organizations may need to optimize resource allocation for different teams. DevOps teams need control of IT systems in order to do their jobs, but cloud service providers tend to abstract a lot of that. There may be problems they need to trace back to the origin and fix. With foundational services hidden, IT teams are left with the same old complaints but no way to address them.

Hidden costs of cloud

People think moving to the cloud is going to save them a ton of money, but that is not the way things pan out. The advantage of cloud managed services is more about ease of use, speed to deployment, easy administration – not cost savings. Cloud costs are far higher than most organizations expected when they made the leap. Organizations who moved to the public cloud thinking it would be cheaper because they’re not directly paying for hardware are likely to be disappointed.

“Every three months, we could re-purchase the hardware.”

Boaz ben-Yaacov, CEO, Catch Media Inc.

Cloud analytics workload costs often vary wildly and unpredictably from month-to-month. It is not an easy thing to explain to a CFO why your department needs six times the budget this month over last month. Not only do costs fluctuate on cloud analytics platforms, but they often do so “automatically.” Some cloud analytic services tout compute auto-scaling as a feature while charging by how much compute you use. This means that without the knowledge or control, the costs are automatically scaled upward. Money seems to just vanish, and sometimes organizations don’t know why, until, or even after, receiving a big bill. Getting control of the quarterly budget back is one of the most frequent drivers to pulling workloads back from the cloud.

“Every startup I’ve worked with has stepped in at least one cost-related bear trap that resulted in a huge spike in monthly spend. And once you’re in the cloud ecosystem, you can’t get out again; your only option is to drop everything and fix it.’”

Alex Rassmussen, Data Systems Consultant, Bits on Disk


Ironically, cloud lock-in, the very thing designed to keep people from pulling workloads out once they’re in, is a big reason that businesses pull workloads back. People who bought IBM Netezza appliances back in the day thought they were getting an efficient way to do analytics. Then they found out that to use this analytics software, they not only had to use IBM hardware but it also only connected well to other IBM services. People found themselves locked into an ecosystem that was hard to escape. Modern public clouds offer analytic databases that only work on that company’s platform and only use their services, leaving many to get a feeling of déjà vu.

The insult to injury bit is “egress fees.” Data and data workloads need to move and flow and change with the times and requirements. Egress fees don’t do anything but put a big roadblock in the way of data engineers, and they clearly don’t benefit anyone but the vendor trying to lock the door once their customer is stuck in their system. Making a decision now to free data analytics from monolithic systems provides a lot more freedom to move and re-configure as needed in the future.

Changing trends

What this all adds up to is this: the way most companies do analytics is changing. The most common data analytics implementation pattern is neither on-prem nor cloud. Now, it’s a hybrid blend of both. More and more organizations who went all-in on cloud early are now finding that some analytics workloads are better on-prem and are pulling those workloads back. Cloud repatriation is no longer an isolated or unusual thing for a company to do – many other companies are swimming upstream right alongside them.

Paige Roberts

About Paige Roberts

Paige Roberts is Open Source Relations Manager at Vertica, a Micro Focus company. In 23 years in the data management industry, Paige Roberts has worked as an engineer, a trainer, a support technician, a technical writer, a marketer, a product manager, and a consultant. She has built data engineering pipelines and architectures, documented and tested open-source analytics implementations, spun up Hadoop clusters, picked the brains of stars in data analytics and engineering, worked with a lot of different industries, and questioned a lot of assumptions. Now, she promotes understanding of Vertica, distributed data processing, open-source, high scale data engineering, and how the analytics revolution is changing the world.

Leave a Reply

Your email address will not be published. Required fields are marked *