SHARE
Facebook X Pinterest WhatsApp

Where to Train Your AI Dragon

thumbnail
Where to Train Your AI Dragon

Robot Hand turns wooden cube and puts the letters AI (artificial intelligence) to the word training.

Selecting the right service provider and tuning your AI workloads can dramatically reduce associated training and inference costs. The exercise improves time-to-market and competitiveness while lowering expense and the environmental impact of this new technology.

Written By
thumbnail
Liran Zvibel
Liran Zvibel
Jun 18, 2024

AI is coming for your business. If you’re not already leveraging generative AI to drive operational efficiencies and boost worker productivity, you likely will be soon – according to McKinsey, GenAI is expected to add up to $7.9 trillion annually to the global economy.

Although AI promises to deliver a significant return on investment in the form of increased operational efficiencies, enhanced worker creativity and productivity, and the creation of new value streams, the associated costs to train and implement it are high, both financially and in terms of its environmental impact. To plan and budget for new AI projects, you’ll want to know a few things about where these projects will run.

See also: Groups Focus on Infrastructure for AI and High-Performance Workloads

Not Your Average Cloud Load

The world has a lot of data center capacity, but availability for new projects is declining. In the U.S., for example, the Silicon Valley region has a data center vacancy rate of under 3%.

Traditional public cloud services configured for application hosting can be ill-suited for AI workloads, which require exceptionally high data and processing power and performance. A standard enterprise data center network architecture is also unlikely to have enough network performance to keep the high-powered GPUs that run AI loads operating at capacity, and paying for idle GPUs can be very expensive, not to mention the waste heat they generate.

While traditional hyperscale cloud providers are quickly gearing up to support AI workloads, other options exist. Many data processing businesses were created over the past several years to support similar GPU-intensive loads. Notably, many former crypto mining companies have begun shifting their GPU resources to create cloud services specifically designed to support AI workloads.

These GPU clouds are different from public hyperscaler cloud data centers in a few key ways that make them well-suited for AI projects: Architecturally, they resemble high-performance computing (HPC) clusters in that the projects they take on often take over the entire facility while running. They utilize next-generation GPU compute hardware, not general-purpose CPUs used for transaction processing. They also typically leverage modern data pipeline-oriented storage architectures to support their performance and scale requirements, and their jobs run “closer to the iron” without going through virtual machines or containers.

Many GPU farms set up for the cryptocurrency business were also built with financial efficiency in mind, so they frequently leverage advanced data center cooling technologies, run entirely on renewable energy sources, or have been built in proximity to renewable energy sources like hydroelectric generators.

Advertisement

Pay Per GPU, By the Hour

For workloads on GPU cloud installations, you often pay by the number of GPUs and the time you need them, not by actual workload, so it pays to optimize.
In the case of Atomwise, a pharmaceutical research company using AI for drug discovery and one of our customers, an AI experiment could take several months to run on its GPU-based data pipeline, which regularly needed to ingest petabytes of unstructured data and access tens of millions of files. Atomwise also needed to complete multiple I/O steps to train its model: import the data, clean it, generate descriptors, and package the data. The different requirements of each stage created storage silos and a performance bottleneck; one training cycle could take up to four days to run. By adding software optimization that centralized the data for the entire pipeline and managed the transfer of data from storage to the GPU servers, it was possible to reduce model training time by 20x, so projects that used to take up to three months can now be completed in less than a week.
For Atomwise, using optimization meant that processing that used to take a year could be completed in just 12 days. Not only did this improve the company’s competitive position in advancing research for fields like oncology and rare diseases, but it saved a substantial amount of money in compute expenses.

Optimizing data architectures to optimize GPUs and AI workloads is a developing science. Benchmarks do exist, but different types of AI workloads put varying levels of strain on data infrastructure. For example, research shows that training generative AI language models generates much smaller input/output operations than training models for imaging. However, both modalities put extreme stress on storage and data transfer capabilities.

Although hyperscale cloud vendors have deep resources and can afford to invest in new technologies, some customers report that they can’t always deliver the high-touch attention required to get their services matched to these new workloads. In addition, they are generally staffed for time-sliced and managed processes to begin with, which are different from AI loads.

On the other hand, the service providers with racks of GPUs and HPC gear typically sit on thousands of expensive and sought-after processors that are well-suited for AI workloads powered by nearby clean energy resources. They may also be generally newer to the enterprise hosting business and will work harder to provide more custom and dedicated resources.

Advertisement

Staging the Work

Business tech and research teams are still learning how to spec and acquire services for their new AI workloads. However, one thing that is universal for these new jobs is the need to focus on data storage and the movement of data into processing systems. It is typically those two legs (storage and networking) of the data center triad that have an outsize influence on the cost and efficiency of a project. We find the third leg, compute, is generally at the mercy of the other two.

We are in the early days of learning how to evaluate service providers on their AI workload capabilities. Technology executives must evaluate all options, including and outside of their traditional data center partners, and discuss their needs with the providers they are considering. Workloads should be trialed on candidate systems, and business teams should work on correlating (and adapting) benchmarks, such as MLPerf, if applicable, with their needs.

Selecting the right service provider and tuning your AI workloads can dramatically reduce associated training and inference costs. The exercise improves time-to-market and competitiveness while lowering expense and the environmental impact of this new technology.

thumbnail
Liran Zvibel

Liran Zvibel co-founded WEKA, the AI-native data platform company, in 2013. Today, he guides the company’s long-term vision and strategy as its Chief Executive Officer. He earned a Bachelor of Science degree in Mathematics and Computer Science from Tel Aviv University.

Recommended for you...

AI Agents Need Keys to Your Kingdom
The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer
Digital Twins in 2026: From Digital Replicas to Intelligent, AI-Driven Systems

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.