SHARE

AI Is Wasting Energy and the Bill Is Due

Washington and Big Tech signed a pledge but missed an opportunity to lower the tab.

Written By

Apr 10, 2026

5 minute read

Virginia families saw their electricity bills rise $11 a month this January. The first Dominion Energy rate increase since 1992. The average D.C.-area household is absorbing roughly $21 extra per month. The cause is not a mystery. It is the data center cluster down the road.

Washington noticed. In February, Peter Navarro said on Fox News that AI companies “need to pay for all, all of the costs.” By March, major hyperscalers had signed a Ratepayer Protection Pledge, committing to cover the full cost of the grid infrastructure they consume. A meaningful step.

Energy is no longer a hidden issue. It shows up in everyday utility bills and public policy. Proving this is not only a technical issue but an economic one.

But shifting who pays does not change how much gets consumed. The IEA projects global data center electricity consumption will more than double by 2030. Reassigning the bill does not shrink it. And the question of how to actually reduce it is getting almost none of the attention.

The scale of investment flowing into AI infrastructure is remarkable. More chips. Bigger data centers. More gigawatts. That buildout is necessary. What is striking is how little attention and capital is flowing toward the other side of the equation: whether the AI systems consuming all that power are being run efficiently in the first place. AI workloads currently run at 30-50% GPU utilization, meaning almost half of the infrastructure already built has remained unused. This reflects a system design issue. GPU utilization, model routing, inference architecture, and system-level efficiency are not secondary concerns. They are an enormous opportunity.

The Efficiency Illusion

The industry’s rebuttal is efficiency. Google claims the energy per Gemini prompt fell 33-fold in 12 months. Real gains. Entirely beside the point.

This is Jevons Paradox. In the 1860s, more fuel-efficient steam engines did not reduce coal consumption. They made coal-powered machinery economical in markets that previously couldn’t afford it, and consumption exploded. Cheaper AI queries do not reduce grid demand. They expand the number of use cases that can run AI, which expands query volume, which expands hardware requirements, which expands the power draw. A 33-fold efficiency gain means nothing for the grid if query volume grows 100-fold in response, which, by every available measure, it is.

But there’s a second problem the industry almost never discusses: most AI deployments do not need the model they are running.

A significant share of enterprise workloads are routed to the most powerful available model by default, not by necessity. Frontier models with hundreds of billions of parameters are being invoked to handle tasks that a well-tuned model a fraction of the size would handle just as accurately, at a fraction of the cost and a fraction of the power draw. It is like a light switch that turns on every room in the house when you just need to go to the bathroom.

This is the current state of AI deployment across enterprises. Two-thirds have not yet scaled AI across their enterprise. Systems are being built from the start and are being built to be inefficient, mainly because workflows are not redesigned around systems.

The Shift

GPU utilization at most enterprise deployments runs well below theoretical capacity. Inference workloads are routed to oversized models by default. Multi-model architectures duplicate compute across redundant calls. The efficiency gap at the system level is enormous, and it remains largely unaddressed.

However, an encouraging shift is starting to happen. The most sophisticated companies are moving away from monolithic frontier models and toward intelligent orchestration across smaller, task-specific ones.

AT&T’s Chief Data Officer Andy Markus says, “I believe the future of agentic AI is many, many, many small language models” (VentureBeat). AT&T was processing 8 billion tokens a day through large models and hit a wall. Not technically, but economically. They rebuilt their AI orchestration layer around a multi-agent architecture where large models coordinate smaller, task-specific ones. This allowed them to breakdown workflows and transfer tasks based on complexity, breaking requests into smaller steps, and routing each step independently. The result: 90% cost reduction in AI spend while actually scaling throughput, processing more than 27 billion tokens a day across more than 100,000 employees.

This is not an isolated case. NVIDIA Research published a position paper titled “Small Language Models are the Future of Agentic AI,” making the case that SLMs are sufficiently powerful, operationally more suitable, and necessarily more economical for the majority of what enterprise agents actually do. Gartner projects that by 2027, organizations will use task-specific SLMs three times more than LLMs.

The logic is straightforward. Most agent tasks are narrow and repetitive. They do not require a generalist with encyclopedic world knowledge. They require a specialist that executes reliably and cheaply. This is changing how enterprises deploy AI, moving from a one-model approach to systems that determine the best course of action by incorporating factors such as accuracy and cost. Routing those tasks to frontier models is the AI equivalent of hiring a neurosurgeon to file paperwork.

What’s Next

The infrastructure buildout will continue, and it should. The Ratepayer Protection Pledge will ensure hyperscalers absorb more of its cost, and that matters. But neither development addresses the efficiency gap, because the efficiency gap is not a supply problem. It is a system design problem.

Practitioners working on inference-time optimization are already documenting 10x to 100x efficiency gains in production through smarter routing and right-sized model selection. This comes from eliminating redundant compute. These are not lab results. They are production outcomes, achievable today, on existing hardware. Capturing even a fraction of that would do more to reduce grid demand than every renewable procurement pledge Big Tech has announced.

The industry spent the last three years racing to scale. The next race is using what we built more efficiently.

CC

Calvin Cooper

Calvin Cooper is an AI Industrialist focused on the intersection of technology, capital markets, and the physical economy. He is Co-Founder and COO of NeuroMetric AI, building intelligent orchestration systems and task-specific small language models for faster, cheaper, and more accurate AI performance. He is a Director at Pilot Wave Holdings, a private equity firm acquiring and transforming manufacturing, infrastructure, and essential services businesses through applied intelligence. Previously, Cooper was a venture capitalist at NCT Ventures and founded Rhove, which he took from inception through acquisition and a Nasdaq direct listing (AIRE). He serves as an Advisor at the Milken Institute, researching private capital market policy to meet the AI industrial challenge.

AI Is Wasting Energy and the Bill Is Due

The Efficiency Illusion

The Shift

What’s Next

Calvin Cooper

Recommended for you...

Featured Resources from Cloud Data Insights

Company

Categories