When most people interact with an AI tool, they experience it as software. You type something, a response appears. It feels lightweight, instant, almost ethereal. That experience is a remarkably effective illusion.Â
Behind every AI interaction is a chain of physical infrastructure that is neither lightweight nor cheap: vast data centres consuming industrial quantities of electricity, custom silicon costing tens of thousands of pounds per unit, specialised cooling systems, high-bandwidth networking, and storage arrays measured in petabytes. The gap between how AI feels to use and what it actually requires to run is one of the most important things businesses do not understand when they start planning their AI strategy.Â
This blog is about closing that gap.Â
Â
Why Large Language Models Need So Much HardwareÂ
To understand the infrastructure demand, it helps to understand what a large language model actually is at a physical level.Â
A model like the ones powering the leading AI assistants today is, at its core, a very large file containing billions of numerical parameters. GPT-4, for reference, is estimated to have somewhere in the region of one trillion parameters. Even more efficient models used widely in business contexts typically run to tens of billions. Each of those parameters is a number that needs to be stored and processed during every inference, every time the model generates a response.Â
This is not like running a conventional software application. When you load a word processor, the programme itself is small and the documents it works with live separately. When you run a large language model, the model itself is the thing that needs to be held in memory and processed continuously. A model with 70 billion parameters, stored in a common numerical format, requires somewhere between 35 and 140 gigabytes of memory just to load, depending on the precision used. That is before you handle any user requests.Â
Memory is the binding constraint. GPUs (graphics processing units) are used for AI inference and training because their architecture is highly suited to the parallel matrix operations that neural networks require. But GPU memory is expensive, scarce, and limited per chip. Running large models efficiently typically requires multiple high-end GPUs working in concert, each with significant dedicated memory. Nvidia’s H100 chips, which became the reference hardware for serious AI workloads, cost upwards of $30,000 each at launch and were in short supply for much of 2023 and 2024. A single server rack configured for serious AI workloads can cost more than a luxury house.Â
Training is an order of magnitude more demanding than inference. Training a frontier model requires running this process not once but billions of times across the entire training dataset, adjusting parameters incrementally. Â
The compute required for training the largest models is measured in exaFLOPs — quantities of calculation that are difficult to visualise. The energy consumption of a single training run for a large frontier model is comparable to the annual electricity usage of hundreds of homes.Â
Â
The Ripple Effect on Global Supply ChainsÂ
The hardware demands of AI have had material effects on global supply chains that go well beyond the technology sector.Â
The GPU shortage of 2022 to 2024 was the most visible example. Nvidia had a near-monopoly on the high-performance GPUs suited to AI training and inference, and demand dramatically outstripped supply. This drove up prices for anyone trying to access GPU compute, created long waiting lists for cloud GPU instances, and triggered a scramble among hyperscalers to secure supply well in advance of their needs.Â
This in turn affected the semiconductor supply chain further upstream. Advanced chip fabrication is dominated by a small number of foundries, most notably TSMC in Taiwan. The concentration of global chip manufacturing capacity in a small number of facilities and geographies has become a significant geopolitical concern, with governments in the US, EU, and UK investing heavily in efforts to diversify domestic semiconductor capability.Â
Beyond chips, AI infrastructure has created significant demand for other components that are not typically top of mind: high-bandwidth memory (HBM), which is the specialised memory used in AI accelerators, is produced by a very small number of manufacturers. Networking equipment capable of connecting thousands of GPUs with the low latency and high bandwidth that AI training requires is a specialised market. The power infrastructure needed to run large data centres at the scale AI demands has created supply constraints in electrical transformers and cooling equipment.Â
The rare earth elements and specialist materials that go into advanced semiconductors connect the AI supply chain to mining operations in regions including China, the Democratic Republic of Congo, and parts of South America. Businesses that think of AI as a purely digital phenomenon are, in a practical sense, connected to physical resource extraction and geopolitical risk in ways they may not have considered.Â
Â
What This Means for Cost and AccessibilityÂ
For businesses, the infrastructure reality translates into a cost structure that is often poorly understood at the planning stage.Â
If you access AI through a cloud API, the costs appear simple: you pay per token, per image, per call. What you are actually paying for is a fraction of the infrastructure described above, amortised across many users. This makes API access genuinely affordable for moderate usage. A business running a few thousand AI-assisted queries per day can access state-of-the-art model capability for costs that are commercially manageable.Â
The picture changes significantly when you move towards higher volume, lower latency, or custom deployments. A business that needs AI responses in milliseconds rather than seconds, at very high volume, cannot rely on shared API access and needs dedicated compute capacity. A business that wants to run a fine-tuned or custom model needs to either rent dedicated GPU instances (which can cost hundreds of pounds per hour for serious configurations) or invest in on-premise hardware.Â
The accessibility gap between large and small organisations is significant and worth naming directly. Hyperscalers and large enterprises can negotiate preferential access to GPU compute, lock in supply ahead of demand spikes, and invest in the engineering expertise needed to optimise their infrastructure usage. Smaller businesses are working with shared, on-demand resources, at pricing that reflects market rates rather than negotiated terms. This does not make AI inaccessible to smaller organisations, but it does mean the economics look different, and planning needs to account for this honestly.Â
Open source models have materially improved the accessibility picture. Models that would have been considered frontier capability eighteen months ago are now available as open weights, runnable on hardware that a mid-sized business could realistically own or rent. This is a genuinely important development for businesses that want more control over their AI infrastructure and are willing to invest in the technical capability to manage it.Â
Â
How Cloud Providers Shape What AI Is Available and at What CostÂ
It is difficult to overstate the degree to which the major cloud providers — AWS, Azure, and Google Cloud — shape the AI landscape for most businesses. They are simultaneously infrastructure providers, model developers, and distributors of third-party AI services. Their decisions about what to offer, how to price it, and where to invest shape what is practically accessible for the vast majority of organisations that are not building their own infrastructure.Â
This creates a set of dependencies that businesses should think through carefully. When you build a product or internal workflow on top of a cloud provider’s AI offering, you are exposed to their pricing decisions, their availability guarantees, their model deprecation schedules, and their choices about which capabilities to develop or retire. Cloud providers have generally behaved well in this regard, but the history of technology is full of platforms that extracted value from ecosystems they had enabled once those ecosystems were sufficiently dependent.Â
The concentration of AI infrastructure in a small number of hyperscaler data centres also creates geographic and regulatory considerations. Where your data is processed matters for compliance with GDPR and sector-specific regulations. The physical location of data centres affects latency. Outages at a major cloud provider can affect AI-dependent workflows across thousands of businesses simultaneously, as has happened on multiple occasions.Â
None of this argues against using cloud-based AI. For most businesses, it is the right choice: the economies of scale are real, the infrastructure management burden is significant, and the pace of capability development is faster than most organisations could match independently. The point is to make that choice with eyes open to the dependencies it creates.Â
Â
Why Infrastructure Awareness Matters When Planning AI AdoptionÂ
Businesses that understand the infrastructure layer make better AI adoption decisions. This shows up in a few specific ways.Â
Realistic cost modelling becomes possible. If you understand that AI inference has a non-trivial compute cost that scales with usage, you can model your AI-dependent workflows honestly rather than being surprised when API bills climb faster than anticipated as adoption grows.Â
Vendor evaluation improves. When you understand what infrastructure a vendor is actually running and how they have structured their compute capacity, you can ask better questions about reliability, latency guarantees, and pricing sustainability. A startup offering AI capabilities at below-market pricing on the basis of subsidised cloud credits is a different proposition from a provider with a sustainable cost structure.Â
Risk planning becomes more grounded. Understanding that your AI capabilities depend on a specific cloud region, a specific model version, or a specific GPU availability situation allows you to think about what your contingency looks like if any of those things change. For AI-critical workflows, that contingency planning is worth doing.Â
It also produces more honest conversations about what AI can and cannot do at a given price point. The capabilities of a system running on a large, well-provisioned model are different from those of a system running a smaller, more efficient model on limited hardware. Infrastructure shapes capability, and understanding that relationship helps you set appropriate expectations.Â
Â
The Infrastructure Is Not Going AwayÂ
The hardware intensity of AI is not a transitional phase that will resolve once the technology matures. It is a structural feature of how current AI systems work, and while efficiency improvements are real and ongoing, demand tends to absorb those gains rather than reduce overall resource requirements.Â
What is likely to change is the accessibility of meaningful capability at lower cost. Inference efficiency has improved dramatically, open source models are closing the gap with proprietary ones, and the competitive dynamics of the cloud market create pressure on pricing. For most business use cases, the combination of better models and more efficient infrastructure is making AI more accessible over time, not less.Â
But the hardware layer will remain. The supply chains, the geopolitical dependencies, the energy requirements, and the cloud concentration dynamics are features of the AI landscape that any serious business strategy needs to account for. The businesses that treat AI as a purely digital, purely software phenomenon are operating with an incomplete picture. And incomplete pictures lead to planning decisions that do not hold up in practice.Â


