Why do AI agent costs scale quadratically instead of linearly?

Each new turn in a conversation requires the LLM to re-read the entire conversation history. These cache reads accumulate in a triangle pattern, producing quadratic growth where doubling conversation length roughly quadruples the cost.

What is the 27,500 token threshold in AI agent costs?

At approximately 27,500 tokens of conversation history, cache reads equal all other API costs combined. Beyond this point, cache reads dominate the bill and their share continues to grow.

How much does a single feature cost with an AI coding agent?

According to exe.dev's analysis of 250 real conversations, a single feature-level conversation with a coding agent costs $12.93 on average, including all API charges.

What percentage of AI agent costs come from cache reads?

By the end of a typical coding session, cache reads account for 87% of the total bill, meaning only 13% of the cost goes toward processing new inputs and generating new outputs.

How can companies reduce AI agent costs?

Companies can reduce costs by structuring agent interactions as many short, independent conversations rather than long monolithic sessions. Task decomposition and conversation splitting significantly improve cost efficiency under quadratic scaling.

Does the quadratic cost pattern hold across different use cases?

Yes. Exe.dev sampled 250 real production conversations across varying lengths, complexity levels, and task types. The quadratic pattern held consistently because the underlying cause (re-reading history on every turn) is fundamental to how current LLM APIs work.

The Quadratic Cost Problem | Startup Intros

The Triangle Pattern: How Cache Reads Accumulate

The fundamental insight from exe.dev's analysis is deceptively simple. When an LLM-powered agent conducts a multi-turn conversation, each new turn requires the model to process the entire conversation history. Modern LLM APIs offer "cache reads" at reduced rates compared to fresh input tokens, but the volume of these cache reads grows in a predictable and problematic pattern.

Consider a conversation with N messages. The first message is processed once (when generating the second response). The first and second messages are both processed when generating the third response. The first, second, and third messages are all processed when generating the fourth response. The total number of token-reads across the entire conversation follows the formula for triangular numbers: N times (N plus 1) divided by 2. This is quadratic growth.

In practical terms, this means that doubling the length of a conversation does not double the cost. It roughly quadruples it. Tripling the conversation length increases costs by roughly nine times. For short interactions (a few messages), the difference between linear and quadratic scaling is negligible. For the extended, multi-turn conversations that characterize real coding agent sessions, the difference is enormous.

The 27,500 Token Crossover

Exe.dev's analysis identified a critical threshold at approximately 27,500 tokens of conversation history. At this point, the cumulative cost of cache reads equals the cumulative cost of all other API charges (input tokens for new messages, output tokens for responses). Beyond this threshold, cache reads dominate the bill and their share continues to grow.

To put this in context, 27,500 tokens is roughly equivalent to 20,000 words, or about 40 pages of text. For a coding agent working through a complex feature, this threshold can be reached within 30 to 60 minutes of active development. Many coding sessions last significantly longer, meaning most substantive agent interactions operate in the quadratic-dominated cost regime.

87% by Session End

By the end of a typical coding session, exe.dev found that cache reads account for 87% of the total bill. This means that only 13% of the cost goes toward the "useful" work of processing new inputs and generating new outputs. The remaining 87% is the overhead of re-reading what the model has already seen.

This ratio has a direct analogy in traditional computing: it resembles a system where 87% of CPU time is spent on garbage collection or memory management rather than actual computation. In such systems, the standard engineering response is to redesign the architecture. The same pressure will apply to AI agent systems.

The $12.93 Feature

The analysis quantified a concrete, relatable metric: the average cost of a single feature-level conversation with a coding agent is $12.93. This figure accounts for all API costs including input tokens, output tokens, cache reads, and any tool-use overhead.

At first glance, $12.93 per feature might seem reasonable. A human developer earning $150,000 per year costs roughly $75 per hour fully loaded. If a feature takes a developer four hours, that is $300 in human cost versus $12.93 in agent cost, a 23x savings.

But the quadratic scaling changes the picture when you consider volume and complexity. If a team ships 20 features per week across multiple agents, the weekly cost is approximately $260. Reasonable. But if those features grow in complexity, requiring longer conversations, the cost per feature escalates rapidly. A feature requiring twice the conversation length costs roughly four times as much: $52 instead of $13. A highly complex feature requiring three times the conversation length costs roughly nine times as much: $117.

For companies planning to deploy hundreds or thousands of agents across their engineering organizations, these costs compound quickly and unpredictably.

250 Conversations Confirm the Pattern

The strength of exe.dev's analysis lies in its empirical foundation. Rather than modeling costs theoretically, the team sampled 250 real conversations from production coding agent sessions. The quadratic pattern held consistently across conversations of varying lengths, complexity levels, and task types.

This consistency matters because it rules out the possibility that the quadratic pattern is an artifact of specific use cases or edge cases. The underlying cause, re-reading conversation history on every turn, is fundamental to how current LLM APIs operate. Until the architecture changes, the quadratic cost curve will persist.

Implications for Companies Deploying Agents

The quadratic cost problem creates several strategic challenges for companies investing in AI agents.

First, cost predictability becomes difficult. Linear cost models, which most financial planning assumes, will underestimate actual costs by increasing margins as usage grows. Finance teams accustomed to forecasting cloud compute costs on a per-unit basis will find that agent costs behave more like network effects: each additional unit of work costs more than the last.

Second, the economics favor short conversations. Companies that can structure their agent interactions as many short, independent conversations rather than few long ones will achieve significantly better cost efficiency. This architectural insight may drive changes in how agent-based products are designed, favoring task decomposition and conversation splitting over monolithic, long-running sessions.

Third, the competitive landscape among LLM providers may shift toward models and APIs that address the cache read problem. Providers that offer better caching strategies, sliding-window attention, or conversation summarization features will have a meaningful cost advantage for agent workloads.

Why This Research Matters

The quadratic cost problem is one of the most important and underreported findings in the current AI deployment landscape. Most media coverage of AI agents focuses on capability (what agents can do) rather than economics (what agents cost at scale). Exe.dev's analysis provides the first rigorous, data-backed examination of a cost structure that could limit the economic viability of AI agents for many use cases.

As companies race to deploy agents in response to competitive pressure from AI-first companies, understanding the true cost curve is essential. The quadratic scaling problem does not make agents unviable, but it does mean that naive deployment strategies, those that assume linear cost scaling, will produce budget surprises and may undermine the ROI case for agent adoption.

The Quadratic Cost Problem

Introduction

Frequently Asked Questions

Sources

Deep Dive & Analysis