New research reveals LLM agent costs scale quadratically. Cache reads hit 87% of the bill by session end. A single feature costs $12.93.
A detailed analysis from exe.dev has exposed a fundamental economic problem with LLM-powered coding agents: their costs scale quadratically, not linearly, as conversations grow longer. This finding has significant implications for every company deploying AI agents for software development, customer support, or any task requiring extended multi-turn interactions.
The core issue is architectural. Each time an AI agent processes a new message in a conversation, it must re-read the entire conversation history. The cost of these cache reads accumulates in a triangle pattern: the first message is read once, the second twice, the third three times, and so on. This produces a quadratic cost curve where total expenditure grows proportionally to the square of the conversation length, not its linear size.
The numbers from exe.dev's analysis are striking. By the time a conversation reaches 27,500 tokens, cache reads account for half the total cost. By the end of a typical coding session, cache reads consume 87% of the bill. A single feature-level conversation, the kind a developer might have while building one component of an application, costs $12.93 on average. The analysis sampled 250 real conversations and confirmed the quadratic pattern held consistently across different use cases and conversation lengths.
For companies scaling AI agent deployments, this finding challenges the assumption that AI coding assistants become cheaper with scale. Instead, the opposite may be true: as agents handle more complex, longer tasks, per-task costs accelerate. This research represents one of the first rigorous, data-driven examinations of real-world AI agent economics, a topic that most coverage has addressed only in theoretical terms.
Each new turn in a conversation requires the LLM to re-read the entire conversation history. These cache reads accumulate in a triangle pattern, producing quadratic growth where doubling conversation length roughly quadruples the cost.
At approximately 27,500 tokens of conversation history, cache reads equal all other API costs combined. Beyond this point, cache reads dominate the bill and their share continues to grow.
According to exe.dev's analysis of 250 real conversations, a single feature-level conversation with a coding agent costs $12.93 on average, including all API charges.
By the end of a typical coding session, cache reads account for 87% of the total bill, meaning only 13% of the cost goes toward processing new inputs and generating new outputs.
Companies can reduce costs by structuring agent interactions as many short, independent conversations rather than long monolithic sessions. Task decomposition and conversation splitting significantly improve cost efficiency under quadratic scaling.
Yes. Exe.dev sampled 250 real production conversations across varying lengths, complexity levels, and task types. The quadratic pattern held consistently because the underlying cause (re-reading history on every turn) is fundamental to how current LLM APIs work.
The fundamental insight from exe.dev's analysis is deceptively simple. When an LLM-powered agent conducts a multi-turn conversation, each new turn requires the model to process the entire conversation history. Modern LLM APIs offer "cache reads" at reduced rates compared to fresh input tokens, but the volume of these cache reads grows in a predictable and problematic pattern.
Consider a conversation with N messages. The first message is processed once (when generating the second response). The first and second messages are both processed when generating the third response. The first, second, and third messages are all processed when generating the fourth response. The total number of token-reads across the entire conversation follows the formula for triangular numbers: N times (N plus 1) divided by 2. This is quadratic growth.
In practical terms, this means that doubling the length of a conversation does not double the cost. It roughly quadruples it. Tripling the conversation length increases costs by roughly nine times. For short interactions (a few messages), the difference between linear and quadratic scaling is negligible. For the extended, multi-turn conversations that characterize real coding agent sessions, the difference is enormous.
Exe.dev's analysis identified a critical threshold at approximately 27,500 tokens of conversation history. At this point, the cumulative cost of cache reads equals the cumulative cost of all other API charges (input tokens for new messages, output tokens for responses). Beyond this threshold, cache reads dominate the bill and their share continues to grow.
To put this in context, 27,500 tokens is roughly equivalent to 20,000 words, or about 40 pages of text. For a coding agent working through a complex feature, this threshold can be reached within 30 to 60 minutes of active development. Many coding sessions last significantly longer, meaning most substantive agent interactions operate in the quadratic-dominated cost regime.
By the end of a typical coding session, exe.dev found that cache reads account for 87% of the total bill. This means that only 13% of the cost goes toward the "useful" work of processing new inputs and generating new outputs. The remaining 87% is the overhead of re-reading what the model has already seen.
This ratio has a direct analogy in traditional computing: it resembles a system where 87% of CPU time is spent on garbage collection or memory management rather than actual computation. In such systems, the standard engineering response is to redesign the architecture. The same pressure will apply to AI agent systems.
The analysis quantified a concrete, relatable metric: the average cost of a single feature-level conversation with a coding agent is $12.93. This figure accounts for all API costs including input tokens, output tokens, cache reads, and any tool-use overhead.
At first glance, $12.93 per feature might seem reasonable. A human developer earning $150,000 per year costs roughly $75 per hour fully loaded. If a feature takes a developer four hours, that is $300 in human cost versus $12.93 in agent cost, a 23x savings.
But the quadratic scaling changes the picture when you consider volume and complexity. If a team ships 20 features per week across multiple agents, the weekly cost is approximately $260. Reasonable. But if those features grow in complexity, requiring longer conversations, the cost per feature escalates rapidly. A feature requiring twice the conversation length costs roughly four times as much: $52 instead of $13. A highly complex feature requiring three times the conversation length costs roughly nine times as much: $117.
For companies planning to deploy hundreds or thousands of agents across their engineering organizations, these costs compound quickly and unpredictably.
The strength of exe.dev's analysis lies in its empirical foundation. Rather than modeling costs theoretically, the team sampled 250 real conversations from production coding agent sessions. The quadratic pattern held consistently across conversations of varying lengths, complexity levels, and task types.
This consistency matters because it rules out the possibility that the quadratic pattern is an artifact of specific use cases or edge cases. The underlying cause, re-reading conversation history on every turn, is fundamental to how current LLM APIs operate. Until the architecture changes, the quadratic cost curve will persist.
The quadratic cost problem creates several strategic challenges for companies investing in AI agents.
First, cost predictability becomes difficult. Linear cost models, which most financial planning assumes, will underestimate actual costs by increasing margins as usage grows. Finance teams accustomed to forecasting cloud compute costs on a per-unit basis will find that agent costs behave more like network effects: each additional unit of work costs more than the last.
Second, the economics favor short conversations. Companies that can structure their agent interactions as many short, independent conversations rather than few long ones will achieve significantly better cost efficiency. This architectural insight may drive changes in how agent-based products are designed, favoring task decomposition and conversation splitting over monolithic, long-running sessions.
Third, the competitive landscape among LLM providers may shift toward models and APIs that address the cache read problem. Providers that offer better caching strategies, sliding-window attention, or conversation summarization features will have a meaningful cost advantage for agent workloads.
The quadratic cost problem is one of the most important and underreported findings in the current AI deployment landscape. Most media coverage of AI agents focuses on capability (what agents can do) rather than economics (what agents cost at scale). Exe.dev's analysis provides the first rigorous, data-backed examination of a cost structure that could limit the economic viability of AI agents for many use cases.
As companies race to deploy agents in response to competitive pressure from AI-first companies, understanding the true cost curve is essential. The quadratic scaling problem does not make agents unviable, but it does mean that naive deployment strategies, those that assume linear cost scaling, will produce budget surprises and may undermine the ROI case for agent adoption.