Guojin Securities: AI Agents Drive Non-Linear Growth in Computing Demand, Focus on Industrial Chain Investment Opportunities

MaticHoleFiller · 2026-03-24T09:10:03+00:00

The AI industry is undergoing a significant shift, with long-context Agents replacing single-turn interactions and driving non-linear growth in compute demand. AI is evolving from a question-answering tool into autonomous Agents, with substantially increased token consumption and surging enterprise demand for multi-Agent systems. This transformation brings heightened requirements for memory and computational capacity, impacting future technological development.

MaticHoleFiller

2026-03-24 09:10:03

Abstract generation in progress

March 23, with 248,000 GitHub stars, a fourfold increase in Token consumption, and a 1445% growth in enterprise inquiries—these data points outline a key shift happening in the AI industry: the paradigm leap from Prompt to long Agent has already begun. OpenRouter platform data shows that multi-step reasoning is rapidly replacing single-turn interactions; Anthropic’s real-world tests indicate that Token consumption for a single Agent is about four times that of a conversation mode, while multi-Agent systems can reach up to 15 times. As the runtime of Agents continues to increase, the demand for computing power is entering a new phase of nonlinear expansion.

Paradigm shift in computing demand: from Prompt to long Agent

The interaction paradigm of artificial intelligence is undergoing a fundamental transformation. AI systems have evolved from single-question-and-answer tools to autonomous Agents capable of reasoning, planning, and continuous operation. This trend is clearly confirmed: data from the OpenRouter platform shows that multi-step reasoning and chain-of-tool calls are quickly replacing traditional single-turn interactions. The open-source Agent framework OpenClaw, released just over four months ago, has surpassed 248,000 GitHub stars to top the global open-source project charts, marking that long-running Agents have fully moved from experimental to production deployment. 2) Token consumption for Agent tasks has far exceeded that of traditional Q&A scenarios: Anthropic’s real-world tests show that a single Agent consumes about four times as many tokens as a dialogue mode, and multi-Agent systems can reach 15 times. NVIDIA’s January 2026 technical blog also explicitly states that the next-generation AI factories must be capable of handling hundreds of thousands of input tokens to support the long context required for Agentic reasoning. The paradigm shift has occurred, and a new logic for the growth of computing demand is taking shape.

Nonlinear increase in computing power driven by long Agents

Several core reasons drive the increased demand for computing power by long Agents: 1) Technical mechanisms: First, the computational cost of large model self-attention mechanisms grows quadratically with context length; second, the inference decode stage is inherently limited by memory bandwidth. As KVCache linearly expands with context, GPU utilization continues to decline, throughput bottlenecks become more prominent, and the pricing structures of mainstream vendors reflect physical costs: Google Gemini 3.1 Pro and Alibaba Cloud Qwen both adopt tiered pricing based on context length. 2) The rise of multi-Agent collaboration architectures introduces additional communication overhead. Gartner data shows that enterprise inquiries about multi-Agent systems surged by 1445% from Q1 2024 to Q2 2025; meanwhile, Google DeepMind research points out that global context compression and transfer among parallel Agents inevitably incur a “coordination tax,” with communication costs increasing nonlinearly with the number of Agents. 3) Jevons’ paradox further amplifies these effects: Microsoft CEO Satya Nadella predicts that improvements in model inference efficiency, while reducing costs, will stimulate faster growth in usage.

In summary, the increase in Agent runtime is an inevitable technological trend. In the foreseeable future, demands for memory bandwidth, interconnect throughput, and intelligent computing density will continue to expand at a nonlinear rate.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes