Cloudflare intègre Kimi K2.5 et traite 7 milliards de tokens par jour, réduisant les coûts d'audit de sécurité de 77%

動區BlockTempo

2026-03-24 09:25:29

Cloudflare’s Workers AI platform officially integrates Moonshot AI’s Kimi K2.5, supporting 256K context, multi-turn tool calls, and visual input. Cloudflare’s internal security audit agent processes over 7 billion tokens daily, and switching to this model reduces costs by 77% compared to mid-tier commercial models.
（Background: Cursor used Kimi K2.5 for training but didn’t disclose it; developers captured packets, deleted prompts, and official statements changed rapidly）
（Additional context: Cloudflare, which blocks web crawlers, launched a “one-click full-site crawler API” that supports RAG, incremental updates, and model training）

Table of Contents

Toggle

A security agent processing 7 billion tokens a day
Cloudflare’s three major improvements
Underlying reasoning engine: Infire support, not a ready-made framework

Cloudflare’s Workers AI platform quietly made a major move. According to the official Cloudflare blog, they set Moonshot’s Kimi K2.5 as the default model for the Agents SDK starter. Cloudflare engineers are also using it for real security audits, saving a lot of costs.

Kimi K2.5 is one of the few models in the open-source community that supports “cutting-edge specifications,” including 256K context windows, multi-turn tool calling, visual input, and structured output. For long-text reasoning agent tasks, these features are quite practical.

A security agent processing 7 billion tokens a day

Cloudflare engineers directly used Kimi K2.5 as the main programming agent in the OpenCode environment, deploying a public code review agent called “Bonk” integrated into automated pipelines.

Even more impressive is the internal security audit scenario. This agent handles over 7 billion tokens daily. Using a standard-tier commercial model for the same workload would cost about $2.4 million per year. With Kimi K2.5, costs are cut by 77%, saving nearly $1.85 million.

This figure isn’t advertising; it’s a transparent account from Cloudflare engineers shared on their official blog.

Cloudflare’s three major platform improvements

Simply changing the model isn’t enough. Cloudflare also introduced three platform-level improvements targeting the cost and efficiency of long conversation agent scenarios:

Prefix Caching: Tokens already processed in multi-turn conversations are not charged again; cache hits enjoy discounted prices. Over long tasks, this saves significant costs.
Session Affinity Header: Adds an x-session-affinity request header to route the same session to the same model, increasing cache hit rates. OpenCode and Agents SDK starter support this natively.
Asynchronous batch inference API: Requests exceeding synchronous rate limits can be queued asynchronously, typically completing within 5 minutes during internal testing. Suitable for code scanning and research-type agents that don’t require immediate responses.

Underlying reasoning engine: Infire support, not a ready-made framework

Cloudflare didn’t use off-the-shelf inference frameworks but developed a customized core with their own Infire inference engine, employing data parallelism, tensor parallelism, and expert parallelism, combined with a separated prefix processing architecture.

Currently, Kimi K2.5 is the first large model inference case on Workers AI, demonstrating Cloudflare’s ambition in AI infrastructure—integrated with web platforms and cost-effective.

Voir l'original

Avertissement : Les informations contenues dans cette page peuvent provenir de tiers et ne représentent pas les points de vue ou les opinions de Gate. Le contenu de cette page est fourni à titre de référence uniquement et ne constitue pas un conseil financier, d'investissement ou juridique. Gate ne garantit pas l'exactitude ou l'exhaustivité des informations et n'est pas responsable des pertes résultant de l'utilisation de ces informations. Les investissements en actifs virtuels comportent des risques élevés et sont soumis à une forte volatilité des prix. Vous pouvez perdre la totalité du capital investi. Veuillez comprendre pleinement les risques pertinents et prendre des décisions prudentes en fonction de votre propre situation financière et de votre tolérance au risque. Pour plus de détails, veuillez consulter l'avertissement.

Commentaire

0/400

Aucun commentaire