Cloudflare’s Workers AI platform officially integrates Moonshot AI’s Kimi K2.5, supporting 256K context, multi-turn tool calls, and visual input. Cloudflare’s internal security audit agent processes over 7 billion tokens daily, and switching to this model reduces costs by 77% compared to mid-tier commercial models.
(Background: Cursor used Kimi K2.5 for training but didn’t disclose it; developers captured packets, deleted prompts, and official statements changed rapidly)
(Additional context: Cloudflare, which blocks web crawlers, launched a “one-click full-site crawler API” that supports RAG, incremental updates, and model training)
Table of Contents
Toggle
Cloudflare’s Workers AI platform quietly made a major move. According to the official Cloudflare blog, they set Moonshot’s Kimi K2.5 as the default model for the Agents SDK starter. Cloudflare engineers are also using it for real security audits, saving a lot of costs.
Kimi K2.5 is one of the few models in the open-source community that supports “cutting-edge specifications,” including 256K context windows, multi-turn tool calling, visual input, and structured output. For long-text reasoning agent tasks, these features are quite practical.
Cloudflare engineers directly used Kimi K2.5 as the main programming agent in the OpenCode environment, deploying a public code review agent called “Bonk” integrated into automated pipelines.
Even more impressive is the internal security audit scenario. This agent handles over 7 billion tokens daily. Using a standard-tier commercial model for the same workload would cost about $2.4 million per year. With Kimi K2.5, costs are cut by 77%, saving nearly $1.85 million.
This figure isn’t advertising; it’s a transparent account from Cloudflare engineers shared on their official blog.
Simply changing the model isn’t enough. Cloudflare also introduced three platform-level improvements targeting the cost and efficiency of long conversation agent scenarios:
Cloudflare didn’t use off-the-shelf inference frameworks but developed a customized core with their own Infire inference engine, employing data parallelism, tensor parallelism, and expert parallelism, combined with a separated prefix processing architecture.
Currently, Kimi K2.5 is the first large model inference case on Workers AI, demonstrating Cloudflare’s ambition in AI infrastructure—integrated with web platforms and cost-effective.