The Dark Side of the Moon open-source FlashKDA, Kimi Linear inference speed improved by 1.7 to 2.2 times

robot
Abstract generation in progress

ME News message, April 22 (UTC+8). According to monitoring by Dongcha Beating, the Dark Side of the Moon has open-sourced FlashKDA on GitHub—a set of tools specifically designed to accelerate model inference on NVIDIA Hopper series GPUs (H100, H20, etc.)—under the MIT license. It targets KDA, the new attention mechanism proposed last year by the Dark Side of the Moon in the Kimi Linear paper. When large models read long texts, the computational cost of traditional attention expands at a quadratic rate with sequence length; linear attention reduces this cost to linear growth. KDA is an improved version along this route. The Kimi Linear model architecture alternates 3 layers of KDA with 1 layer of traditional attention.

Previously, there was already a Triton-language version of KDA in the open-source library flash-linear-attention (abbreviated as fla). FlashKDA has now been rewritten using NVIDIA’s low-level GPU library CUTLASS, specifically to extract maximum performance from Hopper GPUs. In official tests on the H20, for the same forward computation, FlashKDA is 1.7 to 2.2 times faster than the Triton version. The speedup is especially noticeable in scenarios where input lengths vary and batching is used to run multiple batches. However, the official comparison only benchmarks against their own Triton version and does not compare with other linear attention approaches.

This time, only the forward computation has been open-sourced—meaning you can only “run the model” (inference), but cannot “train the model”; training still requires the original Triton version. Requirements: Hopper and later GPUs (starting with the SM90 architecture), CUDA 12.9 or above, and PyTorch 2.4 or above. FlashKDA also serves as a new backend merged upstream into fla (PR #852). For existing users switching over, it only takes changing one line of configuration.

(Source: BlockBeats)

KDA0.87%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin