Analysis: The open-source content of TileKernels corresponds in many places to the V4 architecture specifications previously disclosed by Yifan Zhang

robot
Abstract generation in progress

According to Beating monitoring, the open-source TileKernels core library from DeepSeek has multiple correspondences with the V4 architecture specifications previously disclosed by Yifan Zhang.

Zhang says that V4 uses Hyper-Connections for residual connections. The open-source TileKernels is the mHC (Manifold-Constrained Hyper-Connections) core, which is an improved version of the HC with double random matrix constraints that DeepSeek proposed for the Byte Seed team in 2024. It solves the signal divergence problem of the original HC during large-scale training. mHC itself is a type of Hyper-Connections. The original HC cannot support large-scale stable training, so V4’s actual use is likely mHC. Zhang also says that V4 uses a Fused MoE Mega-Kernel to manage MoE layers with 384 experts, with 6 experts activated at a time. TileKernels’ MoE module includes Top-k expert selection, token-to-expert mapping, and fused expert dispatching and collection.

TileKernels also includes the Engram core, a conditional memory module introduced in DeepSeek’s paper this January, but Zhang’s V4 specifications do not mention Engram. The library supports SM90 (Hopper) and SM100 (Blackwell), but does not support Huawei Ascend. The Information previously reported that V4 was trained on Blackwell, and DeepSeek also spent several months adapting the model for Huawei and Cambricon chips.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin