The strongest open-source model, DeepSeek v4, is finally here! A 1.6 trillion-parameter model, MIT license—long-text memory uses just one-tenth of V3.2.

robot
Abstract generation in progress

According to Beating Monitoring, the open-source preview version of DeepSeek V4 series, licensed under MIT, with weights now available on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6 trillion parameters and 49 billion active tokens per token; V4-Flash with a total of 284 billion parameters and 13 billion active tokens. Both support around 1 million token context.

Three architecture upgrades: hybrid attention mechanisms (compressed sparse attention CSA + heavily compressed attention HCA) significantly reduce long-context overhead, with V4-Pro’s single-token inference FLOPs at only 27% of V3.2 under 1 million token context, and KV cache (memory used to store historical information during inference) at only 10% of V3.2; manifold-constrained super-connection mHC replaces traditional residual connections, enhancing cross-layer signal propagation stability; training uses Muon optimizer for faster convergence. Pretraining data exceeds 32 trillion tokens.

Post-training is divided into two phases: first, using SFT and GRPO reinforcement learning to train domain experts separately, then online distillation to unify and merge into a single model. V4-Pro-Max (highest inference mode) claims to be the currently strongest open-source model, with top-tier encoding benchmarks, and significantly narrowed gaps in inference and agent tasks compared to closed-source cutting-edge models. V4-Flash-Max, after sufficient reasoning budget, approaches Pro in inference performance but is limited in pure knowledge and complex agent tasks due to parameter scale. Weights are stored in FP4+FP8 mixed precision.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin