DeepSeek V4 Released: 1.6T Parameter Flagship Supports Around 1 Million Contexts, Inference Computing Power Only 27% of V3.2

According to Beating Monitoring, the open-source preview version of DeepSeek V4 series, licensed under MIT, with weights now available on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6 trillion parameters and 49 billion active tokens per token; V4-Flash with a total of 284 billion parameters and 13 billion active tokens. Both support approximately 1 million tokens of context.

Three architectural upgrades: hybrid attention mechanisms (compressed sparse attention CSA + heavily compressed attention HCA) significantly reduce long-context overhead, with V4-Pro achieving only 27% of the FLOPs per token compared to V3.2 under 1 million token context, and KV cache (memory used to store historical information during inference) only 10% of V3.2; manifold-constrained super-connection mHC replaces traditional residual connections, enhancing stability of cross-layer signal propagation; training is accelerated using the Muon optimizer. Pretraining data exceeds 32 trillion tokens.

Post-training is divided into two phases: first, domain experts are trained separately using SFT and GRPO reinforcement learning; then, online distillation is used to unify and merge them into a single model. V4-Pro-Max (highest inference mode) claims to be the currently strongest open-source model, with top-tier encoding benchmarks, and a significant narrowing of gaps in inference and agent tasks compared to closed-source cutting-edge models. V4-Flash-Max, after sufficient reasoning budget, approaches Pro in inference performance but is limited in pure knowledge and complex agent tasks due to parameter scale. Weights are stored in FP4+FP8 mixed precision.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin