DeepSeek has released a V4 series preview, under the MIT license, with weights available on HuggingFace and ModelScope. V4-Pro1.6T and V4-Flash284B both support a 1M context window; activations are 49B and 13B respectively. Upgraded with CSA+HCA hybrid attention, mHC hyper connections, and Muon optimization, with training data exceeding 32T. Post-training is done in two stages: SFT/GRPO first cultivating domain experts, followed by online distillation and merging. Pro-Max is the strongest open-source model, and inference is already close to the frontier; Flash-Max is also close to Pro given a sufficient reasoning budget, but its scale is limited. Weights use FP4+FP8 mixed precision.

BlockBeatNews

2026-04-24 03:22:40

Abstract generation in progress

According to Beating Monitoring, the open-source preview version of DeepSeek V4 series, licensed under MIT, with weights now available on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6 trillion parameters and 49 billion active tokens per token; V4-Flash with a total of 284 billion parameters and 13 billion active tokens. Both support around 1 million token context.

Three architecture upgrades: hybrid attention mechanisms (compressed sparse attention CSA + heavily compressed attention HCA) significantly reduce long-context overhead, with V4-Pro’s single-token inference FLOPs at only 27% of V3.2 under 1 million token context, and KV cache (memory used to store historical information during inference) at only 10% of V3.2; manifold-constrained super-connection mHC replaces traditional residual connections, enhancing cross-layer signal propagation stability; training uses Muon optimizer for faster convergence. Pretraining data exceeds 32 trillion tokens.

Post-training is divided into two phases: first, using SFT and GRPO reinforcement learning to train domain experts separately, then online distillation to unify and merge into a single model. V4-Pro-Max (highest inference mode) claims to be the currently strongest open-source model, with top-tier encoding benchmarks, and significantly narrowed gaps in inference and agent tasks compared to closed-source cutting-edge models. V4-Flash-Max, after sufficient reasoning budget, approaches Pro in inference performance but is limited in pure knowledge and complex agent tasks due to parameter scale. Weights are stored in FP4+FP8 mixed precision.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Gate13thAnniversaryLive
1.28M Popularity
#
WCTCTradingChallengeShare8MUSDT
832.06K Popularity
#
CryptoMarketSeesVolatility
202.23K Popularity
#
rsETHAttackUpdate
76.96K Popularity
#
US-IranTalksStall
488.82K Popularity

Sitemap

The strongest open-source model, DeepSeek v4, is finally here! A 1.6 trillion-parameter model, MIT license—long-text memory uses just one-tenth of V3.2.

Trending Topics

Gate13thAnniversaryLive

WCTCTradingChallengeShare8MUSDT

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

Pin