DeepSeek V4-Pro Internal Review: Coding Pass Rate Approaches Opus 4.5, 52% of Testers Endorse as Default Model

According to monitoring by Dongcha Beating, DeepSeek V4 has rarely disclosed internal dogfooding data. The team collected around 200 real R&D tasks from over 50 engineers, covering functional development, bug fixes, refactoring, and diagnostics, with a tech stack including PyTorch, CUDA, Rust, and C++. After strict selection, 30 tasks were retained as the evaluation set. The V4-Pro-Max pass rate is 67%, significantly higher than Sonnet 4.5’s 47%, and close to Opus 4.5’s 70%, but lower than Opus 4.5 Thinking’s 73% and Opus 4.6 Thinking’s 80%. The pass rate for Haiku 4.5 is only 13%. In an internal survey with N=85, all respondents reported using V4-Pro for agentic coding in their daily work. 52% believe V4-Pro can serve as the default primary coding model, 39% tend to agree, and less than 9% disagree. The main feedback issues include basic errors, misunderstandings of vague prompts, and occasional overthinking.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin