Kimi K2 Thinking set new records in benchmark assessments evaluating reasoning, coding, and agent capabilities.

Jin10 Data, November 8 — According to the Moon’s Dark Side official website, Kimi K2 Thinking has set new records in benchmark evaluations for reasoning, coding, and agent capabilities. K2 Thinking achieved a state-of-the-art score of 44.9% in the HLE benchmark, 60.2% in the BrowseComp test, and 71.3% in the SWE-Bench Verified test, demonstrating the strong generalization ability of the most advanced thinking agent model.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Related Topics
#
ai
Comment
0/400
No comments
  • Pin
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)