Jin10 Data, November 8 — According to the Moon’s Dark Side official website, Kimi K2 Thinking has set new records in benchmark evaluations for reasoning, coding, and agent capabilities. K2 Thinking achieved a state-of-the-art score of 44.9% in the HLE benchmark, 60.2% in the BrowseComp test, and 71.3% in the SWE-Bench Verified test, demonstrating the strong generalization ability of the most advanced thinking agent model.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Kimi K2 Thinking set new records in benchmark assessments evaluating reasoning, coding, and agent capabilities.
Jin10 Data, November 8 — According to the Moon’s Dark Side official website, Kimi K2 Thinking has set new records in benchmark evaluations for reasoning, coding, and agent capabilities. K2 Thinking achieved a state-of-the-art score of 44.9% in the HLE benchmark, 60.2% in the BrowseComp test, and 71.3% in the SWE-Bench Verified test, demonstrating the strong generalization ability of the most advanced thinking agent model.