Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Futures Kickoff
Get prepared for your futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to experience risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Lobster Large Model Evaluation Rankings Are Here! MinMax and Kimi Make It Into the Top Three
In the past week, “Lobster Farming” has become a huge craze!
A long line of people waiting outside Tencent’s headquarters to get free “lobsters” installed, and on second-hand platforms like Xianyu, there are dozens to hundreds of “lobster” installation services. Major cloud providers have also launched one-click deployment tutorials and services. But here, “lobster” doesn’t refer to the small crayfish we eat, but to “OpenClaw.” “Claw” means both a claw and a tool, fitting its function as a tool, and the mascot of OpenClaw is a cute lobster.
The official definition on the OpenClaw website is “The AI that actually does things,” which can be literally translated as “truly working AI.” It can help you clean your inbox, send emails, manage schedules, check in for flights, and more—all by sending commands through connected chat apps like WhatsApp, Telegram, Feishu, DingTalk, and others.
In fact, OpenClaw cannot be used directly; it requires deployment and configuration, and over time, skills are added to it. That’s why it’s called “Lobster Farming.” When deploying OpenClaw, the first challenge is choosing which large model to serve as its “brain.” To answer this question, the PinchBench website was created.
PinchBench specifically benchmarks large models for OpenClaw, evaluating their performance in OpenClaw tasks. Currently, the official website has tested 33 of the world’s leading large models.
Data shows that in terms of success rate, Google’s Gemini-3-Flash-Preview ranks first at 95.1%. Domestic models Minimax-m2.1 and Kimi-k2.5 also made it into the top three, with success rates of 93.6% and 93.4%, outperforming many Claude models.
Regarding testing costs, Minimax-m2.1 and Kimi-k2.5 also perform well, balancing success rate and lower costs compared to Gemini-3-Flash-Preview. The costs are $0.14 and $0.20 respectively, while Gemini costs $0.72.
Additionally, in task completion speed, Minimax-m2.1 and Kimi-k2.5 also reach average levels among the seven models with success rates above 90%.
No wonder OpenClaw’s founder, Peter Steinberger, once said in a podcast interview that he believes Minimax 2.1 is the best open-source model currently (at that time, he hadn’t tested the latest models of Minimax and Kimi).