OpenAI releases the most powerful professional model GPT-5.4, automatically operates computers, with plugin support for AI to master Excel and financial analysis

2026-03-08 11:45:11

Faster and More Discerning GPT-5 Series Models: GPT-5.3 Just one day after its instant release, on Thursday, May 4th, Eastern Time, OpenAI announced the launch of the new flagship base model GPT-5.4, which is now available across ChatGPT, API, and development tools like Codex.

OpenAI describes GPT-5.4 as “the most capable and efficient professional frontier model to date,” focusing on enterprise office and complex knowledge work scenarios. Compared to previous versions, the biggest change in GPT-5.4 is the enhancement of AI agent capabilities. For the first time in API and Codex, GPT-5.4 achieves native-level “computer operation” functions, supporting intelligent agents to execute complex workflows across software.

GPT-5.4 can generate text or code and, for the first time, introduces native computer control into a general model, allowing direct operation of software, web browsing, mouse and keyboard control to complete tasks, and deep integration with enterprise applications like spreadsheets and financial analysis tools, embedding deeply into Microsoft Excel and Google Sheets.

In ChatGPT, GPT-5.4 supports “pre-showcasing thought processes,” allowing users to adjust task directions during model responses, and improves deep web search and context retention in long logical conversations.

Industry experts believe that a series of upgrades in GPT-5.4 mark the transition of AI models from “dialogue tools” to automated task-executing digital agents, further penetrating enterprise productivity software and professional knowledge work.

OpenAI also launched two versions this Thursday: GPT-5.4 Thinking, which is better at complex reasoning, and GPT-5.4 Pro, a high-performance version, aimed at paid users and high-end enterprise clients.

In the OSWorld-Verified computer control benchmark, GPT-5.4 achieved a success rate of 75.0%, surpassing the human average of 72.4%, a significant jump from GPT-5.2’s 47.3%. The financial services suite released simultaneously showed GPT-5.4’s score in OpenAI’s internal investment bank benchmark rose from 43.7% with GPT-5 to 88.0%.

Early testing organizations have given positive feedback. Daniel Swiecki, head of AI solutions at investment firm Walleye Capital, said GPT-5.4 improved accuracy by 30 percentage points in internal finance and Excel assessments. Brendan Foody, CEO of AI talent platform Mercor, called it “the best model we’ve tried so far” and noted GPT-5.4 ranked first in Mercor’s APEX-Agents benchmark for professional services.

First Native Computer Control in General Models: Breaking Single-Round Q&A Limits

The most groundbreaking feature of GPT-5.4 is its native computer control capability, a first for OpenAI in a general model. Through API and Codex, this model can operate computers like humans, completing multi-step workflows across applications.

Specifically, GPT-5.4 can control computers by writing code with libraries like Playwright or directly respond with screenshots and issue mouse and keyboard commands. Developers can also configure custom confirmation strategies to suit different risk tolerance scenarios.

Benchmark data supports substantial progress: in OSWorld-Verified, which tests desktop navigation, GPT-5.4 achieved a success rate of 75.0%, exceeding GPT-5.2’s 47.3% and surpassing the human benchmark of 72.4%. In WebArena-Verified browser control tests, success was 67.3%, higher than GPT-5.2’s 65.4%. In Online-Mind2Web, it achieved a 92.8% success rate using only screenshots.

In web search capabilities, BrowseComp testing shows GPT-5.4 improved by 17 percentage points over GPT-5.2, with GPT-5.4 Pro reaching a record high score of 89.3%.

Mainstay, a real estate tech company, reports that in testing across about 30,000 property tax portals, GPT-5.4 achieved a first-attempt success rate of 95% and 100% within three attempts, a significant improvement over previous computer control models (success rates around 73-79%), with speeds about three times faster and token consumption reduced by approximately 70%.

Tool Search Mechanism Rebuilt to Significantly Reduce Token Consumption

As the tool ecosystem expands, managing tool calls efficiently has become a bottleneck for deploying agent systems. GPT-5.4 introduces a “Tool Search” mechanism in the API, fundamentally changing how tools are defined and transmitted.

Previously, models needed to preload all tool definitions in the prompt for each request, which in large systems could consume thousands or tens of thousands of tokens per request, increasing costs, latency, and diluting context. The new mechanism allows the model to receive only a lightweight list of tools, retrieving full definitions only when needed.

OpenAI provides concrete data: in 250 tasks of the Scale MCP Atlas benchmark, with all 36 MCP servers enabled, the tool search mode reduced total token usage by 47% compared to exposing all MCP functions directly in context, while maintaining the same accuracy.

Wade, CEO of Zapier, states that GPT-5.4 performed excellently in tool usage benchmarks across hundreds of real workflows, calling it “the most sustainable model to date.”

Financial and Enterprise Applications: Deep Excel Integration and Investment Banking Performance Doubled

Alongside GPT-5.4, OpenAI released the “OpenAI Financial Services” suite for enterprises and financial institutions, featuring core products like ChatGPT for Excel and Google Sheets (beta)—embedding ChatGPT directly into spreadsheet cells to build, analyze, and update complex financial models.

The suite integrates data partners like FactSet, MSCI, Third Bridge, and Moody’s, and introduces reusable Skills functions covering high-frequency financial tasks such as earnings previews, comparable company analysis, DCF valuation, and investment memos.

In internal investment bank benchmarks, GPT-5.4 Thinking scored 88.0%, up from 43.7% with GPT-5. In simulated junior investment banker spreadsheet modeling tasks, GPT-5.4 averaged 87.3%, far above GPT-5.2’s 68.4%.

Niko Grupen, head of legal AI applications at Harvey, reports GPT-5.4 scored 91% in the BigLaw Bench assessment, stating it “outperforms other models in structured complex transaction analysis, maintaining accuracy across lengthy contracts, and providing the detailed insights legal practitioners need.”

Knowledge Work and Hallucination Suppression: Fully Benchmarking Against Professionals

OpenAI demonstrates GPT-5.4’s capabilities across multiple real-world professional benchmarks. In GDPval, which covers 44 knowledge work tasks across professions—including sales demos, accounting spreadsheets, manufacturing charts—GPT-5.4 matched or exceeded industry professionals in 83.0% of cases, up from 71.0% with GPT-5.2.

In presentation quality assessments, human reviewers preferred GPT-5.4 outputs 68.0% of the time, citing better visual aesthetics, richer visual diversity, and more effective image generation.

Regarding hallucinations and factual errors, OpenAI states GPT-5.4 is its “most factually accurate model to date”: on de-identified fact-error flagged prompts, the rate of individual statement errors decreased by 33% compared to GPT-5.2, and the overall probability of any errors in full responses dropped by 18%.

In programming, GPT-5.4 performs on par or better than GPT-5.3-Codex on SWE-Bench Pro, with lower latency across reasoning settings. The Codex /fast mode can boost token generation speed by up to 1.5 times, using the same model and intelligence but optimized for speed. GitHub’s Mario Rodriguez notes GPT-5.4 excels in logical reasoning and executing complex multi-step workflows dependent on tools, calling it “the model enterprise should adopt from day one.”

Two Versions Cover Different User Needs: Context Window Up to 1 Million Tokens

GPT-5.4 Thinking targets general professional scenarios requiring deep reasoning, while GPT-5.4 Pro is designed for the most complex tasks, pushing performance limits.

On ChatGPT, GPT-5.4 Thinking is available from this Thursday to Plus ($20/month), Team, and Pro users, replacing GPT-5.2 Thinking, which will be retired on June 5, 2026. GPT-5.4 Pro is limited to Pro ($200/month) and Enterprise plans. Free users can access GPT-5.4 in limited capacity via system routing. Enterprise and education users can enable early access through admin settings.

On the API side, GPT-5.4 is available under the gpt-5.4 identifier, and GPT-5.4 Pro as gpt-5.4-pro, both usable on the Codex platform. The maximum output is 128,000 tokens, consistent with previous models. Both API and Codex support a maximum context window of 1 million tokens—the largest OpenAI has offered—suitable for planning, executing, and verifying long, multi-step tasks.

Pricing Higher Than Previous Generation, but Efficiency Gains Offset Cost Increases

GPT-5.4’s API pricing is higher than GPT-5.2’s, as follows:

GPT-5.4: $2.50 per million input tokens, $15 per million output tokens (GPT-5.2 was $1.75 input / $14 output)
GPT-5.4 Pro: $30 per million input tokens, $180 per million output tokens (GPT-5.2 Pro was $21 input / $168 output)
Batch and Flex pricing enjoy 50% discounts; Priority processing is charged at double the standard rate

Note that when a single input exceeds 272,000 tokens, the excess is billed at double the standard rate. In Codex, the default compression limit is 272,000 tokens, but developers can manually increase this to handle larger prompts, with excess tokens incurring higher charges.

OpenAI explains the higher prices with three points: first, GPT-5.4 offers stronger capabilities in programming, computer control, deep research, high-level document generation, and tool invocation; second, it reflects significant technological advances from their research roadmap; third, the more efficient reasoning mechanism consumes fewer reasoning tokens for the same tasks, somewhat offsetting the price increase. OpenAI also states that even with the price hike, GPT-5.4 remains cheaper than comparable leading models from competitors.

Disclaimer: The content and data in this article are for reference only and do not constitute investment advice. Verify before use. Operate at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes