OpenAI debuts the GPT‑5.1-Codex-Max encoding model and has already completed a 24-hour task internally

OpenAI has GPT-5.1-Codex-Max introducedlatest border agent coding model, now available in the Codex development environment. This release represents a significant breakthrough in AI-powered software engineering, offering improved long-term reasoning, performance, and real-time interactive capabilities. GPT‑5.1-Codex-Max will now replace GPT‑5.1-Codex as the default model on Codex-integrated surfaces.

The latest model is intended to function a persistent, high-context development agent able to managing complex refactorings, debugging workflows, and project-scale tasks across multiple context windows.

It follows in the footsteps of Google, which released its powerful latest Gemini 3 Pro model yesterday, and yet it still outperforms or matches it in key encoding tests:

- Advertisement -

ON Verified on SWE-Bench, GPT‑5.1-Codex-Max achieved an accuracy of 77.9%. with very high reasoning effort, outperforming Gemini 3 Pro by 76.2%.

This also led Terminal-Bench 2.0 with 58.1% accuracy in comparison with 54.2% Gemini, and matched Gemini’s rating of 2439 on LiveCodeBench Pro, Elo’s competitive coding test.

Compared to the most advanced configuration of Gemini 3 Pro – the Deep Thinking model – Codex-Max also has a slight advantage in agent coding tests.

Performance testing: incremental gains on key tasks

GPT‑5.1-Codex-Max shows measurable improvements over GPT‑5.1-Codex in a number of ordinary software engineering benchmarks.

In SWE-Lancer IC, SWE achieved an accuracy of 79.9%, a significant increase in comparison with 66.3% in GPT-5.1-Codex. SWE-Bench Verified (n=500) achieved 77.9% accuracy with very high inference effort, outperforming GPT-5.1-Codex’s 73.7%.

Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT-5.1-Codex-Max achieving an accuracy of 58.1% in comparison with 52.8% for GPT-5.1-Codex.

All evaluations were performed with density and the inclusion of very high inference effort.

These results indicate that the latest model offers a higher ceiling in each comparative validity and real-world usability under prolonged reasoning load.

Technical architecture: long-term reasoning through densification

The predominant improvement of the GPT‑5.1-Codex-Max architecture is its ability to reason efficiently over long I/O sessions using a mechanism called compaction.

This allows the model to retain key contextual information and discard irrelevant details because it approaches the limit of the context window – effectively allowing it to work on tens of millions of tokens constantly without degrading performance.

Internally, the model was observed to perform tasks lasting longer than 24 hours, including multi-step refactorings, test-driven iteration, and autonomous debugging.

Compacting also improves token performance. At medium reasoning effort, GPT‑5.1-Codex-Max used roughly 30% fewer considering tokens than GPT‑5.1-Codex, providing comparable or higher accuracy, which has implications for each cost and latency.

Platform integration and use cases

GPT‑5.1-Codex-Max is currently available in multiple Codex-based environments that reference OpenAI’s own integrated tools and interfaces built specifically for code-centric AI agents. These include:

  • CLI Codethe official OpenAI command-line tool (@openai/codex), which already runs GPT‑5.1-Codex-Max.

  • IDE extensionspossibly developed or maintained by OpenAI, although no specific third-party IDE integrations are mentioned.

  • Interactive coding environmentssimilar to those used to show front-end simulation applications similar to CartPole or Snell’s Law Explorer.

  • Internal code review toolsused by OpenAI engineering teams.

For now, GPT‑5.1-Codex-Max is not yet available via a public API, although OpenAI says it’ll be soon. Users who need to work with the model in terminal environments today can do so by installing and using Codex CLI.

It is currently unconfirmed whether or how the model shall be integrated with third-party IDEs unless they are built on top of a CLI or a future API.

The model can interact with live tools and simulations. Examples shown in the release include:

  • An interactive CartPole policy gradient simulator that visualizes training and reinforcement learning activations.

  • Snell’s Law Optical Explorer, supporting dynamic ray tracing across refractive indices.

These interfaces illustrate the model’s ability to reason in real time while maintaining an interactive programming session – effectively combining computation, visualization, and implementation in one loop.

Cybersecurity and security restrictions

While GPT‑5.1-Codex-Max does not meet OpenAI’s “high” threshold for cybersecurity capabilities as a part of its readiness framework, it is currently the most capable cybersecurity model deployed by OpenAI. It supports use cases similar to automatic vulnerability detection and remediation, but with strict sandboxing and network access disabled by default.

OpenAI does not report an increase in malicious use, but has introduced improved monitoring systems, including activity routing and mechanisms to disrupt suspicious behavior. Codex stays isolated from the local workspace unless developers select to supply wider access, mitigating risks similar to rapid injection of untrusted content.

Deployment context and developer usage

GPT‑5.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu and Enterprise plans. It will even turn out to be the latest default in Codex-based environments, replacing GPT-5.1-Codex, which was a more general-purpose model.

OpenAI claims that 95% of internal engineers use Codex on a weekly basis, and since implementation, these engineers have sent on average about 70% more pull requests, highlighting the tool’s impact on the speed of internal development.

Despite its autonomy and persistence, OpenAI emphasizes that Codex-Max needs to be considered a coding assistant, not a alternative for manual review. The model produces terminal logs, test quotes, and tool call results to make sure transparency of generated code.

Perspectives

GPT‑5.1-Codex-Max represents a significant evolution of OpenAI’s strategy towards agent development tools, offering greater depth of reasoning, token performance, and interactive capabilities for software engineering tasks. By extending context management and compaction strategies, the model can handle tasks at the scale of full repositories fairly than individual files or fragments.

With a continued emphasis on agent-based workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max sets the stage for the next generation of AI-powered development environments – while emphasizing the importance of governance in increasingly autonomous systems.

Latest Posts

Advertisement

More from this stream

Recomended