Codex Security: now in research preview

AI Summary7 min read

TL;DR

Codex Security, an AI-powered application security agent, is now in research preview. It uses deep project context and agentic reasoning to identify high-confidence vulnerabilities with actionable fixes, reducing noise and false positives for security teams.

Key Takeaways

•Codex Security leverages OpenAI's frontier models to build deep project context, identify complex vulnerabilities, and provide actionable fixes, reducing false positives and noise.
•It creates editable threat models, validates findings in sandboxed environments, and learns from user feedback to improve precision and align with real-world risk.
•The tool has demonstrated significant improvements during beta, cutting noise by up to 84% and reducing false positive rates by over 50% across repositories.
•Codex Security is available in research preview for ChatGPT Enterprise, Business, and Edu customers, with free usage for the first month.
•It supports the open-source community by prioritizing high-confidence issues and has already helped discover critical vulnerabilities in projects like OpenSSH, GnuTLS, and Chromium.

Tags

ProductSecurity

Today we’re introducing Codex Security, our application security agent. It builds deep context about your project to identify complex vulnerabilities that other agentic tools miss, surfacing higher-confidence findings with fixes that meaningfully improve the security of your system while sparing you from the noise of insignificant bugs.

Context is essential when evaluating real security risks, but most AI security tools simply flag low-impact findings and false positives, forcing security teams to spend significant time on triage. At the same time, agents are accelerating software development, making security review an increasingly critical bottleneck. Codex Security addresses both challenges. By combining agentic reasoning from our frontier models with automated validation, it delivers high-confidence findings and actionable fixes so teams can focus on the vulnerabilities that matter and ship secure code faster.

Formerly known as Aardvark⁠, Codex Security began last year as a private beta with a small group of customers. In early internal deployments, it surfaced a real SSRF, a critical cross-tenant authentication vulnerability, and many other issues which our security team patched within hours. Early deployments with external testers helped us improve how users provide relevant product context and move from onboarding to securing their code. We also significantly improved the quality of our findings over the course of the beta: scans on the same repositories over time show increasing precision, in one case cutting noise by 84% since initial rollout. We’ve reduced the rate of findings with over-reported severity by more than 90%, and false positive rates on detections have fallen by more than 50% across all repositories. These improvements help Codex Security better align reported severity with real-world risk and reduce unnecessary triage burden for security teams, and we expect the signal-to-noise ratio to continue to improve.

Starting today, Codex Security is rolling out in research preview to ChatGPT Pro, Enterprise, Business, and Edu customers via Codex web with free usage for the next month.

How Codex Security works

Codex Security leverages OpenAI’s frontier models and the Codex agent. It can reduce noise and accelerate remediation by grounding vulnerability discovery, validation, and patching in system-specific context.

Build system context and create an editable threat model: After configuring a scan, it analyzes your repository to understand the security-relevant structure of the system and generates a project-specific threat model that can capture what the system does, what it trusts, and where it is most exposed. Threat models can be edited to keep the agent aligned with your team.
Prioritize and validate issues: Using the threat model as context, it searches for vulnerabilities and categorizes findings based on expected real-world impact in your system. Where possible, it pressure-tests findings in sandboxed validation environments to distinguish signal from noise. Users can see this analysis in the validated findings. When Codex Security is configured with an environment tailored to your project, it can validate potential issues directly in the context of the running system. That deeper validation can reduce false positives even further and enable the creation of working proof-of-concepts, giving security teams stronger evidence and a clearer path to remediation.
Patch issues with full system context: Finally, Codex Security proposes fixes to the discovered issues that align with system intent and surrounding behavior. This enables patches that can improve security while minimizing regressions, making them safer to review and land. Users can filter the findings so they stay focused on what matters most to their team and has the highest security impact.

Codex Security can also learn from your feedback over time to improve the quality of its findings. When you adjust the criticality of a finding, it can use that feedback to refine the threat model and improve precision on subsequent runs as it learns what matters in your architecture and risk posture.

It’s designed to operate at scale and surface the highest-confidence findings with easy-to-accept patches. Over the last 30 days, Codex Security scanned more than 1.2 million commits across external repositories in our beta cohort, identifying 792 critical findings and 10,561 high-severity findings. Critical issues appeared in under 0.1% of scanned commits, showing that the system can identify security impacting issues in large volumes of code while minimizing noise to reviewers.

"As a company laser-focused on product security, NETGEAR was pleased to join the early access program, and the results exceeded expectations. Codex Security integrated effortlessly into our robust security development environment, strengthening the pace and depth of our review processes. Its findings were impressively clear and comprehensive, often giving the sense that an experienced product security researcher was working alongside us."

— Chandan Nandakumaraiah, Head of Product Security at NETGEAR and Member of CVE Board

Supporting the open source community

Open source software forms the foundation of modern systems, including our own. We've been using Codex Security to scan the open-source repositories we rely on most, sharing high impact security findings we identify with maintainers to help strengthen that foundation.

In our conversations with maintainers, a consistent theme emerged: the challenge isn’t a lack of vulnerability reports, but too many low-quality ones. Maintainers told us they need fewer false positives and a more sustainable way to surface real security issues without creating additional triage burden. These conversations helped shape how we’re supporting the open source community with Codex Security. Rather than generating large volumes of speculative findings, we are building a system that prioritizes high-confidence issues that maintainers can act on quickly.

As part of this work, we reported critical vulnerabilities to a number of widely used open-source projects including OpenSSH⁠(opens in a new window), GnuTLS⁠(opens in a new window), GOGS⁠(opens in a new window), Thorium⁠(opens in a new window) libssh, PHP, and Chromium, and more. Fourteen CVEs have been assigned with dual reporting on two — we've shared some examples in the Appendix.

We recently started onboarding an initial cohort of open-source maintainers into Codex for OSS, our program to support the ecosystem with free ChatGPT Pro and Plus accounts, code review, and Codex Security. Projects like vLLM have already used Codex Security to find and patch issues as part of their normal workflow.

We plan to expand the program in the coming weeks so more maintainers have a direct path to better security, stronger review workflows, and support for the open-source work the ecosystem depends on. If you’re an open-source maintainer and interested, please get in touch⁠.

Get started

We’ll be rolling out Codex Security access to ChatGPT Enterprise, Business, and Edu customers over the coming days. Check out our docs⁠(opens in a new window) to learn more about setting up Codex Security for your team.

Appendix

Examples of high impact OSS vulnerabilities discovered by Codex Security:

GnuTLS certtool Heap-Buffer Overflow (Off-by-One) — CVE-2025-32990⁠(opens in a new window)
GnuTLS Heap Buffer Overread in SCT Extension Parsing — CVE-2025-32989⁠(opens in a new window)
GnuTLS Double-Free in otherName SAN Export — CVE-2025-32988⁠(opens in a new window)
2FA Bypass GOGS — CVE-2025-64175⁠(opens in a new window)
Unauth bypass GOGS — CVE-2026-25242⁠(opens in a new window)
Path traversal (arbitrary write) — download_ephemeral, download_children (agent) — CVE-2025-35430⁠(opens in a new window)
LDAP injection (filters & DN) — LdapUserMap::new / get_unix_info / basic_auth_ldap — CVE-2025-35431⁠(opens in a new window)
Unauthenticated DoS & mail abuse — resend_email_verification — CVE-2025-35432⁠(opens in a new window) , CVE-2025-35436⁠(opens in a new window)
Session not rotated on password change — User::update_user — CVE-2025-35433⁠(opens in a new window)
Disabled TLS verification — Elasticsearch client — CVE-2025-35434⁠(opens in a new window)
DoS: division by zero — /api/streams/depth/.../{split} — CVE-2025-35435⁠(opens in a new window)
gpg-agent stack buffer overflow via PKDECRYPT --kem=CMS (ECC KEM) — CVE-2026-24881⁠(opens in a new window)
Stack-based buffer overflow in TPM2 PKDECRYPT for RSA and ECC due to missing ciphertext length validation — CVE-2026-24882⁠(opens in a new window)
CMS/PKCS7 AES-GCM ASN.1 params stack buffer overflow — CVE-2025-15467⁠(opens in a new window)
PKCS#12 PBMAC1 PBKDF2 keyLength overflow + MAC bypass — CVE-2025-11187⁠(opens in a new window)