Strengthening cyber resilience as AI capabilities advance
TL;DR
AI의 사이버 보안 능력이 빠르게 발전함에 따라 OpenAI는 방어적 사용을 지원하고 악용을 제한하기 위해 다층적 안전 조치를 구축하고 있습니다. 신뢰할 수 있는 접근 프로그램, Aardvark 보안 도구, Frontier Risk Council 등을 통해 방어자들에게 우위를 제공하고 생태계의 회복력을 강화하는 데 중점을 두고 있습니다.
Key Takeaways
- •AI 모델의 사이버 보안 능력이 급속히 향상되어 방어와 공격 모두에 활용될 수 있는 이중 사용 위험이 증가하고 있습니다.
- •OpenAI는 방어적 사이버 보안 작업을 강화하고, 코드 감사 및 취약점 패치와 같은 워크플로우를 지원하는 도구를 개발하여 방어자들에게 상당한 이점을 제공하는 것을 목표로 합니다.
- •다층적 안전 접근법을 통해 액세스 제어, 인프라 강화, 탐지 시스템, 전문가 레드 팀링 등을 결합하여 악용 위험을 완화하고 있습니다.
- •신뢰할 수 있는 접근 프로그램, Aardvark 보안 연구원, Frontier Risk Council 등의 이니셔티브를 통해 방어적 역량을 확장하고 생태계 전반의 보안 태세를 강화하고 있습니다.
- •산업과의 협력을 통해 위협 모델에 대한 공유 이해를 개발하고, 독립적인 평가 생태계를 구축하여 모델 능력에 대한 일관된 이해를 도모하고 있습니다.
Tags
Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyberdefense as well as new dual-use risks that must be managed carefully. For example, capabilities assessed through capture-the-flag (CTF) challenges have improved from 27% on GPT‑5(opens in a new window) in August 2025 to 76% on GPT‑5.1-Codex-Max(opens in a new window) in November 2025.
We expect that upcoming AI models will continue on this trajectory; in preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework(opens in a new window). By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects. This post explains how we think about safeguards for models that reach these levels of capability, and ensure they meaningfully help defenders while limiting misuse.
As these capabilities advance, OpenAI is investing in strengthening our models for defensive cybersecurity tasks and creating tools that enable defenders to more easily perform workflows such as auditing code and patching vulnerabilities. Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced.
Like other dual-use domains, defensive and offensive cyber workflows often rely on the same underlying knowledge and techniques. We are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes. Cybersecurity touches almost every field, which means we cannot rely on any single category of safeguards—such as restricting knowledge or using vetted access alone—but instead need a defense-in-depth approach that balances risk and empowers users. In practice, this means shaping how capabilities are accessed, guided, and applied so that advanced models strengthen security rather than lower barriers to misuse.
We see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.
Mitigating malicious uses
Our models are designed and trained to operate safely, supported by proactive systems that detect and respond to cyber abuse. We continuously refine these protections as our capabilities and the threat landscape change. While no system can guarantee complete prevention of misuse in cybersecurity without severely impacting defensive uses, our strategy is to mitigate risk through a layered safety stack.
At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring. We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly. These safeguards are designed to evolve with the threat landscape. We assume change, and we build so we can adjust quickly and appropriately.
Building on this foundation:
- Training the model to refuse or safely respond to harmful requests while remaining helpful for educational and defensive use cases: We are training our frontier models to refuse or safely respond to requests that would enable clear cyber abuse, while remaining maximally helpful for legitimate defensive and educational use cases.
- Detection systems: We refine and maintain system-wide monitoring across products that use frontier models to detect potentially malicious cyber activity. When activity appears unsafe, we may block output, route prompts to safer or less capable models, or escalate for enforcement. Our enforcement combines automated and human review, informed by factors like legal requirements, severity, and repeat behavior. We also work closely with developers and enterprise customers to align on safety standards and enable responsible use with clear escalation paths.
- End-to-end red teaming: We are working with expert red teaming organizations to evaluate and improve our safety mitigations. Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might. This helps us identify gaps early and strengthen the full system.
Ecosystem initiatives to strengthen cyber resilience
OpenAI has invested early in applying AI to defensive cybersecurity use cases and our team closely coordinates with global experts to mature both our models and their application. We value the global community of cybersecurity practitioners toiling to make our digital world safer and are committed to delivering powerful tools that support defensive security. As we roll out new safeguards, we will continue to work with the cybersecurity community to understand where AI can meaningfully strengthen resilience, and where thoughtful safeguards are most important.
Alongside these collaborations, we are establishing a set of efforts designed to help defenders move faster, ground our safeguards in real-world needs, and accelerate responsible remediation at scale.
Trusted access programs for cyberdefense
We will soon introduce a trusted access program where we explore providing qualifying users and customers working on cyberdefense with tiered access to enhanced capabilities in our latest models for defensive use cases. We're still exploring the right boundary of which capabilities we can provide broad access to and which ones require tiered restrictions, which may influence the future design of this program. We aim for this trusted access program to be a building block towards a resilient ecosystem.
Expanding defensive capacity with Aardvark
Aardvark, our agentic security researcher that helps developers and security teams find and fix vulnerabilities at scale, is now in private beta. It scans codebases for vulnerabilities and proposes patches that maintainers can adopt quickly. It has already identified novel CVEs in open-source software by reasoning over entire codebases. We plan to offer free coverage to select non-commercial open source repositories to contribute to the security of the open source software ecosystem and supply chain. Apply to participate here.
Frontier Risk Council
We will be establishing the Frontier Risk Council, an advisory group that will bring experienced cyber defenders and security practitioners into close collaboration with our teams. This council will start with a focus on cybersecurity, and expand into other frontier capability domains in the future. Members will advise on the boundary between useful, responsible capability and potential misuse, and these learnings will directly inform our evaluations and safeguards. We will share more on the council soon.
Developing a shared understanding on threat models with the industry
Finally, we anticipate cyber misuse may be viable from any frontier model in the industry. To address this, we work with other frontier labs through the Frontier Model Forum, a nonprofit backed by leading AI labs and industry partners, to develop a shared understanding of threat models and best practices. In this context, threat modeling helps mitigate risk by identifying how AI capabilities could be weaponized, where critical bottlenecks exist for different threat actors, and how frontier models might provide meaningful uplift. This collaboration aims to build a consistent, ecosystem-wide understanding of threat actors and attack pathways, enabling labs, maintainers, and defenders to better improve their mitigations and ensure critical security insights propagate quickly across the ecosystem. We are also engaging with external teams to develop cybersecurity evaluations. We hope an ecosystem of independent evaluations will further help build a shared understanding of model capabilities.
Together, these efforts reflect our long-term commitment to strengthening the defensive side of the ecosystem. As models become more capable, our goal is to help ensure those capabilities translate into real leverage for defenders—grounded in real-world needs, shaped by expert input, and deployed with care. Alongside this work, we plan to explore other initiatives and cyber security grants to help surface breakthrough ideas that may not emerge from traditional pipelines, and to crowdsource bold, creative defenses from across academia, industry, and the open-source community. Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security.