Early experiments in accelerating science with GPT-5

AI Summary13 min read

TL;DR

GPT-5 accelerates scientific discovery by helping experts synthesize results, conduct literature reviews, and generate proofs in fields like biology and math. It enhances research workflows but requires human oversight for validation and attribution.

Key Takeaways

•GPT-5 aids in synthesizing known results, conducting literature reviews, and generating novel proofs across various scientific disciplines.
•Human-AI collaboration is key, with scientists setting agendas and validating results while GPT-5 provides speed and breadth.
•The model shows limitations in autonomous problem-solving and attribution, emphasizing the need for expert oversight.

What is OpenAI for Science?

The mission of OpenAI for Science is to accelerate scientific discovery: to help researchers explore more ideas, test hypotheses faster, and uncover insights that would otherwise take significant time. We do this by pairing frontier models with the right tools, workflows, and collaborations.

We work closely with researchers across academia, industry, and national labs. These collaborations help us understand where the models are useful, where they fail, and how to integrate them into the scientific process—from literature review and proof generation to modeling, simulation, and experimental design.

Our approach combines two complementary beliefs. Specialized scientific tools, such as simulation engines, protein databases, and computer algebra systems, are essential for efficiency and precision. At the same time, scaling foundation models continues to unlock new reasoning abilities: connecting ideas across fields, sketching proofs, proposing mechanisms, and navigating large literatures conceptually rather than by keyword. Where specialized tools exist, we want to use them; where general reasoning is required, we build models designed to handle it. Both paths reinforce each other.

How scientists are working with GPT‑5 today

The most meaningful progress comes from human–AI teams. Scientists set the agenda: they define questions, choose methods, critique ideas, and validate results. GPT‑5 contributes breadth, speed, and the ability to explore many directions in parallel.

Using GPT‑5 effectively is a skill. Researchers learn how to pose questions, when to push back, how to break problems into steps, and what to validate independently. Productive work often looks like dialogue—researcher and model iterating until a promising direction emerges or the idea is discarded

The current state of GPT‑5 in scientific work

Across these early studies, GPT‑5 appears able to shorten parts of the research workflow when used by experts. It does not run projects or solve scientific problems autonomously, but it can expand the surface area of exploration and help researchers move faster toward correct results.

One emerging capability is conceptual literature search. GPT‑5 can often identify deeper relationships between ideas and retrieve relevant material across languages and less accessible sources. Researchers report finding references, connections, and theses they did not previously know.
In mathematics and theoretical computer science, where structure is explicit and feedback loops are fast, GPT‑5 is especially helpful. Mathematicians have used GPT‑5 to generate viable proof outlines in minutes, transforming work that otherwise might have taken days or weeks. In physics and computational domains, the model can propose simplifying transformations or point to analogous structures in other fields.
In biology and other empirical sciences, the model can propose mechanisms and design experiments to validate these hypotheses in the wet lab.

We are beyond the point where models only summarize existing knowledge. Now, early contributions from GPT‑5 can meaningfully assist researchers under expert oversight. The pace of improvement suggests the potential for deeper acceleration as capabilities and tools advance.

What this looks like in practice: a few case studies

Independent rediscovery of known results at the scientific frontier

Tightening a theorem in convex optimization

Optimization is the math of finding the “best” option—like the lowest training loss or the shortest route in a network. Gradient descent is a basic optimization method that takes repeated small steps downhill on a function. A recent theorem⁠(opens in a new window) by Guy Barzilai, Ohad Shamir, and Moslem Zamani asked when the sequence of values visited by gradient descent forms a convex curve over time (a curve with no dips), which makes the algorithm’s behavior easier to analyze and control. The first version of the paper showed this only for very small, conservative step sizes.

Sébastien Bubeck gave GPT‑5 the weaker version of the result and asked if the condition could be improved, and the model proposed a sharper step-size bound and a cleaner, more standard proof that he then checked carefully by hand; with more thinking time, an internal run of the model even derived the optimal bound from scratch.

GPT‑5’s contribution: GPT‑5 helped Sébastien Bubeck explore a sharper step-size condition and suggest a cleaner proof for a recent convex optimization theorem, which he verified independently.

Read more on page 3(opens in a new window)

Recovering hidden symmetries around black holes

In general relativity, rotating black holes are described by the Kerr solution, and waves moving around them satisfy a complicated differential equation. Physicists look for symmetries of such equations—transformations that leave them unchanged—because symmetries lead to conserved quantities and simple structure. Recent work by Alex Lupsasca showed that the Kerr wave equation has a hidden symmetry structure forming an SL(2,ℝ) algebra, which helps explain why certain tidal responses vanish.

When we asked GPT‑5 Pro directly about the full Kerr problem, it initially failed and reported no interesting symmetries. After Lupsasca gave it a simpler “warm-up” version of the same structure in flat space, we returned to the Kerr case; this time, after about 18 minutes of internal reasoning, the model produced the full set of symmetry generators that close into SL(2,ℝ), matching the human result.

GPT‑5’s contribution: GPT‑5 Pro reconstructed the hidden SL(2,ℝ) symmetry algebra of the Kerr black hole wave equation once given an appropriate warm-up problem, and Lupsasca confirmed the result.

Mechanistic insight in immunology

A key question in modern immunotherapy, especially in CAR-T cancer treatments that rely on engineered T cells, is how to keep beneficial T cells active and durable without pushing them into an exhausted, dysfunctional state. Established literature has shown that transiently limiting glucose metabolism can durably reprogram T cells to be more proinflammatory. In an earlier study, Derya Unutmaz and colleagues briefly treated human CD4+ T cells (a key class of immune cells) with 2-deoxyglucose (2DG), a compound that interferes with glucose metabolism. After removing 2DG and then priming the CD4+ T cells with IL-2 (a signaling molecule that tells T cells to proliferate), they saw a lasting shift toward a proinflammatory Th17-like state—a subtype of T cells involved in both protection and autoimmune disease—and spent months of experiments and reading to arrive at a plausible mechanism explaining this effect.

Years later, he gave GPT‑5 Pro an unpublished figure of flow cytometry scatterplots showing different T cell subsets after treatment with varying glucose and 2DG levels—and asked what might explain the data and what experiments to run next. In about a dozen minutes of back-and-forth, the model suggested that disrupted N-linked glycosylation (how cells attach sugar chains to proteins) during priming was the driver, and predicted that memory (rather than naïve) T cells were responsible. GPT‑5 then proposed specific follow-up experiments, including an elegant mannose rescue experiment that restored N-glycosylation without restoring glycolysis. The lab had previously conducted the mannose rescue experiment, and the results matched the model predictions exactly.

GPT‑5 Pro was then able to analyze unpublished data of CD8+ T cells pulsed with 2DG, and predicted that transient 2DG exposure during CAR-T generation would lead to enhanced killing efficiency against target cancer cell lines. GPT‑5 Pro’s predictions matched the lab’s unpublished experimental data.

GPT‑5’s contribution: GPT‑5 analyzed unpublished data to derive non-obvious and valuable mechanistic hypotheses, identified the acting T-cell subpopulation, and suggested follow-up experiments, which Unutmaz’s lab later tested and confirmed.

Read more on page 11(opens in a new window)

Deep literature search

Linking a new geometric result to other fields

Nikita Zhivotovskiy and his collaborators proved a new theorem in convex geometry—the study of “well-behaved” shapes where any line between two points stays inside the shape. Convex geometry underlies many models in machine learning and statistics. Once the theorem was done, the natural next question was: where else could this result be useful?

Instead of guessing search terms and scanning the literature by hand, Zhivotovskiy gave GPT‑5 the formal statement of the theorem and asked which areas it might connect to. The model pointed to work in density estimation, learning theory, and multi-objective optimization, and surfaced specific references, including several he had not seen and some in other languages.

GPT‑5’s contribution: GPT‑5 helped Nikita Zhivotovskiy identify concrete connections and references across several fields, including materials he had not encountered.

Cleaning up—and contributing to—the Erdős problem database

Paul Erdős posed more than a thousand problems, many of which are tracked on a public website. Some problems are still listed as “open” even though solutions exist in obscure journals or non-English papers. Mehtaab Sawhney and Mark Sellke used GPT‑5 as a literature-search assistant over this database: for each supposedly open problem, they asked it to search for solutions or major partial progress.

GPT‑5 found full solutions for several problems still marked open, identified substantial partial results for others, and flagged a misprint in one problem statement. For Erdős Problem #848, human comments on the site had already outlined much of the structure; GPT‑5 proposed a key density estimate, and Sawhney and Sellke corrected and tightened it into a complete proof that closed the problem.

GPT‑5’s contribution: GPT‑5 assisted in locating missed solutions and proposed a density estimate that Sawhney and Sellke refined into a complete proof of Erdős Problem #848.

Clique-avoiding codes: a cautionary tale

Error-correcting codes add redundancy to data so you can recover information even when bits are corrupted. This project examined a special kind of binary code where each position corresponds to an edge in a graph, and the goal is to rule out any codeword that looks like a “clique” (a fully connected set of nodes). The challenge was to determine how many parity checks are fundamentally required to prevent these structured errors. GPT‑5 reframed the question using quadratic equations over a finite field and highlighted a classical result, the Chevalley–Warning theorem, which immediately pointed to the correct lower bound—showing that only about half as many constraints were needed as previously thought.

An unexpected twist emerged afterward: the exact same bound, and essentially the same proof, had appeared years earlier in a short research paper. GPT‑5 had reproduced the argument without citing its source, only identifying the prior work when asked again in a fresh session. This underscored an important lesson for AI-assisted mathematics: models can generate correct and elegant reasoning, but they may not reliably attribute where those ideas originally came from. Careful verification and attention to attribution remain essential.

GPT‑5’s contribution: GPT‑5 provided the key reformulation and the classical theorem that led to the optimal lower bound. However, the model did not identify the prior publication until explicitly asked, underscoring the need for careful human checks on attribution.

Read more on page 28(opens in a new window)

Working in tandem with AI

Using GPT-5 as a research partner in combinatorics

Tim Gowers, a Fields Medal–winning combinatorialist, ran a series of experiments treating GPT‑5 as a “research partner” rather than a tool for homework-style problems. He gave the model hard combinatorics questions he was actively thinking about and asked it to suggest constructions, find counterexamples, or critique partial arguments.

In multiple cases, GPT‑5 quickly spotted flaws or missing cases in candidate constructions and proposed simpler alternatives or counterexamples; in others, it stalled or failed to make progress. Gowers’ overall conclusion was that the model is already useful as a very fast, very knowledgeable critic that can stress-test ideas and save time, even though it does not yet meet his bar for full co-authorship.

GPT‑5’s contribution: GPT‑5 acted as a fast critic for Tim Gowers, spotting flaws, missing cases, and simpler alternatives during exploratory combinatorics work.

Read more on page 31(opens in a new window)

Interpreting cosmology models

Cosmology uses simplified models to describe the large-scale behavior of the universe, including dark energy and the expansion history. Those models often exist in several mathematically equivalent forms, and small algebraic slips can derail a calculation. Robert Scherrer used GPT‑5 to sanity-check derivations, explore toy versions of cosmological models, and translate between different parameterizations of dark energy.

GPT‑5 was particularly useful in catching algebraic mistakes, suggesting equivalent formulations of the same physical idea, and pointing Scherrer to existing results in the literature that matched the models he was independently deriving. This reduced the friction between having an idea on paper and getting it into a form that could be compared with data.

GPT‑5’s contribution: GPT‑5 assisted Robert Scherrer by checking derivations, suggesting equivalent formulations, and pointing to matching results in the literature.