Collective intelligence for AI-assisted chemical synthesis

AI Summary3 min read

TL;DR

MOSAIC is an AI framework using specialized experts to generate executable chemical synthesis protocols from vast literature, achieving a 71% success rate and enabling novel compound discovery.

Key Takeaways

  • MOSAIC leverages collective intelligence from millions of chemical reactions to create reproducible experimental protocols with confidence metrics.
  • The system achieved a 71% success rate in experimental validation, synthesizing over 35 novel compounds across multiple industries.
  • MOSAIC can discover new reaction methodologies not present in its training data, advancing chemical synthesis capabilities.
  • The framework partitions chemical space into searchable expert regions, providing a scalable strategy for AI-assisted discovery.

Tags

CheminformaticsChemical synthesisTechnologyScienceHumanities and Social Sciencesmultidisciplinary

Abstract

The exponential growth of scientific literature presents an increasingly acute challenge across disciplines. Hundreds of thousands of new chemical reactions are reported annually, yet translating them into actionable experiments becomes an obstacle1,2. Recent applications of large language models (LLMs) have shown promise3,4,5,6, but systems that reliably work for diverse transformations across de novo compounds have remained elusive. Here we introduce MOSAIC (Multiple Optimized Specialists for AI-assisted Chemical Prediction), a computational framework that enables chemists to harness the collective knowledge of millions of reaction protocols. MOSAIC is built upon the Llama-3.1-8B-instruct architecture7, training 2,498 specialized chemical experts within Voronoi-clustered spaces. This approach delivers reproducible and executable experimental protocols with confidence metrics for complex syntheses. With an overall 71% success rate, experimental validation demonstrates the realizations of over 35 novel compounds, spanning pharmaceuticals, materials, agrochemicals, and cosmetics. Notably, MOSAIC also enables the discovery of new reaction methodologies that are absent from the expert’s training, a cornerstone for advancing chemical synthesis. This scalable paradigm of partitioning vast domains into searchable expert regions enables a generalizable strategy for AI-assisted discovery wherever accelerating information growth outpaces efficient knowledge access and application.

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$32.99 / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Author information

Author notes
  1. These authors contributed equally: Haote Li, Sumon Sarkar

Authors and Affiliations

  1. Department of Chemistry, Yale University, New Haven, Connecticut, USA

    Haote Li, Sumon Sarkar, Wenxin Lu, Patrick O. Loftus, Tianyin Qiu, Yu Shee, Abbigayle E. Cuomo, John-Paul Webster, Robert H. Crabtree, Timothy R. Newhouse & Victor S. Batista

  2. Chemical Development, Boehringer-Ingelheim Pharmaceuticals Inc, Ridgefield, Connecticut, USA

    H. Ray Kelly, Vidhyadhar Manee, Sanil Sreekumar & Frederic G. Buono

Authors
  1. Haote Li
  2. Sumon Sarkar
  3. Wenxin Lu
  4. Patrick O. Loftus
  5. Tianyin Qiu
  6. Yu Shee
  7. Abbigayle E. Cuomo
  8. John-Paul Webster
  9. H. Ray Kelly
  10. Vidhyadhar Manee
  11. Sanil Sreekumar
  12. Frederic G. Buono
  13. Robert H. Crabtree
  14. Timothy R. Newhouse
  15. Victor S. Batista

Visit Website