Collective intelligence for AI-assisted chemical synthesis
TL;DR
MOSAIC is an AI framework using specialized experts to generate executable chemical synthesis protocols from vast literature, achieving a 71% success rate and enabling novel compound discovery.
Key Takeaways
- •MOSAIC leverages collective intelligence from millions of chemical reactions to create reproducible experimental protocols with confidence metrics.
- •The system achieved a 71% success rate in experimental validation, synthesizing over 35 novel compounds across multiple industries.
- •MOSAIC can discover new reaction methodologies not present in its training data, advancing chemical synthesis capabilities.
- •The framework partitions chemical space into searchable expert regions, providing a scalable strategy for AI-assisted discovery.
Tags
Abstract
The exponential growth of scientific literature presents an increasingly acute challenge across disciplines. Hundreds of thousands of new chemical reactions are reported annually, yet translating them into actionable experiments becomes an obstacle1,2. Recent applications of large language models (LLMs) have shown promise3,4,5,6, but systems that reliably work for diverse transformations across de novo compounds have remained elusive. Here we introduce MOSAIC (Multiple Optimized Specialists for AI-assisted Chemical Prediction), a computational framework that enables chemists to harness the collective knowledge of millions of reaction protocols. MOSAIC is built upon the Llama-3.1-8B-instruct architecture7, training 2,498 specialized chemical experts within Voronoi-clustered spaces. This approach delivers reproducible and executable experimental protocols with confidence metrics for complex syntheses. With an overall 71% success rate, experimental validation demonstrates the realizations of over 35 novel compounds, spanning pharmaceuticals, materials, agrochemicals, and cosmetics. Notably, MOSAIC also enables the discovery of new reaction methodologies that are absent from the expert’s training, a cornerstone for advancing chemical synthesis. This scalable paradigm of partitioning vast domains into searchable expert regions enables a generalizable strategy for AI-assisted discovery wherever accelerating information growth outpaces efficient knowledge access and application.
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Prices may be subject to local taxes which are calculated during checkout
Author information
These authors contributed equally: Haote Li, Sumon Sarkar
Authors and Affiliations
Department of Chemistry, Yale University, New Haven, Connecticut, USA
Haote Li, Sumon Sarkar, Wenxin Lu, Patrick O. Loftus, Tianyin Qiu, Yu Shee, Abbigayle E. Cuomo, John-Paul Webster, Robert H. Crabtree, Timothy R. Newhouse & Victor S. Batista
Chemical Development, Boehringer-Ingelheim Pharmaceuticals Inc, Ridgefield, Connecticut, USA
H. Ray Kelly, Vidhyadhar Manee, Sanil Sreekumar & Frederic G. Buono
- Haote Li
Search author on:PubMed Google Scholar
- Sumon Sarkar
Search author on:PubMed Google Scholar
- Wenxin Lu
Search author on:PubMed Google Scholar
- Patrick O. Loftus
Search author on:PubMed Google Scholar
- Tianyin Qiu
Search author on:PubMed Google Scholar
- Yu Shee
Search author on:PubMed Google Scholar
- Abbigayle E. Cuomo
Search author on:PubMed Google Scholar
- John-Paul Webster
Search author on:PubMed Google Scholar
- H. Ray Kelly
Search author on:PubMed Google Scholar
- Vidhyadhar Manee
Search author on:PubMed Google Scholar
- Sanil Sreekumar
Search author on:PubMed Google Scholar
- Frederic G. Buono
Search author on:PubMed Google Scholar
- Robert H. Crabtree
Search author on:PubMed Google Scholar
- Timothy R. Newhouse
Search author on:PubMed Google Scholar
- Victor S. Batista