Ultra-high-throughput mapping of genetic design space

AI Summary4 min read

TL;DR

CLASSIC combines long- and short-read sequencing to screen over 100,000 gene circuit designs (5-20 kb) in human cells. It enables machine learning models to predict circuit behavior and reveal part composability rules, accelerating synthetic biology design cycles.

Key Takeaways

  • CLASSIC platform combines long- and short-read NGS to screen complex genetic constructs of arbitrary length
  • Can profile over 100,000 gene circuit designs (5-20 kb) in a single experiment in human cells
  • Enables training of machine learning models that accurately predict circuit behavior across design landscapes
  • Reveals genetic part composability rules that govern circuit performance
  • Accelerates synthetic biology design-build-test-learn cycles for complex genetic systems

Tags

Genetic circuit engineeringHigh-throughput screeningMachine learningNext-generation sequencingSynthetic biologyScienceHumanities and Social Sciencesmultidisciplinary

Abstract

Massively parallel genetic screens have been used to map sequence-to-function relationships for a variety of genetic elements1,2,3,4,5. However, as these approaches interrogate only short sequences, it remains challenging to perform high-throughput assays on constructs containing combinations of multiple sequence elements arranged across multi-kb length scales. Overcoming this barrier could accelerate synthetic biology; by screening diverse gene circuit designs and learning ‘composition to function’ mappings, genetic part composability rules could be revealed, enabling rapid identification of behaviour-optimized design variants6,7. Here we introduce CLASSIC (combining long- and short-range sequencing to investigate genetic complexity), a genetic screening platform that combines long- and short-read next-generation sequencing (NGS) modalities to quantitatively assess pools of constructs of arbitrary length containing diverse genetic part compositions. We show that CLASSIC can measure expression profiles of over 105 gene circuit designs (from 5–20 kb) in a single experiment in human cells. The resulting datasets can be used to train machine-learning models that accurately predict circuit behaviour across expansive circuit design landscapes, revealing part composability rules that govern circuit performance. Our study shows that, by expanding the throughput of each design–build–test–learn cycle, CLASSIC enhances the pace and scale of synthetic biology and establishes an experimental basis for data-driven design of complex genetic systems.

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$32.99 / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Using CLASSIC to systematically map the design space of complex genetic programs.
Fig. 2: Using CLASSIC to quantitatively profile a synthetic gene circuit design landscape.
Fig. 3: ML-aided mapping of single-input circuit design space reveals gene circuit design rules.
Fig. 4: ML-guided exploration of multi-input gene circuit behaviour.
Fig. 5: Analysis of digital logic gene circuit design rules in >109-member design space.

Data availability

All Nanopore and Illumina sequencing datasets generated in this study are available from the Sequencing Read Archive (BioProject: PRJNA1347054).

Code availability

All custom scripts used for Nanopore sequencing data analysis are available at GitHub (https://github.com/cbashorlab/WIMPY). Code associated with Illumina data analysis and model training are available at GitHub (https://github.com/cbashorlab/CLASSIC). All other scripts used to generate any analysis in addition to those provided above are available on request.

References

  1. de Boer, C. G. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38, 56–65 (2020).

    Article  PubMed  Google Scholar 

  2. Castillo-Hair, S. et al. Optimizing 5′UTRs for mRNA-delivered gene editing using deep learning. Nat. Commun. 15, 5284 (2024).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  3. Angenent-Mari, N. M., Garruss, A. S., Soenksen, L. R., Church, G. & Collins, J. J. A deep learning approach to programmable RNA switches. Nat. Commun. 11, 5057 (2020).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  4. Sahu, B. et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 54, 283–294 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Jones, E. M. et al. Structural and functional characterization of G protein-coupled receptors with deep mutational scanning. eLife 9, e54895 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Zhang, C., Tsoi, R. & You, L. Addressing biological uncertainties in engineering gene circuits. Integr. Biol. 8, 456–464 (2016).

    Article  Google Scholar 

  7. Kitano, S., Lin, C., Foo, J. L. & Chang, M. W. Synthetic biology: learning the way toward high-precision biological design. PLoS Biol. 21, e3002116 (2023).

Visit Website