Molecular basis of polyadenylated RNA fate determination in the nucleus
Abstract
Eukaryotic genomes generate a plethora of polyadenylated (pA+) RNAs1,2, which are packaged into ribonucleoprotein particles (RNPs). To ensure faithful gene expression, functional pA+ RNPs, including protein-coding RNPs, are exported to the cytoplasm, whereas transcripts within non-functional pA+ RNPs are degraded in the nucleus1,2,3,4. How cells distinguish these opposing fates remains unknown. The DExD-box ATPase UAP56 (also known as DDX39B) is a central component of functional pA+ RNPs, and promotes their docking to the nuclear pore complex-anchored TREX-25,6, which triggers transcript release from UAP56 to facilitate export7. Here we reveal that the poly(A) tail exosome targeting (PAXT) connection8 binds a TREX-2-like module, which releases pA+ RNAs from UAP56 for decay by the nuclear exosome. The core of this module consists of a LENG8–PCID2–SEM1 trimer, which we show is structurally and biochemically equivalent to the central GANP–PCID2–SEM1 trimer of TREX-2. Mutagenesis and transcriptomic data demonstrate that the nuclear fate of pA+ RNPs is governed by the contending actions of nucleoplasmic PAXT and nuclear pore complex-associated TREX-2, which interpret RNA-bound UAP56 as a signal for RNA decay or export, respectively. As RNA targets of PAXT are generally short and intron-poor, we propose an overall model for pA+ RNP fate determination whereby the distinct sub-nuclear localizations of PAXT and TREX-2 govern the degradation of short non-functional pA+ RNAs while allowing export of their longer and functional counterparts.
Main
RNA polymerase II (RNAPII) extensively transcribes mammalian genomes, yielding a wide range of unadenylated and polyadenylated RNAs1,2,3,4. Moreover, individual transcription units that generate standard full-length transcripts also give rise to an array of shorter isoforms9,10. Thus, functional RNAs are produced alongside a wealth of futile RNAPII products. Whereas mature functional pA+ RNAs, such as protein-coding mRNAs, are exported from the nucleus to the cytoplasm, their non-functional counterparts are typically retained and degraded3,4. This is primarily achieved by the nucleoplasmic PAXT connection, which consists of a heterodimeric core of the RNA helicase MTR4 and the Zn-finger protein ZFC3H18,11. Additional, and less well-described, interactions with the nuclear pA+ RNA-binding protein PABPN1 and other transiently interacting RNP components, sometimes referred to as extended PAXT components, may aid in directing transcript turnover by the 3′−5′ exonucleolytic exosome complex12,13,14,15. However, how these interactions provide a biochemical basis by which PAXT distinguishes non-functional pA+ RNAs remains a major unresolved question.
Prior to their nuclear export, pA+ RNAs are packaged with proteins into pA+ RNPs. Central to this process is the export factor and DExD-box ATPase UAP56, which is recruited to pA+ RNPs in preparation for their nuclear export5,16. At the nuclear envelope, the activity of the nuclear pore complex (NPC)-associated GANP–PCID2–SEM1 (GANP–PS) trimer of TREX-26 facilitates the release of RNA from UAP56, enabling export7,17. Again, how pA+ RNP sorting is orchestrated to favour the selected export of functional pA+ RNAs is unknown.
Here we interrogate two TREX-2-like human complexes, SAC3D1–PCID2–SEM1 (SAC3D1–PS) and LENG8–PCID2–SEM1 (LENG8–PS), in which the conserved SAC3D1 and LENG8 proteins, respectively, replace GANP. The GANP–PS, SAC3D1–PS and LENG8–PS complexes are structurally similar and share the ability to release UAP56 from RNA. Notably, we show that LENG8–PS offers PAXT a module, that acts on UAP56 to promote transcript turnover in contrast to the RNA export activity of TREX-2. Our findings reveal that nuclear pA+ RNA export and decay utilize a shared biochemical mechanism to act on pA+ RNPs but with fundamentally different outcomes. Based on the substrate preference of PAXT and its separate nuclear localization from TREX-2, we propose a general model for pA+ RNP fate determination.
TREX-2-like complexes release RNA from UAP56
To mediate the docking of export-competent pA+ RNPs at the NPC, UAP56 binds the five subunit TREX-2 complex7 (GANP, PCID2, SEM1, CETN2 or CETN3, and ENY2). Within this complex, UAP56 contacts the TREX-2 complex core (TREX-2M), which comprises PCID2, SEM1 and the SAC3 domain of the scaffolding subunit GANP7 (Fig. 1a, left). The ability of TREX-2M to release UAP56 from the pA+ RNP depends on the conserved ‘wedge loop’ within the SAC3 domain7,17 (Fig. 1b). Notably, similar SAC3 domains are found in the UAP56-interacting LENG8 and SAC3D1 proteins7. Although they are broadly conserved amongst eukaryotes, these proteins share no sequence features with GANP or each other aside from the SAC3 domain (Fig. 1a,b). Moreover, proteome-wide AlphaFold2 screens suggested that SAC3 domains of LENG8 or SAC3D1 form complexes with PCID2 and SEM118,19,20, thus mimicking TREX-2M. Finally, and central to the present study, LENG8 co-immunoprecipitated with PAXT core components ZFC3H1 and MTR48,21 (also see Fig. 2 below) and was shown to interact with PCID2 and SEM1 in both human and yeast cells22,23. Collectively, this prompted us to investigate these TREX-2M-like complexes in more detail.
a, Cartoons of TREX-2 and TREX-2-like core complexes (top) and their domain architectures (bottom). PCID2, dark blue; SEM1, mid blue; SAC3 domain-containing proteins GANP, LENG8 and SAC3D1, shades of blue. Wedge loop positions are shown as grey bars. Regions included in the atomic models in e,f are indicated by black lines. WH, winged helix. b, Multiple sequence alignment of wedge loop sequences of human GANP (UniProt O60318), LENG8 (UniProt Q96PV6) and SAC3D1 (UniProt A6NKF1), with a conserved tyrosine residue anchoring the wedge loop on the SAC3 domain (Yanchor) and the central wedge loop arginine (Rwedge) highlighted. Colouring by conservation (blue letters, conserved residue; blue background, invariant residue). c,d, UAP56 RNA release assay, demonstrating the stimulatory effects of LENG8–PSM (c) or SAC3D1–PSM (d) complexes. Bead-immobilized 15 poly-uridine RNA was incubated with UAP56 and ATP to form UAP56–ADP-Pi–RNA complexes7 and subsequently challenged with recombinant LENG8–PSM or SAC3D1–PSM complexes, or their respective wedge loop mutants. Remaining UAP56–ADP-Pi-RNA complexes were analysed by SDS–PAGE and Coomassie staining. e,f, Cartoon representation of cryo-EM structures of UAP56–RNA–LENG8–PSM (e) or UAP56–RNA–SAC3D1–PSM (f) complexes at resolutions ranging from 6 to 12 Å for UAP56–RNA–LENG8–PSM and 2.6 to 4.5 Å for UAP56–RNA–SAC3D1–PSM. SEM1, blue; PCID2, dark blue; LENG8491–800, light blue; SAC3D148–404, green blue; UAP56, shades of pink; flexible region of UAP56 N-terminal domain, dashed line; RNA, black. e, Bottom left, UAP56 domain structure. g, Details of the SAC3D1 wedge loop–UAP56–nucleotide interactions for UAP56–RNA–SAC3D1–PSM (left) or UAP56–RNA–TREX-2M (right) (stick representation). Superpositions of the wedge loop anchoring tyrosine and the central arginine of SAC3D1 or GANP are shown in the middle. GANP, light blue; other colours as in a. h, Cartoon model of UAP56–RNA–TREX-2 or TREX-2-like complex interactions, including RNA release from UAP56. Left to right: (1) pre-RNA release state: TREX-2 and TREX-2-like complexes bind RNA-clamped UAP56; (2) post-RNA release state: RNA is unclamped from UAP56, leaving UAP56 in an open conformation bound to the TREX-2 or TREX-2-like complex; (3) dissociation.
a, Immunofluorescence analyses of central TREX-2 and TREX-2-like components. Anti-Flag antibody (left column)- and DAPI (mid column)-stained HeLa cell lines (merged signals, right column), expressing C-terminally 3×Flag-tagged endogenous GANP (top row), LENG8 (second row), SAC3D1 (third row) or PCID2 (bottom row). Scale bars, 10 μm. b, Volcano plots of Flag IP–MS analyses of extracts from 3×Flag-tagged GANP (left), LENG8 (mid) or PCID2 (right) cells from a. log2 fold label-free quantification (LFQ) changes of interactor signals in the individual immunoprecipitations over their maternal HeLa cell line control were plotted against −log10-transformed two-sided limma-moderated Student’s t-test P values calculated over biological triplicate data. TREX-2-like, core PAXT, exosome, TREX-2 and NPC components are colour-coded and labelled. c, Heat map of mean intensity-based absolute quantification (iBAQ) values, with control immunoprecipitation values subtracted, from triplicate immunoprecipitation experiments with 3×Flag-tagged GANP, LENG8 and PCID2, conducted without (−) or with (+) Pierce universal nuclease treatment. Displayed proteins as in b, and with EXOSC1–EXOSC5 and EXOSC10. d, Colocalization coefficients between ZFC3H1 and Flag immunofluorescence signals in maternal HeLa cells (from Extended Data Fig. 4j) and HeLa cells expressing 3×Flag-tagged ZFC3H1, LENG8, PCID2 or GANP. Red (displayed in magenta) and green channels were used for the detection of ZFC3H1 and Flag signals, respectively. Example cells with staining overlap and total numbers (n) of cells imaged are indicated above the plot. In all box plots, the centre line is the median, box edges delineate the interquartile range and whiskers represent the distribution of data points within 1.5× interquartile range (Methods). e, AlphaFold2 model of interacting regions of ZFC3H1 (top) and LENG8 (bottom), shown in cartoon representation with interface residue as sticks (middle). The conserved F301 of LENG8, which is critical for interaction, is highlighted. f, Volcano plot as in b, but displaying ZFC3HΔ730−747–3×Flag relative to wild-type ZFC3H1–3×Flag sample data. Constructs were expressed in HeLa cells expressing ZFC3H1–2×HA–dTAG that were treated with dTAGV-1 to deplete endogenous ZFC3H1. Note additional colour coding of extended PAXT components and UAP56 (see g). WT, wild type. g, As in f, but for LENG8(F301A)–3×Flag relative to wild-type LENG8–3×Flag constructs expressed in dTAGV-1-treated cells expressing LENG8–2×HA–dTAG, h. Cartoon depicting localization-distinct TREX-2-like modules.
Given the critical role of UAP56 in pA+ RNP export via TREX-2, we hypothesized that LENG8 and SAC3D1 might target UAP56-bound RNPs to different cellular fates. To address this, we first explored the structure–function relationships of LENG8 or SAC3D1 with UAP56 in vitro. As previously achieved for GANP7, we purified stable recombinant complexes of the SAC3 domain-containing constructs of LENG8491–800 or SAC3D148–404 in the presence of PCID2–SEM1 (constituting LENG8–PSM or SAC3D1–PSM; Supplementary Fig. 2a,b). Both complexes could bind UAP56 in the presence of the non-hydrolysable ATP analogue adenylyl-imidodiphosphate (AMP-PNP) and a 15-nucleotide poly-U RNA substrate (Extended Data Fig. 1a,b, lanes 1–5). TREX-2M facilitates the release of ADP and Pi from UAP56, thus accelerating the rate-limiting step in the disassembly of UAP56–RNA complexes, releasing free UAP56 available for RNA re-binding, and resulting in an increased apparent ATPase activity7. Similarly, the LENG8–PSM or SAC3D1–PSM complexes stimulated the apparent ATPase rate of UAP56 in the presence of RNA and ATP in vitro, revealing approximately a 290-fold and 60-fold stimulation, respectively (Extended Data Fig. 1c,d). Moreover, substituting a highly conserved arginine residue in the LENG8 or SAC3D1 wedge loops with an alanine7 (LENG8(R563A) or SAC3D1(R102A)) (Fig. 1b, Extended Data Fig. 1e and Supplementary Fig. 2a,b), did not affect UAP56 binding (Extended Data Fig. 1f,g and Supplementary Fig. 2c), but largely abrogated the ATPase stimulatory activity on UAP56 (Extended Data Fig. 1h,i). Finally, to test whether LENG8–PSM and SAC3D1–PSM, like TREX-2M (ref. 7), would promote the release of RNA from UAP56, we incubated UAP56 with RNA and ATP to form UAP56–ADP-Pi–RNA complexes, which we immobilized on streptavidin beads via the biotinylated RNA7. These complexes were then challenged with either LENG8–PSM or SAC3D1–PSM, revealing that both moieties released UAP56 efficiently (Fig. 1c,d, compare lanes 4 and 5), whereas the respective wedge loop mutants did not (Fig. 1c,d, lane 6). Of note, mutating three residues targeting the UAP56 N-terminal domain (NTD) and UAP56 RecA2-binding interfaces of LENG8 (LENG8(TRR)) led to diminished LENG8–PSM–UAP56 interaction (Extended Data Fig. 1j,k and Supplementary Fig. 2a) and parallel declines in both the apparent ATPase activity (Extended Data Fig. 1h, lane 7) and the release of UAP56 from RNA (Extended Data Fig. 1l). We conclude that TREX-2-like complexes, like TREX-2, bind UAP56 and trigger the release of its bound RNA through a shared mechanism.
Although previous structural studies of UAP56–TREX-2M complexes had revealed their protein–protein interfaces, it remained unclear how the wedge loop functions in releasing UAP56 from RNA. To investigate the molecular basis for this function, we analysed LENG8–PSM and SAC3D1–PSM complexes with UAP56 in the presence of 15-nucleotide poly(U) RNA and ATP or AMP-PNP using cryo-electron microscopy (cryo-EM). This revealed a fraction of complexes without UAP56, enabling us to solve the structures of apo LENG8–PSM and SAC3D1–PSM at 3.5 Å and 3.6 Å resolution, respectively (Extended Data Table 1). Both complexes showed the same V-shaped architecture previously observed for TREX-2M (Extended Data Figs. 2a and 3 and Supplementary Figs. 3a–c and 4a–c) and a yeast LENG8–PSM complex23,24. Unexpectedly, two-dimensional class averages of the UAP56-engaged fractions of LENG8–PSM and SAC3D1–PSM suggested that UAP56 could be in a closed, RNA-bound state, prior to its release via the wedge loop (Extended Data Fig. 2b). Together with our previously reported UAP56–TREX-2M structure7, in which UAP56 was captured after RNA release, this enabled us to investigate the RNA-releasing mechanism of SAC3 domain-containing complexes. We resolved the cryo-EM structures of UAP56–LENG8–PSM and UAP56–SAC3D1–PSM complexes in the pre-RNA release state (Fig. 1e,f, Extended Data Fig. 3, Supplementary Figs. 3d–g and 4f,g and Extended Data Table 1). A severe bias in particle orientation limited resolution to 6–12 Å for UAP56–LENG8–PSM in the RNA-clamped pre-release state. Reconstitution of the complex with ATP yielded a higher resolution structure at 4.9 Å containing density only for the UAP56 NTD (Supplementary Fig. 4d,e). We could, however, resolve the structure of UAP56–RNA–SAC3D1–PSM to 2.6 Å, enabling a detailed structural analysis. The structure of the pre-release state shared key architectural features with UAP56–TREX-2M, including the anchoring of the NTD of UAP56 at the base of the SAC3D1–PS complex. Truncating the UAP56 NTD reduced the affinity of UAP56 for both SAC3D1–PSM and LENG8–PSM by more than 30-fold, as measured by grating-coupled interferometry (Extended Data Fig. 2c) and supported by in vitro pulldown assays (Extended Data Fig. 1a,b, lanes 6 and 7). Thus, the UAP56 NTD is equally important for TREX-2-like complex and TREX-2-complex7 interactions. In addition, the UAP56–SAC3D1–PSM structure provided insights into the action of the wedge loop. In the structure, this region (residues Y100–P111; Fig. 1b and Extended Data Fig. 1e) is bound near the two RecA lobes through largely electrostatic interactions between the peptide backbone and UAP56 residues R135 in RecA1 and K334 in RecA2 (Fig. 1g and Extended Data Fig. 2d). The critical R102 wedge loop residue in SAC3D1 forms a hydrogen bond with UAP56 E354, positioning R102 close to F381 in the RecA2 lobe of UAP56. By contrast, in the post-release state observed for UAP56–TREX-2M, this central wedge loop arginine (R102 in SAC3D1, R678 in GANP) replaced UAP56 F381 in the nucleotide binding site (Fig. 1g, right). The positioning of the wedge loop arginine in the clamped state might prime it to replace UAP56 F381 in a subsequent step, releasing RNA from UAP56 (Fig. 1g,h and ref. 7).
The RNA-clamped RecA lobes of UAP56 are bound between PCID2, the wedge loop and the SAC3 domain in these SAC3 domain-containing complexes. Notably, the protein–protein interfaces between UAP56 and PCID2 in both TREX-2M and the TREX-2M-like complexes involve only few specific interactions, except for the UAP56 NTD7, suggesting that PCID2–SEM1 has an architectural role in ensuring specificity for UAP56. Indeed, superposition of the evolutionarily related and RNA-bound form of the DExD-box ATPase EIF4A316 onto the UAP56–SAC3D1–PSM structure revealed clashes between EIF4A3 and PCID2 (Extended Data Fig. 2e). Consistently, LENG8–PSM bound UAP56, but not the closely related DExD-box proteins EIF4A3 and DDX19 in vitro (Extended Data Fig. 2f and Supplementary Fig. 2d) and did not stimulate the EIF4A3 ATPase (Extended Data Fig. 2g).
We conclude that human cells contain three structurally and biochemically equivalent SAC3 domain-containing complexe

