To lay the foundation for a new avenue of research into engineering nitrogen-fixing endosymbionts, we investigated protein transport to B. bigelowii ’s symbiotic partner UCYN-A (nitroplast) and implemented proof-of-principle endosymbiosis experiments.
The majority of proteins within mitochondria and chloroplasts are nuclear-encoded – they are expressed by the host and are imported into the organelle. Proteins meant for the organelle are usually marked by a targeting sequence at one end, also known as a transit peptide, which directs the protein to its destination after which it is cleaved.
This is no different with UCYN-A: Coale et al [1] in their 2024 study used proteomics to find proteins encoded by the host and imported into the nitroplast. Upon examining these protein sequences, they noticed that many of them possess characteristics of organellar import – most of them possess a C-terminal 120 amino acid extension compared to their orthologues. This extension is reminiscent of targeting sequences known to exist in mitochondrial [2] and chloroplastic [3] imported proteins. They termed the putative targeting sequence uTP (UCYN-A Transit Peptide, with lowercase “u” to differentiate it from uridine triphosphate).
Our investigations began with an in-depth computational analysis of B. bigelowii ’s proteome [1], [7] to identify potential signals marking proteins for import into UCYN-A. Based on these results, we designed fluorescent protein-transit peptide constructs for expression in model organisms to show that the identified signals indeed localize to UCYN-A. To pave the way for transplanting the nitroplast into new organisms, we also explored the feasibility of physically inserting UCYN-A into a new host by attempting cell fusion experiments. Furthermore, we successfully established a culture of B. bigelowii and tested a new protocol for isolating UCYN-A. These experiments collectively aim to elucidate the mechanisms of UCYN-A's endosymbiotic relationship and lay the groundwork for future engineering of nitrogen-fixing symbionts into new host organisms.
If, like other organelles, UCYN-A relies on proteins imported from the host for normal functioning, characterizing the import system and the targeting sequence is essential before transplanting the organelle into a new host organism. Building upon the work of Coale et al. [1], we aimed to advance the understanding of uTP by identifying its precise sequence.
Starting from the raw proteomics data from [1], we selected 368 proteins expressed by the host and significantly enriched in UCYN-A and performed multiple sequence alignment (MSA). Using the alignment we identified a strongly conserved C-terminal region in many of the imported proteins similar to that reported by [1]. We selected a subset of 206 proteins with highly similar (>60% sequence identity) C-terminal alignments, indicating that these are likely to contain uTP.
Motif analysis confirmed findings similar to [1], revealing 8 conserved motifs in the C-terminal region (Fig 2). Further investigation of motif co-occurrence and relative positioning uncovered common patterns: two motifs consistently appeared near the start of the C-terminal region at fixed positions, followed by various combinations of the remaining motifs. This arrangement is reminiscent of a potential sub-organellar localization mechanism, where the initial two motifs could target UCYN-A, while subsequent motifs may specify localization within the endosymbiont, as is the case with chloroplast targeting, where a bipartite N-terminal targeting sequence specifies stromal and thylakoidal localization. More research is needed however to investigate this hypothesis.
We investigated the relationship between transit peptide (uTP) sequences and the functional core of proteins, known as the mature domain. The mature domain is the part of a protein that remains after the transit peptide is cleaved off and performs the protein's primary function. Given the observed diversity in uTP sequences, understanding their connection to specific mature domains is crucial for designing effective uTP constructs for future experiments. Certain uTP sequences may only be compatible with specific proteins, so to explore potential correlations between uTP motif patterns and mature domain sequences, we trained classifiers to predict the appropriate uTP sequence (by predicting the correct combination of motifs) based on a given mature domain sequence. The classifiers were evaluated using a permutation test [10], with 3 of them yielding statistically significant results (p < 0.05) (Fig 4).
Accuracy | Permutation p-value | F1 Score | |
---|---|---|---|
Logistic Regression | 68% | 0.0010 | 0.61 |
Random Forest | 68% | 0.0010 | 0.49 |
SVC | 45% | 0.0010 | 0.34 |
Decision Tree | 9% | 0.0380 | 0.13 |
For in vivo characterization, we constructed candidate uTP sequences by concatenating the consensus sequences of discovered motif patterns. The uTP sequence classifiers were used to select the correct motifs for the fluorescent proteins we planned to use, mVenus and mNeonGreen. The two sequences with the highest confidence values (uTP1 and uTP2) were selected for in vivo experiments. These sequences were also submitted to the Parts Registry.
uTP1 Amino Acid Sequence
WLEEWRERLECWWGPVGTQTQLGACMGALGLHLGSRLDNEQETQTISAIVAEPGCEWVEEAAPGLPDFPEPFSLPPIPRL
uTP2 Amino Acid Sequence
WLEEWRERLECWWLDPKTQTQLGACMGALGLHLGSRLDIAPYFAWRAALLGRAPPPSARAEPGCEWVEEALDDLPDFPEPFSLPPIPRL
To further validate the constructed sequences, their predicted structure was examined. Structural prediction was performed on all 206 selected uTP-containing B. bigelowii proteins, to uncover the 3D conformation of uTP. The predicted structures were aligned and a consensus structure was created by averaging the aligned regions. This revealed a highly-conserved (stdev per residue position < 1.8Å) structural region with 2 alpha-helices arranged into a U-bend (Fig 5). The structure of constructed mNeonGreen and mVenus + uTP1, uTP2 sequences was predicted and the consensus structure aligned onto them, yielding good alignment (RMSD<=4.0Å), confirming that our constructs will likely behave similar to native uTP-containing proteins.
In the case of mitochondrial and chloroplast imported proteins, there is a well-known overlap: numerous proteins have twin transit peptides or ambiguous targeting sequences targeting both organelles. To investigate whether there is a similar overlap between the potential UCYN-A import system and other known cellular transport systems, we used established protein localization prediction tools on the potential list of UCYN-A imported proteins. These predictions proved to be inconclusive. A large minority (28%) of them were classified as secreted (Fig 6). This suggests that the UCYN-A import system, similar to other protein transport mechanisms, might be related to the Sec system.
We also investigated both sequence and structural homologs of uTP in public databases (NCBI, PDB, AlphaFold/Proteome) and found no significant matches.
The inconclusive homology search and localization prediction results, together with the fact that in most cases transport signals are found on the N-terminal of proteins [2],[3],[8] as opposed to the C-terminal in the case of uTP, suggest the protein import machinery associated with uTP is quite distinct from other known systems in the cell.
The ultimate goal with the uTP sequences we identified is to understand and confirm whether they are indeed responsible for protein import into UCYN-A. Conventional methods to check this would require a toolbox for genetic manipulation of B. bigelowii, not yet available and beyond the scope of this project. We therefore opted for using 2 model eukaryotes for further research on uTP’s behavior, namely C. reinhardtii and S. cerevisiae, and designed an experiment to confirm uTP’s function without modifying B. bigelowii.
We worked off of a S. cerevisiae and a C. reinhardtii backbone, pUDE1311 and pOpt2-mVenusBle respectively, in order to design constructs expressing fluorescent proteins (FP) tagged by known transit peptides as well as uTP. Unmodified, pUDE1311 expresses ymNeongreen and pOpt2-mVenusBle expresses mVenus, a YFP analogue; Both express AmpR for selection on E.coli, and while pOpt expresses a Zeocin resistance gene for selection on C. reinhardtii pUDE expresses URA3 for auxotrophic selection on S. cerevisiae CEN.PK 113-5D, an strain with uracil knockout. We designed 2 constructs for expression in our yeast and 3 in our algae. For our yeast, one construct had uTP inserted in the C-terminus of ymNeongreen and the other had MTS1, a mitochondrial transit peptide [2], inserted in the N-terminus of the fluorescent protein. For our algae, one construct had uTP inserted in the C-terminus of mVenus, while the two others had a chloroplastic (cTP, [3]) and a mitochondrial transit peptide (mTP, [11]) respectively, both inserted in the N-terminus of mVenus. Plasmid maps for our constructs and vectors can be found in our Materials and Methods page. We planned to observe the localization of uTP in the absence of UCYN-A in these species, hypothesizing based on the dry-lab analysis detailed above that we would observe uniform diffusion in the cytoplasm. Cells transformed with the regular pUDE and pOpt plasmids as well as the known transit peptides would serve as controls showing both uniform diffusion as well as localization to organelles respectively.
We used Gibson assembly to construct the uTP-FP and transit-peptide-FP plasmids and transformed E. coli with them. The E. coli was plated in selective medium and each plate was screened in colony PCRs to confirm the presence of each insert, after which positive colonies were cultured in liquid to amplify the plasmid for later isolation. With isolated plasmid constructs, both S. cerevisiae and C. reinhardtii were transformed and selected after colony PCRs for analysis. For our transformation, selection and culturing protocols, see our Materials and Methods page.
All of our inserts appeared proper on the diagnostic PCRs. Due to technical difficulties, we were not able to send over our assembled constructs for sequencing until very late in our project - only when our sequencing results came back we noticed that our pUDE1311-uTP2 construct was in fact not what we thought. Through some strange artifact in our Gibson assembly procedure, the uTP insert assembled into pUC19, a plasmid used as a control for the assembly. We do not know how this happened, as there was no overlap with the overhangs on the uTP insert anywhere on pUC19. In addition, a few hundred basepairs between pUC19 and the insert belong to neither, which is also puzzling. The sequencing results can be found at the bottom of our Materials and Methods page. This unfortunate turn of events so late on our timeline prevented us from progressing further with our experiments, as pUC19 lacks both the fluorescent protein and a yeast promoter. However, imaging our two controls in S. cerevisiae transformed with pUDE1311 and our pUDE1311-MTS1 construct proved the experimental pipeline works, despite the problems with the uTP construct.
Due to time constraints coupled with a long culture time for selection after transformation, we were unable to image our C. reinhardtii transformants.
To verify the function of uTP in vivo, we aimed to deliver purified His-tagged uTP-mNeongreen from our transformed S. cerevisiae cultures into B. bigelowii cells via electroporation. This would be followed up by high resolution imaging to confirm localization of the fluorescent protein to the nitroplast. However, this experiment was prematurely terminated due to the failure of correct uTP assembly into the yeast backbone, as previously described.
Thanks to the generous help of Dr. Kyoko Hagino, a pioneer in research into B. bigelowii, we obtained a culture of B. bigelowii FR-21 [1]. This species is known to be difficult to work with, however, we were able to find the optimal conditions and grow it in our lab in Delft, establishing, to our knowledge, the first B. bigelowii culture in Europe. We followed Kyoko’s advice when deciding on our culture conditions, which can be found in our Materials and Methods page.
To allow the close study of UCYN-A by future iGEM teams and lay some groundwork for UCYN-A transplantation efforts, we developed a new, easier protocol for isolating it from a culture of B. bigelowii, compared to the known method reported by [1]. While Coale et al relied on a multistep procedure with a Percoll gradient and centrifugation, we used a sorting flow cytometer after lysing of the host cells. We confirmed UCYN-A’s presence in the isolate with PCR, with primers targeting part of the 16S rRNA gene exclusive to prokaryotes.
The isolated populations are not completely pure, as each of the three fractions contained UCYN-A. However, after comparing our cytometry plots with those from [1], we hypothesize the third fraction to contain the densest sample of UCYN-A. This is supported by the relative intensity of the bands in the PCR gel, with the third lane being the strongest. This suggests the third population had the highest concentration of UCYN-A, since an equal amount of cells were collected in each sample and all PCR conditions were identical, meaning the intensity is approximately proportional to the amount of DNA in the starting sample.
For future investigation we would recommend the usage of qPCR in order to more precisely quantify the presence of UCYN-A DNA in different isolates.
After identifying the proteins that must be supplied by a host to UCYN-A and the mechanism of their import, the next step to successfully transplanting it into a new host would be the actual insertion of UCYN-A into the host cells. We investigated a protocol to execute this insertion step using S. cerevisiae as a host and E. coli as a stand-in for UCYN-A.
The successful fusion of E. coli into S. cerevisiae has been shown in literature before using polyethylene glycol (PEG) to make the host cells permeable [5]. As a first step we attempted to replicate these results. We used E. coli NCM3722 expressing PlsB-msGFP2 combined with fluorescent microscopy to validate the outcome of fusion.
Despite multiple attempts at the fusion procedure, we were unable to obtain conclusive results. The yeast cells' native autofluorescence interfered with measurements using the fluorescent microscope, since GFP expression in our E. coli strain was not very strong: we hypothesize this is due to the stress experienced by the cells during the fusion procedure. Since the intensity from autofluorescence was too similar to the E. coli's, we could not draw conclusions from our images. To circumvent these problems, selective staining of bacterial RNA or DAPI staining of E. coli prior to fusion were both proposed. Due to time constraints, we could not execute any of these changes.