Evolution stands as one nature's most remarkable and powerful forces. Through countless random mutations across the genome, paired with the persistent selection of variants that best adapt to their environment, life on Earth has flourished in unimaginable ways over millions of years. This complex dance of change and survival has given rise to the vast scheme of biological processes that we witness today.
As college students, we stand on the edge of discovery and innovation, much like the organisms that have developed through the ages.
Embrace the infinity possibilities that evolution has laid before us. Every challenge you face is an opportunity for growth, like the natural selection that shapes life itself. Just as species adapt to their surroundings, we too can adapt, learn, and evolve in our own journey.
Let the wave of evolution inspire everyone to go beyond the boundaries, explore new ideas, and contribute to the ever-evolving narrative of life. Remember, with each step you take, you are participating in the magnificent story of existence ! An undefined adventure waiting to unfold. Seize it!
INSPIRATION
This year, the iGEM 2024 Évry Paris-Saclay team focused on developing a powerful tool for in vivo directed evolution of proteins. Protein evolution, especially enzyme evolution, plays a key role in developing and optimizing new metabolic pathways, creating innovative industrial processes, and designing in vivo and in vitro enzymatic biosensors. Protein engineering consists of mutating DNA sequences to generate variants with enhanced or novel properties, such as stability and specific biochemical functions, including specificity, regioselectivity, and stereoselectivity. It can be managed through three main strategies: i) rational design, ii) semi-rational approaches, and iii) directed evolution.
Rational design focuses on creating enzyme variants by targeting specific residues through site-directed mutagenesis, requiring a deep understanding of the enzyme's 3D structure, mechanisms, and biochemical parameters. This method generates large libraries with low genetic diversity.
Semi-rational approaches combine site-directed mutagenesis and directed evolution, focusing on specific sequences, motifs, or regions of the enzyme without requiring full knowledge of its structure.
These two approaches employ in silico tools, including predictive computational approaches and de novo protein design, to leverage data accumulated over 3.8 billion years of evolution. Structure-based strategies utilize Multiple Sequence Alignment (MSA) to identify consensus sequences, functional domains, and target regions to mutate. Recently, quantum mechanical and molecular dynamics calculations, combined to machine-learning algorithms were exploited to develop tools such as RoseTTAFold [Watson et al., 2022], PocketGen [Zhang et al., 2024], AlphaFold3 (AF3) [Abramson et al., 2024], 310.AI, or EvolutionaryScale [Hayes et al.,2024].
Directed evolution introduces random mutations into the gene of interest (GOI) without previous knowledge of the enzyme's structure or function (Figure 1). One of the most known methods is the continuous culture evolution of strains dependent on enzyme activity or ability to grow in a stressing medium, what is called adaptive evolution. But this often results in a limited number of variants with low genetic diversity [Bloom & Arnold, 2009, Wang et al., 2021].
Figure 1. Directed evolution cycle (adapted from Moore et al., 2018).
To increase genetic diversity, large variant libraries can be generated using both in vitro and in vivo directed evolution tools [Morrison et al., 2020]. Various methods exist for in vitro directed evolution, such as the widely-used error-prone polymerase chain reaction (ep-PCR), though these will not be discussed here as they fall outside the scope of our project. Creating diversity with in vitro directed evolution methods is not a limiting step, these approaches are time- and resource-intensive. Success depends on the ability to screen large numbers of mutants to identify the 'optimal' variant. Traditional screening in 96-well plates can process a few hundred variants, whereas more advanced microfluidic-based methods can screen billions [Gantz et al., 2023], though these require specialized equipments and expertise.
Recently, in vivo mutagenesis techniques have been developed to accelerate and improve targeted protein evolution by combining genetic diversification and selection directly within the organism, reducing the need for ultra-high-throughput screening. In addition, in vivo directed evolution experiments are typically cheaper to carry out than in vitro directed evolution experiments since they do not require as much reagents, enzymes and equipments as the mutagenesis occur in the living cell.
Early in vivo methods used UV irradiation or mutagenic chemicals added to the culture medium to generate genetic diversity. These approaches are cost-effective, require no specialized equipments, and are easy to implement. However, since they involve non-targeted mutagenesis, mutations occur randomly, necessitating a time-consuming screening process to identify desired variants. Despite this limitation, these methods are still employed for generating GMO-free microorganisms and proteins.
More recently, phage-assisted continuous evolution (PACE) was developed as an advanced system that automatically selects the most favorable variants that enhance the activity of the protein of interest during the evolutionary process [Esvelt et al., 2011, Miller et al., 2020].
Targeted in vivo mutagenesis methods have been then more studied and raised interest to overcome the lack of specificity leading to less efficiency. CRISPR/Cas9, the famous genome editing method that was discovered in 2012, revolutionized the field of gene editing by enabling precise and targeted modifications to the DNA of living organisms. The system was further develop to allow mutations to be generated in a window around a specific genomic location, by fusing a non-cliving dead Cas9 (dCas9) to base deaminases and other base modifying enzymes [Komor et al., 2016; Gaudelli et al., 2017; Zhao et al., 2021; Tong et al., 2023] to cite but a few. The main drawback of CRISPR-based targeted evolution methods is that they can only target a limited number of bases in the vicinity of the guide-RNA binding site, usually less than 100 bases [Hess et al., 2016]. While highly efficient, it can cause off-target effects, its success is influenced by sequence composition, and the method can be expensive.
Recently, [Tian et al., 2024] introduced an innovative approach using an orthogonal DNA polymerase (O-DNAP) to create an independent replication system within Escherichia coli cells. This system enables the generation of mutations exclusively in the targeted replicon, without disrupting other normal cellular processes.
Targeting mutations to a specific sequence can also be achieved during transcription. Tools based on bacteriophage T7 RNA polymerase (T7RNAP) [Moore et al., 2018] have proven highly effective. This last system is based on the activity of a T7RNAP fused to a base deaminase (BD), which specifically recognizes a T7 promoter upstream of the coding sequence (CDS) of the GOI in E. coli, but also in eukaryotic cells. During transcription of the target gene, the deaminase within the BD-T7RNAP fusion protein randomly deaminates nucleotides mainly on the non-template strand of T7RNAP. After DNA replication, this process introduces C→T and A→G mutations on both strands (Figure 4). Several methods leverage BD-T7RNAP fusions, including the MutaT7 system [Moore et al., 2018], T7-DIVA [Álvarez et al., 2020], and enhanced MutaT7 (eMutaT7) [Park & Kim, 2021], which aim to improve the MutaT7 system by increasing its efficiency and mutation rate.
Ideally, an in vivo directed evolution system should have a mutagenesis rate comparable to in vitro methods such as error-prone PCR and combine it with an autonomous selection system to amplify the most adapted sequences and get rid of high-throughput screening [Wang et al., 2021].
PHAGEVO
PROJECT OVERVIEW
The aim of our project is to develop a powerful directed evolution tool by combining the PANCE technology (Phage-Assisted Non-Continuous Evolution) which allow directed evolution with a self-selecting screening of the best variants [Esvelt et al., 2011, Miller et al., 2020] and the Evolution.T7 tool developed by the iGEM Evry Paris-Saclay 2021 team [https://2021.igem.org/Team:Evry_Paris-Saclay] which allows targeted mutagenesis on a specific gene of interest.
Among available in vivo continuous directed evolution methods, the PACE technology and its declinations such as PANCE are among the most commonly used [McLure et al., 2022]. The PANCE technology is very useful to avoid a long and fastidious step of screening as the resulting mutant population after the experiment should already have an improved fitness toward the selective pressure applied compared to the initial phage population. However, the evolution part relies on random mutagenesis on the whole system and not only on our gene of interest. It can result in loss of interesting variants if the system becomes non-functional due to mutations in genes essential for bacteria survival, bacteriophage reproduction or selection of the mutated protein.
In contrast, Evolution.T7 is a targeted in vivo mutagenesis method that focuses mutations on a specific gene sequence but does not have an auto-screening method such as PANCE. Therefore, it remains necessary to screen a high number of variants after the mutagenesis step to recover interesting ones.
By combining these two methods in PHAGEVO, we aim to fasten and improve the discovery of new interesting variants by combining the best aspects of in vivo continuous evolution and targeted mutagenesis, with the objective to increase mutation rates and selective pressure while reducing the failure rate of the system.
In parallel, we aim to compare the efficiency of the PHAGEVO system to find the best variants of a protein and the ability of an AI based model to find such variants. We could resume it as “ Who is the best engineer between nature and AI ? “
As a proof of concept, our project focuses on the evolution of a transcription factor, XylS, recently engineered to detect plastic degradation metabolites [Li et al., 2022]. This mutated XylS could then be used to characterize the activity of plastic degradation enzymes, solving a lack of efficient detection system for their activity. It could also be used to detect small quantities of plastic pollution in water, where our actual techniques cannot detect plastic particles inferior to 100 nm [Qian et al., 2024].
Finally, our new directed evolution system and model (who’s the best?) could be applied to many other biological topics and fields that involve proteins.
THE PHAGEVO TECHNOLOGY
PACE is a continuous directed-evolution system based on the M13 phage [Esvelt et al., 2011], and PANCE is the non-continuous version [Miller et al., 2020, Roth et al., 2019] (Figure 2). We chose the non continuous version as it is easier to set up in the lab and is high throughput. Infectivity of M13 phages is dependent on filamentous phage protein III encoded by gene III (pIII / gIII), that binds specifically to F-pilus for entry into the bacteria cytoplasm. In PANCE, the phage genome is depleted from gIII which is replaced by the gene of the protein to evolve (protein of interest, POI) (Selection phage, SP in Figure 2). gIII is expressed from another plasmid (accessory plasmid, AP in Figure 2). In order to evolve the POI toward the desired function, the expression level of gIII must be dependent on POI and linked to an increase in POI fitness. Therefore, the only phages to harbor gIII on their capsid (and therefore only infective phages) will be the ones carrying a genome (Selection phage, SP in Figure 2) with an active POI. Deleterious mutations of the POI will be eliminated because phages carrying inactive POI will lack gIII and be unable to infect bacteria and multiply. Over several cycles of infection, the medium is enriched in phage expressing a POI with beneficial mutations thanks to the natural selection process between the different phages. In classical PACE/PANCE, to increase mutation rates, host bacterial cells carry a mutagenesis plasmid (MP in Figure 2). The most potent one is MP6 which expresses dnaQ926, dam, seqA, emrR, ugi and PmCDA1 that insert mutations, block the DNA repair system and are involved in DNA methylation system.
Figure 2. The principle of PACE and PANCE systems for continuous and non-continuous evolution with SP for selection phage, AP for accessory plasmid, MP for mutagenesis plasmid and POI for protein of interest. The red dots correspond to mutations introduced into the genome and plasmids. Phages are periodically isolated and introduced in a medium with new host cells (adapted from Brödel et al., 2018).
This mutation system finds its limit through the fact that mutations can occur everywhere in the plasmids and bacterial genome. Some of these mutations could affect and alter the replicative functions of the plasmid or the antibiotic gene resistance (selection gene) and then limit the selection of new potential interesting variants. To contain and focus these mutations only into the GOI, and therefore considerably increase the recovery of new variants with the PANCE system, in our project we combine it to Evolution.T7 [https://2021.igem.org/Team:Evry_Paris-Saclay].
Evolution.T7 tool is based on the orthogonal T7 RNA polymerase (T7RNAP) linked to a base deaminase (BD) either a cytosine or an adenosine deaminase (respectively CD and AD), which allows for the rapid generation of genetic diversity in GOI in vivo in E. coli. When BD-T7RNAP fusion protein is expressed, the sequence flanked by the T7 promoter and the T7 terminator(s) gets mutated as the CD or AD randomly deaminates the nucleotides mainly on the non template strand of the T7RNAP. Upon DNA replication, these deaminated bases lead to C→T or A→G transition mutations, depending on whether CD or AD was used (Figures 3 and 4).
Figure 3. The mutation mechanism of the Evolution.T7 system with a base deaminase (BD) fused to the T7 RNA polymerase (T7RNAP) (adapted from Moore et al., 2018).
Figure 4. Mutation mechanisms through base deamination. (A) In DNA, deamination of cytosine by cytosine deaminase converts it to deoxyuridine (B) which pairs with adenosine and leads to C→T mutation. (C) Deamination of adenine by adenine deaminase converts it to deoxyinosine (D) which pairs with C and causes an A→G mutation.
To be able to introduce also T→C and G→A, Evolution.T7 uses also a mutated T7RNAPCGG-R12-KIRV specific to an altered T7CGG promoter sequence which was placed in the reverse orientation downstream of the target region in order to compensate for the above mentioned bias of deaminations occurring mainly on the non template strand (Figure 5).
Figure 5. Schematic of the general organization of the Evolution.T7 system. The GOI is flanked upstream and downstream by the PT7 (sense) and PT7CGG (antisense) promoters, respectively, and by four T7 terminators (B0015, Sba_>000587, T7wt, Sba_000451).
EVOLUTION OF THE XYLS TRANSCRIPTION FACTOR AS A PROOF OF CONCEPT
Plastic polymers, such as phtalate esters (PAEs) and polyethylene-terephtalate (PET), are degraded in nature in various ways. This includes breaking down into toxic microplastics due to physical forces, or into non-toxic monomers like phthalic acid (PA) and terephthalic acid (TPA) through the action of bacterial enzymes, such as PAE hydrolases [Huang et al., 2019; Bhattacharyya et al., 2022] or PETases [Liu et al., 2023].
The detection of these derivatives could be achieved thanks to the recently reported variants of the transcriptional activator XylS of Pseudomonas putida [Li et al., 2022] initially activated by benzoic derivatives (Figure 6). These variants are sensitive to PA and TPA but with only low affinity and specificity. Enhancing XylS specificity toward plastic degradation products would be promising for the design of new generation biosensors that can be further used for the evolution of PAE hydrolases and the PETases enzymes for efficient plastic degradation.
Figure 6. Fluorimetric detection of PA and TPA using XylS variant proteins via whole-cell biosensors (adapted from Li et al., 2022).
AI-BASED MODELING
Our project's computational component focuses on developing a tool to predict new variants of the XylS protein capable of binding specific synthetic ligands: PA and TPA. We started with PocketGen [Zhang et al., 2024] as our baseline model, which we then refined and customized to fit our goals. This refinement involved introducing constraints, such as incorporating mutagenic domain characteristics from the PHAGEVO tool, to guide mutations within biologically relevant regions of the protein.
PocketGen is designed to generate protein pockets by predicting both amino acid sequences and 3D structures required for ligand binding. We leveraged its architecture to adapt the XylS protein to recognize and bind PA and TPA, which aren't its natural ligands. Our modifications included adjusting input features to account for these ligands' chemical properties and integrating domain-specific constraints to direct mutations toward regions most likely to influence ligand specificity without compromising protein stability.
We began by inputting the wild-type XylS protein into our modified PocketGen model, specifying PA and TPA as target ligands. The model generated a set of variant sequences predicted to form binding pockets compatible with these ligands. Throughout this process, we assessed the predicted sequences for structural feasibility and binding affinity using computational evaluations.
To validate our predictions, we synthesized the proposed XylS variants and conducted wet-lab experiments to test their binding capabilities. We assessed each mutant for its ability to bind PA and TPA and for any changes in regulatory activity compared to both wild-type XylS and previously characterized mutants. These experimental results provided valuable feedback, allowing us to further refine our model and improve its predictive accuracy.
By integrating computational modeling with experimental validation, our approach showcases the potential of AI-based tools in protein engineering. The iterative refinement of PocketGen, tailored with specific constraints and validated through experimental feedback, demonstrates the effectiveness of combining deep learning models with domain-specific knowledge. This approach not only speeds up the discovery of functional protein variants but also deepens our understanding of protein-ligand interactions, contributing to advancements in synthetic biology and biotechnology applications.
CONCLUSIONS
Our project describes a cutting-edge approach for the evolution of and selection of novel proteins, with XylS variants having enhanced binding affinities for PA and TPA as our first example. By succeeding in creating this system, our team aims to provide a new XylS with improved sensitivity for the detection of plastic monomers in water that can be further used for the evolution of enzymes for enhanced plastic degradation.
Furthermore, the methodology developed in this project could be extended to other applications in biosensors with other proteins that interact directly or indirectly with nucleic acids such as other transcription factors or DNA polymerases.
REFERENCES
- Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung C-C, O’Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper J. (2024) Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493-500.
- Álvarez B, Mencía M, De Lorenzo V, Fernández LÁ. (2020) In vivo diversification of target genomic sites using processive base deaminase fusions blocked by dCas9. Nat Commun 11, 6436.
- Bhattacharyya M, Basu S, Dhar R, Dutta TK. (2022) Phthalate hydrolase: distribution, diversity and molecular evolution. Environ Microbiol Rep 14, 333–346.
- Bloom JD, Arnold FH. In the light of directed evolution: pathways of adaptive protein evolution. (2009) Proc Natl Acad Sci USA 106, 9995-10000.
- Brödel AK, Isalan M, Jaramillo A. (2018) Engineering of biomolecules by bacteriophage directed evolution. Curr Opin Biotechnol 51, 32-38.
- Esvelt KM, Carlson JC, Liu DR. (2011) A system for the continuous directed evolution of biomolecules. Nature 472, 499-503.
- Gantz M, Neun S, Medcalf EJ, van Vliet LD, Hollfelder F. (2023) Ultrahigh throughput engineering enzyme engineering and discovery in in vitro compartments. Chem Rev 123-9, 5571-5611.
- Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI & Liu DR. (2017) Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464-471.
- Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, Lin Z, Verkuil R, Tran VQ, Deaton J, Wiggert M, Badkundri R, Shafkat I, Gong J, Derry A, Molina RS, Thomas N, Khan Y, Mishra C, Kim C, Bartie LJ, Nemeth M, Hsu PD, Sercu T, Candido S, Rives A. (2024) Simulating 500 million years of evolution with a language model. bioRxiv 2024.07.01.600583.
- Hess GT, Frésard L, Han K, Lee CH, Li A, Cimprich KA, Montgomery SB & Bassik MC. (2016) Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat Methods 13, 1036–1042.
- Huang H, Zhang XY, Chen TL, Zhao YL, Xu DS, Bai YP. (2019) Biodegradation of structurally diverse phthalate esters by a newly identified esterase with catalytic activity toward Di(2-ethylhexyl) phthalate. J Agric Food Chem 67, 8548-8558.
- Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR. (2016) Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424.
- Li J, Nina MRH, Zhang X, Bai Y. (2022) Engineering transcription factor XylS for sensing phthalic acid and terephthalic acid: an application for enzyme evolution. ACS Synth Biol 11, 1106–1113.
- Liu F, Wang T, Yang W, Zhang Y, Gong Y, Fan X, Wang G, Lu Z, Wang J. (2023) Current advances in the structural biology and molecular engineering of PETase. Front Bioeng Biotechnol 11, 1263996.
- McLure RJ, Radford SE, Brockwell DJ. (2022) High-throughput directed evolution: a golden era for protein science. Trends in Chemistry 4, 278-291.
- Miller SM, Wang T, Liu DR. (2020) Phage-assisted continuous and non-continuous evolution. Nat Protoc 15, 4101-4127.
- Moore CL, Papa III LJ, Shoulders MD. (2018) A processive protein chimera introduces mutations across defined DNA regions in vivo. J Am Chem Soc 140, 11560-11564.
- Morrison MS, Podracky CJ & Liu DR. (2020) The developing toolkit of continuous directed evolution. Nat Chem Biol 16, 610–619.
- Park H, Kim S. (2021) Gene-specific mutagenesis enables rapid continuous evolution of enzymes in vivo. Nucleic Acids Res 49, e32-e32.
- Qian N, Gao X, Lang X, Deng H, Bratu TM, Chen Q, Stapleton P, Yan B, Min W. (2024) Rapid single-particle chemical imaging of nanoplastics by SRS microscopy. Proc Natl Acad Sci USA 121, e2300582121.
- Roth TB, Woolston BM, Stephanopoulos G, Liu DR. (2019) Phage-assisted evolution of Bacillus methanolicus methanol dehydrogenase 2. ACS Synth Biol 8, 796-806.
- Tian R, Rehm FB, Czernecki D, Gu Y, Zürcher JF, Liu KC, Chin JW. (2024) Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science 383, 421-426.
- Tong H, Wang X, Liu Y, Liu N, Li Y, Luo J, Ma Q, Wu D, Li J, Xu C, Yang H. (2023) Programmable A-to-Y base editing by fusing an adenine base editor with an N-methylpurine DNA glycosylase. Nat Biotechnol 41, 1080-1084.
- Wang Y, Xue P, Cao M, Yu T, Lane ST, Zhao H. (2021) Directed evolution: methodologies and applications. Chem Rev 121, 12384-12444.
- Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, Ahern W, Borst AJ, Ragotte RJ, Milles LF, Wicky BIM, Hanikel N, Pellock SJ, Courbet A, Sheffler W, Wang J, Venkatesh P, Sappington I, Torres SV, Lauko A, De Bortoli V, Mathieu E, Barzilay R, Jaakkola TS, DiMaio F, Baek M, Baker D. (2022) Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, 2022.12.09.519842.
- Zhang Z, Shen WX, Liu Q, Zitnik M. (2024) Efficient generation of protein pockets with PocketGen. Nature Machine Intelligence 6, 1382–1395.
- Zhao D, Li J, Li S, Xin X, Hu M, Price MA, Rosser SJ, Bi C, Zhang X. (2021) Glycosylase base editors enable C-to-A and C-to-G base changes. Nat Biotechnol 39, 35-40.