Project Description

Project Overview


Protein engineering is the foundational method behind every novel biologic, medicine, and enzyme. Directed evolution (DE) remains the predominant approach, earning Frances Arnold a Nobel Prize in 2018, alongside George Smith and Gregory Winter for their work on phage display. Recently, the rapid advance of machine learning has positioned computational de novo design as a potential complement to DE in the kit of tools for protein engineering. Despite its importance in synthetic biology, many protein engineering methods remain out of reach for scientists outside of industry or well-funded academic labs. The high cost and specialized nature of reagents for library generation and bioreactors for culture-based selection pose significant barriers to entry. Platforms for de novo design are similarly inaccessible due to the steep learning curve of computational methods and their perceived commercial value. Lowering the barriers is crucial to democratizing synthetic biology on a global scale.


In-Vivo Approach (Directed Evolution)

To lower the barriers to entry for DE, our team addressed accessibility for both the biological (wet lab) and hardware tools (dry lab). From a literature review and interviews with scientists currently using Phage-Assisted Continuous Evolution (PACE) for protein engineering, we identified one of the largest obstacles to broader adoption: the cost of hardware and reagents. In response, we developed optogenetic parts to replace the chemical inducers used in PACE and created complementary bioreactor hardware with integrated light-based controls.


Our wet lab team reduced the cost barrier by utilizing well-characterized optogenetic gene expression systems. We determined that a combined induction and repression system would provide the necessary control for PACE. We cloned the UirS/UirR from synthetic DNA and successfully characterized its expression under dual UV-activated induction and green-activated repression.


Concurrently, our dry lab team focused on designing an affordable, open-source turbidostat bioreactor. Turbidostats represent the bulk of initial costs for PACE and other continuous culture methods. After several design iterations, we developed a simplified Arduino-based bioreactor with UV, blue, green, and red LEDs for optogenetic control of cells in continuous culture. With access to a 3D printer, the hardware kit can be assembled for under $200.


In-Silico Approach (Machine Learning)

For this approach, we identified highly regarded and experimentally validated models for de novo protein design using structure-based inference. From an extensive literature review, we decided to use a combination of RFDiffusion and LigandMPNN to generate de novo structures from a given protein backbone. These tools generated candidate small peptide binders complementary to a known binding site on HIV reverse transcriptase, a disease relevant target with a well-resolved structure and described mechanism. Using the methodologies developed by ColabDesign, we built a massive library of peptide structures and then performed pseudo-binding assays using machine learning-based prediction validated on public data.


Our goal was to develop a user-friendly, one-shot approach for designing small peptide binders to target proteins. The workflow allows users with little to no machine learning experience to input a target protein and generate a set of candidate binding peptides. Our model outputs binders of appropriate length with predicted nanomolar affinities, making de novo design of peptides with novel function readily accessible.