Project Description

AI-Driven Enzyme Engineering for PFAS Degradation Using Protein LLM, Expression Classifier, and Cell-Free Expression

Home Image

PFAS, or "forever chemicals," are toxic compounds linked to cancers, infertility, and vaccine resistance at concentrations below 1 ppt.

Current methods to degrade PFAS are costly and could result in a highly toxic byproduct. Our high school iGEM team is using AI to design enzymes that will destroy PFAS chemicals. We are employing AI to design enzymes that break down PFAS through a four-step process:

1. Using protein large language models (pLLMs) and an enzyme classifier to generate novel PFAS-degrading enzymes.

2. Computationally validating the predicted catalytic activity of these enzymes against PFAS.

3. Assessing the expressibility of the enzymes in a cell-free system to ensure feasibility for lab testing.

4. Sending successfully expressed enzymes to a specialized lab to confirm PFAS degradation activity, and as input for an expression classifier.

By combining AI-driven protein design with experimental validation, our approach seeks to provide an efficient and cost-effective solution to combat the environmental and health risks posed by PFAS.


FourStepPlan

Our Four Step Plan

Our enzyme generation pipeline contains four interconnected steps or modules. The first step is where we use a protein large language model or pLLM to generate novel PFAS degrading enzymes. Next we computationally validate that our enzyme has activity with PFAS. The third step is where we plan to utilize an expression classifier that predicts if sequences coming from the second step are expressible in a cell free expression system called TXTL. Enzymes predicted to be expressible in TXTL are then sent to the wet lab team to be expressed in TXTL. Whether the enzyme is successfully expressed or not is experimentally identified to ensure enzyme yield and strengthen future expression predictions. Then lastly, we will send our enzymes to a separate lab to test them on a PFAS substrate. Our module’s interactions and feedback allow us to tailor our process towards an optimal system for creating novel PFAS-degrading enzymes.

See below for a more in depth explanation of our process.


First Step

In the first step, we begin by fine-tuning protein large language models (pLLM). Fine-tuning is when a model is given existing data that it is trained off of so that it produces a specific output. For a protein LLM or pLLM we are using natural enzyme candidate sequences predicted to have PFAS degrading capabilities as out training data. This will allow for the model to generate PFAS degradation enzymes since that is what we are feeding into the model.

Quick Explanation - Protein Large Langauge Model(s) (PLLM)

We use three pLLMs for novel enzyme generation: ZymCTRL, ProtGPT2, and ESM2. After fine-tuning, the pLLM will be biased toward the protein structures within the training data. This means that its output should retain specific key catalytic and structural motifs necessary for PFAS degradation while still having novel aspects. Any sequences generated at this step with erroneous amino acids are filtered out, and sequences with a folded structure dissimilar to natural proteins are also filtered out.

Second Step

In the second step, vetted generated sequences (gen-seqs) are put through a computational process that gauges their predicted catalytic activity with PFAS. If a sequence is predicted not to have activity with PFAS, we will alter our training data and other aspects of the protein generation procedure. This creates a feedback loop within the system that selects sequences that are more likely to degrade PFAS.

Third Step

Sequences predicted to have activity with PFAS move on to the third step, where their expressibility in a cell-free expression system (TXTL) is predicted computationally. Sequences predicted to not be expressible in TXTL influence our initial generation run so that we are not only selecting for sequences that have activity with PFAS but that are also expressible. Sequences predicted to be expressible are then expressed in TXTL in the lab. The results of this in-lab expression experiment will allow us to make more informed computational predictions for future generation runs.

Fourth Step

In the fourth step, successfully expressed enzymes will be sent to a professional lab that can handle PFAS. The lab’s team of experts will use our enzymes in a PFAS degradation assay to validate their catalytic activity. If degradation is not observed, we will alter our generation procedure and our prediction mechanisms.

And Finally...

If degradation is observed, we will have effectively reached our goal of creating a novel PFAS-degrading enzyme!