ENGINEERING | GCM-KY

Overview

Part Engineering
- References
Vcell Engineering
- Cycle 1
- Cycle 2 Cycle 3
AI initiatives
Umbrella Sampling

Design

Goal: Our project’s goal was to effectively detect PFAS (specifically, PFOA) by utilizing E. coli for detection.

With this, we design 3 main pathways:

prmA Promoter construct: We tested the prmA promoter last year with Rhodococcus jostii, however we weren’t able to get significant results. We retested this promoter this year with E. coli, placing it upstream of a superfolder GFP gene to see if E. coli would fluoresce when exposed to PFOA.
FAB-GFP Conjugate construct: Using the FAB-GFP conjugate molecule created by Dr. Bryan Berger, initially meant to react to fatty acids, we designed a system that would theoretically detect PFAS binding, initiating GFP fluorescence. We hypothesized that PFAS would bind similarly to fatty acids due to receptor similarities.
Synthetic Transcription Factor construct: Based on work done by Dr. Dossani in yeast, we designed a system of two plasmids: one containing a synthetic transcription factor that would respond to estradiol and another containing a hybrid promoter linked to GFP. Since there was already some research indicating that PFAS could interact with estradiol receptors, we wanted to see whether this system would function in E. coli as a method of PFAS detection.

BsaI and SapI restriction sites were added to both sides of gene inserts before printing from Genscript to enable restriction digestion to check for the presence of the inserts in the plasmids.

Build

All part sequences are deposited into the registry and are publicly available. All genes were printed from Genscript and subcloned into pUC57-Kan or pUC57.

prmA Promoter construct: The prmA promoter construct was transformed into E. coli and plated on ampicillin-selective agar plates. After heat shock transformation, successful colonies were identified via colony screening, followed by plasmid extraction using a miniprep. Nanodrop analysis indicated DNA concentrations ranging from 200-300 ng/µL, confirming successful plasmid isolation for further testing in PFAS detection experiments.

FAB-GFP Conjugate construct: The FAB-GFP conjugate plasmid was transformed into E. coli and plated on kanamycin-selective agar. White colonies were selected through blue-white screening, indicating successful transformation. Plasmid DNA was extracted using miniprep, with concentrations ranging from 170-350 ng/µL. A restriction digest using Eco31i enzyme confirmed the integrity of the construct, preparing it for PFAS-binding efficacy tests.

Synthetic Transcription Factor construct 1 (Kanamycin): The synthetic transcription factor (STF) construct was transformed into E. coli and screened via blue-white colony screening. White colonies resistant to both antibiotics were selected, and plasmid DNA concentrations ranged from 160-250 ng/µL after miniprep. A restriction digest confirmed the proper assembly of the STF construct, facilitating its use in estradiol receptor interaction experiments for PFAS detection.
Synthetic Transcription Factor construct 2 (Ampicillin): The STF construct with ampicillin resistance was transformed into E. coli. Following colony screening, white colonies were cultured, and plasmid DNA was extracted using miniprep. Nanodrop analysis confirmed DNA concentrations, and a restriction digest verified the structural integrity of the construct, readying it for testing in PFAS detection through synthetic transcription factor mechanisms.

Restriction digest (Eco31i/BsaI) and gel electrophoresis were performed on all plasmids to ensure our inserts were present

All inserts appear to be present.

Test

prmA Promoter construct:After successfully transforming the E. coli with the prmA promoter construct, we tested its ability to induce GFP expression in response to PFAS exposure. A 96-well plate was prepared with E. coli cultures containing the prmA promoter construct and exposed to varying concentrations of PFOA. Fluorescence readings were taken every 90 minutes using a plate reader over a 12-hour period.
FAB-GFP Conjugate construct: To test the efficacy of the FAB-GFP construct in detecting PFAS, E. coli cultures containing the construct were exposed to increasing concentrations of PFOA in a 96-well plate format. Fluorescence was measured at 90-minute intervals for a 12-hour period.
Synthetic Transcription Factor construct 1 (Kanamycin): The synthetic transcription factor (STF) construct with kanamycin resistance was tested for its response to estradiol receptor activation in the presence of PFAS. Cultures of E. coli containing the STF construct were grown in 96-well plates and exposed to various concentrations of PFOA (0 to 1000 ppm). Fluorescence measurements were recorded every 90 minutes using a plate reader.
Synthetic Transcription Factor construct 2 (Ampicillin): The synthetic transcription factor (STF) construct with ampicillin resistance was tested for its response to estradiol receptor activation in the presence of PFAS. Cultures of E. coli containing the STF construct were grown in 96-well plates and exposed to various concentrations of PFOA (0 to 1000 ppm). Fluorescence measurements were recorded every 90 minutes using a plate reader.

PREFACE: We planned to collect data up to the 48-hour timestamp in 1.5-hour increments. However, the fluorometer we were using had technical issues, resulting in data points only up till the 3-hour timestamp.

prmA-GFP

Our first construct used the prmA promoter characterized by United States Air Force Academy (2019) and Stockholm (2020), as well as by previous literature. The results of the fluorescence of our construct are shown below:

Graph 1 displays the fluorescence values over time of a colony from plate 1 AFTER subtracting the LB broth and PFOA solution’s fluorescence values at the corresponding times. From the graph, it’s evident that there was an increase in fluorescence as PFOA was added, as can be seen by the difference between the blue line, which represents 0 micromolar concentrations of PFOA, and the other lines, which represent higher concentrations. This difference indicates that PFOA may have led to an uptick in GFP production. However, when looking at the values of the fluorescence intensities themselves, which are all negative, show that the LB Broth with PFOA solution had a higher fluorescence than the fluorescence of the cells on their own. Although we aren’t sure of the reason behind this, it may be because of the addition of the cells, which reduced the natural fluorescence of the LB broth, or because of errors in the measurement of the fluorescence from the fluorometer. More testing is needed to determine if the constructs work.

The data in Graph 2 implies that all cells produce basal fluorescence over time based on the increasing fluorescence reading across all cells. The fluorescence may be affected by PFAS since the fluorescence at any given time point appears to be ordered by PFAS concentration, however, more testing is needed to determine if the ordering is statistically significant and not an artifact of any inaccuracies in the fluorimeter’s readings.

prmA tested with H2O2

Graph 3 displays the fluorescence values over time of a colony that was taken from plate 1, containing E. coli transformed with the prmA gene construct, after subtracting the LB broth fluorescence values at the corresponding times. From this, we can see there was an uptick in GFP production with higher concentrations of H2O2 concentrations; the blue line, which represents a 0 micromolar concentration of hydrogen peroxide, has a lower fluorescence value than the other lines. However, once again we see, when looking at the values of the fluorescence intensities themselves, which are all negative, show that the LB Broth with PFOA solution had a higher fluorescence than the fluorescence of the cells on their own.

The data in Graph 4 implies that all cells produce basal fluorescence over time based on the increasing fluorescence reading across all cells. The small spacing between curves makes it very difficult to establish any causality between fluorescence and H2O2, so more testing is needed to determine if the ordering is statistically significant and not an artifact of any inaccuracies in the fluorimeter’s readings.

FAB-GFP

The FAB-GFP mechanism was taken from previous literature (https://www.nature.com/articles/s41598-023-41953-1). According to their results, they found that the FAB-GFP complex was capable of fluorescing in E. coli after exposure to concentrations of PFAS

Graph 5 displays the fluorescence values over time of a colony taken from plate 1, containing E. coli taken from the FAB-GFP construct after subtracting the LB broth fluorescence values at the corresponding times. From this, we can see there was an increase in fluorescence when PFOA was added. This is evident by comparing the heights at different points of the 0 uM PFOA concentration line to the others. However, this trend doesn’t compare with the other values. The 50 μM concentration line begins by being lower than the 5 μM concentration. Despite this, from the 1.5-hour mark onwards, it remains higher than both the 5 μM and the 250 μM concentration levels. Additionally, at 0 hours, the 250 μM was lower than the 5 μM, not following the traditional trend of a direct relationship between PFOA concentration and fluorescence intensity. Since we know that the concentration of PFAS doesn’t directly correlate to the fluorescence intensity (exemplified by this graph’s data), we can still confirm the fact that PFOA concentration increases the fluorescence intensity compared to no PFOA at all.

The data in Graph 6 implies that all cells produce basal fluorescence over time based on the increasing fluorescence reading across all cells. The fluorescence may be affected by PFAS since the fluorescence at any given time point appears to be ordered by PFAS concentration, however, more testing is needed to determine if the ordering is statistically significant and not an artifact of any inaccuracies in the fluorimeter’s readings.

Estrogen Receptor Synthetic Transcription Factor

The estrogen receptor synthetic transcription factor (STF) and its promoter was taken from previous literature (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5873372/ ). These researchers primarily developed the STF to detect various concentrations of estradiol in yeast cells. They also developed a hybrid promoter dedicated to the binding sites of the STF to upregulate the transcription of GFP. Since PFAS is known to be an agonist for estrogen receptors, it was hypothesized that the estrogen synthetic transcription factor may also bind to PFAS chemicals, causing a conformational change that may also upregulate transcription in the hybrid promoter Plex.

Graph 7 displays the fluorescence values over time of a colony that was taken from plate 1, containing E. coli taken from the Estrogen Receptor STF construct after subtracting the LB broth fluorescence values at the corresponding times. Since this graph doesn’t follow the previous and confirmed trend of fluorescence intensity increasing when PFAS (PFOA) is added, it can be hypothesized that when applied in real-time, this construct is less likely to be able to provide accurate results via fluorescence.

The data in Graph 8 implies that all cells produce a basal fluorescence over time based on the increasing fluorescence reading across all cells. The amount of fluorescence at any time point appears to be inversely related to PFAS concentration, however more testing is needed to determine if the ordering is statistically significant and not an artifact of any inaccuracies in the fluorimeter’s readings.

Attached is the raw data from lab testing

After testing, there was no significant difference in fluorescence as the amount of PFOA added increased. This applies to every construct we tested (prmA, FAB-GFP, STF). For a detailed look at results, please refer to our results page.

Learn

Overall, all three of our constructs had inconclusive results. A common reason for this inconclusiveness may be because of the technical issues our team faced in the lab with the fluorometer. However, more issues and areas of further experimentation are listed below.

prmA construct

After testing E. coli transformed with the prmA-GFP construct, the results were inconclusive. According to USAFA 2019, there has been a precedent that the PrmA promoter is sensitive to PFAS concentrations. However, those findings were tested in Rhodococcus jostii, suggesting that the prmA promoter may not be induced in the presence of PFAS due to the potential lack of transcriptional elements native to Rhodococcus and absent in E. coli. Based on an extensive literature review, we hypothesize that the prmA promoter requires R. jostii specific Fis family transcriptional regulators and cAMP-receptor proteins to function properly. We confirmed these results by conducting a BLAST of R. jostii cAMP-receptor proteins and Fis family transcriptional regulators against the E. coli genome, which did not return any sequences with >50% identity. E. coli likely lacks the transcriptional machinery for the prmA promoter to function as it does in R. jostii. It was also hypothesized that the prmA promoter region was induced by an abundance of H2O2 caused by oxidative stress from the presence of PFAS. So, when our team also tested prmA-GFP in the presence of H2O2, the results were inconclusive as well. Based on our team's results, it may be that the H2O2 abundance is not the correct mechanism that causes the prmA promoter to upregulate transcription, or there may have been human error when testing concentrations of H2O2, suggesting that further research must be completed to identify the mechanism of prmA’s sensitivity to PFAS.

FAB-GFP

When tested at our lab to confirm, the results came out inconclusive. One major reason may be that our genetic construct had a few missense mutations, which may be the leading cause of the lack of promising results. Based on the molecular dynamics simulations, there are many possibilities for the improvement of binding affinities through mutations, which may translate to better results in a practical setting. Additionally, according to Virtual Cell results (detailed below), the limiting factor in the fluorescence of FAB-GFP is the availability of PFAS in the environment (using association constants given by the original author). Thus, this opens two avenues to explore: decreasing binding affinities even further, because of the simplicity of Virtual Cell and its models, and innovating a way to increase the fold change in fluorescence of the bound FAB-GFP compared to the unbound (essentially making the FAB-GFP fluoresce more when PFAS or a fatty acid is bound to it), which requires increased structural understanding of the protein.

Estrogen receptor STF construct

There may be several reasons why our fluorescence results came out inconclusive. One major reason may be that the PFAS chemicals may not bind correctly to the STF, which prevents its activation of expression on the hybrid promoter. Another reason may be that the conformational change does not correctly occur to activate expression on the hybrid promoter. In addition, there may be native key transcription factors that are necessary to induce the hybrid promoter that is only found in yeast cells and absent in E. coli, suggesting that there is significantly less transcription in our E. coli. The last contributing factor to our inconclusive results may be that the VP16 activator domain may not have functioned as desired in our E. coli. VP16 is native to herpes simplex virus proteins which mainly target eukaryotic cells. The RNA polymerase in eukaryotic cells is fundamentally different compared to prokaryotic cells, which means that the VP16 may not have successfully recruited the polymerases to the DNA as desired.

These reasons call for the need for future research on and testing to confirm the viability of the use of the estrogen receptor STF as a potential mechanism for biosensing of PFAS in E. coli.

↑ Mann, M. M., & Berger, B. W. (2023, September 13). A genetically-encoded biosensor for direct detection of perfluorooctanoic acid. Nature News. https://www.nature.com/articles/s41598-023-41953-1
↑ Smathers, R. L., & Petersen, D. R. (2011, March 1). The human fatty acid-binding protein family: Evolutionary divergences and functions - human genomics. BioMed Central. https://humgenomics.biomedcentral.com/articles/10.1186/1479-7364-5-3-170
↑ Tuttle, A. R., Trahan, N. D., & Son, M. S. (2021). Growth and maintenance of escherichia coli laboratory strains. Current Protocols, 1(1). https://doi.org/10.1002/cpz1.20
↑ Ali Azam T, Iwata A, Nishimura A, Ueda S, Ishihama A. Growth phase-dependent variation in protein composition of the Escherichia coli nucleoid. J Bacteriol. 1999 Oct;181(20):6361-70. doi: 10.1128/JB.181.20.6361-6370.1999. PMID: 10515926; PMCID: PMC103771.
Dossani, Z. Y., Reider Apel, A., Szmidt-Middleton, H., Hillson, N. J., Deutsch, S., Keasling, J. D., & Mukhopadhyay, A. (2018, March). A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering. Yeast (Chichester, England). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5873372/
ACS Publications: Chemistry journals, books, and references published ... (n.d.). http://pubs.acs.org/doi/full/10.1021/ja026939x?mobileUi=0

VCell Engineering

Overall Purpose

A Virtual Cell (VCell) is a useful tool to simulate the reactions of complex reaction pathways to compute the predicted concentrations of each species at any given point in time. It is capable of deterministically simulating reactions with differential equations and mass-action kinetics or stochastically by probabilistically simulating individual chemical reactions based on collisions. Our aim with Vcell this year was to computationally predict which gene construct would be the most effective in producing GFP in the presence of PFAS. We only simulated a single cell for time and simplicity, however, future works could expand on our work by simulating multiple cells and testing new pathways for PFAS detection.

Cycle 1

In Cycle 1, we first had to build the structure for the future cycles. First, we had to research the constructs we had to design to find out how to start constructing them. We decided to first look at the prmA_GFP construct and end with the pLex_GFP construct. We first started with the design of the constructs. Each construct had the same geometry.

Geometry:

Cell Size: 3.5 uM^3

Environment: 1000 uM^3

Cell size was based on Harvard Bionumbers

Environment size was determined by taking the reciprocal of the approximate max density of E. coli in batch culture, about 10^9 cells per milliliter of water (Tuttle et al., 2021), which is about 1000um^3.

Once a construct was designed, we would input rate constants and run stochastic simulations with the construct. In the simulations, we would use varying amounts of PFAS and see how much GFP was produced.

Simulation Parameters:

Time: 10,000 seconds

PFAS uM : 0.01, 0.1, 0, 1, 5, 10, 50

Solver: Gibson

Most rate constants were estimated, and few were taken from the literature review. This is a problem we would like to address in future studies, as this construct would produce more SF_GFP. There is a direct relationship between PFAS concentrations and SF_GFP production. This construct helped us prepare for future models as they followed a similar structure. However, since most of the rates were estimated, we cannot say for certain how accurate this model is. In future studies, we will look for more rate constants to have a higher accuracy in the output of this model.

Construct 1: prma_GFP

Design

We designed the reaction network based on last year's reaction network, PFAS Detector V2. We changed the reaction diagram so that when the pRMA_operon and the PFAS bonded and produced the complex, the complex would produce the GFP. However, the GFP we used in our reaction diagram was a variant of GFP called SF_GFP (Superfolder GFP). Some reactions were taken from literature or pulled from the PFAS DetectorV2 Biomodel, but others were estimated. This construct was the first one of the 4 constructs we diagramed and simulated on Vcell, and this construct came as a learning guide for the creation of future constructs.

Cell volume was chosen as 3.5 uM^3, while the environment volume was set to 1000 uM^3, and time was measured in seconds.

Model name: pRMA_GFP construct 1 v1.3

Owner: Pillow123

Build

prmA_GFP Construct #1:

Test & Results

We ran the model with a variety of different starting conditions with stochastic solvers. We were able to visualize SuperFolder GFP or SF_GFP production in response to varying levels of PFAS.

The graph was made using the Google Sheets platform by extracting the data tables from VCell and pasting them onto sheets. We were then able to create a graph with multiple independent variables.

Conclusion and Learnings

We can see from the data that, having a higher molarity of PFAS in the system, the construct would produce more SF_GFP. There is a direct relationship between PFAS concentrations and SF_GFP production. This construct helped us prepare for future models as they followed a similar structure. However, since most of the rates were estimated, we cannot say for certain how accurate this model is. In future studies, we will look for more rate constants to have a higher accuracy in the output of this model.

Construct 2: FAB_GFP

Design

We designed the reaction network building on a paper we found on this topic. The FAB_GFP gene would be constantly produced by the pConst promoter and would bind with the PFAS to produce the complex. Similar to the construct before, most rate constants were estimated; however, the binding affinity between the PFAS and the FAB_GFP protein was calculated, and other rate constants were pulled from literature studies. This was the 2nd of the 4 constructs we had, and it provided a base design for future constructs that we designed.

Cell volume was chosen as 3.5 uM^3, while the environment volume was set to 1000 uM^3, and time was measured in seconds.

Model name: FAB_GFP_Construct v1.1

Owner: Pillow123

Build

prmA_GFP Construct #1:

Test & Results

The results for the PFAS_FAB_GFP were different than the prmA_GFP construct, but the style of the simulations was the same. All the same, constants were used (1000 seconds and the same solver). There were different rate constants overall showing differences in numbers, but both of them provided convincing results.

The graph was also made using Google Sheets. The data was imported from Vcell as an HDF5 file and then put onto Google Sheets, where the graph was made. The graph used several ranges, allowing for us to have several lines. The legend on the side helps improve the comprehensiveness of the graph and readability. Additionally, a variety of colors can increase the readability of the graph as lines are now distinguished from one another.

Conclusion and Learnings

We can see a direct relationship between the molarity of PFAS and the amount of PFAS_FAB_GFP being present in the cell. The higher the micromolar count, the higher the count of PFAS_FAB_GFP. This is similar to the pRMA_GFP graph, which shows a common relationship. This model helped become a model for future constructs and helped provide rate constants for future graphs.

Construct 3: Synthetic Transcription Factor

Design

This construct had a similar design to the FAB_GFP construct; however, the main difference is that when PFAS and the Synt Transcription Factor were bound, they produced GFP. This model was where the FAB_GFP model helped a lot. Rate constants were estimated and similar to those of the FAB_GFP model, but the binding affinity of PFAS and Synt_Tran factor was calculated. This was the third of the 4 constructs, with the last being pLex_GFP.

Cell volume was chosen as 3.5 uM^3, while the environment volume was set to 1000 uM^3, and time was measured in seconds.

Model name: Synthetic_Transcription_factorv1.1

Owner: Pillow123

Build

SyntTranscriptionFactor_GFP Construct #3:

Test & Results

We ran stochastic simulations on the reaction diagram with varying molarities of PFAS and ran to see its effect on the GFP protein. Different rate constants were used, which caused variance in the amount of GFP produced between the constructs.

Unlike the other graphs, we can see in all the uM values of PFAS that there were near 0 for the first 100 seconds. Then they all separated around 1250 seconds.

Conclusion and Learnings

We can see, like all other graphs, that this one had a direct relationship. The more PFAS in the system, the more GFP was produced. Since there was no promoter leak, so PFAS 0uM stayed at 0uM, which makes logical sense. A future study on this construct would be to simulate the effects of promoter leakage on GFP production at different PFAS concentrations.

Construct 4: pLex_GFP

Design

We designed the reaction network building off of last year’s reaction network PFAS Detector V2, similar to the prmA_GFP construct. We simply replaced the prmA_operon with the pLex_operon and changed the GFP variant. In this construct, unlike the first construct, the GFP variant will simply be GFP, unlike the SF_GFP variant used in the prmA_GFP construct. Also like the prmA_operon, the pLex_operon has a leak, which we estimated using scientific studies. The pLex_operon would bind with the PFAS to create the complex, which would then produce the GFP_mRNA.

Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.

Model name: pLex_GFP Construct

Owner: Pillow123

Build

pLex_GFP Construct #4:

Test & Results

We can see that each PFAS uM value has a varying amount at the 2000-second mark, but all start to curve from there too. The 2000-second mark represents their "max.”.

Conclusion and Learnings

With this last construct, we can analyze all the graphs and come to a conclusion on which pathway was the most successful in detecting PFAS. However, additionally, cycles may be necessary as the rate constants were heavily estimated and few were based on scientific research. However, looking at the graph, we can see that the 0.01 uM graph produced the least while the 50 uM produced the most, showing a direct relationship between the uM of PFAS and the GFP count. Additionally, due to a promoter leak, the 0uM produced more than the 0.1uM and 0.01uM. This was also the only graph that produced a true stochastic curve, while the others produced exponential graphs.

VCell Modeling Cycle 2

Construct 3: SyntTranscriptionFactor_GFP

Design

Further research shows that a combination of constructs 3 and 4 enhances the production of GFP in the presence of PFAS. Due to an error in the creation of constructs 3 and 4, which were supposed to work together to produce the GFP, we had to recreate construct 3, which is a combination of the 2 constructs. When PFAS is introduced into the system and binds with the synthetic transcription factor, it allows it to bind to the pLex promoter, which induces transcription. Using this information, a new model was developed using previous rate constants from the models, and new simulations were run.

Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.

Model name: Construct 3_Cycle 2

Owner: Aryanshah16

Build

SyntTranscriptionFactor Construct:

The previous construct 3 and 4 were combined into one Biomodel.

Test & Results

We ran stochastic simulations on the reaction diagram with varying molarities of PFAS and ran to see its effect on the GFP protein production. Different rate constants were used, which caused variance in the amount of GFP produced between the constructs.

The graph was made using the Google Sheets platform by extracting the data tables from VCell and pasting them onto sheets. We were then able to create a graph with multiple independent variables.

Conclusion and Learnings

Going through the rate constant, where PFOA binds to the synthetic transcription factor, it directly correlates with the amount of GFP produced. Multiplying the rate constant by ½ would result in a 50% decrease in GFP production, etc. GFP was produced in similar amounts based on each volume of PFAS.

Cycle 3

Design

In the previous iterations, there were rate constants that were fully estimated without sources. In this cycle, a new source was added to each rate constant that did not have one previously. All sources can be found within our publicly available VCell Biomodels. PFAS testing ranges were recomputed. Regulations are on the parts per trillion (ppt) level, so we used 1 ppt as our minimum concentration. Assuming 1 ppt=1 ng/ml and the molar weight of PFOA, a common model PFAS, is 414 g/mol, that means 1 ppt is approximately 2E-6 uM of PFOA. Additionally, we realized stationary phase E. coli are smaller than 3.5 um^3 (Azam et al., 1999) and are closer to 1 um^3. Since real-life application of our biosensor would likely use cells that are already at steady state equilibrium, we ran simulations where protein and mRNA are at steady state equilibrium before PFAS is introduced. Steady-state concentrations were determined using deterministic models that ran until concentrations ceased fluctuations. For the synthetic transcription factor Biomodel, a new degradation pathway for synthetic transcription factor molecules bound to PFOA to be degraded was introduced. For the FAB-GFP model, PFAS is made to be regenerated when the PFAS.FAB-GFP protein is degraded.

Build

Model name: Construct3_Cycle3_dgl Model owner: dglVcell

Model name: FAB_GFP_Construct_v1.1_Douglas Model owner: dglVcell

Test

Simulations were run for 100 minutes (6000 seconds). Every plot shown is stochastic using the Gibson solver.

The top 2 graphs are for the synthetic transcription factor construct. The bottom two graphs are for the FAB-GFP construct.

Learn

The improved models showed very intriguing results. A PFAS-inducible transcription factor with a relatively strong binding affinity appears to produce the greatest amount of signal in response to a very small amount of PFAS. Even when there is only about a single PFAS molecule per cell (at 10^-6 uM), the synthetic transcription factor system can produce a large amount of GFP (about 5000 molecules per cell). The FAB-GFP system appears to be limited by the amount of PFAS that can bind to the GFP. At extremely low concentrations of PFAS, like those set as the minimal safety limits, there may be only one bound FAB-GFP producing fluorescence in a cell. For both constructs, it appears that the cell actually depletes the amount of PFAS that is available externally due to strong binding affinities. It is important to note that while the binding affinity of FAB-GFP has been experimentally characterized by the original creators, the binding affinity of the synthetic transcription factor was set to be the empirical binding affinity of human estrogen receptor alpha to PFOA, which may not accurately reflect the actual binding affinity of PFOA or other PFAS to the entire synthetic transcription factor.

AI Initiatives

At an earlier stage in the project, we attempted to use AI to design proteins that could be used to bind to PFAS, acting as a biosensor. While we were not able to incorporate a fully functional model into the final project, we have documented our efforts and progress so that the work could be expanded on.

Umbrella Sampling

Overall Purpose

The Umbrella Sampling technique used in molecular dynamics is especially useful in this project as it manually enhances the sampling of complex energy landscapes of protein data banks, involving free energy calculations and calculations of other thermodynamic variables. It can overcome large energy barries and simulate rare events that other rudimentary simulations would not be able to achieve.

Design

The program was designed as a pipeline between various molecular dynamics tools. However, the most important part of the pipeline is the Umbrella Sampling process indicated by the caption on the arrow between the windows rectangle and metafile/cv files rectangle.

Build

Below is the pseudocode outlining the program above including the SMD pulling loop, Umbrella Sampling, wham analysis, and binding coefficient algorithm. This paraphrases the code itno simple words, and the full code in python is in the software gitlab repository.

              
                BEGIN

                # Setup Directories
                DEFINE master_path as current directory
                DEFINE test_path as subdirectory "tests"
                DEFINE forcefield_path, pdb_path
                IF directories do not exist
                    CREATE necessary directories (e.g., pdb, windows, cv, hist)

                # Test Folder Naming
                SET test_name = nextFolder()
                DEFINE new_path as directory for current test
                IF new_path does not exist
                    CREATE directory structure for new test (cv, windows, hist)

                PRINT "New folder created"

                # Initialize System
                LOAD PDB file from pdb_path
                CREATE Modeller to check and add missing hydrogens
                APPLY forcefield to the system

                # Define System and Simulation
                SET system with forcefield, constraints, and parameters
                INITIALIZE Langevin integrator for temperature and timestep
                SET positions and velocities for the system
                MINIMIZE system energy

                # Parameters for Steered Molecular Dynamics (SMD)
                DEFINE starting and ending values for collective variable (CV)
                DEFINE pulling force constant, velocity, and number of windows
                DEFINE steps and recording intervals for simulation

                # Add pulling force to system
                CREATE pullingForce as harmonic force on CV
                ADD pullingForce to the system
                INITIALIZE simulation with pulling force

                # Perform Steered Molecular Dynamics (SMD) Loop
                FOR each window
                    PERFORM simulation steps
                    UPDATE pulling force parameter r0
                    RECORD collective variable (CV) values
                    IF CV reaches target value for the current window
                        SAVE system coordinates
                    END IF
                    UPDATE progress bar
                END FOR

                # Save Windows and CV Data
                FOR each saved window
                    SAVE window coordinates as PDB file
                END FOR

                # Run Simulations on Each Window
                DEFINE function run_window(window_index)
                    LOAD coordinates from corresponding window
                    SET system positions to window coordinates
                    RESET temperature and run simulation for steps
                    RECORD collective variable (CV) values for each window
                    SAVE CV values to file
                END FUNCTION

                CALL run_window() for each window

                # Generate Histogram and Save
                FOR each window
                    LOAD CV data
                    GENERATE histogram of CV values
                    SAVE histogram plot
                    CREATE metafile entry for WHAM analysis
                END FOR

                # WHAM Analysis
                PREPARE WHAM input arguments
                RUN WHAM on metafile to compute PMF
                SAVE WHAM output and log

                # Post-WHAM Processing
                LOAD PMF data from WHAM output
                PARSE coordinates, free energies, and probabilities
                PLOT free energy and probability graphs
                SAVE graph as PNG file

                # Compute Equilibrium Constant (Kd)
                COMPUTE delta_G_bound and delta_G_unbound from PMF
                CALCULATE standard free energy difference (ΔG)
                COMPUTE dissociation constant (Kd) from ΔG
                OUTPUT Kd value

                END

Test & Results

This program was tested with the PDB that had protonated PFOA in FAB-GFP. Although deprotonated PFOA could be simulated in antechamber, we were unable to generated a forcefield file for the program to use to simulate deprotonated PFOA. Thus we had to use protonated pfoa where the forcefield was able to generate with a program called LigParGen, named and stored as:

umbrellasampling/tests/forcefields/pfoa_protonated.xml

1. The protonated PFOA in the PDB was made by manually adding a hydrogen atom to PFOA through ChimeraX and edited into the FAB-GFP again.
2. To prevent NaN coordinates, the simulation's energy was minimzied by moving the protiens and PFOA around. The conformal differences of:
UNL 1 Refers to the protonated PFOA, and the conformational changes of the protein fab-gfp in window 0 that follow from the minimized energy process (via openMM) is in red. The added hydrogen according to ChimeraX is in solid green.
UNL 377 Refers to the deprotonated PFOA, and the conformational changes of the protein fab-gfp that follow from the minimized energy process (via antechamber) is in green with reduced opacity.

The setup of the test includes the following paraemters:

Li	Lf	index1	index2	num_win	fc_pull	v_pulling	total_steps	increment_steps	wTotal	wDelta
3.0 nm	3.2 nm	14 (Hydrogen UNL 1)	5439 (Hydrogen SER 349)	100	1000.0 kj/mol/nm^2	0.02 nm/pc	30,000	10	100,000	10

The window 0 pdb file looks as follows with the two atoms that were pulled indicated:

The results of the PMF graph and histogram graph are as shown below.

The Kd and delta G calculated from the PMF graph is:

Dissociation Constant	ΔG°
2.26 M	0.48224 kcal/mol

Learn

Although we were unable to simulate deprotonated PFOA with the program, the major takeaways with the test was the stark differences in the behaviors of protonated and deprotonated PFOA.
1. When the energy was minimized, the PFOA was put outside the binding pocket of FAB-GFP, thus the very high dissociation constant, or low binding strength, was explained by this result.
2. The deprotonated PFOA had a much lower dissociation constant, thus much higher bonding strength, explained by being inside the binding pocket of FAB-GFP.

References

Blinov, M. L., J. C. Schaff, D. Vasilescu, Moraru, II, J. E. Bloom, and L. M. Loew. 2017. Compartmental and Spatial Rule-Based Modeling with Virtual Cell. Biophysical journal 113:1365-1372. PMC5627391
Cai, L., Friedman, N., & Xie, X. S. (2006). Stochastic protein expression in individual cells at the single molecule level. Nature, 440(7082), 358–362. https://doi.org/10.1038/nature04599
Cowan, A. E., Moraru, II, J. C. Schaff, B. M. Slepchenko, and L. M. Loew. 2012. Spatial modeling of cell signaling networks. Methods Cell Biol 110:195-221. PMC3288182
Schaff, J., C. C. Fink, B. Slepchenko, J. H. Carson, and L. M. Loew. 1997. A general computational framework for modeling cellular structure and function. Biophysical journal 73:1135-1146. PMC1181013
Ali Azam T, Iwata A, Nishimura A, Ueda S, Ishihama A. Growth phase-dependent variation in protein composition of the Escherichia coli nucleoid. J Bacteriol. 1999 Oct;181(20):6361-70. doi: 10.1128/JB.181.20.6361-6370.1999. PMID: 10515926; PMCID: PMC103771.
Tuttle, A. R., Trahan, N. D., & Son, M. S. (2021). Growth and maintenance of escherichia coli laboratory strains. Current Protocols, 1(1). https://doi.org/10.1002/cpz1.20