Integrating Computational Simulations

Integrating computational simulations with mathematical and physical models was a fundamental approach to evaluating the performance of our parts and overall system. This process proved crucial, not only during the design and simulation stages—where multiple iterations were explored before laboratory experiments—but also after acquiring experimental data, allowing for continuous refinement.

In the context of this theoretical-experimental integration, our modeling efforts aimed to:

By integrating measurements from various project components, our modeling efforts played a central role in both the development and comprehension of the newly designed BARBIE1 protein via Molecular Dynamics simulations and the statistical thermodynamic analysis of the proposed filtration system.

What results are you going to find on the model page:

Protein Engineering - Methodology:

How we modeled a new protein using a diffusion process for better binding with plastic;

How we improved our pipeline for generating protein for specific ligands;

Computational Simulations:

Parts Evaluation - with the newly designed part, we evaluated some important properties not only for our pipeline but also for the filter, such as the comparison with the original protein, homo-oligomer, electrostatic, and hydrophobicity analysis;

Plastic Interaction through Molecular Dynamics - creating a computational simulation of our protein and plastic molecule in solution, it was possible to comprehend the binding sites, assess the interaction energy, and the interaction in different conditions (i.e., pH and temperature);

Spidroin Formation through Molecular Dynamics - moreover, we also simulated a spidroin system in different pH conditions, allowing us to visualize the fiber formation in time;

Protein Corona Formation:

Computational Simulation - in real life, knowing the protein is relatively smaller than the microplastic size, it is expected to create a protein layer around the particle, which is explored in the protein corona’s computational simulation;

Analytical Model - for properly simulating this behavior, we created an analytical model to allow visualizing the protein-corona creation around microplastic;

Statistical Models:

Filtering Microplastics - using statistical physics, we created a thermodynamic model for calculating retention probability of our system, which allowed us to assess its efficiency;

Microplastics Concentration for Saturation - using the Monte Carlo algorithm, we tested randomly several configurations for plastic adsorption, in a way it was possible to find the minimum energetic state that represents the saturation concentration;

Integrated Filter Modeling - in the light of the efficiency and saturation data, we were able to assess the filter modeling in time properly and its saturation;

From Laboratory to Industry:

Scalability Modeling - using the Monod model, it was possible to calculate important steps on the expression and purification for our system to bring it from the laboratory to the industry;

Price Expectations - with the knowledge acquired from the scalability.

Protein Engineering - Methodology

In this section, we describe the starting point of our project: Protein Design and Engineering. In particular, we clarify both of the methodologies cycles adopted.


Cycle 1

By integrating multiple AI-based methodologies, the first protein design cycle enabled us to design a better plastic-binding protein for retaining plastics in our filter.

As the starting point, the Bacillus anthracis protein BaCBM2 structure model was generated by the AlphaFold2 software1 since it already shows important plastic binding properties2. The next step was performing docking assays with six types of plastic oligomers: PP, PE, PET, NY, PVC, and PS. We made the docking predictions using the Gnina software3 with relaxed parameters to screen many binding sites, including the possible artifacts.

Subsequently, the overlaps produced by the docking assays were removed using ChimeraX software4, also employed for structural visualization. Reverse folding was then performed on the protein output from the docking using the LigandMPNN tool5, which was configured with optimal parameters for sequence diversity (temperature coupling) and higher weights for plastic-binding residues, such as the aromatic residues described in the original BaCBM2 paper6. This process generated 6,000 sequences for each ligand, resulting in a total of 6 plastics × 6,000 sequences = 36,000 sequences, as illustrated in Figure 1.

Figure 1. The protein-ligand docking representation of the BARBIE1 protein with six different plastics is shown, with PS, PVC, and PP in the top panel, while Nylon-6,6, PE, and PET are displayed in the bottom panel.
Figure 1. The protein-ligand docking representation of the BARBIE1 protein with six different plastics is shown, with PS, PVC, and PP in the top panel, while Nylon-6,6, PE, and PET are displayed in the bottom panel.

As an initial approach, the consensus sequence from the 36,000 sequences was retrieved to capture structural features shared equally across all tested plastics. This innovative strategy led to the creation of our optimized protein sequence, sensitive to multiple plastic types, which we named BARBIE1!

The BARBIE1 primary sequence was then modeled using AlphaFold2 and compared to BaCBM2, showing a highly similar fold (Root Mean Square Deviation = 0.56 Å) and experimentally comproved using circular dichroism (circular dichroism link). The BARBIE1 protein model was redocked with the same plastics as before, using the Gnina software. The results, showing its predicted affinities, are illustrated in the bar plot in Figure 2. When comparing the predicted affinities of the inspiration, BaCBM2, with the designed protein, a substantial increase in affinity is observed across all plastics, particularly for PE, PP, and PS, highlighting the potential of the processing pipeline developed by our team.

Figure 2. Dissociation constant comparative for 6 different plastics between BARBIE1 and BaCBM2.
Figure 2. Dissociation constant comparative for 6 different plastics between BARBIE1 and BaCBM2.

In addition to testing oligomeric forms, we also aimed to evaluate the affinity using longer plastic chains to ensure this would be a valuable parameter for future analysis and implementation. Therefore, we tested PE and PET ligands with 50 and 25 repeating units, respectively. As shown in Figure 3, the previous trend of BARBIE1 maintaining a higher dissociation constant (KD) compared to BaCBM2 was preserved.

Figure 3. Dissociation constant comparative between BARBIE1 and BaCBM2 for a PE molecule with 50 units and PET with 25 units.
Figure 3. Dissociation constant comparative between BARBIE1 and BaCBM2 for a PE molecule with 50 units and PET with 25 units.

Cycle 2

After successfully designing an improved plastic-binding protein, we optimized it for designing protein binders for any type of small ligand.

To assess the viability of designing high-affinity binders for specific ligands, we selected 11 known aquatic pollutants for ligand-driven design, including the plastics from the previous cycle and herbicides, drugs, and heavy metals. These chemicals were chosen as target ligands based on a literature review of current water contaminants (Table 1) to apply a combined strategy of protein hallucination - which is the process of generating new sequences that do not necessarily have correspondence with the real world but have specific interest properties - and reverse folding, that is generating an amino acid sequence from the tridimensional structure.

Table 1. Explored pollutants related to their ligand group, protein diffusion base, and reference that justifies studying their removal from potable water.

Group Molecule Diffusion base Reference
Plastic oligomers Nylon-6,6 (NY66) BARBIE1/BaCBM2 Koelmans et al., 2019 [7]
Polyethylene (PE) BARBIE1/BaCBM2
Polyethylene terephthalate (PET) BARBIE1/BaCBM2
Polypropylene (PP) BARBIE1/BaCBM2
Polystyrene (PS) BARBIE1/BaCBM2
Polyvinyl chloride (PVC) BARBIE1/BaCBM2
Heavy metals Mercuric chloride (HgCl2) 7QDL (RCSB PDB) Cappelletti, 2019 [8]
Lead (Pb) 5VGF (RCSB PDB) Needleman, 2015 [9]
Arsenate (AsO4) 1TA4 (RCSB PDB) Mohammed, 2015 [10]
Herbicides Glyphosate (GPJ) 7FMF (RCSB PDB) Van Bruggen, 2018 [11]
Drugs Desvenlafaxine (29J) 4MM7 (RCSB PDB) Argaluza, 2021 [12]

For each contaminant, 11 protein structures solved in the presence of the target ligands were selected from the RCSB database. The exception was plastic pollutants, for which we used the previously docked BaCBM2 model as a reference, eliminating the need for an experimentally solved structure in our updated pipeline.

These base structures were automatically processed to retain only ligand-binding residues (<5 Å) and then input into the RosettaFold2 Diffusion All-Atom (RFAAdiff)13 model. Through hallucination, this process generated three distinct backbone structures to bind each ligand, with parameters set for generating oligomers with 80 amino acids in length.

Subsequently, each of the three backbones produced by RFAAdiff for each ligand was submitted to LigandMPNN. The five sequences with the highest ligand confidence parameters were selected for further analysis, resulting in 15 different sequences per ligand (3 x 5 = 15 sequences) and a total of 75 sequences across all 11 contaminants. These 15 new protein sequences for each ligand were then modeled in AlphaFold2, and all five generated models were submitted to Gnina for ligand docking. Each model's top five non-overlapping docking poses for each model were retained and combined into a single PDB file for the next step (15 sequences x 5 models x 5 poses = 375 PDB files per ligand).

Next, the combined structures were subjected to LigandMPNN for reverse folding, generating 20 sequences for each PDB file. In total, we produced 7,500 sequences for each ligand, with 2,500 featuring similar folding but different amino acid compositions. Based on LigandMPNN scores and clustering highly similar sequences using the CD-HIT tool14, this set of 82,500 sequences (11 x 7,500) was filtered down to 6,000 distinct sequences. These sequences were used to create a multi-ligand binder plasmid library for yeast surface display.

Alternatively, sequences can be drastically filtered by applying rigorous cutoff parameters throughout the pipeline to enable small-scale wet lab tests. However, using more relaxed cutoffs during a post-design filtering process, we preserved sequence diversity while significantly reducing the scale and cost of laboratory testing. This approach allows for extensive exploration of our pipeline and offers novel insights into diverse protein-ligand interactions, thereby increasing the chances of identifying high-affinity binders. Both strategies are highly adaptable and can be applied to any type of small molecule, offering a flexible approach for designing and testing high-affinity binders for various contaminants.

Computational Simulations

As part of the dry lab organization, multiple studies were conducted to create insights about our solvated PBP with plastic. We not only studied protein properties such as oligomer state and its hydrophobicity but also how the system changes in different pH and temperature conditions.

Parts Evaluation

With our new improved part, it was necessary to compare it with the original protein. By developing these tests, it was possible to understand the fundamental properties of the new sequence, as well as the effectiveness of the protein engineering pipeline.

Protein Comparison

Using protein alignment, it was possible to note that the designed protein preserves the efficient scaffold of polymer binding modules (CBM2).

To further our knowledge of the protein properties, we compared our resulting sequence BARBIE1 with the original BaCBM2 in different tests. First, we compared each protein's tertiary structure, as shown in Figure 4. BARBIE1 and BaCBM2 proteins presented high similarity, especially in their secondary structures, in which the presence of the same amount of beta sheets in both proteins was notable. However, our results demonstrated that, in the upper left part of the BARBIE1 structure, both proteins showed differences between them.

This subtable contrast between the original and modified protein is shown in Figure 4 (c), in which we aligned the structures using VMD and calculated the root mean square deviation (RMSD). As it is possible to analyze the aligned structures, it is very similar visibly, with a resulting RMSD of 0.56 Å (RMSDs < 2 Å indicates high similarity).

Figure 4. Structural models. (a) BARBIE1, (b) BaCBM2 protein, (c) protein alignment of both proteins.
Figure 4. Structural models. (a) BARBIE1, (b) BaCBM2 protein, (c) protein alignment of both proteins.

Homo-Oligomer analysis

Based on our analysis using AlphaFold 3 (AF3), we concluded that both the original protein BaCBM2 and the engineered protein BARBIE1 are monomers in solution.

Proteins with carbohydrate-binding modules (CBM) can not only be found as single domains but also as part of larger multi-domain proteins. An example of this is a component of the Piromyces equi cellulase and hemicellulose complex NCP-1. Separately, the protein has a weak interaction with its ligands, but its interaction with two modules of the protein changes the ligand specificity.

In light of the importance of understanding the thermodynamics basis for structural composition, the newest version of AF3 was used. The model was used to optimize protein-protein interaction and test high-order oligomers. To effectively model the water filter system closer to reality, the first step was to predict the state of the proteins.

Although the optimized Barbie1 (B1-CBM) protein oligomer state was unknown and needed to be defined, the spidroin protein that forms the nanofiber has already been explored in literature. Therefore, the prediction of the oligomer state of the spidroin protein is not necessary.

Figure 5. Protein comparison of BARBIE1 (a), BaCBM2 (b), 1A3N (c), and the quaternary structure of 1A3N forming the deoxy human hemoglobin (d).
Figure 5. Protein comparison of BARBIE1 (a), BaCBM2 (b), 1A3N (c), and the quaternary structure of 1A3N forming the deoxy human hemoglobin (d).

The proposed proteins to be evaluated in Alpha Pulldown18 are the BARBIE1, BaCBM2, and 1A3N (Figures 5a, 5b, and 5c, respectively), which is a reference protein that forms a pocket in the middle region to store oxygen molecules. Since this reference protein was resolved with more subunits, it can serve as a baseline for the prediction of the other proteins.

About the predicted template modeling (pTM) values:

  • 🟩: If the value is above 0.5, the overall prediction for the complex may be similar to the actual structure;
  • 🟨: If the pTM value is below 0.5, the template might not be similar to the true structure, which is likely incorrect.

Concerning the interface predicted template modeling (iPTM) values:

  • 🟩: Values higher than 0.8 represent high confidence in prediction;
  • 🟨: Values between 0.6 and 0.8 are a gray zone in which it is unlikely to assure if the prediction is accurate or not;
  • 🟥: For values lower than 0.6, it is possible to ensure the prediction is incorrect when compared to the actual structure.

Table 2. pTM and iPTM values calculated by AlphaFold 3 for BARBIE1, BaCBM2, and 1A3N with different repeating units.

Subunits BARBIE1 (pTM) BaCBM2 (pTM) 1A3N (pTM) BARBIE1 (iPTM) BaCBM2 (iPTM) 1A3N (iPTM)
1 🟩 0.86 🟩 0.89 🟩 0.85 - - -
2 🟨 0.48 🟩 0.50 🟩 0.88 🟥 0.09 🟥 0.10 🟩 0.86
3 🟨 0.44 🟨 0.40 🟨 0.47 🟥 0.21 🟥 0.12 🟥 0.26
4 🟨 0.32 🟨 0.35 🟩 0.82 🟥 0.12 🟥 0.16 🟨 0.78
5 🟨 0.25 🟨 0.28 🟩 0.50 🟥 0.10 🟥 0.12 🟥 0.39
6 🟨 0.25 🟨 0.28 🟩 0.69 🟥 0.09 🟥 0.13 🟨 0.64
7 🟨 0.25 🟨 0.26 🟨 0.42 🟥 0.13 🟥 0.14 🟥 0.34
8 🟨 0.20 🟨 0.23 🟨 0.26 🟥 0.13 🟥 0.14 🟥 0.35
9 🟨 0.21 🟨 0.22 🟨 0.32 🟥 0.14 🟥 0.13 🟥 0.39
10 🟨 0.21 🟨 0.22 🟨 0.34 🟥 0.13 🟥 0.14 🟥 0.27

As it can be seen in Table 2, both BARBIE1, BaCBM2, and 1A3N achieved high scores as monomers. On the other hand, only the 1A3N as a multimer achieved higher values, such as its two and three subunits iPTM, as expected. Therefore, it is possible to affirm that it is very unlikely for both B1-CBM and BaCBM2 to form a multimer, which can be assured by the deoxy human hemoglobin results. This result allowed us to model B1-CBM as a monomer, simplifying the system. This is consistent when compared to the SEC-MAALS measurement on the Result page (link para https://2024.igem.wiki/cnpem-brazil/results em “results page”).

Electrostatic and Hydrophobicity assessment

Through analysis results, it was possible to define that BARBIE1 is more hydrophobic and has more negative charges compared to BaCBM2.

We assessed the electrostatic and hydrophobic surface of each protein using the ChimeraX tool, which can be valuable for identifying binding sites for ligands, as well as stability and its behavior when solvated.

On the upper part of Figure 6, both proteins are represented with the electrostatic surface. While on one hand the red parts stand for the negative electrostatic, the blue parts stand for the positive electrostatic. Therefore, it is notable a major presence of negative values for BARBIE1 when compared to BaCBM2, which may indicate a higher affinity with positive ions or positive charged ligands.

Figure 6. Electrostatic (first row) and hydrophobicity (second row) surface assessment of BARBIE1 (on the left) and BaCBM2 (on the right).
Figure 6. Electrostatic (first row) and hydrophobicity (second row) surface assessment of BARBIE1 (on the left) and BaCBM2 (on the right).

Following on, the hydrophobicity surface is shown in the lower part of Figure 6. By doing this, we could also understand the water interacting regions, corresponding to the more hydrophilic parts, and possible binding sites, generally indicated by hydrophobic parts. The hydrophilic regions are represented as cyan and the hydrophobic regions as yellow.

In general, while on one hand, hydrophilic regions are exposed to aqueous solvents in the exterior of the structure, hydrophobic regions are buried in its interior. When confronting both structures, we noticed an inverse behavior on the BARBIE1 structure compared to BaCBM2: the hydrophobic parts are not buried, but exposed to the surface.

Since the hydrophobic sites were exposed to the surface, it may point to a better understanding of our results. The better scores were achieved by choosing hydrophobic amino acids in the protein primary structure, which enabled the creation of more pockets and the subsequent increase of KD. Therefore, although choosing hydrophobic amino acids is better for a higher plastic affinity, it is also worse for water solubility.

Plastic Interaction through Molecular Dynamics

With a deeper in silico understanding of the BARBIE1 protein, we studied the interaction between the protein and plastic molecules through molecular dynamics. As a proof of concept of the system, only the polystyrene (PS) was analyzed in the experiments, as in other parts of the project.

Plastic Binding Proteins Interactions

With the initial molecular simulations, it was possible to assess fundamental interaction properties, such as how the proteins bind to the plastic molecule, its stability in time, and the interaction energy.

To comprehend the binding between BARBIE1 with PS, we used Charmm-GUI to create a solvated system. These results were then used as input to Gromacs19. Considering that building a microplastic particle is a very expensive task in computation terms, we used a plastic molecule instead.

The PS molecule was built with 10 repeating units, as shown in Figure 7. For PS there are three principal taticities formation, that is the extent to where the functional group is. Knowing that atactic taticity is the only commercially important form, with the phenyl group randomly distributed on both chain sides, we chose this form to create the in silico PS plastic molecule.

Figure 7. On the left, it is represented a single molecule of styrene. On the right, it is represented a molecule with 10 repeating units.
Figure 7. On the left, it is represented a single molecule of styrene. On the right, it is represented a molecule with 10 repeating units.

The simulated system reproduced the same used in the wet lab experiments, which was in room temperature (300 K), neutral pH (pH=7), and 0.15 mM of NaC salt concentration. As represented in Figure 8, the system is colored with its solvent as blue, BARBIE1 as magenta, the styrene molecule in gray and white, and the ions in green. The system configuration is set with periodic boundary conditions (PBCs) with an edge distance of 15 Å.

Figure 8. Representation of the solvated system of BARBIE1 in magenta, polystyrene in gray and white, and the water molecules in blue.
Figure 8. Representation of the solvated system of BARBIE1 in magenta, polystyrene in gray and white, and the water molecules in blue.

Once generated, after the simulations, we minimized and equilibrated the system. This standard process was made to adjust the temperature and pressure, remove initial artifacts, properly distribute the energy, and stabilize the molecular structure.

Therefore, we minimized the solution with 5.000 steps using the Verlet method for cut-off scheme, which is the short range interaction. Then, we equilibrated the system using an integration step of 0.001 and 125.000 steps, the Verlet method was used as a cut-off scheme, V-rescale as the temperature coupling, and the C-rescale as the pressure coupling. In this initial simulation step, the solution was heated to 303.15 K.

Aiming to compare the BARBIE1 and BaCBM2, another similar system was generated for BaCBM2. It is fundamental to note that since calculating MDs for large molecules, such as proteins, it is not viable using quantum mechanics. For this reason, we are not able to check the actual bind between protein and ligand, but it is possible to check its energies and behavior through classical dynamics.

The calculated systems results showed that both proteins quickly adhered to the plastic as a result of their affinity for it. To quantify this proximity between the proteins and PS, we calculated the minimum distance between plastic and protein for each simulation, as shown in Figure 9a.

A complementary analysis to the minimum distance is the root mean square deviation (RMSD). This method calculates the deviation trajectory through the simulation allowing us to elucidate the molecule or atom's stability over time, a useful result for understanding the protein-ligand interaction. In Figure 9b, we calculated the RMSD for the polystyrene molecule.

Figure 9. (a) Calculated distances (Å) between protein and the PS plastic molecule for both BARBIE1 and BaCBM2. (b) RMSD for both proteins over time.
Figure 9. (a) Calculated distances (Å) between protein and the PS plastic molecule for both BARBIE1 and BaCBM2. (b) RMSD for both proteins over time.

Interaction Energy Between Protein and Ligand

The energy interaction between the protein and the plastic was calculated using both molecular dynamics and experimental tests, showing close proximity between them.

An important piece of information about our system is the interaction energy between protein and ligand, which can elucidate our knowledge of their dynamics and give us data to model the filtration system used in our project. We used Gromacs MMPBSA21, a free energy calculator that considers the protein, the ligand, and the combination of both components to calculate this interaction.

From the MMPBSA output, we noticed that the energy (kcal/mol) varied through the simulation due to the protein-ligand interactions. It was also interesting to notice that the van der Waals contribution was considerably larger than the coulombic, due to the nature of the ligand molecules. To calculate the thermodynamic properties, we used the partition function considering these interaction energy values, defined as:

Equation 1

where Ei is the van der Waals interaction energy values in each simulation frame, k is the Boltzmann constant, and T is the system temperature. In general, this partition function can be applied to describe an equilibrated system’s statistical properties.

Then, the partition function was used for calculating the probability of each conformation from the molecular dynamics trajectory through the equation:

Equation 2

This process is represented in Figure 10a, in which the probability sum of all states is equal to one, and a notable peak represents the most probable energy state, corresponding to over 80%.

Finally, by multiplying the state probability for the van der Waals energy, we found the weighted van der Waals energies, in which the interaction energy is the sum of all possible energies. The least energy shown in Figure 10b as a molecular dynamics diagram represents the interaction energy state between BARBIE1 and the polystyrene plastic molecule.

Figure 10. Interaction energies calculation for BARBIE1 and Polystyrene. (a) plot of the van der Waals energy calculation. (b) Corresponding probability for each energy state, where the highest probability state is the most likely to occur, represented in the BARBIE1-PS molecular dynamics interaction.
Figure 10. Interaction energies calculation for BARBIE1 and Polystyrene. (a) plot of the van der Waals energy calculation. (b) Corresponding probability for each energy state, where the highest probability state is the most likely to occur, represented in the BARBIE1-PS molecular dynamics interaction.

As represented in Figure 10b, the most favorable energy can be divided by the interaction area between the plastic molecule and protein. For doing so, the protein, ligand, and complex area were calculated in VMD, making the energy/area ratio to be equal to -0.71 meV/Angstrom². This shows coherency and representativity in the result.

The same process was followed for BaCBM2, which allowed us to obtain an interaction energy of -0.44 eV. In parallel to the in silico results, our wet lab group also conducted experimental tests for comparison, utilizing the protein-corona information discussed on the Results page (link para https://2024.igem.wiki/cnpem-brazil/results na palavra “results page”).

Once we calculated the protein dissociation constant, we applied this result in the Gibbs energy formula, given by:

Equation 3

where R is the gas constant and T the temperature in Kelvin. Knowing that the resulting KD equals 6x10-12 M, the equation resulted in ΔG as -0.60 eV.

This is an interesting result considering the theoretical interaction is not so distant.

While the theoretical energy was calculated to a small PS molecule, the experimental result was obtained with a 100 nm PS particle. We then normalized each energy value according to the contact surface between each calculated result, which is a factor of 9.86 nm2 for the theoretical interaction and 13 nm2 to the experimental interaction surface of the protein with the ligand.

This allowed us to observe that while the theoretical normalized energy is -0.0446 eV, the experimental normalized energy is equal to -0.0459 eV, showing a great similarity between both!

Plastic Interaction in Different Conditions

By varying the protein-plastic solution conditions, it was possible to verify the stability of the molecular structure and resistance in the binding properties.

In the literature, it is already assessed the increase of affinity (Kd) between BaCBM2 and plastic due to the crystallinity arrangement of plastic. Studying similar behaviors for BARBIE1 is fundamental to understanding both molecular properties and how we can successfully implement the protein in the water filter. For example, if BARBIE1 decreases plastic affinity with the increase of temperature, it will be important for cleaning the filter after use.

Consequently, bearing in mind it is not easy to experimentally examine the molecular pattern, we performed new MDs in different conditions such as different pHs (1, 5, 7, and 9), and temperatures (30, 50, and 70 ºC). As seen in previous results, the polystyrene RMSD was stable during the simulation when it is close to the protein, which makes it a viable way to examine the affinity.

Firstly, the assessment of different pHs was made for BARBIE1 stability, which showed us an interesting trend in Figure 11a. Although we varied to acid or basic solutions, the BARBIE1 RMSD kept similar to a control simulation, which we created at 303 K and neutral pH. While there were subtable oscillations for each simulation, this result indicates there may be no stability difference for BARBIE, maintaining its structure.

Furthermore, the different temperature simulations can be seen in Figure 11b. Similarly to the pH variance, the temperature does not seem to influence the BARBIE1 structure. In other words, just like the pH, we hypothesized that the temperature does not influence the protein denovelation, demonstrating a high-temperature resistance.

Figure 11. RMSD analysis of BARBIE1. (a) RMSD calculated at different pH values. (b) RMSD calculated at different temperatures.
Figure 11. RMSD analysis of BARBIE1. (a) RMSD calculated at different pH values. (b) RMSD calculated at different temperatures.

It is important to note that, differently from previous simulations, the polystyrene molecule started next to BARBIE1. Since we want to check how the affinity would vary, we used as an initial state for the conditions simulation the last frame from the standard simulation in which BARBIE1 was close to the PS.

In Figure 12, the RMSD for polystyrene calculation also showed a constancy for the plastic molecule both for pH variation (a) and temperature (b). For this reason, since it is not changing through the simulation, it must still be attracted to the protein. In conclusion, we might expect the affinity to keep under temperature and pH change.

Figure 12. RMSD calculated for the polystyrene molecule in the BARBIE1 solution. (a) RMSD at different pH values. (b) RMSD at different temperatures.
Figure 12. RMSD calculated for the polystyrene molecule in the BARBIE1 solution. (a) RMSD at different pH values. (b) RMSD at different temperatures.

Spidroin Formation through Molecular Dynamics

Through molecular dynamics, it was possible to observe the formation of spidroin fiber in acid conditions.

Just as important as uncovering the plastic binding protein properties, it is also interesting to understand the developing mechanism behind the other fundamental part of our filter: the spidroin mesh. For this reason, in the following section, we seek to integrate experimental knowledge into the theoretical model for the creation of spidroin.

As already experimentally assessed in literature, it is known that Nt2RepCt fiber formation is given in a dimer state under an acid pH. This information was crucial for setting the systems to be simulated, which we stated as a control system in a neutral pH (pH=7) and the specified acid pH. The amino acids sequence was used in AlphaFold 3 to generate its tertiary structure.

Using similar conditions for simulation as used for BARBIE1, the spidroin system builder used NaCl as ions in a 0.15 mM concentration, 303.3 K as temperature, and a 20 Å distance to the edges. We used CHARMM3623 as the system’s force field to generate the system in two different pHs, 5 and 7.

As a result, Figure 13 shows the structure in each generated system. While on one hand we have spidroin at the neutral pH on (a), on the other hand we have the protein at the acid pH on (b). There is some notable difference between the structures that is already an indication for fiber formation.

Figure 13. Nt2RepCt generated by AF3 after solvating the system in neutral pH on (a) and in an acid pH (b).
Figure 13. Nt2RepCt generated by AF3 after solvating the system in neutral pH on (a) and in an acid pH (b).

After a simulation of 100 nanoseconds for each system, the final frame is represented in Figure 14. As already expected, the neutral pH could not open its structure and compose a fiber formation. Whereas, the acidified system was opened with time in a perceptibly visible configuration, in which the N terminal glycine and the C terminal lysine were colored in purple.

Figure 14. Final frame of Nt2RepCt simulation in neutral pH on (a) and acid pH on (b).
Figure 14. Final frame of Nt2RepCt simulation in neutral pH on (a) and acid pH on (b).

In order to quantify the difference between each system, we used VMD24 software to calculate the distance between the atoms of each terminal, as shown in Figure 15a. There is a visible distinction between each pH system, highlighting the fiber formation. While there is an average distance between residues of 63.6 Å for basic pH, the average distance for acid pH is 104.5 Å, which we can comprehend as an average fiber length. Although we assessed the temperature condition, in the results page (link para https://2024.igem.wiki/cnpem-brazil/results na palavra “results page”) we experimentally explored the fiber formation.

Furthermore, the root mean square deviation was also calculated and the result is exposed in Figure 15b. Compared to the initial structure, both systems found a minimum energy state, as it is possible to visualize by the linearity created through the time. Basically, the protein leaves from an initial to a stable point, in a way the lower the RMSD, the lower is the variation.

Figure 15. (a) Distance between N terminal of glycine and C terminal from lysine in basic and acid pH. (b) RMSD calculation for Nt2RepCt in both acid and neutral pH.
Figure 15. (a) Distance between N terminal of glycine and C terminal from lysine in basic and acid pH. (b) RMSD calculation for Nt2RepCt in both acid and neutral pH.

Just as we did in the BARBIE1 and polystyrene system, we modeled a dimer spidroin with a PS molecule. By simulating this system with similar conditions as described before, we could extract the interaction energy of the Nt2RepCt with PS. With respect to the plastic energy interaction, spidroin takes a longer time to interact with PS when compared to BARBIE1, which resulted in a lower interaction energy of -0.46 eV.

Protein Corona Formation

By combining computational simulations with an analytical model, we explored the relationship between protein concentration, protein-plastic interaction, and the size of microplastics with the dynamics of protein corona formation.

Although we were able to better understand how the plastic binding proteins bind to the plastic molecules through Molecular Dynamics, it is important to recognise that most microplastics in real life are not single chain molecules. In reality, microplastics are most likely to have a nanoparticle structure and be way bigger than single molecules, once it has multiple chains.

Thus, it creates a greater number of regions for proteins to bind, shaping an interesting structure with a protein layer around the particle. As shown on Figure 16, this phenomenon increases the hydrodynamic radius, which is a parameter that describes the size of an object in a fluid.

Figure 16. Representation of a protein corona formation between microplastic and plastic binding proteins.
Figure 16. Representation of a protein corona formation between microplastic and plastic binding proteins.

Knowledge about the formation of protein coronas on micro- and nanoplastic particles (MNPs) has been expanding, especially due to its relation to the toxicity of these particles and their dispersion in the environment. In the context of the BARBIE project, protein corona formation is an essential process for the proposed sensing methodology. This approach involves the preliminary formation of protein coronas, followed by the deposition of this MNP-protein complex on the sensor's working electrode for microplastic detection.

Regarding protein corona formation on nanoparticles, D. Dell'Orco et al. proposed a way to analyze this process similar to the study of chemical reaction kinetics. They considered a system with only two possible states: without corona formation (NP + Prot) and with corona formation (NP ⋅ Prot). The analysis then focused on the number of complexes (NP ⋅ Prot) formed over time.

Since our analysis aimed to study the protein-particle interaction more deeply and is based on experimental dynamic light scattering (DLS) (link para https://2024.igem.wiki/cnpem-brazil/results na palavra “dynamic light scattering (DLS)”), our team developed a model that allows the examination of the average behavior of the particles. This enabled an understanding of the dynamics of protein corona formation and connected the fiber formation dynamics with the binding results of the BaCBM2 and BARBIE1 proteins. Additionally, a computational simulation of the system was developed to enhance the visualization of the corona formation process and compare the simulation data with the theoretical model.

Experimentally, this refers to a water container containing a specific concentration of proteins and polystyrene particles. Generally, the environment is saturated with proteins, meaning the number of proteins is greater than the number of plastic particles. Additionally, the system is agitated, so the particles and proteins have some velocity. These factors were considered in the model construction, with MNPs and proteins approximated as spherical objects.

Computational Simulation

Through a computational simulation of a protein-corona solution using kinetics chemistry, a logistic growth was found, which can be expected from the experimental values.

For the computational simulation, a model of solid spheres in a two-dimensional space was used to describe the proteins and plastic particles. This model was implemented in Python and allows for a visualization of the system.

Methodology of simulation

Particle Definition

In the code, particles and proteins are represented as objects of the "Particle" class, where each particle has the following attributes:

  • Position: Defined by x and y coordinates stored in a numpy array, this attribute identifies where the particle is located within the system, as we are working in a two-dimensional space.
  • Radius: This attribute defines the size of the particle, influencing how it interacts with other particles and the system boundaries.
  • Velocity: Also defined by an "np.array" matrix, this determines how fast the particle moves through the system.
  • Mass: Calculated as the square of the particle's radius, this attribute determines the particle's inertia and how it behaves under external forces.
Initial Conditions for Particle Positions

The initial positions of the particles must be random and arranged so that they do not overlap. The approach chosen to solve this problem involves using a position matrix.

First, the space occupied by the particles was approximated to a square circumscribed around the particles. Based on this concept, the space of possible positions was represented as a matrix of potential particle generation positions, indicated by elements set to 0. The position of a particle was determined by randomly selecting an index containing a 0 in the matrix.

After selecting a position, the region equivalent to a square with sides equal to twice the particle diameter was filled with elements set to 1, limiting the generation of other particles in this area. For particles with different radii, the side of the square was determined by the sum of their diameters.

Particle Movement

The movement of particles is characterized by updating their positions over time. In each simulation iteration, particles were displaced by an amount proportional to their velocity and the defined time step ("dt"). Its was achieved using the "add(self, dt)" method of the "Particle" class, which adjusts the particle's position based on its velocity.

Particle Collision

Due to the size of the particles, when a two-dimensional collision occurs, the velocities were first decomposed into radial and tangential directions (Figure 17), using the rotation matrix.

Figure 17. Bidimensional particle collision diagram considering the relative velocity and position for each particle.
Figure 17. Bidimensional particle collision diagram considering the relative velocity and position for each particle.

After the collision, the velocities in the tangential direction remain constant, while the velocities in the radial direction change as in a one-dimensional collision, following the conservation of momentum.

Equation 4 Equation 5 Equation 6-1 Equation 6-2

These results allowed us to find the final velocities in both the radial and tangential directions. Subsequently, the inverse rotation matrix can be used to obtain the final velocity values in the x and y directions.

Particles Overlap

There are some errors associated with computational implementation, one of which is particle overlap. This overlap causes the particles to become "stuck" together throughout the simulation, occurring during the collision process.

Computationally, a collision is understood as a relationship between the distance between particles and their radii.

Equation 8

If, after the collision process, the particles do not have sufficient relative velocity to invalidate this equation, another collision will occur, causing the particles to move closer again. This leads the particles into a chain of collision processes, trapping them together.

To address this issue, a condition was imposed on the collision process to prevent the initiation of this chain of collisions. Among various possible implementations, one alternative involves the relative velocity between the particles in the radial direction.

Equation 9

In each collision, a probability of protein adsorption on the particle surface was assumed, which depends on the protein-particle interaction energy, the number of adsorbed proteins relative to the maximum coverage, and the constants α and β, which are related to the interaction geometry as well as the medium conditions. The expression used for the adsorption probability is given by:

Equation 10

After aggregation, the total momentum of the two bodies is conserved, so that the mass of the aggregate becomes the sum of the two particles, and the velocity of the aggregate is given by the equation:

Equation 11
Simulation of the Protein-Corona

In this way, the system can be simulated to evaluate the formation of aggregates. For this purpose, a GIF was generated to visualize the formation of a protein corona on plastic particles.

Figure 18. Protein-corona simulation. Blue represents the microplastics, whereas the plastic binding protein is represented in magenta. Once the PBP encounters the MP, there is a chance of aggregating in its surface and contributing to the protein-corona layer.

To compare with the experimental data generated by DLS, triplicate simulations were conducted for 20 different protein concentrations. To facilitate comparison with DLS, an estimation of the hydrodynamic radius of each aggregate at a specific time was made, considering the radius of the microplastics, the radius of the proteins, and the number of aggregated proteins using the expression:

Equation 12

The simulation data corresponding to the behavior of the hydrodynamic radius as the concentration of proteins in the medium increases is presented in Figure 20.

It is noticeable that the simulation allows for a visualization of the system and understanding of how the behavior observed in DLS can be explained at the particle interaction scale. However, this type of approach is somewhat limited in making estimates that closely resemble real data due to its constraints, such as modeling the system in only two dimensions.

Analytical Model

Once more finding a logistic growth from the analytical protein corona model, we reiterate an expected behavior from the experimental data, which was indeed found.

To address the gap in the simulation's ability to describe the system with parameters close to reality, an analytical model was constructed that allows for a three-dimensional approach to the system. This model also facilitates the analysis of different conditions without the high computational cost associated with simulations.

To understand the analytical approach, we can first consider a plastic particle suspended in water. Due to the relative motion between this particle and the proteins in the solution, over a small time interval, the particle moves a certain distance dS = <v> dt. During this movement, the number of proteins encountered along its path can be visualized using the concept of a "collision cylinder," which takes into account the sizes of both the plastic particle and the proteins.

Figure 19. Collision cylinder exemplification for microplastic-protein system.
Figure 19. Collision cylinder exemplification for microplastic-protein system.

The number of collisions between the plastic particle and the proteins over this time interval represents the rate at which these collisions occur. This rate depends on factors such as the concentration of proteins, the average relative speed between the proteins and the particle, and the combined sizes of the particle and the proteins.

In this way, the number of collisions (dN) between the particle and the proteins over time (dt), which refers to the particle-protein collision rate, is given by the equation:

Equation 13

Therefore, considering the assumed expression for the adsorption probability in a collision process, the variation in the number of proteins adsorbed onto a plastic particle can be described by the equation, which is an ordinary differential equation (ODE):

Equation 14

Thus, it is possible to solve this ODE to obtain the average number of proteins aggregated on the surface of the plastic particles over time n(t), as shown in the following equation:

Equation 15

Considering the hydrodynamic radius from the previous equation and working with a constant capture time, it is possible to obtain an analytical expression for the hydrodynamic radius (R_Total) as a function of the protein concentration, as shown by the following equation:

Equation 16

Based on this theoretical construction, the following values presented in Table 3 were considered.

Table 3. Protein-corona simulation constants.

Average radius of plastic particles (R_MP) 50 nm
Average radius of proteins (R_Prot) 1.25 nm
Maximum number of proteins per particle (n_max) 1600
Average relative velocity between particles and proteins (<v_rel>) 2 m/s
Protein-plastic interaction energy (ε) - 0.44
Alpha parameter (α) -0.44 eV
Beta parameter (β) 1e10 eV-1
Epsilon parameter (E) -0.44 eV

In this way, it becomes possible to compare the data generated by the computational simulation, the data from the analytical model, and the experimental data explored in results.

Figure 20. Generated data from the analytical model solving, (b) protein-corona creation from the computational simulation, and (c) the experimental data found from the protein titulation.
Figure 20. Generated data from the analytical model solving, (b) protein-corona creation from the computational simulation, and (c) the experimental data found from the protein titulation.

Statistical Models

In the present section we describe our developed model, in which our team was able to advance in the scientific knowledge of the parts involved in the B.A.R.B.I.E. project. From design, formation, and function of the parts, we assessed essential information to efficiently build a water filter.

With our designed system, it is notable that it can adsorb particles in its topology when passing water through it. For this reason, an interesting approach to model the water filtration is through thermodynamic modeling, in which we can calculate the adsorption energy from the interaction of each component and understand how the particles would react to the filtering surface. By combining experimental data with theoretical insights, we gained a more comprehensive understanding of the filtration mechanism than would be possible through either approach individually.

Therefore, as illustrated in Figure 21, our water filter was modeled in a two-step strategy: random walk and a canonical ensemble. While the first strategy was relevant for reaching out the movement of microplastics in the filter and its efficiency, the second strategy was fundamental to energetically assess the filter saturation. Once we could estimate the filter’s saturation and efficiency, we are able to estimate its viability in scalability and price expectations.

Figure 21. Model roadmap of B.A.R.B.I.E. Project. By integrating molecular dynamics and experimental values, the main purpose is to properly understand the filter viability.
Figure 21. Model roadmap of B.A.R.B.I.E. Project. By integrating molecular dynamics and experimental values, the main purpose is to properly understand the filter viability.

Filtering Microplastics

Using random walk based simulation, we evaluated the system filtration efficiency (E_0) in different scenarios, raising evidences of the filtration enhancement by using BARBIE1 in the process.

The random walk is a mathematical and statistical concept that describes the process of a sequential movement, in which every step is taken in a random way. Its main idea is that at each instant of time, an object realizes a movement in one direction or magnitude ruled by a probabilistic rule without a standard pattern.

In our case, we developed a random walk based model. It can be applied in the sense that the microplastics will be inserted in a filter with many layers that compose the hydrogel. Each microplastic in the system would have a probabilistic chance dependent on its interaction energy to be retained or to pass to the next layer. If the particle is not retained, it can randomly move due to eventual collisions inside the filter.

It is very important to state that although this technique might be very useful for assessing our filter method, further experimental information is required to properly approximate it to reality. Since many of these data are not trivial to be calculated, some assumptions will be made. Once we can find all of the information, the modeling can be refined.

As an initial point, it is also important to highlight that although we could not capture the hydrogel in the Transmission Electron Microscope, we can use well-established results from literature. The hydrogel structure using the Scanning Electron Microscope (SEM) is shown in Figure 22a, in which is visible the fiber’s structures creating pores. Thus, there is a chance for the plastic particles to be retained or to pass.

Figure 22. (a) shows the captured hydrogel on the Scanning Electron Microscope. On (b), the identified fibers in the image.
Figure 22. (a) shows the captured hydrogel on the Scanning Electron Microscope. On (b), the identified fibers in the image.

With this important information, we assessed the hydrogel porosity by summing all fiber lengths divided by the captured mesh, resulting in a 0.63 porosity.

Both fiber diameter and porosity were then integrated using Python, in which we created the filter simulation. The filter was simulated using the following pipeline:

Filter assemble:

Defining the grid size, porosity of the nanofibrils, layers count, pore limit size, and concentration of the plastic binding protein. This function creates an object that contains our filter. The components present are represented as values in our matrixes.

Microplastics creation:

For a specific amount of microplastics with a determined maximum size, this function creates particles without overlapping with a random size with MS as its maximum possible size of diameter.

Simulation run:

With the defined filter and microplastics positions, it is possible to run the simulation.

Considering in a first moment that the water filter would be only applied to domestic use, we can also suppose mostly nanoplastics would be present in the samples. Therefore, we used a particle size of 100 nm, which is understood as one of the most dangerous particles for human health.

This is a point of attention, since there is not enough information about microplastics concentration in drinking water with high precision, especially regarding their size. Moreover, with the proposed sensor in our project, we are going to better understand this problem and properly design the filter.

In Figure 23, we present the water filter mesh creation. Represented as a matrix as shown in Figure 23a, the idea is that each biological component is represented as an integer. We define that the pores, the fiber and BARBIE1 would be represented as integer values from 0 to 2, respectively. This encoding is shown on Figure 23b, where pores, fiber and BARBIE1 are represented as the green, yellow or pink squares in the mesh, respectively. With this representation, we created a logic for creating larger meshes taking in consideration the hydrogel properties.

Figure 23. Computationally designed mesh filter system representation. (a)  The matrix is numerically represented, whereas on (b) it is represented as an image.
Figure 23. Computationally designed mesh filter system representation. (a) The matrix is numerically represented, whereas on (b) it is represented as an image.

With this strategy, we were able to recreate the mesh captured on SEM, as shown on Figure 24. Likewise, it is shown in a simplified mode on Figure 23, the yellow parts represent the mesh’s pores, on pink the BARBIE1 protein, and on green the spidroin fiber. In this initial representation, we used a BARBIE1 concentration of 0.3. Figure 24c shows the length distribution of these fibers, with an average length of 52 nm, while Figure 24d presents the diameter distribution, resulting in an average diameter of 28 nm, which are valuable information for computationally reconstructing the mesh.

Figure 24. Filter Transmission Electron microscopy and computational representation. (a) the mesh’s fibers captured on TEM. (b)  Fiber length average and the fiber’s diameter (c). The computational reconstruction with these parameters is then built (d), with the 6-layer filter shown on (e) and each layer from (e-1) to (e-6).
Figure 24. Filter Transmission Electron microscopy and computational representation. (a) the mesh’s fibers captured on TEM. (b) Fiber length average and the fiber’s diameter (c). The computational reconstruction with these parameters is then built (d), with the 6-layer filter shown on (e) and each layer from (e-1) to (e-6).

Thereafter, we can proceed to the microplastic insertion. For this purpose, we need to calculate the on-site interaction energy given by the microplastic position in the system. The chosen approach was calculating the superficial interaction energy, which takes into account the contact area and interaction energy.

Equation 17

where:

  • E_int is the total interaction energy of the microplastic in a given system position;
  • ε_S-P and ε_B-P are the interaction energy between spidroin-plastic and BARBIE1-plastic, respectively;
  • A_S-P and A_B-P are the interacting surface areas of spidroin-plastic and BARBIE1-plastic, respectively.

If we consider ε_S-P*A_S-P as the spidroin interaction energy E_S and ε_B-P*A_B-P as the BARBIE1 interaction energy E_B, we simplify the interaction energy writing as E_int = E_S + E_B. With this term, we can calculate part of the system’s partition function.

In general, the partition function (Z) is the sum of all possible energetical states of a given system. For example, for this specific case, the microplastic can have two possible configurations: (i) when it is bound to the proteins Z_bind, and (ii) when it does not have any interaction, i.e. not bound Z_free.

With respect to the Z_free state term, we can comprehend it as the particle flow in the system. Since water is passing through the filter, we can argue that this is a combination of the particle’s movement and an energy term associated with a possible phase transition (the possible adsorption). In other words, the particle’s movement is the kinetic energy and the phase transition energy is the particle’s chemical potential.

Thus, we can propose Z_free as:

Equation 18

where μ_0 can be calculated as the polystyrene chemical potential multiplied by the particle’s volume and K_E as m*v2/2, considering a styrofoam density of 15 kg/m3.

The velocity term on the other hand can be calculated as v = h/ΔT - where h and ΔT is the filter’s height and the time variation, respectively. Knowing that the flow is given by ρ = V/ΔT, where V is the volume, we can divide it by the velocity term, finding out that ρ/v = V/h, which is the section area A_sec. Dividing both sides by ρ, it is possible to note that the velocity is calculated by the ratio between flow and the section area.

Thus, with only two possible states for the particles, the partition function Z can be calculated as:

Equation 19

with β representing 1/k_BT, in which k_B stands for the Boltzmann constant and T the temperature. With this, we can calculate the retention probability Pretention by the bound state function divided by the total partition function:

Equation 20

Dividing both numerator and divisor by the binded state function, we can then find the retention probability, which is similar to the Langmuir adsorption model and given by:

Equation 21

The resulting equation is a sigmoid function, in which the retention probability increases as far as the interaction energy decreases. As shown in Figure 25, we simulated the interaction energy of microplastics in the system, resulting in the left histogram. As shown, the values vary in a range of -200 and 150 eV.

Figure 25. Energy distribution of microplastics when in contact with the filter proteins on the left (a) and sigmoid function found for probability calculation on the right (b).
Figure 25. Energy distribution of microplastics when in contact with the filter proteins on the left (a) and sigmoid function found for probability calculation on the right (b).

Therefore, with the known energy range and the sigmoid function, we were able to calculate the retention result for each microplastic. Given a random value R_V between zero and one, we compute the retaining probability and compare the values. With this, we create the piecewise function:

Equation 22

At last, it is possible to simulate the microplastics flow through the filter. In Figure 26, the filter is divided in many layers, in which the microplastics (in light blue) are positioned as shown on Figure 26d. On Figure 26b, a zoomed vision of the mesh is represented to illustrate the binding between microplastic and BARBIE1 (a) and Nt spidroin (b).

Figure 26. Microplastics filtration in the proposed system is represented in multiple layers. On (b), it is shown the zoomed version of the mesh with (a) and (c) representing the binding between MPs-BARBIE and MPs spidroin, respectively.
Figure 26. Microplastics filtration in the proposed system is represented in multiple layers. On (b), it is shown the zoomed version of the mesh with (a) and (c) representing the binding between MPs-BARBIE and MPs spidroin, respectively.

As it is possible to note, with six layers, 83 ± 2 % of the imputed microplastics are retained. Although it is an already interesting result, it is very important to note that each of these layers are extremely thin, with an estimated size around 100 nm. Thus, if we consider the final hydrogel for the filter as a parallelepiped with an height of 1 mm and 5 cm of width, the total layers amount would be of 100.000.

Moreover, an interesting approach for the designed system is changing the plastic binding protein property. The first possible study is removing the plastic binding protein in order to evaluate how it behaves without our differential, that is the BARBIE1. Secondly, we can also try to evaluate how it would behave using the original protein instead, namely BaCBM2.

On Figure 27 both experiment’s results are shown. The first experiment is testing the microplastics filtration without the PBP (BARBIE1), using only spidroin for retaining. On Figure 27a, the meshes comparison is represented with the captured particles plot. Figure 27b is showing that although the water filter shows a great potential for filtering particles, using BARBIE1 indicates a higher efficiency, actually justifying the integration.

Figure 27. Simulations employing different conditions. The mesh with and without PBP (BARBIE1)  representations (a-1) and (a-2) and the estimated efficiency (b). The mesh with BARBIE1 and BaCBM2  representations (c-1) and (c-2) and the estimated efficiency (d).
Figure 27. Simulations employing different conditions. The mesh with and without PBP (BARBIE1) representations (a-1) and (a-2) and the estimated efficiency (b). The mesh with BARBIE1 and BaCBM2 representations (c-1) and (c-2) and the estimated efficiency (d).

At last, it was also evaluated how the porosity of the hydrogel alteres the microplastics filtration. Although we assessed important hydrogel mesh properties using SEM, it is worth noting that the porosity is dependent on the protein’s concentration and can vary widely. Thus, understanding the porosity influence in the water filter efficiency is fundamental.

Represented on the left of Figure 28a, the illustrated porosities are 0, 0.2, 0.4, 0.6, 0.8, and 1. Considering porosities from 0 to 1 with a 0.1 increment, the microplastics filtered were counted and are plotted on Figure 28b. Knowing that a higher porosity indicates a larger amount of pores, it creates lesser retaining sites for the microplastics. And, as expected, with the increase of porosity, the efficiency decreased. The result shown on Figure 28b clarifies the inverse correlation between the porosity and particle retention.

Figure 28. Microplastics filtration evaluation regarding the porosity. (a) the filter with different porosities from 0 to 1 and (b) the MPs filtration count for each system.
Figure 28. Microplastics filtration evaluation regarding the porosity. (a) the filter with different porosities from 0 to 1 and (b) the MPs filtration count for each system.

Finally, for further modeling, we simulated our standard model i.e 6 mesh layers with a 0.3 BARBIE1 concentration, and 0.63 as porosity - 1.000 times, which resulted in an efficiency of 0.83 ± 0.02%.

In conclusion, by integrating both experimental and theoretical results into the model, we assessed different filter parameters - such as the plastic binding proteins, porosity, and number of layers - affect the filtration efficiency. This approach provides a deeper understanding of our own proposal and also helps in the practical implementation of the water filter. For instance, knowing the initial filtration efficiency enables us to predict how it changes over time, allowing us to estimate when the filter needs replacement before it loses effectiveness, thus preventing the consumption of microplastics!

Microplastics Concentration for Saturation

Through simulation, we discovered that for a 1 µm mesh, the saturation point is reached with a total of 28 microplastic particles, each with a diameter of 100 nm.

Based on the previous model, it's important to note that the number of available retention sites decreases over time as the microplastics particles occupy them. Eventually, the system will reach a saturation point where it can no longer filter additional microplastics.

Therefore, knowing there is a concentration limit for the filter, regular cleaning of the filter is necessary to maintain its effectiveness. If corrective maintenance is used (cleaning or replacing the filter only after it fails), it requires maintaining a large inventory of replacement materials, as it’s difficult to predict when the filter will need attention. This approach is inefficient. However, by properly modeling the filter’s service life, we can predict when maintenance is required, allowing for a more proactive and efficient upkeep schedule.

Predictive maintenance is particularly valuable from a circular economy perspective!

It optimizes inventory management, reduces waste, improves energy efficiency, lowers production costs, and facilitates upcycling processes.

To effectively determine the filter's saturation point, we can apply statistical physics to consider all possible system configurations. Since nature tends to favor energy minimization, given a certain amount of microplastics, we can explore different arrangements to find the lowest-energy, most probable configuration. The average number of microplastics in this configuration is the most likely to be observed in real life conditions.

One viable method to assess this is by creating a canonical ensemble, a concept from statistical mechanics that describes a physical system with controlled properties like particle number, volume, and temperature. Similar to our previous model, the canonical ensemble calculates the partition function Z. By accounting for interaction energies, we can determine the probabilities of different system configurations using the Boltzmann distribution, where lower-energy configurations are more likely to occur than higher-energy ones.

However, it is computationally expensive to calculate all possible arrangements for a large system like the one we designed. This is the reason why we chose two important settings for our calculation: (i) considering a small system and (ii) using Monte Carlo method for assessing configurations in a random way.

Essentially, the Monte Carlo (MC) method calculates random position states for the particle. By calculating the energy of a possible new state, it can be compared to the actual system. If the new energy is less than the previous one, the actual system is now the tested one. This strategy has two important advantages, which is (i) flexibility for many different problems, and (ii) simulating complex systems.

Similarly to the previous simulation, we defined the partition function considering free state using the free energy term and the bound state as the interaction energy. However, as we now need to calculate the thermodynamic stability, we can also calculate the Helmholtz free energy (F). The function allows us to comprehend the probability for each state to occur.

Equation 23

where k_B is the Boltzmann constant, T is the temperature, and Z the partition function.

With the set up system, we run a total of 2.000 systems with 50 steps for trying to insert microplastics into the system. The energy distribution from the simulation can be seen on Figure 29a, showing a range from -725 to -625 eV with a peak at -650 eV. This interaction energy distribution is related to the system configuration and the respective particle’s arrangement.

For this reason, the system within each bin of the histogram can either be similar or differ within the same group. To better assess this variability, we calculated the RMSD (root-mean-square deviation) values, as shown in Figure 29b. In the upper histogram, the energy distribution is displayed as in Figure 29a, while the right-hand histogram shows the RMSD distribution. Since we are considering a mesh size equivalent to 100 nm, the RMSD is measured in angstroms.

With a peak around 13.2 Å, we observe that the average contrast between the meshes is not significantly high. In contrast, at lower energy levels, the RMSD values show only slight increases, whereas at higher energy levels, they show slight decreases, indicating strong consistency across the simulated meshes.

To further illustrate this analysis, three different energy values are plotted. Comparing figures 29 (a-1) through (a-3), we can conclude that although the meshes may appear similar, the positioning of the microplastics is crucial to the overall energy. In (a-1), the particles are visibly more dispersed, while in (a-3), they are more aggregated. This demonstrates that when fewer binding sites are left unoccupied, the calculated energy is higher due to increased interaction energy.

Figure 29. Monte Carlo simulation result. On (a), the energy distribution according to its interaction energy with (a-1) to (a-3) illustrates three different energy states, and on (b) the RMSD jointplot showing the similarity between systems according to the energy distribution.
Figure 29. Monte Carlo simulation result. On (a), the energy distribution according to its interaction energy with (a-1) to (a-3) illustrates three different energy states, and on (b) the RMSD jointplot showing the similarity between systems according to the energy distribution.

Finally, with the calculated system, it is possible to analyze the microplastics distribution in the simulated meshes. On Figure 30a, the particle count distribution shows an average of around 27 to 30 particles, suggesting that the filter tends to stabilize at this concentration. Using the mesh energy, the Helmholtz Free Energy is then calculated, which reveals a minimum free energy occurring within this same particle range. This indicates that the system naturally stabilizes around this concentration due to energy minimization.

Figure 30. On (a), the microplastics count distribution is shown and on (b) the Helmholtz free energy in function of the particles count, showing a parabolic tendency.
Figure 30. On (a), the microplastics count distribution is shown and on (b) the Helmholtz free energy in function of the particles count, showing a parabolic tendency.

An interesting behavior appears naturally in the plot, which is a parabolic tendency. In statistical physics, this method is very used to simplify thermodynamic properties of complex systems for its equilibrium states. Fitting a quadratic function highlights this behavior, allowing us to calculate the minimum point for the function, which is equal to 28 microplastics particles.

In light of the modeling results, it is possible to affirm this approach was effective for assessing our system and its viability. Understanding the saturation concentration is fundamental for performing maintenance before the filter loses its ability to remove microplastics. Additionally, the model is designed to be rerun with experimental data, allowing for improved accuracy and alignment with real-world conditions. It can also serve as a valuable reference and inspiration for other iGEM teams working on similar projects.

Integrated Filter Modeling

Understanding the functioning of the B.A.R.B.I.E. filtration system and the influence of parameters on the filter's behavior is fundamental.

As a conclusion, the proposed mathematical model for the filter uses the results obtained from statistical models and atomic simulations in order to robustly and well-foundly describe the filter's operation, resulting in important information for the application and understanding of the project's feasibility.

To understand the filter modeling, we can first start by analyzing the total number of plastic particles retained in the filter (N). At the initial moment, this value should be zero, and over time, this value will increase at a rate of dN/dt. This rate of increase is related to the filter's efficiency over time (E(t)), as well as to the rate of microplastics passing through the filter, which can be given by the product of the water flow rate (v) and the concentration of microplastics in the water (C_MP). Thus, the general expression is given by:

Equation 24

Furthermore, the filter's efficiency should depend on the initial efficiency (E_0) and vary over time according to the saturation of the filter, that is, the number of retained particles (N) and the maximum number of particles (N_max), so that the expression can be given by:

Equation 25

Thus, substituting the term E(t) in the first equation, it is possible to find an ordinary differential equation in which the variation of the number of retained particles depends on the number of retained particles itself:

Equation 26

This expression basically describes the dynamics of the filter and has some interesting characteristics. First, it is possible to note that the term for initial efficiency (E_0) refers to the efficiency of the ideal filter and can be determined by the statistical capture model, that is, it depends on parameters such as the interaction energy between the CBM protein and the plastic, the interaction energy between the fiber and the plastic, the porosity of the filter, the average size of MPs, and the diameter of the fiber.

Second, the term for the maximum number of particles (N_max) can be found through the statistical physics model, meaning it not only depends on the interaction energy but also on the system's temperature and the chemical potential of MPs in water. Finally, the term for the water flow rate (v) is related to the porosity of the filter. The analytical solution for this expression is given by:

Equation 27

It can be noted that the amount of microplastics retained increases over time, but the retention rate decreases, such that the total number of retained particles tends to the maximum value of the system in equilibrium (N_max), which corresponds to the saturation of the filter.

Figure 31. Number of microplastics retained in the filter over time.
Figure 31. Number of microplastics retained in the filter over time.

It is also possible to find how the system's efficiency varies over time, where the efficiency decay over the filter's usage time is evident. From an application perspective, this result is excellent, as it would allow the maintenance of the filtration system based on the efficiency of microplastic capture.

Equation 28
Figure 32. Capture efficiency of the filter over time.
Figure 32. Capture efficiency of the filter over time.

Finally, to estimate the effectiveness of the filtration system, it is possible to describe the total number of filtered particles (N_f), considering that the filtration system should be replaced when the efficiency drops below the accepted limit (E_f). The expression is given by:

Equation 29
Figure 33. Total number of particles captured depending on the accepted efficiency.
Figure 33. Total number of particles captured depending on the accepted efficiency.

Thus, it is possible to define a limit efficiency of 0.5, meaning the minimum acceptable for the filtration system is to capture at least half of the microplastics in the water. The filter should be replaced once this value is reached. Based on the initial efficiency and saturation of the filtration system calculated, it is possible to determine that the system would be able to filter approximately 2.7e14 plastic particles per filter.

From laboratory to industry

The final goal for our project is impacting people’s lives and help them avoid consuming microplastics and its contaminants. Thus, it is fundamental to think in a scalability process to reach more people in an accessible way.

Kinetics model of protein production

Through the modeling of protein production kinetics, we obtained parameters that can guide the optimization of production and inform scaling strategies for larger systems.

In this section, we describe the mathematical framework and computational model developed to simulate the production of proteins in bench scale, following the Monod growth kinetics. The model was designed to estimate the dynamics of cell growth, substrate consumption, protein production, and the impact of purification steps. These aspects are crucial for future scaling-up models of the production process in an industrially relevant environment. Below, we outline the key equations and their implementation in our model.

The growth of bacterial cells follows the Monod equation, which describes how the specific growth rate (μ) depends on the substrate concentration. The Monod equation is expressed as:

Equation 30

where μ_{max} is the maximum specific growth rate, S represents the substrate concentration, and K_S is the substrate saturation constant. For our model, we assumed a μ_{max} of 0.6 h-1 and a K_S of 0.1 g/L, which are typical for E.coli BL21 grown in glucose.

As cells grow, their biomass concentration (X) changes over times according to:

Equation 31

This equation captures the rate of change of biomass, where the growth rate μ is dependent on the availability of substrate.

Simultaneously, the consumption of substrate is directly related to the production of biomass. The substrate consumption rate is described by:

Equation 32

Here, Y_{X/S} is the yield coefficient, which represents the efficiency of conversion from substrate to cells. In our model, we set Y_{X/S} = 0.5 grams of cell per gram of substrate, meaning that for every gram of substrate consumed, 0.5 g of cells are produced, this value is known for E. coli BL21 growing on glucose.

Protein production is initiated once the cell concentration reaches a critical point, corresponding to an optical density at 600 nm (OD600) of 0.8. This is approximately equivalent to a cell concentration of 0.184 g/L, based on established relationships between OD600 and cell biomass. At this point, the system is induced to produce the desired protein using a chemical inducer such as IPTG. The rate of protein production is proportional to the rate of biomass growth and can be expressed as:

Equation 33

where Y_{P/X} is the yield of grams of protein per gram of cells, set to 0.5 in this model, and k_{deg} = 0.1 h-1 represents the rate of protein degradation. These values were estimated based on typical values for similar systems. However, they can be fine-tuned and optimized by conducting experimental measurements specific to the target protein in our system. By gathering experimental data on the actual yield and degradation rates under the chosen production conditions, we can further refine the model to increase its accuracy and provide more precise predictions of protein production and resource utilization. This experimental validation is crucial to ensure that the model reflects the real behavior of the system when scaled up for industrial production.

Before induction (i.e., when X < X_{induction}), protein degradation occurs, but there is no net production. Once induction occurs, protein begins to accumulate in the system, with some loss due to degradation over time. In Figure 34, we observe the dynamics of cell growth, substrate consumption, and protein production over time in a batch culture system. The bacterial cell concentration (X) increases until it reaches the induction point, which corresponds to an OD600 of 0.8. At this point, protein production (P) is initiated, while substrate concentration (S) continues to decrease as it is consumed by the growing cells. After induction, protein production reaches a maximum and subsequently declines due to degradation, while cell growth reaches a plateau.

Figure 34. Dynamics of cell growth, substrate consumption, and protein production in a batch culture. Cells (X) grow and consume substrate (S) until the point of induction (dashed line), after which protein (P) is produced. The degradation of protein is also accounted for, leading to a decrease in protein concentration over time.
Figure 34. Dynamics of cell growth, substrate consumption, and protein production in a batch culture. Cells (X) grow and consume substrate (S) until the point of induction (dashed line), after which protein (P) is produced. The degradation of protein is also accounted for, leading to a decrease in protein concentration over time.

Following the production phase, we simulate the protein purification process, which consists of several stages, each with a specific recovery efficiency. First, the cell lysate undergoes a clarification step to remove cell debris. In our model, we assume a 2% loss of total protein during this stage. After clarification, the clarified lysate is processed through affinity chromatography, where the protein binds to a resin. This step typically recovers about 90% of the protein. The total binding capacity of the resin is defined by the product of its capacity (Q) and volume (V_{R}), as in:

Equation 34

In our model, we use a resin with a capacity of 10 mg/mL and a volume of 5 mL, providing a total binding capacity of 50 mg. The number of cycles required to process the entire protein load is determined by the total protein present and the capacity of the column:

Equation 35

Each cycle is assumed to take approximately one hour, and the total time for affinity chromatography is the product of the number of cycles and the time per cycle. The overall purification efficiency is the product of the recovery efficiencies of each stage:

Equation 36

Thus, the final amount of purified protein is determined by the combined efficiency of these steps.

This kinetic model was developed to describe protein production at the laboratory scale. The process follows a standard lab approach, including affinity purification, which is commonly employed in small-scale experiments.

Note: While this model was designed for bench-scale simulations, it could be adapted for large-scale production. However, adjustments would be necessary in the purification process, as the affinity purification techniques used in the laboratory are not directly scalable to industrial volumes without significant modifications.


Future directions

The developed kinetic model outlines the steps to estimate costs and optimize production for scaling protein synthesis from lab to industrial scale.

Scaling up the production of proteins from laboratory to industrial scale is a complex task. Understanding and predicting the costs associated with each step is critical for economic feasibility. The model we have developed provides a robust framework for estimating these costs considering the production process, from biomass generation to protein purification. By modeling the dynamics of cell growth, substrate consumption, and protein production, we can predict yields and link these predictions to the materials, reagents, and equipment required at each step of the process.

The complexity of industrial protein production means that the total cost is influenced by several factors, including the efficiency of resource utilization, the duration of each phase, and the recovery rates in downstream purification. Therefore, future accurate modeling of these variables on a larger scale will help to identify potential bottlenecks, optimize production, and estimate the financial investment needed for the scaling-up.

References

1 Yang, Z., Zeng, X., Zhao, Y. et al. AlphaFold2 and its applications in the fields of biology and medicine. Sig Transduct Target Ther 8, 115 (2023). https://doi.org/10.1038/s41392-023-01381-z
2 Rennison, A. P., Westh, P., & Møller, M. S. Protein-plastic interactions: The driving forces behind the high affinity of a carbohydrate-binding module for polyethylene terephthalate. Sci. Total Environ. 870, 161948 (2023). https://doi.org/10.1016/j.scitotenv.2023.161948
3 McNutt, A.T., Francoeur, P., Aggarwal, R. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021). https://doi.org/10.1186/s13321-021-00522-2
4 Pettersen, E.F., Goddard, T.D., Huang, C.C., Meng, E.C., Couch, G.S., Croll, T.I., Morris, J.H. & Ferrin, T.E. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021). https://doi.org/10.1002/pro.3943
5 Dauparas, J., Lee, G.R., Pecoraro, R., An, L., Anishchenko, I., Glasscock, C. & Baker, D. Atomic context-conditioned protein sequence design using LigandMPNN. bioRxiv https://doi.org/10.1101/2023.12.22.573103 (2023).
6 Weber, J., Petrović, D., Strodel, B., Smits, S.H.J., Kolkenbrock, S., Leggewie, C. & Jaeger, K.E. Interaction of carbohydrate-binding modules with poly(ethylene terephthalate). Appl. Microbiol. Biotechnol. 103, 4801–4812 (2019). https://doi.org/10.1007/s00253-019-09760-9
7 Koelmans, A. A. et al. Microplastics in freshwaters and drinking water: Critical review and assessment of data quality. Water Research 155, 410–422 (2019).
8 Cappelletti, S., Piacentino, D., Fineschi, V., Frati, P., D’Errico, S., & Aromatario, M. Mercuric chloride poisoning: symptoms, analysis, therapies, and autoptic findings. A review of the literature. Critical Reviews in Toxicology (2019).
9 Needleman, H. L. & Bellinger, D. The Health Effects of Low Level Exposure to Lead. Annual Review of Public Health 12, 111–140 (1991).
10 Mohammed Abdul, K. S., Jayasinghe, S. S., Chandana, E. P. S., Jayasumana, C. & De Silva, P. M. C. S. Arsenic and human health effects: A review. Environmental Toxicology and Pharmacology 40, 828–846 (2015).
11 Van Bruggen, A. H. C. et al. Environmental and health effects of the herbicide glyphosate. Science of The Total Environment 616-617, 255–268 (2018).
12 Argaluza, J. et al. Environmental pollution with psychiatric drugs. World Journal of Psychiatry 11, 791–804 (2021).
13 Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024). https://doi.org/10.1126/science.adl2528
14 Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012). https://doi.org/10.1093/bioinformatics/bts565
15 Flint, J., Nurizzo, D., Harding, S.E., Longman, E., Davies, G.J., Gilbert, H.J. & Bolam, D.N. Ligand-mediated dimerization of a carbohydrate-binding module reveals a novel mechanism for protein–carbohydrate recognition. J. Mol. Biol. 337, 417–426 (2004). https://doi.org/10.1016/j.jmb.2003.12.081
16 Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w
17 Andersson, M., Jia, Q., Abella, A. et al. Biomimetic spinning of artificial spider silk from a chimeric minispidroin. Nat. Chem. Biol. 13, 262–264 (2017). https://doi.org/10.1038/nchembio.2269
18 Yu, D., Chojnowski, G., Rosenthal, M. & Kosinski, J. AlphaPulldown—a python package for protein–protein interaction screens using AlphaFold-Multimer. Bioinformatics 39, btac749 (2023). https://doi.org/10.1093/bioinformatics/btac749
19 Kern, N.R., Lee, J., Choi, Y.K. et al. CHARMM-GUI Multicomponent Assembler for modeling and simulation of complex multicomponent systems. Nat. Commun. 15, 5459 (2024). https://doi.org/10.1038/s41467-024-49700-4
20 Abraham, M.J., Murtola, T., Schulz, R., Páll, S., Smith, J.C., Hess, B. & Lindahl, E. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015). https://doi.org/10.1016/j.softx.2015.06.001
21 Maul, J., Frushour, B. G., Kontoff, J. R., Eichenauer, H., Ott, K.-H., & Schade, C. Polystyrene and Styrene Copolymers. Ullmann’s Encyclopedia of Industrial Chemistry. doi:https://doi.org/10.1002/14356007.a21_615.pub2
22 Kumari, R., Kumar, R., Open Source Drug Discovery Consortium & Lynn, A. g_mmpbsa—A GROMACS tool for high-throughput MM-PBSA calculations. J. Chem. Inf. Model. 54, 1951–1962 (2014). https://doi.org/10.1021/ci500020m
23 Andersson, M., Jia, Q., Abella, A. et al. Biomimetic spinning of artificial spider silk from a chimeric minispidroin. Nat Chem Biol 13, 262–264 (2017).
24 Huang, J. & MacKerell, A.D. Jr. CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J. Comput. Chem. 34, 2135–2145 (2013). https://doi.org/10.1002/jcc.23354
25 Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996). https://doi.org/10.1016/0263-7855(96)00018-5
26 Dawson, A.L., Bose, U., Ni, D. et al. Unravelling protein corona formation on pristine and leached microplastics. Microplast. Nanoplast. 4, 9 (2024). https://doi.org/10.1186/s43591-024-00086-6
27 Dell'Orco, D., Lundqvist, M., Oslakovic, C., Cedervall, T. & Linse, S. Modeling the time evolution of the nanoparticle-protein corona in a body fluid. PLoS One 5, e10949 (2010). https://doi.org/10.1371/journal.pone.0010949
28 Doran, P. M. Bioprocess Engineering Principles. Elsevier eBooks (Elsevier BV, 2013). doi:https://doi.org/10.1016/c2009-0-22348-8
29 Lytle, D. A. et al. Lead Particle Size Fractionation and Identification in Newark, New Jersey’s Drinking Water. Environmental Science & Technology 54, 13672–13679 (2020).
30 Ali, N., Katsouli, J., Marczylo, E.L., Gant, T.W., Wright, S. & Bernardino de la Serna, J. The potential impacts of micro-and-nano plastics on various organ systems in humans. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2023.161948 (2023).
31 H. Höcker, Shih, H. & Flory, P. J. Thermodynamics of polystyrene solutions. Part 3.—Polystyrene and cyclohexane. Transactions of the Faraday Society 67, 2275–2281 (1971).