Model | KULeuven - iGEM 2024

Project workflow depicting movement from literature search to structure prediction to optimization

Literature research: Finding proteins that naturally bind metals

1. Selection of Target Proteins

The first step in our dry lab workflow was to find specific metal-binding proteins that would be used in the lab and also become our targets for further optimization through software prediction and simulation. An initial round of selection was performed on a group of proteins found through literature analysis. The structure of the proteins was determined via Alphafold [1]. Parameters like confidence of predicted structure, evolutionary distance from E. coli, and target metal ligands were used to further reduce the list. E. coli already has its own native metal-binding protein modA which binds metals that are practical for iGEM lab conditions (Mo and W). While the literature has confirmed the identity of the metals which bind to modA, in order to proceed with further modeling in the dry lab, we needed to find a software that would provide us with a baseline of quantitative information for binding. Unfortunately, we were not able to find a software that could predict Mo or W binding as the majority of software is trained on data for proteins that bind more common metals like Cu and Zn. We ended up pivoting to using proteins Csp1 (Cu binding, from Methylosinus trichosporium) and SmtB (Cd and Zn binding, from Synechococcus sp. PCC7942).

Determine structure with AlphaFold and metal binding capacity with MIB2

2. Assessment of Baseline Metal Binding

[2] developed a metal ion-binding site prediction and modeling server called MIB2. This server allowed us to input amino acid sequences and identify potential residues involved in metal ion binding as well as simulate docking of the metal ion within the protein's three-dimensional structure.

Selective mutation of proteins with pMPNN

3. Generation of Optimized Sequences

Based on the results of MIB2, we then went ahead and utilized another software tool ProteinMPNN from [3] to design new protein sequences through deep learning. Based on the residues identified in the previous step, we instructed ProteinMPNN to keep the binding residues unchanged but allow for mutation in any other parts of the sequence. Ideally, one of these mutations would result in a gain-of-function mutation that would improve metal binding through structural stability and provide increased molecular contacts with the ligand. Through ProteinMPNN, 10 mutated sequences were generated for our target proteins as well as their predicted structures through AlphaFold.

Evaluate best mutated sequences, i.e. those with high metal binding capacity, solubility and stability, and send to wet lab for experimental verification.

4. Verification of Optimization

The plan was then to compare our quantitative results from MIB2 on the wild-type proteins to the newly generated mutated constructs. At this point, the MIB2 server had crashed and did not allow us to verify our results. In order to continue with the experimental workflow while MIB2 was not available, other software was utilized like mebipred from [4] and LMetalSite from [5] to establish any improvement from wild type. Based on these results, we sent specific mutated sequences to wet lab to experimentally determine whether the predicted binding capacity could be confirmed. We were able to find a more quantitative software from [6] called CheckMyMetal as a replacement for MIB2. After comparing the wild-type and the specific mutated sequences, we were able to show that there were small improvements in the predictions to binding their respective metals which helped inform the wet lab’s experimental workflow.

5. Machine Learning Model

In our project, we aimed to develop a machine-learning model that could identify the key features of metal-binding proteins by learning from a library of mutated variants. Our goal was to construct a neural network that would analyze this library, recognize important mutations, and predict new, potentially more effective sequences for metal-binding proteins. To build this library, we based our approach on the hypothesis that mutations improving the metal-binding capabilities of proteins would enhance the survival rate of E. coli in high metal concentrations. Thus, our plan involved error-prone PCR to generate a diverse set of mutations in the target metal-binding protein. We then planned to transform these mutated genes into E. coli and screen the resulting library by exposing it to a slightly higher metal concentration than the original E. coli strain with the unmutated protein could survive.

By doing this, we expected that E. coli strains which survived the increased metal concentration would possess more effective metal-binding proteins. This survival would imply that the metal-binding protein mutations in those strains were beneficial, allowing us to select these variants for inclusion in our library. Our neural network model would then learn from this library to identify which specific mutations contribute to improved metal-binding efficiency. However, the results from the toxicity assays were not as conclusive as we had hoped. A clear, consistent trend in protein effectiveness was crucial for building the library, as it would form the training data for our neural network.

References

[1] Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., ... & Jumper, J. M. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3.  Nature, 630, 493–500. https://doi.org/10.1038/s41586-024-07487-w

[2] Lu, C. H., Chen, C. C., Yu, C. S., Liu, Y. Y., Liu, J. J., Wei, S. T., & Lin, Y. F. (2022). MIB2: metal ion-binding site prediction and modeling server.  Bioinformatics (Oxford, England), 38(18), 4428–4429. https://doi.org/10.1093/bioinformatics/btac534

[3] Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Courbet, A., de Haas, R. J., Bethel, N., Leung, P. J. Y., Huddy, T. F., Pellock, S., Tischer, D., Chan, F., Koepnick, B., Nguyen, H., Kang, A., Sankaran, B., Bera, A. K., … Baker, D. (2022). Robust deep learning-based protein sequence design using ProteinMPNN.  Science (New York, N.Y.), 378(6615), 49–56. https://doi.org/10.1126/science.add2187

[4] Aptekmann, A. A., Buongiorno, J., Giovannelli, D., Glamoclija, M., Ferreiro, D. U., & Bromberg, Y. (2022). mebipred: identifying metal-binding potential in protein sequence.  Bioinformatics (Oxford, England), 38(14), 3532–3540. https://doi.org/10.1093/bioinformatics/btac358

[5] Yuan, Q., Chen, S., Wang, Y., Zhao, H., & Yang, Y. (2022). Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning.  Briefings in bioinformatics, 23(6), bbac444. https://doi.org/10.1093/bib/bbac444

[6] Zheng, H., Cooper, D. R., Porebski, P. J., Shabalin, I. G., Handing, K. B., & Minor, W. (2017). CheckMyMetal: a macromolecular metal-binding validation tool. Acta crystallographica. Section D, Structural biology, 73(Pt 3), 223–233. https://doi.org/10.1107/S2059798317001061