Modeling

Our in silico models of our proteins of interest

Purpose and Methods Keratin 31 Q-Sensor and Biosensor KerDZ (Keratinase) KerDZ and Biosensor Interaction KerDZ and Keratin MaSP1 (Mini-Spidroin) CelCD Conclusions and Future Directions

Purpose and Methods

In order to visualize the protein encoded by our selected gene sequences, we modeled each protein using the amino acid sequences derived from coding sequences from NCBI or literature with our additional modifications (e.g., 6XHis-tags). This modeling was done using AlphaFold2 AlphaFold2
as it generates models with high accuracy and high similarity to experimentally solved structures.⁶ These models were used to assess the folding of our selected proteins as well as the likelihood of the folding of the synthetic constructs designed for our project. In some cases, interactions between proteins (protein-protein docking) were also modeled using the HDOCK HDOCK
server to assess the likelihood of the reaction prior to in vitro assays in the lab.^{5, 11, 12, 13}

All modeling was completed with AlphaFold2 using Google CoLab or Neurosnap (for larger proteins that required more CPU). AlphaFold2 in Google CoLab and Neurosnap both provide access to AlphaFold2 with a user-friendly interface.^{6, 8} AlphaFold2 generates several models and ranks them based on different factors. AlphaFold2 completes analyses of the models using predicted local distance difference tests (pLDDT) to show the confidence of the model at each residue.⁶ Additionally, AlphaFold2 produces a sequence coverage plot to display the regions where the model was based upon experimentally solved structures. To further assess models, Ramachandran plots of the highest ranking AlphaFold2 model were made using SWISS-MODEL SWISS-MODEL
for each of the proteins to analyze the phi and psi angles of each amino acid residue.¹⁰ Animated GIFs of the models were generated with FirstGlance and Jmol.⁴

**Fig 1. |** Colour scheme for predicted local distance difference test (pLDDT) of protein models in subsequent figures.⁴ Higher pLDDT indicates higher confidence of the amino acid residue’s positioning.

Keratin 31

Keratin 31 from sheep (Ovis aries) was selected as the focus of our project as wool is primarily derived from sheep. Keratin was modeled in AlphaFold2 to visualize the structure and to produce a model to use in protein-protein interactions with a keratinase. Overall, the model displayed high confidence throughout the structure. Regions of low confidence likely represent regions involved in intermolecular interactions within a native wool fiber.² The Ramachandran plot also implies that the areas of low confidence do not have acceptable phi and psi angles. This was expected due to the nature of interactions of these regions.

**Fig 2b. |** Keratin 31 multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2.

**Fig 2c. |** Keratin 31 model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance.

**Fig 2d. |** Keratin 31 Ramachandran plot generated by SWISS-MODEL.

Q-Sensor and Biosensor

These models allowed us to test the folding of our FRET-compatible linkers (RL1, RL2, and RL3) for our Q-sensor and biosensor.⁷ Modeling also visualized and verified the interactions between the domains involved in the quenching system (mScarlet and ShadowR) for both our Q-sensor and biosensor. The linkers used had low pLDDT values in both the Q-sensor and biosensor, however, this was expected as AlphaFold2 is unable to fully model linkers. The regions where the linkers were did not have coverage as shown by the sequence coverage plots. Despite this, the linkers do show alpha helical structures and allow the quenching system to interact. The Ramachandran plots for the linkers generally show that the phi and psi angles are in acceptable ranges, which provides some validation for the generated structure of the linkers. For the biosensor, we modeled with a placeholder for the nanobody. We did not complete the final model containing the nanobody as we did not continue with developing the biosensor due to what we learned in our integrated human practices. Regardless, our models for the biosensor suggested low confidence and unacceptable phi and psi angles in some of the regions of the natural keratin protein, similar to our keratin 31 model. These regions were associated with the hydrophobic regions found in natural keratin that assist with natural protein-protein interactions in a native fiber.² This may be a future area of focus and an engineering opportunity for iGEM teams that would like to develop a biosensor containing keratin 31.

**Fig 3a. |** Q-Sensor predicted local distance difference test (pLDDT) plot generated by AlphaFold2. Regions of low pLDDT represent the linkers where AlphaFold2 could not accurately model due to the lack of homologs.

**Fig 3b. |** Q-Sensor multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2. Regions of low coverage represent the linkers with a lack of homologs.

**Fig 3c. |** Q-Sensor model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance.

**Fig 3d. |** Q-sensor Ramachandran plot generated by SWISS-MODEL.

**Fig 4a. |** Biosensor predicted local distance difference test (pLDDT) plot generated by AlphaFold2. Regions of low pLDDT represent the linkers where AlphaFold2 could not accurately model due to the lack of homologs as well as regions of keratin’s natural hydrophobic regions for intermolecular interactions.

**Fig 4b. |** Biosensor multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2. Regions of low coverage represent the linkers with a lack of homologs.

**Fig 4c. |** Biosensor model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance.

**Fig 4d. |** Biosensor Ramachandran plot generated by SWISS-MODEL.

KerDZ (Keratinase)

As suggested, KerDZ has two different domains.³ The first is the pro-protein domain which is involved in the proper in vivo folding of the protein. This domain is cleaved upon the protein’s secretion by other proteases. As such, this domain was not modeled as it does not complete the cleaving of keratin. The other domain was the protease domain containing the catalytic site involved in keratin degradation. This domain was modeled separately based on the amino acid region that was part of the proteolytic domain and used as a standalone to better represent the interactions between KerDZ and keratin within in vitro assays or natural keratin degradation. This model had high confidence for all of the residues which is due to the alignment with experimentally derived protein structures as shown by the sequence coverage. The Ramachandran plot also shows that nearly all of the residues have acceptable phi and psi angles.

**Fig 5a. |** Domains of *kerDZ* derived from NCBI’s Conserved Domain Search.⁹

**Fig 5b. |** KerDZ (Keratinase) predicted local distance difference test (pLDDT) plot generated by AlphaFold2.

**Fig 5c. |** KerDZ (Keratinase) multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2.

**Fig 5d. |** KerDZ (Keratinase) model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance.

**Fig 5e. |** KerDZ Ramachandran plot generated by SWISS-MODEL.

KerDZ and Biosensor Interaction

Preliminary modeling for the interaction between the keratinase (KerDZ) and the biosensor (containing keratin 31) to visualize how KerDZ would interact with our synthetic construct was also completed using HDOCK.^{5, 11, 12, 13} To do this, the highest quality models (highest ranking model) for the protease domain of KerDZ and the biosensor were used as the inputs. HDOCK completes protein docking and generates several models of the highest likely interactions between the two chains. The output provides a 3D model that highlights the domains where the interaction occurs. It also yields a docking score and confidence score. These results showed that there is high confidence (> 0.90) that the KerDZ interacts with the keratin 31 portion of our synthetic construct, implying proper catalytic activity. This also has implications for the interaction with natural keratin 31 found in textiles. Docking scores below -200 are also indicative of a good model. The docking scores all are < -200 for the top 10 models. The ligand rmsd is not a measure of the docking accuracy and is only used to show the difference between the 3D structure of the model and the input structures.

**Fig 6. |** KerDZ (Keratinase) and Biosensor interaction model (Rank 1) generated by HDOCK and visualized with Jmol. GIF was generated by FirstGlance.

**Table 1. |** HDOCK output containing docking score, confidence score, and ligand rmsd of the top 10 protein-protein interactions for KERDZ and biosensor (containing keratin 31).

KerDZ and Keratin

Similarly, an HDOCK model was done for KerDZ and keratin. The results also suggested that there are interactions that can occur between KerDZ and keratin. Confidence scores are slightly lower than the KerDZ and biosensor prototype model, however, the docking scores are around -200, suggesting the interaction is likely. Furthermore, a confidence above 0.7 still suggests a likely interaction.^{5, 11, 12, 13} The ligand rmsd is not a measure of the docking accuracy and is only used to show the difference between the 3D structure of the model and the input structures. This models only one of the many interactions with natural wool proteins. Modeling other interactions of natural keratin proteins with KerDZ may be done.

**Fig 7b. |** KerDZ (Keratinase) Dand Keratin 31 interaction model (Rank 1) generated by HDOCK and visualized with Jmol. GIF was generated by FirstGlance.

**Table 2. |** HDOCK output containing docking score, confidence score, and the ligand rmsd of the top 10 protein-protein interactions for KERDZ and keratin 31.

MaSP1 (Mini-Spidroin)

Additionally, we modeled the monomer of the synthetic spider silk protein (mini-spidroin, (A₃I)₃-A₁₄) to visualize the structure. The model of the monomer shows the formation of several alpha helices which is expected for soluble mini-spidroin, specifically the N and C termini.¹ There are regions of high and low confidence. The low confidence regions (alanine-rich) are involved in the protein-protein interactions when forming the spider silk fiber and should form beta sheets when the fiber forms. We also modeled the interaction of two monomers with AlphaFold2 in Neurosnap to visualize a potential interaction. This consisted of two identical chains of the mini-spidroin. This model also resulted in regions of high and low confidence. Spider silk fibers may consist of several proteins, therefore, it is likely that protein modeling consisting of several constituents of spider silk may be required to yield a high quality model. This may be a future avenue for iGEM teams to come. Furthermore, different amino acids may be ideal to form the beta sheets required in fiber formation which may be tested by future iGEM teams.¹ The Ramachandran plots also highlight the regions with low pLDDT as these regions also had unacceptable phi and psi angles which suggests these regions are not accurately modeled with the current software available.

MaSP1 Monomer

**Fig 8a. |** Mini-spidroin (monomer) predicted local distance difference test (pLDDT) plot generated by AlphaFold2. Low pLDDT (confidence) regions represent the alanine-rich region of the protein involved in interactions that form the fiber.

**Fig 8b. |** Mini-spidroin (monomer) multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2..

**Fig 8c. |** Mini-spidroin (monomer) model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance. Low confidence regions (red) represent the alanine-rich region of the protein involved in interactions that form the fiber.

**Fig 8d. |** Mini-spidroin (monomer) Ramachandran plot generated by SWISS-MODEL.

MaSP1 Dimer

**Fig 8e. |** Mini-spidroin (dimer) predicted local distance difference test (pLDDT) plot generated by AlphaFold2.

**Fig 8f. |** Mini-spidroin (dimer) multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2.

**Fig 8g. |** Mini-spidroin (dimer) model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance.

**Fig 8f. |** Mini-spidroin (dimer) Ramachandran plot generated by SWISS-MODEL.

CelCD

The cellulase, CelCD, was modeled with AlphaFold2 in Neurosnap. Overall, the model had high confidence which is likely due to the high sequence alignment with experimental structures as suggested by the sequence coverage plot. Further directions with this model may be to model the interactions with cellulose molecules. As this is an enzyme, modifications to its structure may be modeled to optimize enzymatic activity. Overall, the Ramachandran plot shows that the majority of the residues have acceptable phi and psi angles which verifies the confidence of the model.

**Fig 9b. |** CelCD (Cellulase) multiple sequence alignment (MSA) sequence coverage plot generated by AlphaFold2.

**Fig 9c. |** CelCD (Cellulase) model generated by AlphaFold2 and visualized with Jmol. GIF was generated by FirstGlance.

**Fig 9d. |** CelCD Ramachandran plot generated by SWISS-MODEL.

Conclusions and Future Directions

Overall, the modeling completed provides preliminary knowledge into the parts we have worked on, such as their folding, different domains that may be of interest, and their interactions with other molecules. In return, this generates new avenues for future work. Future directions for protein modeling would include remodeling with newer software, such as AlphaFold3, and refining these models with molecular dynamics. Studying interactions between monomers of spidroin may also be done to provide a better understanding of the fibers produced. This may include constructing polymers with one type of spidroin or polymers consisting of several different spidroins (similar to a natural spider silk fiber). Furthermore, there are many helpful tools in optimizing enzymes with protein modeling. This may be a future avenue to include in the engineering cycle of keratinase and cellulase production. For example, NeuroFold (in NeuroSnap) provides an easy to use interface to optimize an enzyme in different conditions.⁸ Moreover, deeper analysis of these enzymes with their ligands and substrates may be completed with protein docking.

References

Arndt, T.; Greco, G.; Schmuck, B.; Bunz, J.; Shilkova, O.; Francis, J.; Pugno, N.; Jaudzems, K.; Barth, A.; Johansson, J.; Rising, A. Engineered spider silk proteins for biomimetic spinning of fibers with toughness equal to dragline silks. Adv. Funct. Mater. 2022, 32(23), 2200986. https://doi.org/10.1002/adfm.202200986
Dowling, L. M.; Crewther, W. G.; Parry, D. A. Secondary structure of component 8c-1 of α-keratin. An analysis of the amino acid sequence. Biochem. J. 1986, 236(3), 705-712. https://doi.org/10.1042/bj2360705
Elhoul, M. B.; Jaouadi, N. Z.; Rekik, H.; Benmrad, M. O.; Mechri, S.; Moujehed, E.; Kourdali, S.; El Hattab, M.; Badis, A.; Bejar, S.; & Jaouadi, B. Biochemical and molecular characterization of new keratinoytic protease from Actinomadura viridilutea DZ50. Int. J. Biol. Macromol. 2016, 92, 299-315. https://doi.org/10.1016/j.ijbiomac.2016.07.009
FirstGlance in Jmol http://FirstGlance.Jmol.Org (accessed 2024-09-10).
Huang, S. Y.; Zou, X. An iterative knowledge‐based scoring function for protein–protein recognition. Proteins. 2008, 72(2), 557-579. https://doi.org/10.1002/prot.21949
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nat. 2021, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
Kolossov, V. L.; Spring, B. Q.; Sokolowski, A.; Conour, J. E.; Clegg, R. M.; Kenis, P. J.; Gaskins, H. R. Engineering redox-sensitive linkers for genetically encoded FRET-based biosensors. Exp. Biol. Med, 2008, 233(2), 238-248. https://doi.org/10.3181/0707-RM-192
Neurosnap—Computational Biology, Simplified. https://neurosnap.ai/ (accessed 2024-06-07).
Wang, J.; Chitsaz, F.; Derbyshire, M. K.; Gonzales, N. R.; Gwadz, M.; Lu, S.; Marchler, G. H.; Song, J. S; Thanki, N.; Yamashita, R.; et al. The conserved domain database in 2023. Nucleic Acids Res. 2023, 51(D1), D384-D388. https://doi.org/10.1093/nar/gkac1096
Waterhouse, A. M.; Studer, G.; Robin, X.; Bienert, S.; Tauriello, G.; Schwede, T. The structure assessment web server: for proteins, complexes and more. Nucleic Acids Res. 2024, 52(W1), W318-W323. https://doi.org/10.1093/nar/gkae270
Yan, Y.; Tao, H.; He, J.; Huang, S. Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 2020, 15(5), 1829-1852. https://doi.org/10.1038/s41596-020-0312-x
Yan, Y.; Wen, Z.; Wang, X.; Huang, S. Y. Addressing recent docking challenges: A hybrid strategy to integrate template‐based and free protein‐protein docking. Proteins. 2017, 85(3), 497-512. https://doi.org/10.1002/prot.25234
Yan, Y.; Zhang, D.; Zhou, P.; Li, B.; Huang, S. Y. HDOCK: a web server for protein–protein and protein–DNA/RNA docking based on a hybrid strategy. Nucleic Acids Res. 2017, 45(W1), W365-W373. https://doi.org/10.1093/nar/gkx407