Loading...
Modeling was used extensively in our project to optimize our experimental parameters, explore additional protein binders, and aid in our understanding of how our solution could realistically be implemented in a clinical setting. Our models formed a core component of our iGEM work.
There are various ways of attaching an antibiotic to our ABM and different methods differ greatly in the rate at which they break down and release an antibiotic. In order to decide on what an ideal conjugation method’s breakdown rate would be, we modeled how the pharmacokinetics of a conjugated antibiotic dose changed as we changed the deconjugation rate constant.
To begin modeling we started by creating a pharmacokinetic model without our ABM. This lets us check our results against models and dosing strategies used in literature and in clinical settings. We initially decided to use a two compartment model as these can be more accurate though we later switched to one compartment models as they are easier to find data for.
An example of a two compartment model, the drug moves between the central compartment and peripheral compartment but is only eliminated in the central compartment. The Dose_Central species is for oral drugs which are absorbed into the body at a relatively slow rate.
Wetlab had chosen third generation cephalosporins as our target antibiotic class because of their shorter half lives and extensive use particularly in treating febrile neutropenia. Because of this we chose to use cefotaxime’s data for our initial model [1]. The results are shown to the right. These results were consistent with a dosing schedule for moderate to severe infections which gives 2g every 8 hours.
There are various ways of attaching an antibiotic to our ABM and different methods differ greatly in the rate at which they break down and release an antibiotic. In order to decide on what an ideal conjugation method’s breakdown rate would be, we modeled how the pharmacokinetics of a conjugated antibiotic dose changed as we changed the deconjugation rate constant.
Having created a benchmark model we then added our ABM to it, adding a Conjugated_Antibiotic and ABM species representing the antibiotic-ABM-albumin complex and ABM-antibiotic complex respectively. Because our moiety is tightly bound to albumin and is relatively small we used albumin’s data for our ABM both when bound and unbound. This gave us all the data we needed to model the effect of our ABM linker’s deconjugation constant. Shown below is the architecture of our model.
After interviewing clinicians, Human Practices found that extending dosing intervals from once every 8 hours (q8h) to once every 12 hours (q12h) or once a day (q24h) would be ideal. We found that a deconjugation constant (kd) of around 0.8 h−1 is the maximum value for q12h dosing, with the minimum being 0.5 h−1. For q24h dosing, the maximum deconjugation constant was 0.23 h−1 and the minimum was 0.16 h−1. When the deconjugation rate exceeds 0.8 h−1, the concentration of cefotaxime drops below the MIC before 9.6 hours, resulting in a %T>MIC of less than 80%. Conversely, if the rate falls below 0.16 h−1, the extension becomes too long to be effective.
Once wetlab settled on esterification as their conjugation method, we were able to fully model the effect of our ABM. The deconjugation constant was estimated to be 0.1-0.2 h−1. Around the same time, Human Practices were deciding on which antibiotic to use, and we were able to assist. Human Practices decided on five candidate antibiotics, shown in the table below, along with the pharmacokinetic parameters and the sources of the models we found. We modeled each one and determined which gave the best %T>MIC with an improved dosing strategy.
Antibiotic | Dose | Molar Mass | Ke | k12 | k21 | MIC mg/L | %T>MIC q12h | %T>MIC q24h | Source |
---|---|---|---|---|---|---|---|---|---|
ABM | N/A | N/A | 0.0021 | 0.0271 | 0.0168 | N/A | N/A | N/A | Jonghan Kim, William L. Hayton, John M. Robinson, Clark L. Anderson, Kinetics of FcRn-mediated recycling of IgG and albumin in human: Pathophysiology and therapeutic implications using a simplified mechanism-based model, Clinical Immunology, Volume 122, Issue 2, 2007, Pages 146-155, ISSN 1521-6616, https://doi.org/10.1016/j.clim.2006.09.001. |
Cefotaxime | 2g | 455 | 1.8 | 1.05 | 1.21 | 0.5 | 222.80% | 111.40% | Fu KP, Aswapokee P, Ho I, Matthijssen C, Neu HC. Pharmacokinetics of cefotaxime. Antimicrob Agents Chemother. 1979 Nov;16(5):592-7. doi: 10.1128/AAC.16.5.592. |
Cefdinir | 2g | 395 | 0.48 | N/A | N/A | 8 | 0.00% | 0.00% | 박경윤, “Population pharmacokinetic analysis of cefdinir following a single oral dose in healthy adults,” https://doi.org/000000145211. |
Ceftriaxone | 2g | 554.48 | 0.1169 | 0.1344 | 3.962 | 0.5 | 362.23% | 181.11% | Patel IH, Chen S, Parsonnet M, et al. Pharmacokinetics of ceftriaxone in humans. Antimicrob Agents Chemother. https://doi.org/10.1128/aac.20.5.634. |
Ceftazidime | 2g | 546.58 | 0.4415 | N/A | N/A | 8 | 102.93% | 51.46% | Leroy A, Leguy F, Borsa F, Spencer GR, Fillastre JP, Humbert G. Pharmacokinetics of ceftazidime in normal and uremic subjects. Antimicrob Agents Chemother. 10.1128/AAC.25.5.638. |
Cefepime | 2g | 480.56 | 0.4545 | N/A | N/A | 8 | 105.49% | 52.75% | Shi Z, Chen X, Tian L, Wang Y, Zhang G, et al. Population Pharmacokinetics and Dosing Optimization of Ceftazidime in Infants. Antimicrob Agents Chemother. https://doi.org/10.1128/aac.02486-17. |
Human practices chose cefepime as a candidate for our target antibiotic as it has a short half life and lacks alternative methods of extending it’s halflife. In addition cefepime is used frequently and so finding a way to improve its dosing will give a greater impact.
To model cefepime’s pharmacokinetics we used a one-compartment model as shown below. We then varied the deconjugation constant in increments of 0.01 h⁻¹ from 0.00 to 1.00 and created a 3d graph showing antibiotic vs time vs deconjugation constant from the data generated.
Below is a graph showing the relationship between drug concentration, time, and deconjugation constant for a 2g dose of cefepime. The graph indicates that the optimal deconjugation constant is around 0.1 h−1, where the antibiotic concentration crosses the MIC at 12.6 hours. The maximum deconjugation constant for q12h dosing is approximately 0.3 h−1, while the minimum is 0.05 h−1, ensuring %T>MIC remains above 80%.
Shown below is a graph illustrating the relationship between drug concentration, time, and deconjugation constant for a 6g dose of cefepime. This graph indicates that the optimal deconjugation constant is approximately 0.08 h−1, at which point the antibiotic concentration crosses the MIC at 29.9 hours. The maximum deconjugation constant for q24h is around 0.15 h−1, while the minimum is 0.02 h−1, ensuring that %T>MIC remains above 80%.
As we modeled cefepime we realized that the toxicodynamic indicators, AUC and CMax, of cefepime were lower when given as an ABM. This meant that in theory we could increase the dose of cefepime while still having lower toxicity than a normal dose of cefepime giving us a longer dosing schedule without increased toxicity.
Earlier models for cefepime were modified for doses of 4g, 6g and 8g and doses of a mixture between conjugated and unconjugated antibiotic with combined mass of 6g were studied.
Dose | AUC | CMax |
---|---|---|
Unconjugated (2g per 8hr) over 24h | 741.6876 | 57.6145 |
Unconjugated (2g) single dose | 203.0432 | 57.6145 |
2g ABM single dose | 203.3449 | 15.2316 |
4g ABM single dose | 406.7785 | 30.4700 |
6g ABM single dose | 610.1743 | 45.7054 |
8g ABM single dose | 813.5851 | 60.9453 |
Both area under curve(AUC) and maximum concentration(CMax) are lower for ABM doses of 2g, 4g and 6g compared to 2g unconjugated doses with similar dosing schedules. However for an 8g ABM dose AUC and CMax exceeds that of a normal dose.
In terms of dosing schedule 6g and 8g doses are suitable for a q24h dosing schedule given their time over minimum inhibitory concentration. The 2g is suitable for a q12h and the 4g dose is not suitable for either q12h or q24h.
The makeup of a 6g dose was varied and we found that around 85% of a 6g dose should be made up of our ABM for a q24h dosing schedule as this gets us 80% T>MIC.
Our objective is to attach a drug to an ‘Albumin Binding Moiety (ABM)’ which will bind to albumin in the bloodstream. The goal is to see if this binding results in our drug depleting at a lower rate, resulting in an extended half-life.
The drug we will be running simulations on will be cefepime, a cephalosporin. The ABM we will be conjugating is Maleimide-peg-oh Ester.
PK-Sim was used for modeling.
In PK-Sim, we entered information for cefepime and modeled a 6 gram dose to get 24 hour coverage. Simulations were done specifically in the kidneys as cefepime is mainly cleared renally.
The extended half-life should be from our cefepime-ABM binding to albumin in the bloodstream, but there is a lack of available PK data for our ABM. This means we can’t simulate the effect of the conjugated ABM in PK-Sim. To get around that, we changed the ‘fraction unbound (fu)’ value in PK-Sim to replicate the effects of cefepime-ABM binding to albumin. This parameter represents the amount of drug bound to albumin in the bloodstream. A high fu means a higher fraction of our drug is unbound within the bloodstream and a low fu means more of our drug is bound to albumin within the bloodstream. When modeling, we can choose a low fu to replicate the effect of ABM binding.
The individual simulated on was a PK-Sim’s healthy European male and administration was done through IV.
The dosage seems very high but it was calculated through another team member’s experiments to achieve 24 hours of coverage. The main outcome of this experiment is to model the effects of our ABM extending the half-life of our drug.
Almost everything collected was from literature. Other values were standard per PK-Sim and automatically inputted. The loading dosage was experimentally determined from a Kinetics member to achieve an effective 24 hour coverage.
Parameter | Value | Source/Reference |
---|---|---|
Renal Clearance (ml/min/kg) | 2.1 | Paper: Obach et al. 2008 [8] |
Lipophilicity (log units) | 0.6914 | Paper: AboulMagcl et al. 2021 [9] |
Molecular Weight (g/mol) | 517 | |
Loading Dose (g) | 6 | Experimentally |
Solubility (mg/ml) | 10 | Product Information: Cayman Chemical[10] |
MIC (mg/l) | 8 | Paper: Weinstein et al. 2020[11] |
Neurotoxic Concentration Level (mg/l) | 20 | Paper: Boschung-Pasquier et al 2020 [12] |
Cefepime Kidney Concentration. 6 gram dosage with default fu value
We modeled 3 different fraction unbound values: 0.2, 0.55 and 0.75. 0.2 simulates the effect of ABM binding, 0.55 is the default fu value for PK-Sim, and 0.75 is just an example where most of a drug is freeflow in the bloodstream.
As seen in the 2nd graph, a lower fraction unbound value gives higher drug concentrations and higher fraction unbound values give lower drug concentrations. With this trend, we can reasonably assume that our binded cefepime will have a longer half-life but at the expense of higher drug concentrations.
However, all 3 CMax values are extremely high. This seems to be a huge limitation on our PBPK model because of the lack of ABM data. While we tried to compensate for that with a lower fu value, it doesn’t account for the fact that if our ABM-Cefepime were bound to albumin, we’d get a lower CMax.
Hence, we decided to rerun the simulations and add a ‘Binding Partner’ to the model to explicitly show the interactions of our ABM. We chose to add albumin as a binding partner with three different dissociation constant (Kd) values: 0.001, 0.05 and 1 µM. We also fixed the fu at 0.2 at all of these Kd values.
0.001 simulates strong binding, 0.05 moderate, and 1 being weak binding. A lower Kd would represent our drug-ABM being more bound to albumin and hopefully lowering our CMax.
We tried modeling that in PK-Sim but it didn’t have any effect on our CMax. This tells us fu dominates the concentration values in our limited PBPK model. In fact, all the concentrations stayed the same, so Kd didn’t do anything.
We can conclude that a lower fu gives longer half-lives but at the cost of a higher CMax. However, that conclusion goes against what we think cefepime-ABM would do in our bloodstream. We then modeled Kd to replicate more of what we think the ABM would do with our drug, but it didn’t do anything. So while we still have an extended half-life, our CMax is way too high and we’d get neurotoxic effects, even if we’re above neurotoxic levels for not that long. The main conclusion we can make is that our PBPK model is extremely limited. Without PK data on our ABM, we can’t properly simulate its effects. Fraction unbound dominates our concentration values and skews things too much.
In the future, we would like to obtain PK data for our ABM and simulate from there. We could then see exactly when our coverage is, our CMax, and half-life. We could then further engineer the chemistry of our ABM to optimize values. We could create a one dosage treatment that covers for an exact time with proper concentration levels.
The objective of this model was to predict the longevity of the bound cefepime module (albumin-linked cefepime constructed using an Albumin Binding Moiety [ABM] ) to estimate impact on metabolic clearance by the liver. These results will suggest whether or not the linker (and subsequent albumin connection) successfully extends the life of the cefepime antibiotic while still exceeding the limits set by the minimum inhibitory concentration (MIC), or range of effectiveness, of the antibiotic.
This model will be run using predictive clearance information of the albumin-bound cefepime complex as derived from deconjugation data (derivation seen in “Finding Optimal Deconjugation Constant”) combined with known kinetic parameters of liver carboxylesterase 2 (CES2) from literature. Applying these parameters to the Michaelis-Menten model was particularly advantageous given the task as the equation effectively describes the relationship between substrate concentration and rate of reaction in a metabolic clearance-specific context. Ultimately, this equation allows for clear quantification of the impacts changes in concentration may have on metabolic clearance rates.
CES2 has a typical value of 0.375mM for its Km constant and a Vmax of 0.426 mM/hour [13] when interacting with carboxylic acid groups as part of the 4-MUBA substrate, which contributes to similar chemical characteristics as the proposed linking mechanism. As the relationship between the carboxylesterase and this specific linker complex have not been previously tested, the CES2 affinities are uncertain, so the median values of the 4-MUBA Km range were used to find the most realistic estimate of time before MIC.
The figure above depicts the fluctuation of antibiotic concentration over time according to Michaelis-Menten Kinetics. Here, the y-axis represents time (0 to 36 hours), the x-axis represents the parameter index derived via the Michaelis-Menten equation (Vmax, Km and their fluctuations) and the z-axis represents antibiotic concentration at each point.
The MIC surface is displayed in red, allowing viewers to denote the points of intersection, the correlating points in time, and varying concentration levels of antibiotic, across the MIC border. This red plane visualizes how close the antibiotic is to the MIC threshold at each point. As per the division between blue-toned concentrations and the MIC plane, CES2 will metabolize the complex between 15 and 21 hours depending on initial antibiotic concentration.
The albumin binding moiety we selected for our initial experiments was ABD035 [14], a modification of the albumin binding domain of Streptococcal Protein G (1GJS [15]). To bind the albumin binding moiety to the antibiotic we must use a Maleimide-PEG-OH linker. The carboxylic acid on the antibiotic binds to the hydroxyl group on the linker. The Maleimide must bind to a thiol group which are found on the cysteine amino acid. In order for the linker to bind to the albumin binding moiety, there must be a cysteine in the moiety.
Hence, we decided to use simple protein structure modeling with PyMOL [25] to select residues for cysteine substitution that would enable our experimental team to effectively bind the albumin binding moiety to the linker without disrupting the effectiveness of the antibiotic or the affinity of the albumin binding moiety for the albumin.
We folded the sequence of our albumin binding moiety along with the sequence of albumin using Alphafold2. We then loaded the structure into PyMOL and ChimeraX [23] and selected amino acids on the binder that were facing away from the binding site. We substituted a cysteine in each of those locations and refolded the sequence with Alphafold2 to see how the substitution altered the structure. We evaluated how much the cysteine substitution altered the binding by looking at the predicted aligned error (PAE) for alignment between Chain A and Chain B, which is a metric calculated by AlphaFold2.
Fig. 1: Potential residues to mutate (orange) and the residues we ultimately selected (yellow) on ABD (gray), in context of human serum albumin (white). Visualized in ChimeraX [23].
We identified a location on the moiety for the cysteine that minimally affected the structure and binding of the moiety to the albumin, selecting residues E22 and R29 as potential residues to mutate for wet lab.
Within the wide range of albumin binding moieties (ABMs) currently available, we found 1TF0 [16], [17] to be highly promising and useful for running modeling experiments, since the crystal structure includes both human serum albumin and an ABM (Chain B = Peptostreptococcal albumin-binding protein), but we noted three possible avenues to improve it: increasing its affinity to albumin, increasing its stability at a range of temperatures, and decreasing its size. To address these issues with the ABM and ensure that our antibiotic’s half-life would effectively be extended, we used computational tools to redesign 1TF0 and to create entirely novel albumin-binding moieties. Additionally, we were filled with curiosity as to how modern computational tools can design novel binders, for other potential future projects on designing binders for other binding pockets on albumin or for other proteins.
To fulfill these goals, we used the AI-based computational biology tools RFdiffusion, ProteinMPNN, and AlphaFold2 and the mathematical protein modeling and analysis software Rosetta. RFdiffusion is a method developed at the University of Washington that uses diffusion modeling to create novel protein backbones [18]. We approached design with RFDiffusion in 2 ways: (1) designing backbones from complete Gaussian noise in the context of other given molecules such as albumin, and (2) designing with partial diffusion by adding varying degrees of noise to the existing backbone. RFdiffusion has been used by other iGEM teams in previous years, including by team INSAENSLyon1s in 2023, who created a manual for RFDiffusion.
ProteinMPNN, also developed at the University of Washington, uses neural networks to create novel sequences for a given backbone, while taking into account neighboring proteins [19]. For this reason, we used ProteinMPNN to design multiple candidate sequences for each backbone created from partial and full diffusion in the context of albumin. ProteinMPNN has also been used in multiple previous iGEM projects including UOregon 2023 and ANU-Australia 2023 . These two projects also used or suggested using AlphaFold, the final program in our computational design pipeline. AlphaFold2 is an AI-based program created by Google Deepmind which folds a given protein sequence in the context of other proteins or molecules [20].
For assessing the sequences to select the sequences for wet lab experimentation, we explored the use of the Rosetta InterfaceAnalyzer, in addition to using the metrics generated by AlphaFold2. Rosetta is a computational algorithm-based software program for protein design developed at the University of Washington which has been used by many iGEM teams for modeling in the past [21]. It can be used for purposes such as protein design and ligand docking, but we opted to use it to calculate energy scores for our albumin-ABM interface.
After first establishing this pipeline, we then adjusted it to build alternative ABM designs. To create designs smaller than the original 53-residue 1TF0 protein, we used full noise diffusion and specified shorter sequence lengths. RFdiffusion and ProteinMPNN have been shown to create backbones and sequences which exhibit high thermostability, increasing the likelihood that our designs would be stable at a variety of temperatures compared to the original structure [18].
This component of our modeling efforts included more than just modeling, incorporating the process of design as well. However, since a large part of it still included modeling and measurements, we have included this work here. We hope that our pipeline will serve as a useful guide to using some of the cutting-edge AI-based tools of synthetic biology for future iGEM teams.
After reviewing the wide variety of ABMs currently available [22], we selected the protein 1TF0, which includes albumin bound to an ABM. This ABM was most appealing for our work due to its documented three-helix structure and binding complex which allowed it to be more easily modified with existing protein modeling tools for remodeling and conjugation to the PEG-OH linker.
The crystal structure was downloaded and loaded into PyMOL, with Chain A being the Human Serum Albumin and Chain B being the albumin binding moiety we are mutating. The Chain A residues closest to Chain B (resn A180-A380), consisting of the binding pocket and neighboring residues, were selected and exported as a new file along with Chain B, named 1tf0_trimmed.pdb. Truncating the protein decreased the size of our input (457 to 162 KB) and allowed for faster computation speeds.
Fig. 2: 1TF0 truncated, including Chain A (resn 180-380) human serum albumin (purple), and albumin binding moiety (pink). Visualized with ChimeraX [23].
RFDiffusion was used for backbone structure generation, installed locally on three team members' computers (1 Windows, 2 Mac) per the instructions in the RFDiffusion Github [24]. RFDiffusion is operated using a run_inference.py script, with variable configurations/settings. We split our experiments into “full diffusion” and “partial diffusion” runs, where the former required the selection of hotspots (the binding pocket) to design a completely novel backbone, and the latter required specifying the degree of noising/denoising of the original structure.
Five full diffusion runs were completed, generating structures of length 20, 30, 35, 40, and 53 by adjusting the 'contigmap.contigs=[A180-380/0 40-40]' flag input. Hotspot residues were selected by modeling the 1TF0 protein in PyMOL [25] and selecting three residues on Chain A that had polar interactions with the original albumin binding moiety: 212, 267, 318. An example full diffusion run, run_full.sh, is attached below.
#!/bin/zsh
/Users/username/Desktop/RFdiffusion/scripts/run_inference.py \
inference.output_prefix=output_pdb/full_diff/6-28/1tf0_40aa \
inference.input_pdb=input_pdb/1tf0_trimmed.pdb \
'contigmap.contigs=[A180-380/0 40-40]' \
'ppi.hotspot_res=[A318,A267,A212]' \
inference.num_designs=10 \
denoiser.noise_scale_ca=0 \
denoiser.noise_scale_frame=0
Multiple partial diffusion runs were completed. To test different degrees of noise, the diffuser.partial_T parameter was set to 20, 25, and 30, with higher numbers corresponding to more noise. An example partial diffusion run, run_partial.sh, is attached below.
#!/bin/zsh
/Users/username/Desktop/RFdiffusion/scripts/run_inference.py \
inference.output_prefix=output_pdb/partial_diff/T30 \
inference.input_pdb=input_pdb/1tf0_trimmed.pdb \
'contigmap.contigs=[A180-380/0 53-53]' \
inference.num_designs=10 \
diffuser.partial_T=30
The output of this script was a PDB file of the original ChainA structure with a new binding moiety backbone consisting entirely of glycines.
Note that RFDiffusion is originally configured to be used with a GPU, which some of our computers did not have. To circumvent this, our advisor, Dr. DiMaio, helped us put together another environment file, copied below, as well as make adjustments to the SE3transformer in the package (removing lines that depend on packages we did not install).
name: SE3nv
channels:
- defaults
- conda-forge
- pytorch
- dglteam
dependencies:
- python==3.9
- pytorch==2.2
- torchaudio
- torchvision
- torchdata
- dgl==2.0.0
- pip
- pip:
- hydra-core
- pyrsistent
- pandas
- pydantic
ProteinMPNN was also installed locally on the three team members’ computers per the instructions in the ProteinMPNN Github [26]. The bash script run_mpnn.sh was written and used to generate 10 sequences for each RFDiffusion backbone.
#!/bin/zsh
/Users/username/Desktop/ProteinMPNN/protein_mpnn_run.py \
--pdb_path /Users/username/Desktop/RFdiffusion/output_pdb/full_diff/6-28/1tf0_40aa_$1.pdb \
--pdb_path_chains "A" \
--out_folder result/6-28 \
--num_seq_per_target 10 \
--sampling_temp "0.1" \
--seed 13 \
--batch_size 1
AlphaFold2 was utilized by way of the accessible ColabFold notebook [27], [28]. Default settings were kept, with only query_sequence and jobname being altered for each sequence. The num_models setting under the “Run Prediction” cell was adjusted from the default setting of 5 to 1, to allow for faster processing. The original 1TF0 Chain A sequence was input first, with the ProteinMPNN generated sequences following, separated by a colon.
ColabFold automatically generated zip folders for all outputs, which were saved locally on a computer.
AlphaFold structures were filtered by predicted aligned error (PAE) for alignment between Chain A and Chain B. A jupyter notebook was developed to calculate the average PAE from the automatically generated JSON files contained in the zip folders. Average PAEs lower than 7.5 were considered acceptable and were further evaluated using Rosetta.
Rosetta was installed on one computer (Mac M2). Structures meeting the 7.5 threshold were relaxed using the relax.sh script [29], [30], [31], and then analyzed using the interface_analyzer.sh script [32], attached below (where the $1 argument is the path to the PDB file being analyzed).
#!/bin/zsh
/Applications/rosetta/main/source/bin/relax.static.macosclangrelease \
-s $1 \
-relax:constrain_relax_to_start_coords \
-relax:coord_constrain_sidechains \
-relax:ramp_constraints false \
-missing_density_to_jump \
-beta \
-default_repeats 2 \
-optimization:default_max_cycles 200
#!/bin/zsh
/Applications/rosetta/main/source/bin/InterfaceAnalyzer.static.macosclangrelease \
-s $1 \
-out:file:score_only scoreinterf.sc
From the interface_analyzer script output score file (score.sc), the dG (free energy) , dSASA (solvent accessible surface area), and dG/dSASA quotient metrics were used to evaluate the energy of the interface. These can be extracted using the command below:
cat scoreinterf.sc | awk '{print $4,$5,$6,$7,$8,$9,$10,$NF}'
Lower dG energy (more negative) and smaller dSASA values were considered more favorable. Plots of PAE scores and dG/dSASA quotients were used to select the best sequences (most negative dSASA, lowest PAE). For the following three groups, one sequence was selected from each for experimental testing: (1) sequences generated with partial diffusion, (2) sequences generated with full diffusion of length 53, and (3) sequences generated with full diffusion of length 40.
The partial diffusion sequences and the full length diffusion of length 40 sequences had no cysteines, but each of the full diffusion of length 53 had 1 cysteine. These cysteines were replaced with serines. Each of the six designs created by AlphaFold2 were loaded into PyMOL and amino acids on the binder that faced away from the albumin binding site were selected. On binders that had 3 helices, amino acids were only selected from the helix that was farthest from the binding site. On binders that had 2 helices, amino acids were selected from either of the 2 helices. The selected amino acids were replaced one at a time for each sequence and designed again in AlphaFold2. The outputs were then evaluated again based on AlphaFold-generated predicted alignment error (PAE) values and dG / SASA values generated by Rosetta. One sequence which had low PAE and dG / SASA values was selected for each of the three groups to be tested by our Wet Lab team. Additionally, a Met was appended to the start of the sequence, to double check these scores were not affected.
In total, 105 backbones were generated for different experimental conditions using RFDiffusion. PyMOL identified 9 polar contacts between the two chains of the 1TF0 PDB file. Combinations of 3 residues in the pocket (from a selection of residue numbers 212, 229, 267, 318, 321, 322, 325) were tested as hotspots for diffusion to select a combination that provided the largest number of reasonable structures. Ultimately, we selected residues K212, N267, and N318 (Fig. 3) to be the hotspots for further testing based on their coverage of the interface (distance from each other), proximity to the original albumin binding moiety (all distances < 3 Å), and a preliminary analysis based on the number of reasonable structures produced (triple or double helices in pocket).
Fig. 3: 1TF0 interface, showing chain A Albumin (purple), chain B binder (pink), and hotspot residues (blue). Visualized in ChimeraX
Partial diffusion structures of length 53 regardless of temperature retained the original three-helical structure of the albumin binding moiety. Full diffusion generated single-helix, double-helix, and triple-helix structures (examples shown in Fig. 4). All structures of length 30 and below generated only single-helix structures and were not analyzed further.
Fig. 4: Example RFDiffusion backbones.
134 of the sequences generated using ProteinMPNN double and triple helix backbones were then folded using AlphaFold. Obviously misaligned and clashing structures had far higher PAEs than normal-looking structures. Fig. 5 shows examples of the PAE matrix generated by AlphaFold, which is a square 2-dimensional matrix with dimensions equalling the sum of the lengths of all chains modeled, graphed using python. The upper right and lower left quadrants, which appear to have a higher PAE, represent the PAE between Chains A and B and were what we utilized to calculate the average PAE.
Fig. 5: Example PAE plots for a structure with low error (left) and high error (right). The structures are visualized using ChimeraX below the plots (pink = ABM, purple = albumin). The structure with higher PAE has an obvious clash between amino acids, pointed out by the yellow arrow.
The native sequence folded in AlphaFold had an average PAE of 6.12. Six sequences of length 40 and four sequences of length 53 generated using the full diffusion protocol had PAE < 7.5, and twelve sequences of length 53 generated using the partial diffusion protocol had PAE < 7.5. Of these low error partial diffusion designs, 4 were generated with T = 20, 6 with T = 25, and 2 with T = 30, suggesting we can have success finding consistently good designs with different degrees of noising. All structures meeting the 7.5 threshold had acceptable dG and dSASA (solvent accessible surface area) values, with dG_separated being < -20 for all structures (negative dG suggests favorable interaction) and dSASA < 1850 Å2. The dG_separated value, represents the change in Rosetta energy when the two chains are separate as opposed to in complex with each other, while dSASA_int is the solvent accessible surface area buried at the interface [32]. The latter value is used because dG alone can be misleading (large interfaces will have more energy, even if the number of interactions is not great for that area), so by dividing by this area, we get a better estimate of how good the binding is per area. Generally, previous successful Rosetta designs (likely calculated with an older version of interface analyzer) have had dSASA in the range 1000 to 1600 Å2, however the dG/dSASA quotient should be more informative [33].
The PAE and dG/dSASA measurements were plotted (Fig. 6) for the three categories we explored with AlphaFold: (1) sequences generated with partial diffusion, (2) sequences generated with full diffusion of length 53, and (3) sequences generated with full diffusion of length 40. For each category, one sequence was selected for experimental validation.
Fig. 6: Plots of dG_cross/dSASA and PAE for three diffusion experiments. The optimal sequences with low dG_cross/dSASA and PAE are in the lower left on the plot. Purple points represent sequences that were selected for wet lab testing. A. Sequences of length 53 created with partial diffusion (N = 11). B. Sequences of length 53 created with full diffusion (N = 4). C. Sequences of length 40 created with full diffusion (N = 6).
To enable the conjugation of the antibody to the ABM, a cysteine was mutated into the protein. By modeling our folded sequences in PyMOL, possible locations for a mutation could be selected. The main selection criteria was that the amino acid was oriented away from the interface. One of the sequences already included a cysteine within the protein, which we mutated to serine, which is the most similar amino acid, to not interfere with our linker chemical reaction.
Fig. 7: Plots of dG_cross/dSASA and PAE for cysteine mutations in three selected sequences. Purple points represent the original non-mutated sequence, green points the cysteine mutated sequence we chose. A. Sequences of length 53 created with partial diffusion (5 mutations). B. Sequences of length 53 created with full diffusion (3 mutations). C. Sequences of length 40 created with full diffusion (3 mutations)
Since our wet lab team utilizes plasmid expression to create these proteins, we modeled the addition of a methionine at the N terminal of the binding moiety. All N terminals pointed away from the interface, so the addition of a methionine did not clash with Chain A or significantly change the structure. The final three sequences, folded with AlphaFold, are shown in Fig. 8. The two structures of length 53 were triple helices, while the shorter sequence (40 residues) created a double helix structure. The PAE changed minimally as a result of this addition. The AlphaFold and Rosetta scores for the three designed sequences, along with the other sequences our wet lab was testing (ABD) are shown in Table 1 (note that these were longer, 65 amino acids including the methionine). The final three sequences, including the cysteine mutations and N terminal methionine, are copied below.
Table 1: The AlphaFold2 PAE score and Rosetta metrics for our three designed sequences (including methionine at the N terminus) and ABD.
Fig. 8: Final designed albumin binding moieties (pink) bound to albumin (purple). N terminal methionine is shown in red, cysteine is shown in yellow.
> 1512K44C|DIFFUSION NOVEL STRUCTURE LENGTH 53
MSRKKERAEELYNTALLSARRGNKKAAERAAEIILEDTGDEEAACKAREALKAI
> 2630E2C|PARTIAL DIFFUSION STRUCTURE LENGTH 53
MSCEEKKIEEHKKKILAELDALGINNKLIKAEIRKSKIPEDMETLFEEIKAERA
> 1712E8C|DIFFUSION NOVEL STRUCTURE LENGTH 40
MDLLKKADCKAKEANELQRKGGKLSDIMKLVKEAEELREAA
We successfully utilized an accessible machine learning based pipeline to design new sequences for albumin binding moieties. We were able to create a two helix design that was shorter than both the native peptostreptococcal albumin-binding protein and ABD sequence our wet lab used for their experiments, with a 40 residue length.
We observed how attempting shorter designs resulted in long single helix binders that RFDiffusion seems to find favorable, but which lack any sort of stabilizing tertiary structure that we would like to see in our designs. This could be a limitation of using computational tools, since theoretical computational success does not necessarily equate to real-world success. Hence, the importance of an experimental validation should be emphasized; the values we gain from the wet lab could also in the future be used to refine our models and determine the weights of different scoring metrics. To read more about how we made plans to integrate these modeling measurements into our engineering design cycle on our Engineering page, and how we planned wet lab experiments on our Experiments page.
Interestingly, our three designed sequences all had better PAE values than the two ABD sequences, yet with the exception of the sequence of length 40, the ABD had more favorable dG/dSASA scores. We hope that further experimentation will help us determine which of these metrics is a better predictor of binding affinity, if either.
[1] “Cefotaxime Dosage & Rx Info | Uses, Side Effects,” MPR. Accessed: Sep. 27, 2024. [Online]. Available: https://www.empr.com/drug/cefotaxime/
[2] J. Kim, W. L. Hayton, J. M. Robinson, and C. L. Anderson, “Kinetics of FcRn-mediated recycling of IgG and albumin in human: Pathophysiology and therapeutic implications using a simplified mechanism-based model,” Clin. Immunol., vol. 122, no. 2, pp. 146–155, Feb. 2007, doi: 10.1016/j.clim.2006.09.001.
[3] K. P. Fu, P. Aswapokee, I. Ho, C. Matthijssen, and H. C. Neu, “Pharmacokinetics of cefotaxime.,” Antimicrob. Agents Chemother., vol. 16, no. 5, pp. 592–597, Nov. 1979.
[4] 성인에서건강한, “Population pharmacokinetic analysis of cefdinir following a single oral dose in healthy adults”.
[5] I. H. Patel et al., “Pharmacokinetics of ceftriaxone in humans,” Antimicrob. Agents Chemother., vol. 20, no. 5, pp. 634–641, Nov. 1981, doi: 10.1128/aac.20.5.634.
[6] A. Leroy, F. Leguy, F. Borsa, G. R. Spencer, J. P. Fillastre, and G. Humbert, “Pharmacokinetics of ceftazidime in normal and uremic subjects,” Antimicrob. Agents Chemother., vol. 25, no. 5, pp. 638–642, May 1984, doi: 10.1128/AAC.25.5.638.
[7] Z.-R. Shi et al., “Population Pharmacokinetics and Dosing Optimization of Ceftazidime in Infants,” Antimicrob. Agents Chemother., vol. 62, no. 4, pp. e02486-17, Mar. 2018, doi: 10.1128/AAC.02486-17.
[8] R. S. Obach, F. Lombardo, and N. J. Waters, “Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 670 Drug Compounds,” Drug Metab. Dispos., vol. 36, no. 7, pp. 1385–1405, Jul. 2008, doi: 10.1124/dmd.108.020479.
[9] A. M. AboulMagd, N. S. Abdelwahab, M. M. Abdelrahman, H. M. Abdel-Rahman, and N. F. Farid, “Lipophilicity study of different cephalosporins: Computational prediction of minimum inhibitory concentration using salting-out chromatography,” J. Pharm. Biomed. Anal., vol. 206, p. 114358, Nov. 2021, doi: 10.1016/j.jpba.2021.114358.
[10] “Product Information Cefepime (hydrochloride hydrate).” Cayman Chemical, 2022. [Online]. Available: https://cdn.caymanchem.com/cdn/insert/23633.pdf
[11] CLSI, Performance Standards for Antimicrobial Susceptibility Testing, 30th ed. in CLSI Supplement M100. Wayne, PA: Clinical and Laboratory Standards Institute, 2020. [Online]. Available: https://www.nih.org.pk/wp-content/uploads/2021/02/CLSI-2020.pdf
[12] L. Boschung-Pasquier et al., “Cefepime neurotoxicity: thresholds and risk factors. A retrospective cohort study,” Clin. Microbiol. Infect., vol. 26, no. 3, pp. 333–339, Mar. 2020, doi: 10.1016/j.cmi.2019.06.028.
[13] P. G. Ferreira, “Insights into human carboxylesterase 2 stability and activity in vivo and in vitro,” M.S. Dissertation, Faculdade de Ciências e Tecnologia, Biotechnology, 2012.
[14] A. Jonsson, J. Dogan, N. Herne, L. Abrahmsén, and P.-Å. Nygren, “Engineering of a femtomolar affinity binding protein to human serum albumin,” Protein Eng. Des. Sel., vol. 21, no. 8, pp. 515–527, Aug. 2008, doi: 10.1093/protein/gzn028.
[15] RCSB Protein Data Bank, “RCSB PDB - 1GJS: Solution structure of the Albumin binding domain of Streptococcal Protein G.” Accessed: Sep. 27, 2024. [Online]. Available: https://www.rcsb.org/structure/1gjs
[16] RCSB Protein Data Bank, “RCSB PDB - 1TF0: Crystal structure of the GA module complexed with human serum albumin.” Accessed: Sep. 27, 2024. [Online]. Available: https://www.rcsb.org/structure/1tf0
[17] S. Lejon, I.-M. Frick, L. Björck, M. Wikström, and S. Svensson, “Crystal Structure and Biological Implications of a Bacterial Albumin Binding Module in Complex with Human Serum Albumin*,” J. Biol. Chem., vol. 279, no. 41, pp. 42924–42928, Oct. 2004, doi: 10.1074/jbc.M406957200.
[18] J. L. Watson et al., “De novo design of protein structure and function with RFdiffusion,” Nature, vol. 620, no. 7976, pp. 1089–1100, Aug. 2023, doi: 10.1038/s41586-023-06415-8.
[19] J. Dauparas et al., “Robust deep learning based protein sequence design using ProteinMPNN,” Science, vol. 378, no. 6615, pp. 49–56, Oct. 2022, doi: 10.1126/science.add2187.
[20] Z. Yang, X. Zeng, Y. Zhao, and R. Chen, “AlphaFold2 and its applications in the fields of biology and medicine,” Signal Transduct. Target. Ther., vol. 8, no. 1, pp. 1–14, Mar. 2023, doi: 10.1038/s41392-023-01381-z.
[21] “Software,” Rosetta Commons. Accessed: Sep. 27, 2024. [Online]. Available: https://rosettacommons.org/software/
[22] A. Ullah, G. Shin, and S. I. Lim, “Human serum albumin binders: A piggyback ride for long-acting therapeutics,” Drug Discov. Today, vol. 28, no. 10, p. 103738, Oct. 2023, doi: 10.1016/j.drudis.2023.103738.
[23] E. C. Meng et al., “UCSF ChimeraX: Tools for structure building and analysis,” Protein Sci., vol. 32, no. 11, p. e4792, 2023, doi: 10.1002/pro.4792.
[24] RosettaCommons/RFdiffusion. (Jul. 22, 2024). Python. RosettaCommons. Accessed: Jul. 22, 2024. [Online]. Available: https://github.com/RosettaCommons/RFdiffusion
[25] “PyMOL | pymol.org.” Accessed: Sep. 27, 2024. [Online]. Available: https://www.pymol.org/
[26] J. Dauparas, dauparas/ProteinMPNN. (Sep. 27, 2024). Jupyter Notebook. Accessed: Sep. 27, 2024. [Online]. Available: https://github.com/dauparas/ProteinMPNN
[27] “Google Colab.” Accessed: Sep. 27, 2024. [Online]. Available: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb
[28] “ColabFold: making protein folding accessible to all | Nature Methods.” Accessed: Sep. 27, 2024. [Online]. Available: https://www.nature.com/articles/s41592-022-01488-1
[29] M. D. Tyka et al., “Alternate states of proteins revealed by detailed energy landscape mapping,” J. Mol. Biol., vol. 405, no. 2, pp. 607–618, Jan. 2011, doi: 10.1016/j.jmb.2010.11.008.
[30] F. Khatib et al., “Algorithm discovery by protein folding game players,” Proc. Natl. Acad. Sci. U. S. A., vol. 108, no. 47, pp. 18949–18953, Nov. 2011, doi: 10.1073/pnas.1115898108.
[31] J. B. Maguire et al., “Perturbing the energy landscape for improved packing during computational protein design,” Proteins, vol. 89, no. 4, pp. 436–449, Apr. 2021, doi: 10.1002/prot.26030.
[32] “InterfaceAnalyzer.” Accessed: Sep. 27, 2024. [Online]. Available: https://docs.rosettacommons.org/docs/latest/application_documentation/analysis/interface-analyzer
[33] P. B. Stranges and B. Kuhlman, “A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds,” Protein Sci. Publ. Protein Soc., vol. 22, no. 1, pp. 74–82, Jan. 2013, doi: 10.1002/pro.2187.