Overview
In recent years, a variety of new viruses have emerged, causing great harm to human health. In order to develop broad-spectrum antiviral drugs with lower side effects, we designed a nanodisc that can mimic human virus host cells to bind to viruses instead of human cells.
From the initial design of the membrane scaffold protein (MSP), to the conduct of experiments and related data analysis, to further assisting the experiments to verify the function of the nanodiscs as well as to make predictions for further experiments in the future, we solved a series of problems in the course of the project by centering on experiments using modeling methods such as differential equation modeling, design of experiments, and computer simulation.
On the one hand, our modeling provided strong theoretical support for the experiments; on the other hand, the data obtained from the experiments also improved our model.
In the protein structure section, we used AlphaFold2 to predict the 3D structure of MSP based on the amino acid sequence of multi-polymerized MSP protein provided by the experimental group. The prediction results show that the structures of the conserved regions, tags and mCherry parts of MSP are in line with our expectations, which guarantees the further experiments.
In the experimental design part, we use the nature of orthogonal table to simplify the number of experiments and obtain high-quality data, and through further analysis of the experimental data, we obtain the correlation factors affecting the formation of nanodiscs.
In order to optimize the function of nanodiscs, we carried out directed evolution of LDL-R membrane proteins and simulated molecular docking of three mutants, which provided ideas for the design of membrane proteins in further experiments.
In the verification of nanodisc function, we used partial differential equation modeling and metacellular automata to simulate the protective effect of nanodiscs on human cells under different affinity settings of viruses and nanodiscs, which to a certain extent verified the relevant functions of nanodiscs.
Finally, since the nanodiscs will eventually be used as antiviral drugs to enter the human body, we simulated their distribution and metabolism in the human body using pharmacokinetic modeling to facilitate the subsequent experiments.
Part 1: Protein Structure Prediction
Since the MSP we use is our self-constructed multicluster MSP, we hope to determine whether our multi-polymerized MSP can function properly before conducting wet lab experiments to reduce costs. In addition, after performing determinate evolution, we also need to predict the protein structures to analyze the effects of the evolution. Therefore, we use AlphaFold2 to help us do protein predictions.
1.1 Method
AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. [1] Although the predicted results can only serve as references and do not represent the real structure of the proteins, this approach allows us to perform structural predictions for any protein of interest at a low cost.
1.2 Structure prediction of the MSPs
Our multi-polymerized MSP consists of three basic components: a tag, a conserved region, and a catcher. To observe the experimental results better, we also incorporated the fluorescent protein mCherry, which emits fluorescence when the multi-polymerized MSP is correctly connected. We examined the structures of each component in the predicted results to ensure that no destructive effects have occurred between them.
It can be seen that the conserved regions of the MSPs exhibit a regular and stable α-helical structure, which allow them to stably bind lipid molecules. Both the tag and mCherry are composed of a β-folded barrel structure. These structures align with our expectations.
In order to flexibly extend lengths of nanodiscs as needed, we linked multiple linkers end-to-end to make larger nanodiscs. However, there also has the possibility of connections between different Tags and Catchers, so we need to confirm through structure predictions that such interactions do not occur. The predicted results show that the use of different Tags and Catchers can prevent the self-monomerization cyclization to some extent, but there is still a certain chance for it to occur. The experimental group further provides another scheme to connect two Tags or two Catchers on a single MSP at the same time, thus constructing a more controllable self-assembly process of nanodiscs. Predictions were made again in this case, and the results showed that the probability of the self-aggregation case was much lower and more favorable for our experiments to be carried out.
Between different components, we also incorporated a flexible GS linker to reduce interactions between parts. The predicted results show that the structural confidence of the MSPs without GS linker is lower than which with GS linker, indicating that it provides a certain buffering effect.
1.3 Structure prediction of the LDL-R after directed evolution
After performing directed evolution on LDL-R, we need to obtain relatively accurate structural predictions to simulate its molecular docking with VSV G. We chose three different mutants for structure prediction and observed whether they had significant structural changes as a basis for subsequent molecular docking. It can be seen that there are no obvious changes in the protein structure after directed evolution, suggesting that its binding function is likely to be retained.
Part 2: Design of Experiments
2.1 Introduction
Design of experiments (DOE) is a statistical method used to plan and analyze experiments. Based on a priori process knowledge, values (called levels) are assigned to continuous and categorical factors suspected of influencing the response of interest, thereby defining the extent of the design space of interest .
DOE has several key advantages over one-off, one-factor (OFAT) experiments. DOE typically requires fewer experimental treatments than OFAT methods, resulting in lower costs and shorter development time. DOE provides a systematic experimental process with three phases of scoping, screening, and optimization, which helps to progressively refine the experimental design and ultimately optimize the system. However, in practice, it may not be necessary to conduct all three types of experiments. For example, scoping experiments may not be necessary if there is sufficient a priori knowledge of the system or process under study.
Due to time constraints, we were unable to complete a large number of experiments. In order to reduce the number of experiments while ensuring the quality of the data obtained from the experiments, as well as to better detect experimentally relevant influencing factors, we used experimental design to plan and refine the experiments.
2.2 Screening Experiments
2.2.1 Design matrix under nested design
At the beginning of the experiment, in conjunction with the experiment and literature references, we identified five factors that we were interested in and that might affect the formation of nanodiscs. They are MSP protein species, lipid species, lipid to MSP protein molar ratio, fabrication time, and temperature. Further, we measured different nanodisc diameter sizes by dynamic light scattering (DLS), and judged the formation of the final nanodiscs based on the ratio of the experimentally obtained diameter size to the theoretical optimal diameter size. To facilitate the modeling, we set as the response value, and verified whether each factor has a significant effect on the nanodisc formation through screening experiments.
It should be noted that, among these five factors, the MSP protein species and the lipid-to-MSP protein molar ratio are correlated, and different species of MSP proteins have different lengths, and longer MSP proteins and smaller lipid-to-MSP protein molar ratios are doomed to lead to experimental failure. In order to avoid such extreme combinations, we use a sliding level design among nested designs, setting the level of the sliding factor msp protein to lipid molar ratio according to the different types of msp proteins [3].
Meanwhile, in order to minimize the number of experiments, we will use an orthogonal experimental design, which reduces the 25 experiments originally needed to traverse all levels to 16 experiments without affecting the data analysis.
The following table gives the design array we used, where +1 and -1 indicate the corresponding high and low level values or different kinds of experimental factors, respectively.
2.2.2 Analysis of Variance (ANOVA)
In order to explore the factors that have a significant effect on the formation of nanodiscs, for the above design, we first build the following model:
where
Since the columns of the orthogonal table have the property of being orthogonal to each other and each factor occurs the same number of times at each level in the design, it is possible to perform a sum-of-squares decomposition of the above model and verify that the cross terms of the product terms in it are all zero:
Further, to analyze the effect of different factors on nanodisc formation, the above parameters can be estimated and F-tested. For example, to test the main effect of A, the value of the following F-test statistic can be calculated:
And so on to obtain the following results for each factorial ANOVA:
Effect | Degrees of freedom | F | P value |
---|---|---|---|
A | 1 | 13.789 | 0.005 |
B | 1 | 0.322 | 0.58 |
C(A) | 2 | 0.499 | 0.623 |
D | 1 | 0.000 | 0.98 |
E | 1 | 1.450 | 0.259 |
From the above results, it can be seen that different MSP protein species have a significant effect on the formation of nanodiscs. The difference in the length of the MSP proteins due to their different species may indeed have an effect on the formation of nanodiscs, which is in line with our intuitive expectations.
While different lipid types, lipid to protein molar ratio, temperature and time at different levels of settings did not have a significant effect on the nanodiscs, this may be due to the fact that the difference between the high and low levels of our settings was not high enough to reflect the effect that the different levels had on the final results.
For the temperature factor, the nanodisc samples all needed to be uniformly incubated on ice for half an hour to one hour before transferring them to the set temperature for a certain amount of time. For samples that only need to be incubated for two hours, this portion of the ice incubation time is significant and is likely to have an effect on the results. Also after the incubation was complete, the determination of DLS was done while we waited on ice, making the waiting time for the sample temperature to be zero to above zero longer, which may have had some effect on the modeling results.
2.2.3 Regression analysis
Through the results of ANOVA, we know that the MSP protein species will cause a significant effect on the formation of nanodiscs. In order to be able to further determine the magnitude of the influence caused by each factor on the formation of nanodiscs, the following regression model was established:
where
Regression analysis of the above models yields the following results:
Estimate | Std. Error | T value | P value | |
---|---|---|---|---|
Intercept | 11.33 | 1.82 | 6.239 | 0.000 |
1.08 | 1.82 | 0.596 | 0.566 | |
-7.09 | 1.82 | -3.903 | 0.004 | |
3.30 | 2.57 | 1.285 | 0.231 | |
1.60 | 2.57 | 0.627 | 0.547 | |
0.04 | 1.82 | 0.022 | 0.983 | |
-2.30 | 1.82 | -1.266 | 0.237 |
From the above figure, it can be seen that the model fits well and the residuals basically satisfy the assumptions of normal distribution and homoscedasticity. The regression results demonstrate that lipid type has a significant effect on the formation of nanodiscs at a significance level of 0.05, and the effect of DOPC lipids is greater and more favorable to the formation of nanodiscs in the -1 setting compared to the lipid setting of 70% PC + 30% PS in the +1 setting, which is in line with the observations and expectations of our experimenters.
The above regression model
Therefore, in future experiments, we will further investigate the interaction effects among factors.
2.3 Response Surface Methodology
Response Surface Methodology (RSM) is a statistical methodology for modeling and analyzing complex relationships between multiple independent variables and the responses they produce on the dependent variable Y. RSM is a methodology that is used in the design of response surfaces. This method is mainly used to find the effects of one or more independent variables (inputs) and the interactions between them on one or more response variables (outputs), and it can also help to determine the best settings of these variables to optimize (maximize or minimize) a particular objective function [4]. Since the MSP constructed in the experiment was designed by our team ourselves, and there are no references to provide us with guidance on the parameters to be used when making nanodiscs, we urgently need to explore the best parameter settings to be used when making nanodiscs for our self-designed MSP and optimize it, and the response surface design provides a good way to do so.
In the following experiments, we will further use response surface design to conduct experiments and analysis for the factors we are interested in based on the existing analysis results. The whole process of experimental design can scientifically and rigorously provide a basis for the setting of experimental parameters, not only for our project, which may provide new ideas and paradigms for other teams and experiments in synthetic biology.
Part 3: Directed Evolution of Membrane Proteins
In order to make receptor proteins on nanodiscs bind competitively to viruses and thus protect human cells. We would like to obtain receptor proteins that have a higher affinity for viral proteins or are more stable.
3.1 Directed Evolution
In order to improve the characteristics of the proteins used in our project from the original basis, we turned to the concept of directed evolution: protein-directed evolution is able to optimize the functioning of biomolecules by mimicking the process of natural selection in order to achieve a precise regulation and optimization of protein properties.
We use an online tool called EVcouplings to direct proteins to exhibit better properties by having our proteins produce mutant variants at different locations based on evolutionary sequence covariation. The effects of these mutations are then measured independently, as well as the effect on the overall protein structure (called apparent fitness). Both parameters correlate with the overall evolutionary fitness of the mutant, thus contributing to its improved function.
We performed targeted evolution on the low-density lipoprotein receptor (LDL-R) used in the project, and the following is the mutational landscape generated by EVcouplings:
We selected three mutants with low positional conservatism (below 0.3) that are more stable after mutation and predicted their structures to facilitate further work.
3.2 Molecular Docking
In order to verify the binding affinity of the receptor proteins obtained from our directed evolution to the viral proteins, we selected three mutants for molecular docking simulations with the viral proteins using the HDOCK server developed by Huang Lab. The following are the results obtained from each molecular docking:
Among the three mutants, the docking scores were not much different from the original proteins, where more negative docking scores imply more probable binding models, but the scores should not be regarded as the real binding affinity between the two molecules, and the exact magnitude of the affinity needs to be further verified by experiments. However, among them, the affinity score of mutant2 was similar to the affinity score of the original protein, but the RMSD was smaller than that of the original protein, suggesting that mutant2 might be more stable than the original LDL-R and could be considered as a better functioning membrane protein.
Directed evolution and molecular docking provide new ideas for further experiments, giving us the opportunity to find more effective receptor proteins to further improve the efficacy of nanodiscs, but computer-guided directed evolution does not yield 100% more functional receptor proteins, which still requires a lot of experimental support.
Part 4: Function Verification of Nanodiscs
4.1 Introduction
In the study of biological systems, understanding the interactions
and diffusion processes of viruses, nanoparticles, and living cells
is crucial. This model aims to describe these interactions and
diffusion processes in both space and time. By extending the
variables
4.2 Mathematical Model
We can use differential equations to describe this system. Let:
-
V(t) represent the number of viruses at timet . -
C(t) represent the number of healthy human cells at timet . -
N(t) represent the number of nanoparticles at timet .
4.2.1 Case 1: Viruses Prefer to Infect Nanoparticles
-
The rate at which viruses bind to human cells is
k_1 V(t) C(t) . -
The rate at which viruses bind to nanoparticles is
k_2 V(t) N(t) .
Since viruses prefer to infect nanoparticles, we have:
-
\frac{dV(t)}{dt} = -k_2 V(t) N(t) - k_1 V(t) C(t) -
\frac{dC(t)}{dt} = -k_1 V(t) C(t) -
\frac{dN(t)}{dt} = -k_2 V(t) N(t)
4.2.2 Case 2: Viruses Randomly Choose to Infect Human Cells or Nanoparticles
Assume the probability of viruses choosing to infect human cells and
nanoparticles is equal, i.e., each is
-
The rate at which viruses bind to human cells is
\frac{k_1}{2} V(t) C(t) . -
The rate at which viruses bind to nanoparticles is
\frac{k_2}{2} V(t) N(t) .
Therefore, we have:
-
\frac{dV(t)}{dt} = -\frac{k_2}{2} V(t) N(t) - \frac{k_1}{2} V(t) C(t) -
\frac{dC(t)}{dt} = -\frac{k_1}{2} V(t) C(t) -
\frac{dN(t)}{dt} = -\frac{k_2}{2} V(t) N(t)
4.2.3 Summary
The differential equations for the first case are:
The differential equations for the second case are:
These equations describe the dynamic changes of viruses, human cells, and nanoparticles in different scenarios.
4.3 Visualization
To further study the spatial propagation characteristics of viruses, human cells, and nanoparticles, we use a Cellular Automaton (CA) model to make visualization. A cellular automaton is a discrete model where space is divided into a grid, and each grid cell (cell) can be in different states and update its state according to certain rules.
4.3.1 Basic Settings of the Cellular Automaton Model
Grid: We use a two-dimensional grid, and each cell can be in one of the following states:
- Empty cell
- Healthy human cell
- Infected human cell
- Nanoparticle
- Virus
Initial state: Set the initial distribution of the states of the cells in the grid at the initial moment. For example, randomly distribute a certain number of healthy human cells, nanoparticles, and viruses.
State update rules:
1. Virus spread:
- If a virus cell is adjacent to a healthy human cell, the healthy human cellis likely to beg infected.However, the virus is not lysed and can still infect other cells.
- If a virus cell is adjacent to a nanoparticle cell, the nanoparticle lyses the virus cell, and the virus disappears.
2. Priority selection rules:
- Case 1: Viruses prefer to infect nanoparticles.
- Case 2: Viruses randomly choose to infect human cells or nanoparticles.
4.3.2 Specific Implementation of State Update Rules
In the visualization of the cellular automaton model, different colors represent various states within the grid. Specifically, normal cells are shown in blue, viruses are shown in purple, infected cells are shown in red, and nanodiscs are shown in green.
Case 1: Viruses Prefer to Infect Nanoparticles
For each virus cell, check its neighboring cells:
- If there is a nanoparticle cell, the virus-cell becomes an empty cell, and the nanoparticle cell becomes an empty cell.
- If there are no nanoparticle cells but healthy human cells, the healthy human cell becomes an infected human cell.
Case 2: Viruses Randomly Choose to Infect Human Cells or Nanoparticles
For each virus cell, check its neighboring cells:
- If there are both nanoparticle cells and healthy human cells, randomly choose one to infect.
- If there are only nanoparticle cells, the virus cell becomes an empty cell, and the nanoparticle cell becomes an empty cell.
- If there are only healthy human cells, the healthy human cell becomes an infected human cell.
Based on the visualization of the cellular automata simulation, we can see that when viral infection occurs, if the affinity between the nanodiscs and the virus is higher, it leads to a rapid decrease in the number of viruses, which suggests that the nanodiscs are effective in reducing the spread of viruses. Suppose the affinity between the virus and the nanodisc is the same as that between the virus and the human cell. In that case, the virus randomly infects the nanodisc or the cell, the virus will spread rapidly in the initial stage, but the spread of the virus will gradually slow down with time. Then finally the spread of the virus will accelerate again when the infected cells and nanodiscs reach a certain level.
Part 5: Diffusion of Nanodiscs in The Human Body
Our project aims to make a nanodisc-based drug for treating infectious diseases. We believe that the metabolism process of nanodisc drugs in the human body is similar to that of ordinary drugs, so based on the pharmacokinetic targeting of the intravenous injection scenario, considering the absence of the absorption process as well as the process of the first level of the rate of clearance, we have developed an open two-compartment model of pharmacokinetics based on the ordinary differential equations(ODEs).
Symbol | Description | ||
---|---|---|---|
Central compartment dosage after the start of IV drip | |||
Peripheral compartment dosage after the start of IV drip | |||
Volume of central compartment | |||
Volume of peripheral compartment | |||
Central compartment blood concentration | |||
Peripheral compartment blood concentration | |||
Intravenous dosage | |||
First-order rate constants for drug transport from central compartment to peripheral compartment | |||
First-order rate constants for drug transport from peripheral compartment to central compartment | |||
Intercompartmental clearance | |||
Elimination clearance |
In response to the set of pharmacokinetic equations for the dosage,
In response to the set of pharmacokinetic equations for the blood concentrations,
where
To avoid subsequent analytical errors caused by incorrect model assumptions, we attempted to introduce a stochastic perturbation term based on the original ordinary differential equation model, which divided the model error into measurement error and systematic stochastic error. We utilized stochastic differential equations(SDEs) to establish a pharmacokinetic model.
To notate the above system of ordinary differential equations in general form:
Introduce stochastic disturbances to model stochastic differential equations based on pharmacokinetics [6]:
where
To ensure the consistency of the final result, we assume that
The systematic error
Parameter | Covariate effect | Estimate(90% CI) | |||
---|---|---|---|---|---|
|
|
10.1(9.4-11.1) | |||
|
|
15.6(11.1-25.2) | |||
|
7.7(5.5-9.6) | ||||
|
10.8(9.6-12.3) |
Data sources: https://doi.org/10.1016/j.bja.2021.10.054.
By solving the ordinary differential equation, we obtain a plot of drug concentration over time, which can be used to predict the changes in drug concentration in the body, thus improving the mode of administration, the time interval of administration, and maintaining the drug concentration within the therapeutic window to optimize the therapeutic effect.
References
[1] Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
[2] Gilman J, Walls L, Bandiera L, Menolascina F. Statistical Design of Experiments for Synthetic Biology. ACS Synth Biol. 2021 Jan 15;10(1):1-18. doi: 10.1021/acssynbio.0c00385. Epub 2021 Jan 7. PMID: 33406821.
[3] ZHANG Runchu, ZHENG Haitao, LAN Yan, AI Mingyou, LIN Yi, YANG Guijun. Experimental design and analysis and parameter optimization[M]. China Statistics Press,2003
[4] MYERS R H, MONTGOMERY D C, ANDERSON-COOK C M. Response surface methodology: process and product optimization using designed experiments[M]. 4th ed. Hoboken, N.J: Wiley, 2016.
[5] Chu, J. (2012). Development and application of pharmacokinetic models using stochastic differential equations. Huazhong University of Science and Technology, Hubei, China.
[6] Kristensen NR, Madsen H, Ingwersen SH. Using stochastic differential equations for pharmacokinetic/pharmacodynamic model development. Journal of Pharmacokinet Pharmacodyn, 2005, 32(1):109-113
[7] Umberto Picchini, Susanne Ditlevsen, Andrea De Gaetano. Maximum likelihood estimation of a time-inhomogeneous stochastic differential model of glucose dynamics. Mathematical Medicine and Biology,2008,25:141-155
[8] Gong G. L. (2008). An overview of stochastic differential equations and their applications (pp. 50-169). Tsinghua University Press, Beijing, China.
[9] Lawrence C.Evans. An Introduction to Stochastic Differential Equations. Department of Mathematics UC Berkeley:91
[10] Grassin-Delyle S, Semeraro M, Lamy E, Urien S, Runge I, Foissac F, Bouazza N, Treluyer JM, Arribas M, Roberts I, Shakur-Still H. Pharmacokinetics of tranexamic acid after intravenous, intramuscular, and oral routes: a prospective, randomised, crossover trial in healthy volunteers. Br J Anaesth. 2022 Mar;128(3):465-472. doi: 10.1016/j.bja.2021.10.054. Epub 2022 Jan 5. PMID: 34998508.