Our project, which designs MOVE as a fundamental improvement to RNA pesticides, has raised several questions that are worth investigating through in-silico methods.
Due to the vast scope of our project, we needed to allocate lab time and material budgets to only the most critical experiments. Modeling has been essential in determining
which experiments to prioritize. By running computer simulations, we can accelerate the engineering process without incurring additional financial costs. Furthermore,
we aim to address concerns raised by Wet lab experiments and propose potential solutions. Our technology has a wide range of applications. Additionally, as long as surface displaied proteins are properly designed, any desired protein can be displayed. To ensure the widespread
adoption of MOVE technology in society, it is essential to evaluate and promote its modularity, effectiveness, and productivity. In the Dry Lab, we focus on how modeling can significantly
contribute to visualizing and improving these aspects. We are also paying attention to modularizing the model itself, making it easily accessible for future MOVE users.
Figure 1. Role of the model in our project
Models Overview
RNA-mMV Model
In this model, “the production process of MOVE in E. coli”, “the retention process of MOVE in farm field environments”,
and “the action of shRNA within the target organism” are primarily described using ordinary differential equations (ODEs).
This model demonstrates the feasibility of MOVE production prior to Wet Lab experiments and proposes improvements to the
medium conditions for more reliable production. Additionally, it evaluates the effectiveness of MOVE as a pesticide,
visualizing its modularity by demonstrating its efficacy across various species.
Figure 2. Overview of RNA-mMV Model
Surface Presentation Model
In this model, we performed structural prediction and molecular docking of surface-displayed proteins to confirm that
the surface-displayed proteins can function on the E. coli surface without being affected by the SpyCatcher/SpyTag
conjugation. This simulation process is also available to MOVE users who wish to display other enzymes on MOVE.
Figure 3. Overview of Surface Presentation Model
Continuous Cascade Bioreactor Model
In this model, we performed Metabolic Control Analysis (MCA) and a parameter scan to optimize the system when
cultivating MOVE in a continuous cascade bioreactor. MOVE is a highly versatile technology that can be applied in
various fields, and optimizing its production is essential. This model demonstrates the potential for efficient,
low-cost, large-scale production of MOVE, which is necessary for its societal implementation. The results obtained
here will be used in the economic model to demonstrate the potential for business success.
Figure 4. Overview of Continuous Cascade Bioreactor Model
Abstract
We divided the process from MOVE creation to its use as a pesticide and the effects of RNAi into six
blocks and developed models for each section to run simulations. Each model is primarily described using
ODEs (Ordinary Differential Equations). Particularly in the m-MV production model, we addressed the concern
of whether shRNA would be encapsulated in m-MVs, predicted its feasibility, and provided suggestions for protocol
improvements. Furthermore, we evaluated the effectiveness of using MOVE as a pesticide and determined the
appropriate application amount for effective use. These results will be utilized in the economic model
and
serve as a basis for entrepreneurship
business success.
Introduction
Due to the vast scope of our project, we needed to allocate laboratory time and material budgets exclusively
to the most critical experiments. Modeling helps in determining which experiments those are.
By running computer simulations, we can accelerate the engineering process without incurring additional financial costs.
In this model, we simulate whether the transformed E. coli for MOVE production can successfully produce MOVE,
and whether the produced MOVE can effectively exhibit its therapeutic effects. We conducted the simulations following
the flow shown in the diagram below. In particular, whether a sufficient amount of shRNA is encapsulated
within the multi layered membrane vesicle(m-MV) can be estimated by the model, and this was a crucial factor for the completion of MOVE.
Additionally, this model aims to create a modular model that can be applied to multiple organisms by simulating
the effects of MOVE in several types of organisms.
Figure 1: workflow of m-MV Model
shRNA Production
Purpose
shRNA is the component that demonstrates the pesticide effect of MOVE and plays the leading role
in this grand system. Therefore, modeling the production concentration of shRNA is crucial in evaluating
the overall design and feasibility of MOVE. However, since the wet lab experimental data does not provide
sufficient information regarding the parameters, we ran simulations and applied them to the shRNA encapsulation
model of the m-MV.
Methods and Modeling
For the production of shRNA, we used the pBluescript vector with the pUC origin of replication, and employed
the T7 promoter for transcription. When RNA polymerase (RNAP) binds to the T7 promoter on pBluescript,
transcription is initiated, first producing single-stranded RNA (ssRNA) before it adopts a secondary structure.
In this study, we utilized a simplified model of this process. In this model, we assume that transcription
is switched on and off based on the binding and dissociation of the T7 promoter and RNAP. Furthermore,
the produced ssRNA, due to the properties of its nucleotide sequence, adopts a secondary structure
after a certain period of time, transforming into short hairpin RNA (shRNA). Based on these assumptions,
we can formulate the following differential equations.
This part of the equation ( equation(1)) describes the process leading up to the production of ssRNA within E. coli.
The transcription of ssRNA depends on the concentrations of the promoter and RNAP, and is determined
by the transcription rate constant (Tr). Additionally, ssRNA can be degraded by RNase, and this degradation
is governed by the degradation rate (Kdeg).
dtd[shRNA]=kfold[ssRNA]−kdeg2[shRNA](2)
This equation (ezuation(2)) represents the production concentration of shRNA. It depends on the folding rate of ssRNA
and the degradation rate of shRNA, expressed as (kfold) and (kdeg2), respectively.
Here, Tr~ and Km are calculated as follows. Km is the Michaelis constant, which is derived from
the dissociation rate constant of RNAP (koff) and the binding rate constant of RNAP (kon) based
on the assumption that Km=koff/kon. However, the kon value provided in Kim’s paper is obtained
in a cell-free system, and it is influenced by differences in diffusion coefficients within a bacterial cell[2]
.
To account for it, we used the correction factor ε, which represents the ratio of diffusion coefficients
between the cell-free system and the intracellular environment. The ssRNA production rate constant Tr~
is calculated by dividing the transcription rate by the number of nucleotides in the ssRNA, giving the
number of ssRNA molecules produced per second.
※ The kdeg2 is set to 0 because One Shot™ BL21 Star™ (DE3) Chemically Competent E. coli was used as the chassis, which has enhanced resistance to shRNA degradation and mRNA stability.
Result
Figure 2. Concentration of ssRNA (represented by blue line) and shRNA (represented by yellow line) in E. coli
Figure 2. Concentration of ssRNA (represented by blue line) and shRNA (represented by yellow line) in E. coli
As shown in Figure 2, single-stranded RNA reaches a steady state shortly after transcription initiation,
and the subsequent concentration change of shRNA within E. coli proceeds linearly. From this, it can
be concluded that ssRNA is rapidly converted into shRNA within E. coli, and the concentration of shRNA changes
at an approximate rate of 1.2 µM per hour. In terms of the number of molecules, shRNA is produced at a rate
of 7.0×102 molecules per hour per E. coli cell.
m-MV production
Purpose
m-MV serves as a protective membrane to enhance the durability of shRNA and is a fundamental part of our project.
Therefore, modeling the production of m-MV is crucial in evaluating the overall design and feasibility of MOVE.
However, due to time and cost constraints, the wet lab experimental data does not provide sufficient information,
so we conducted simulations and applied them to the shRNA encapsulation model of the m-MV.
Methods and Modeling
This model mathematically describes the production of poly(3-hydroxybutyrate)(PHB) in genetically engineered E. coli strains and
simulates the production of m-MV. The model is based on PHB production in an E. coli strain into which
the PHA synthase gene (phaCRe from R. eutropha), the 3-ketothiolase gene (phaARe from R. eutropha),
and the acetoacetyl-CoA reductase gene (phaBRe from R. eutropha) have been introduced through genetic
engineering. The model is constructed using the results of flux analysis related to PHB production and
biomass formation in Cupriavidus necator DSM 545[7]
, the original source of these genes, and it comprehensively
analyzes the sequence of processes leading to MV formation due to the internal pressure of PHB.
Preliminary analysis of experimental data
Estimation of PHB Production
Wet lab data will be analyzed to identify trends and special features that should be considered in the model.
This step is based on the analysis of Markan Lopar’s paper to obtain the data needed for the analysis
in Dry and to derive the relationship between media conditions and PHB and m-MV production.
In Markan Lopar’s paper, four different medium conditions are considered, and we will also conduct our
analysis based on these conditions. These medium conditions mainly vary in the amount of nitrogen. The glucose
consumption, nitrogen consumption, biomass production, and PHB production for each of the four medium conditions
can be examined using the differential equations derived from the metabolic flux.
The following table shows the medium conditions and the corresponding parameters based on metabolic flux analysis.
Using these values, we will simulate the steady-state PHB (wt%) concentration under each medium condition.
Gext is the external glucose concentration. MwG, MwNH_3, MwBIO, and MwPHB represent the molar molecular
weights of each molecule. BIO represents the biomass in g/L.
Table2
parameters based on metabolic flux analysis[7]
parameter
Batch
R1
R2
R3
r_1
1
1
1
1
r_37
0.1998
0.02804
0.7952
1
r_38
0
0
0
0
r_40
0.5576
0.7543
0.1205
0
r_41
0.4918
0.6653
0.1063
0
Estimating m-MV production
The m-MV production concentration was estimated based on the results from Koh’s paper published in 2023[6]
.
In this paper, the property of GFP being encapsulated only in m-MVs was utilized to measure the production
of m-MVs from the GFP intensity of the produced MVs.In this study, GFP intensity was measured, and what is known is the relative concentration in PHB (wt%),
but the production concentration of m-MV is not clear. Since our prediction required the production rate of m-MV,
we used the data from this previous research to calculate the absolute concentration of m-MV. The calculation method
involved computing the concentration of m-MV from the number of m-MVs in Figure 3, using that as the absolute
concentration, and then determining the m-MV production concentration relative to PHB (wt%) through third-order
polynomial fitting.
Figure 3a. An illustrative representation intended to depict the occurance of GFP encupsulation. The data are shown as the mean ± standard deviation from three independent experiments[6]
. Figure 3b. Fluorescence microscopic image of purified MVs encapsulating GFP. Scale bars, 5 μm[6]
.
Figure 3a. An illustrative representation intended to depict the occurance of GFP encupsulation. The data are shown as the mean ± standard deviation from three independent experiments[6]. Figure 3b. Fluorescence microscopic image of purified MVs encapsulating GFP. Scale bars, 5 μm[6].
Result
Estimation of PHB Production
Table3
Simulation results for steady-state values of PHB(wt%) concentration
Reacter
R1
R2
R3
PHB(wt%)
25%
3.3%
86%
As a result of calculations for each medium condition, the steady-state PHB (wt%) concentration was found to
be as shown in Table 3. Since m-MV production does not occur unless the PHB concentration is at least 60 (wt%),
it was revealed that medium conditions such as R2 are required for m-MV production. This indicates that the transformant
transitions between a state where growth is the primary focus and a state where PHB and m-MV production are predominant,
depending on the nitrogen levels in the medium. This is highly valuable information for the efficient production of m-MV
and is a significant outcome of the in-silico analysis.
Estimating m-MV production
Figure 4. Polynomial fitting results for m-MV production
Based on Koh’s paper, a third-order polynomial fitting of m-MV production yielded the following equation.
x represents the 10−2wt% concentration of PHB. This will be used primarily in the creation of the Bioreacter model.
The following equation is the m-MV production per 24h of 1gDWC with PHB(10−2wt%) as a variable.
In our project, we aim to create MOVE, which consists of m-MVs containing shRNA, using recombinant E. coli that
simultaneously produces shRNA and PHB. To achieve this, it was necessary to quantitatively understand the conditions
under which shRNA is incorporated into m-MVs, ensuring that at least one shRNA molecule is encapsulated within each
m-MV. For more details on the mechanism of MOVE production, please refer to the Wet Design
. While shRNA and m-MV
production are carried out in the same recombinant E. coli, if the shRNA production is insufficient and the intracellular
shRNA concentration is low, there is a possibility that shRNA will not be incorporated into the m-MVs. In such a case,
empty m-MVs may be produced, which could reduce the pesticide effectiveness of MOVE.This modeling attempts to improve
the project by determining how often shRNA encapsulation within m-MVs occurs under various conditions during MOVE
production. To address the issue of potential shRNA deficiency, we evaluated how the concentration of shRNA within
the transformants changes under different culture conditions and proposed a method to enhance the reliability of
the system.
Methods and Modeling
From previous simulations, it was determined that each E. coli cell produces shRNA at a rate of 7.04×102
molecules per hour, and the MV production rate is 3.8×10−1 MVs per hour.
Figure 5. State of shRNA production and m-MV production in Transformed E. coli
Figure 5. State of shRNA production and m-MV production in Transformed E. coli
As mentioned earlier, E. coli that produce PHB can exist in two states: one where biomass production is
predominant and another where PHB and m-MV production are dominant. These states are influenced by the
amount of nitrogen in the medium. Therefore, by controlling these states effectively, the number of MVs
containing shRNA produced per hour per liter of medium can vary.First, we will verify how much shRNA is
encapsulated in m-MVs based on the cultivation methods for m-MV production from previous studies. Let’s
consider the recombinant E. coli strain with a doubling time of td and shRNA production per doubling time
Ps. When E. coli divide, they distribute shRNA almost equally between the two daughter cells.
Figure 6. A hypothetical model of how shRNA levels in transformed E. coli change with doubling time.
one E. coli produces 7.0×102 shRNAs per hour and divides into 2 individuals at doubling time. When splitting,
the number of shRNAs possessed in the E. coli is considered to be about half of the number just before
splitting. The figure focuses on the number of shRNAs possessed by only one individual, and the other
individual at the time of division is abbreviated.
Figure 6. A hypothetical model of how shRNA levels in transformed E. coli change with doubling time.
one E. coli produces 7.0 *10^2 shRNAs per hour and divides into 2 individuals at doubling time. When splitting,
the number of shRNAs possessed in the E. coli is considered to be about half of the number just before
splitting. The figure focuses on the number of shRNAs possessed by only one individual, and the other
individual at the time of division is abbreviated.
one E. coli produces 700 shRNAs per hour and divides into 2 individuals at doubling time. When splitting,
the number of shRNAs possessed in the E. coli is considered to be about half of the number just before
splitting. The figure focuses on the number of shRNAs possessed by only one individual, and the other
individual at the time of division is abbreviated.In vitro studies on the variation in RNA production
according to the dry biomass of E. coli are common, and many of these studies have simulated RNA based
on its concentration. Many previous iGEM projects have modeled RNA production using the concentration of
E. coli in this way, but we assume that the designed shRNA will be encapsulated into m-MVs. This means
that the number of shRNA molecules we need to know is within a single E. coli cell. In this situation,
it is more appropriate to reconsider the number of shRNA molecules inside one E. coli cell, rather than
modeling shRNA based on its concentration.We constructed a model consisting of the doubling time-dependent
number of shRNAs in E. coli with reference to a model that uses shRNA concentration to determine parameters.
The equation for the number of shRNAs in E. coli is as follows.
td is the doubling time, n is the number of doublings, Ps is the number of shRNA molecules produced
per doubling time (equal to vs×td), and vs represents the number of shRNA molecules produced per unit
of time. Copies(t) represents the number of shRNA molecules in a single E. coli cell at time t.
The doubling time of E. coli varies depending on the nitrogen concentration in the reactor[7]
. Under medium
conditions with sufficient nitrogen, biomass production (i.e., cell growth) is prioritized over PHB production.
However, when the nitrogen level decreases, the carbon source is primarily used for PHB production, and the rate
of biomass production in E. coli matches the rate of biomass loss due to m-MV release, resulting in an apparent
specific growth rate of μ=0. At this point, the intracellular amount of shRNA, considering the release of
shRNA through m-MVs, is calculated as follows.
Vm represents the number of m-MVs produced per unit of time, and Vm-MV and VE. coli represent the
volumes of m-MV and E. coli, respectively.In the prior research conducted by Professor Koh, the cultivation
method for m-MV production involved inoculating LB medium (100 mL) and culturing for 48 hours. Since m-MV production
requires cultivation under nitrogen limitation, it is assumed that nitrogen was depleted at some point during the
48-hour period, causing a shift to nitrogen-limited metabolic flux. The consumption of glucose, nitrogen, biomass
production, and PHB production can be simulated again using the differential equations from the MV production
concentration prediction model. When nitrogen reaches zero, it is regarded as a nitrogen-limited medium condition,
and the metabolic flux parameters were adjusted accordingly.
The change in PHB(wt%) concentration at this point can be determined as follows.
PHB(wt%)=PHB+BIOPHB(13)
the previous study’s culture conditions in our wet lab are as follows.
Table4
previous study’s culture conditions
Glucose concentration
20g/L
Initial BIO amount
1.3g
Initial nitrogen amount in LB medium
2.3g
Culture time
48 hours
Simulation
We ran the time evolution of intracellular shRNA numbers for each doubling time until a pseudo-steady
state was reached. This was performed at specific growth rates (μ) of 1.0, 0.50, 0.20, and 0.10. The longer
the doubling time, the higher the intracellular shRNA levels reach in the pseudo-steady state.
Figure 7. The relationship between doubling time and shRNA concentration
Figure 8. shRNA concentration under m-MV production conditions
Table5
Number of intracellular shRNAs at each doubling time.
μ
1.0
0.5
0.20
0.10
m-MV production conditions
Doubling time (h)
0.693
1.39
3.47
6.93
inf
Number of shRNAs at steady state
704~1408
1408~2816
2440~4880
4877~9754
about 37000
As shown in graph 2, after approximately three doubling times, the intracellular shRNA amount reaches
nearly its maximum. Table 5 demonstrates the degree of this. The figure simulates the intracellular
shRNA levels in a non-growing, m-MV-producing state. Under the R2 medium condition, the biomass production
and the biomass loss due to m-MV production are almost equal, leading to a state where the number of cells
remains nearly constant. In this condition, the intracellular shRNA amount decreases only due to the release
of m-MVs, but since this reduction is minimal relative to the total shRNA amount, it converges at a very high
level.
Figure 9. m-MV shRNA number distribution in each state
Figure 9 simulates the distribution of shRNA encapsulated into m-MVs based on the actual number of shRNA molecules
in each cell. The figure below shows the results. It was found that the resulting MOVE would be in the following
states. We examined the cases where the number of shRNA molecules per cell was 1,000, 3,000, 5,000, 10,000, 20,000,
and 40,000. When the intracellular shRNA amount is 5,000 or less, more than 8% of the m-MVs produced will be empty.
The simulation results for the changes in NH3, PHB, biomass, and PHB(wt%) under the cultivation conditions of
previous studies are as follows.
Figure 10a. Changes in NH_3, PHB, and biomass under the culture conditions of the previous study. Figure 10b.PHB (wt%) changes under the culture conditions of the previous study.
After 18 hours of inoculation, nitrogen is depleted, and the system transitions primarily to PHB production.
At this point, biomass production is almost balanced with m-MV production, and the growth of transformants halts.
Around 30 hours after inoculation, the PHB (wt%) reaches 70%, and the production of m-MVs begins in earnest.
The graph also shows that the PHB concentration shifts as nitrogen becomes limited. After switching to the PHB
production flux, approximately 8,900 shRNA molecules are produced in the cells over a period of about 10 hours,
with an average of about four shRNA molecules being encapsulated in each m-MV, and hardly any empty m-MVs being
produced. Furthermore, to ensure even more shRNA is encapsulated in m-MVs, it has been found that simply adding
glucose under nitrogen-limited conditions and continuing incubation for one or two additional nights will be
effective.These results successfully address the concern raised in Wet lab experiments about the possibility
of shRNA not being encapsulated in m-MVs. Additionally, we propose a solution for cases where shRNA is not
encapsulated during experiments and offer suggestions for creating a more effective MOVE.
Reslut
It was found that, using the cultivation method from previous studies, an average of four shRNA
molecules can be encapsulated in each m-MV, with no empty m-MVs produced. MOVE can be recovered
at a concentration detectable through experiments. Additionally, if sufficient encapsulation of
shRNA is not confirmed in the experiments, it has been determined that simply adding glucose under
nitrogen-limited conditions and continuing incubation for one or two additional nights will be effective.
This guarantees the feasibility of MOVE production in the Wet lab and has motivated us to continue
the experimental work.
There are two points to note in this model
(1) The occurrence of empty m-MVs
The number of shRNA molecules encapsulated in m-MVs must be at least one or more. To reduce the number
of empty m-MVs to nearly zero, the intracellular shRNA amount should be increased to 10,000 or more.
(2) The number of m-MVs produced
The production of m-MVs in each E. coli cell depends on the PHB (wt%), which in turn depends on the
nitrogen concentration in the medium. However, the PHB (wt%) reaches its maximum at around 80%, meaning
that further increases in m-MV production at the cellular level are not possible. To solve this, the
only option is to increase the number of transformants. However, E. coli with high PHB (wt%) have a
low specific growth rate and are not suitable for growth. We have considered solving this issue using
a continuous cascade system.
The bioreactor design model is explained on the page for the continuous bioreactor model.
MOVE durability
Purpose
We will simulate how long MOVE can maintain the protective state of shRNA in outdoor environments.
This is crucial to demonstrate that MOVE protects shRNA and enhances the effectiveness of the pesticide.
We will conduct supporting experiments in the Wet lab and fit the obtained results using polynomial fitting.
These results will then be formulated into ordinary differential equations and connected to the m-MV encapsulation
model, among others. The following are assumed for MOVE in the external environment and for internal shRNA loss.
MOVE decays over time (UV, heat, pH, ions)
MOVE is washed away from plant leaves and stems by rain runoff
UV light destroys shRNAs inside MOVE
We aim to improve upon issue 2 by using Mgfp-5, a mussel adhesive protein, as a surface display protein.
Methods and Modeling
We will fit the experimental results using polynomial fitting.
Result
In this case, the experiments we had planned in the Wet lab were not completed in time for the wiki freeze.
MOVE uptake
Purpose
We will measure the uptake efficiency of MOVE by the target pathogens in relation to MOVE concentration through
experiments and use the results to formulate differential equations. However, due to time constraints, we were
unable to conduct this experiment in the Wet lab. Below, we outline the simulation method to be used once experimental
data is obtained.
Methods and Modeling
Figure 11. Schematic diagram of the MOVE uptake model
Let S_target represent the surface area of the target pathogen [m²], C_MOVE the density of MOVE [mol/m²],
and c the number of shRNA molecules encapsulated in a single MOVE. V_target is the volume of the target pathogen [L].
N_MOVE represents the number of moles of MOVE dispersed, and S_land is the area of the farmland [m²]. Alpha (α) is a
coefficient representing the surface area of the farmland, used to calculate the surface area including the leaves and
stems of crops. This coefficient depends on the structure of the crops and farmland. Mu (μ) represents the uptake rate
by the pathogen and ranges from 0 to 1.
RNAi effect
Purpose
This model mathematically describes the process of GFP (Green Fluorescent Protein) expression suppression
through the RNAi (RNA interference) mechanism within yeast cells. The model comprehensively represents the
entire process, starting from the introduction of shRNA (short hairpin RNA), the generation of siRNA (small
interfering RNA), the formation of the RISC (RNA-induced silencing complex), and the degradation of GFP mRNA.
Through this model, we aim to verify the RNAi effect of shRNA introduced into yeast cells via MOVE and demonstrate
its therapeutic efficacy.
Methods and Modeling
Figure 12. Overview of RNAi
To represent these processes, we constructed a system of coupled ordinary differential equations. Each equation
describes the concentration changes of the respective molecular species as a function of time[8]
.
Figure 13. Model at shRNA uptake concentration of 0.5 nM
Figure 14. Protein inhibition rate by shRNA incorporation concentration
Figure 13 shows that the RNA interference mechanism is successfully modeled. Figure 14 demonstrates that,
when the intracellular shRNA concentration reaches 100nM, up to 90% of protein expression is inhibited,
indicating that with an appropriate gene target, it is possible to sufficiently eliminate the organism.
In the case of Pyricularia oryzae (rice blast fungus), with alb1 as the target gene, alb1 is essential
for the fungus to extend its hyphae into the plant. Since temporary inhibition of alb1 critically impacts
the survival and reproduction of the fungus, it is expected that RNAi will have a significant fungicidal effect.
MOVE application volume calculation
We consider the number of MOVE particles to be spread per 1ha. We mainly use the results from the MOVE uptake model
and the RNAi effect model. According to the RNAi effect model, when the shRNA concentration in the pathogen reaches
100nM, protein expression is inhibited by 90% through RNA interference, which is expected to be sufficient for pathogen
control. Using the MOVE uptake model, we found that to achieve an shRNA concentration of 100nM inside the pathogen,
it is appropriate to spread 4.0×10−7 MOVE per 1ha. However, this value is based on α=1 and μ=1.
The range of these values is still uncertain as we have not yet been able to experimentally calculate the uptake
efficiency of MOVE, but we estimate that it would not be unreasonable to assume it falls within a range of 1 to 10
times this value. We are using this estimated value to calculate the economic effect in our Economic model
and to
link it to Entrepreneurship
.
Summary
In this section, we were able to demonstrate the feasibility of MOVE production at a laboratory scale and prove the e
ffectiveness of RNAi interference as a pesticide. Before the experiment, we showed that MOVE could be produced in the
Wet lab and proposed improvements to the protocol. Furthermore, by simulating the effectiveness of MOVE at an outdoor
scale, which would be difficult to achieve experimentally in the Wet lab, and by calculating the amount of MOVE to be
dispersed, we provided data as a basis for feasibility in the economic model and entrepreneurship business model.
Modularization
This is the file of the model we used. This makes it easy to reproduce the simulation we have done here.Click here
.
[9]Deerberg et al. (2008). Journal of Molecular Biology, 379(3), 567-581
[10]Bartlett DW, Davis ME. Insights into the kinetics of siRNA-mediated gene silencing from live-cell and live-animal bioluminescent imaging. Nucleic Acids Res. 2006 Jan 12;34(1):322-33. doi: 10.1093/nar/gkj439. PMID: 16410612; PMCID: PMC1331994.
[11]Groenenboom et al. (2005). Journal of Theoretical Biology, 236(3), 327-341.
[12]Pete Corish, Chris Tyler-Smith, Attenuation of green fluorescent protein half-life in mammalian cells, Protein Engineering, Design and Selection, Volume 12, Issue 12, December 1999, Pages 1035–1040
Abstract
We are considering not only encapsulating shRNA in m-MV, but also displaying enzymes on its surface to give
functionality to the membrane surface layer. Surface display is accomplished by joining enzymes with scaffold
proteins, and we used Spycatcher/SpyTag for this joining. By utilizing this system, it becomes possible to display
any enzyme on the surface of MOVE with proper design, allowing MOVE to have various characteristics. This is very
important for the modularity of MOVE. In Dry Lab, we will confirm whether enzymes can function sufficiently on
the MOVE surface through structure prediction using Alphafold, prediction of protein state during membrane
presentation using OPM (orientation of protein), and molecular docking analysis using AutoDock Vina and HADDOCK 2.4
web server. Please see the figure below for details of the simulation.
Figure 1. Overview of Surface Presentation Model
Method
Below, we describe the evaluation procedure for surface display on MOVE. This is
summarized very concisely and is available to anyone who wants to perform SDT
Feasibility Analysis. This contributes to the modularization of the model.
Analysis Flowchart
Structure acquisition/prediction: Obtain known structures from PDB
Selection of E. coli outer membrane proteins:
Select appropriate outer membrane proteins such as OmpA and Lpp. This varies depending on the Da of the enzyme to be surface-displayed.
Spycatcher/SpyTag sequence design:
The SpyTag sequence (AHIVMVDAYKPTK) can be attached to any position on the enzyme, not limited to the N-terminus or C-terminus, and still bind to Spycatcher. Incorporate the sequence into a part that minimally affects the active site.
Confirm how many surface-displayed enzymes are presented per MOVE:
Using our expression level model for surface-displayed proteins, we can easily simulate this by inputting the base pair length.
Scaffold protein and enzyme Spycatcher/Spytag fusion design:
Use Alphafold3 to model the fusion structure.
Execute molecular dynamics simulation:
Use OPM to observe the state of the surface-displayed protein. This allows us to confirm whether the enzyme portion is successfully displayed on the membrane surface.
Add linker or redesign SpyTag position (if necessary):
If interference is observed, add a linker or change the position of SpyTag.
Gene Design for Surface Display
We set the enzymes we use according to our purpose. Here, we will simulate and evaluate the enzymes we are actually
dealing with: Mgfp-5, ChitinaseC (hereafter ChiC), and Endo-1, 3(4)-β-glucanase (glucanase).Here, we conducted an
analysis according to the above flowchart for ChiC and glucanase, which are displayed to degrade chitin, a component
of fungal cell walls, and allow shRNA to penetrate into the fungus, as well as for Mgfp-5, a mussel adhesive protein
surface-displayed to prevent MOVE from falling off leaves and stems due to water flow from rain or pesticide spraying.
Figure 2. Design of the fusion sequence of Chic and Spytag
It is known that Spytag can be inserted at any position on the enzyme and still bind to Spycatcher. Therefore,
we should aim to retain the enzyme’s activity during surface display by designing Spytag at the position that has
the least impact on the enzyme and surface presentation.
Figure 3. Use the PDB data to identify regions that do not overlap with the motifs[1]
.
Figure 3. Use the PDB data to identify regions that do not overlap with the motifs[1].
I will use PDB data to determine where the motifs are located in the enzyme and also identify the active site.
By inserting the SpyCatcher sequence into the flexible linker region, the fusion protein’s activity is expected
to be effectively maintained.
Figure 4. Design of the fusion sequence of Chic and Spytag
Figure 5. Design of the fusion sequence of OmpA and Spycatcher
Here we show the ChiC_ST sequence and the OmpA_SC sequence. We needed detection using Histag for our experiments.
By inserting the ST sequence at the C-terminus and N-terminus of the Histag, we enabled both detection using Histag
and surface display therapy. For the OmpA_SC sequence, we use the same one as in the paper. This design is achieved
by confirming the enzyme motif from the literature.
Models for expression levels of surface-presenting proteins
We use the Spycatcher/Spytag system to display multiple proteins on the surface of m-MVs.
The number of these proteins displayed on m-MVs is crucial for confirming whether MOVE functions sufficiently.
We have created a model to verify MOVE’s functionality by simulating DNA expression, RNA transcription,
and Protein expression levels using ODEs.
Based on this equation, the approximate number of surface-displayed proteins per
individual cell is predicted. From the steady-state expression level of the protein,
the number of protein molecules is calculated and divided by the surface area of the cell
membrane, and the number of enzymes per m-MOVE is determined. I have prepared a program to
perform these calculations.
Structure prediction using Alphafold3
AlphaFold is a groundbreaking tool that uses deep learning to predict protein
structures with high accuracy from amino acid sequences. Moreover, in May of this year,
an improved version, AlphaFold3, was announced[2]
, significantly enhancing its performance
in complex predictions and interactions with non-protein molecules such as nucleic acids.
By simply inputting an amino acid sequence, it can predict the structure of unknown proteins,
making it incredibly useful for designing fusion proteins. For example, it serves as the first
checkpoint to assess how the fusion affects the 3D structure, whether the function is retained,
and what the optimal linker length should be. In this design process, AlphaFold3 is the most
accessible and powerful tool.
Figure 6. Design of sequences for structure prediction of fusion proteins
When fusing these, since the reactive residues of Spycatcher/Spytag during binding are located in the middle
of each sequence, they cannot be described as a single-chain amino acid. In this state, for structure prediction
using Alphafold, it is basic to input a single-chain amino acid sequence. By defining the amino acid sequence of
the fusion protein as described above, we can predict the approximate structure of the fusion protein. However,
it should be noted that this is only an approximation and there are limitations to its accuracy.
Prediction of surface presentation at OPM
While AlphaFold allows for the confirmation of whether the 3D structure of
protein blocks remains intact, using Orientations of Proteins in Membranes (OPM) is more reliable to confirm whether
the enzyme portion is indeed displayed on the surface when surface-displayed
on a membrane[3]
. OPM is a web service that predicts how proteins are positioned
within biological membranes and provides information about this arrangement.
It is particularly useful for studying membrane-related proteins, as it clarifies
the orientation in which a protein is inserted into the membrane, which parts are
exposed to the membrane surface, and which parts are embedded within the membrane.
This allows researchers to easily identify transmembrane regions and domains that
bind to the membrane surface. The input for OPM is the PDB file of the fusion
protein predicted by AlphaFold.
Activity Prediction via Docking Simulation or Molecular Dynamics (MD)
HADDOCK is molecular docking software used to predict interactions between molecules
such as proteins, ligands, DNA, and RNA. Unlike traditional docking methods, HADDOCK
utilizes experimental data provided by the user (e.g., NMR, mass spectrometry, cross-linking)
to perform binding predictions. By using information on binding sites and contact residues,
it can predict more realistic complex structures, taking into account molecular flexibility.
It can be accessed via a web interface, making it relatively easy to perform analyses.On the
other hand, molecular dynamics (MD) simulations simulate the movement of molecules over time
and track the dynamic behavior of proteins and ligands. They can predict long-term interactions,
structural stability, and changes in binding states that cannot be handled by docking simulations.
Performing these analyses on fusion proteins is powerful for confirming their activity.In our case,
since the structure of the fusion protein did not change at the motif level, we concluded that it was
unnecessary to specifically use AutoDock Vina or HADDOCK for the fusion protein and instead relied on
confirmations from existing literature.
Result
Output in the expression level model of surface-presenting proteins
Figure 7. Amount of surface presentation when expressing three types of proteins.
Table1
Concentration of Pro-Sucf complex at 200 min
Concentration at 200 min
M
Mgfp5-Suc complex
2.0×10−06
CHiC-Suc complex
2.0×10−07
Glucanase-Suc complex
9.1×10−08
When calculated as the number presented per single MOVE, we found that Mgfp5 is presented in hundreds,
ChiC in tens, and glucanase is only presented in single-digit orders. This indicates that the surface presentation
of these three types of patterns is unlikely to be successful.
From the results, we were able to determine the amount of protein presentation. When presenting three types of
surface-displayed proteins, it became clear that suc was significantly deficient, indicating the need to increase
the copy number of the plasmid encoding suc. The number of enzymes presented per MOVE is limited, and while
experimental confirmation is necessary, improvements may be needed. ptf16 was insufficient due to its low copy number.
Figure 8. ChiC and scaffold protein production.
As a trial, we simulated the intracellular protein production levels of Sucf and ChiC. From this, we can see
that the production of Sucf is overwhelmingly insufficient.
There are two ways to solve this problem. Reduce the types of proteins displayed on the surface. If we limit the types
of surface-displayed enzymes to one, sucf will be used for only one enzyme, so more Mgfp5 should be displayed.
Additionally, there is a technique to place the scaffold protein on a high-copy vector to increase the number of
Sucf presentations.
Figure 9a.(left) Amount of surface presentation when expressing one types of proteins. Figure 9b.(right) Amount of surface presentation when expressing three types of proteins by using pBluescript II SK(-) .
Table2
Concentration of Pro-Sucf complex at 200 min of using pBluescript II SK(-)
Concentration at 200 min
M
Mgfp5-Suc complex
0.0002
CHiC-Suc complex
2.0×10−5
Glucanase-Suc complex
9.1×10−6
When calculated as the number presented per MOVE, it appears that tens of thousands of Mgfp5
are presented, and thousands of ChiC and glucanase are presented, suggesting that the three surface
presentations on MOVE will be successful.
The results when using pBluescript II SK(-) as a vector are as follows. This would increase the
amount of enzymes presented on MOVE, making it better. The number of enzymes that should be surface-displayed
on MOVE varies depending on the purpose, and wet lab experiments are ultimately essential to guarantee functionality,
but this model can provide some assurance of success probability. Additionally, as the number of enzyme types
increases, the number of enzymes that can be loaded onto individual MOVEs decreases. This is due to the limitation
of suc concentration, and increasing scaffold protein expression levels is essential for presenting three types of enzymes.
For future prospects, it is necessary to use high copy number vectors and high transcription promoters.
Output with Alphafold3
Figure 10. Bioreactor safely converges all concentrations
Figure 11. Structure Prediction of the Fusion Protein Using AlphaFold3
From the results of the 3D structure prediction of the fusion protein and the heatmap showing the Expected
Position Error, we can see that the structure of the motifs themselves doesn’t change much, and the linker between
them becomes a flexible linker. This indicates that the scaffold protein and the enzyme are likely to function without
interfering with each other’s roles, which does not negate our protein design.
Output in OPM (orientetions of protein)
Figure 12. Bioreactor safely converges all concentrations
The purple color represents the scaffold protein OmpA, while the red color indicates the enzyme ChiC region.
The blue represents the outer membrane of the lipid bilayer, and the red represents the inner membrane of the
lipid bilayer. This shows the surface presentation for a 10nm MV. The average diameter of m-MVs is 100mM, so
the curvature is smaller than this and closer to a plane. It has been confirmed that even in a planar state,
the surface-presented protein can stably present its active site, indicating that the surface presentation is
likely to be successful.
Activity Prediction via Docking Simulation or Molecular Dynamics (MD)
Figure 13. Interactions between ChBDChiC and tri-N-acetylchitotriose. The ligand molecule is well bound on the binding site by two stacking interactions (Trp59-NAG1 and Trp60-NAG3) and two hydrogen bonds (between Trp60-N and NAG2-O7 (carbonyl moiety of N-acetyl group of NAG2) and Trp59-NE1 and NAG2-O6). Hydrogen bonds are indicated by broken lines. Only Trp59 and Trp60 are responsible for the interactions [1]
.
Figure 13. Interactions between ChBDChiC and tri-N-acetylchitotriose. The ligand molecule is well bound on the binding site by two stacking interactions (Trp59-NAG1 and Trp60-NAG3) and two hydrogen bonds (between Trp60-N and NAG2-O7 (carbonyl moiety of N-acetyl group of NAG2) and Trp59-NE1 and NAG2-O6). Hydrogen bonds are indicated by broken lines. Only Trp59 and Trp60 are responsible for the interactions [1].
To verify whether the fusion protein can maintain its binding affinity with the substrate
even when surface-displayed, a comparison was made with the OPM database. The results predicted that the activity of the binding site would be maintained even in the
surface-displayed state, as the active site was not embedded within the membrane, and no significant
changes were observed in the shape of the motif containing the active site. However, if there are concerns
with the AlphaFold3 or OPM simulations, it may be necessary to consider redesigning the protein sequence.
Furthermore, it is important to actually perform Activity Prediction as an additional verification to confirm
how the protein’s structural changes affect its binding affinity to the substrate. This will allow for a more
definitive judgment on whether the fusion protein can appropriately maintain its interaction with the substrate
on the membrane surface.
Modularization
This is the file of the model we used. This makes it easy to reproduce the simulation we have done here.Click here
.
Models for single bioreactors and continuous cascade bioreactors were created for the large-scale production of MOVE.
Each model is described by about 80 ordinary differential equations using COPASI. The created models were used to
identify parameters to improve MOVE production efficiency using the MCA Tool, and were optimized through parameter
scanning. As a result, the continuous cascade bioreactor enables approximately 100 times more efficient MOVE production
compared to a single bioreactor. This result is utilized in the Economic model
, contributing to the improvement of
entrepreneurship’s
business model by calculating drug prices and profits.
Introduction
In our project, we aim to create MOVE by encapsulating shRNA into m-MVs. Since MOVE can encapsulate various shRNAs, it has the potential for applications in many different fields. However, to expand its range of
applications and establish it as a modular system, efficient production is essential. To achieve efficient MOVE
production, we developed a bioreactor model and chose to use a continuous cascade system [1]
.Based on the previous
analysis, it is understood that the E. coli producing MOVE adjusts its metabolic flux depending on the nitrogen
content in the medium (as shown in the Dry shRNA-MV model).The continuous cascade system offers the advantage of
effectively controlling changes in metabolic flux compared to a single-batch process. This model aims to improve the
production yield of MOVE through the use of a continuous cascade bioreactor, contributing to its application in various
biotechnologies.
Feed back from Wet!
Wet lab experiments have shown that sufficient aerobic conditions are necessary for PHB production.
Without adequate aerobic conditions, pyruvate undergoes fermentation into alcohol, and PHB is not produced.
Figure 1. Wet lab PHB production results. Under medium conditions with 2% glucose, the top image shows anaerobic
conditions, and the bottom image shows aerobic conditions. Under anaerobic conditions, the cell count is low,
and there is minimal PHB fluorescence detected by Nile Red. Under aerobic conditions,
an increase in both cell count and PHB fluorescence can be observed.
In reactors with a high glucose concentration in the culture medium, cell density increases, and the amount of PHB
per unit volume also rises. However, this also increases the required amount of oxygen. Due to limitations in equipment
and the physical properties of oxygen uptake, adjusting the balance between oxygen supply and the amount of glucose added
is essential for MOVE production.
A system like a continuous cascade bioreactor inevitably involves a high level of complexity due to the interactions
between different reactors. Therefore, careful analysis and design are required to achieve the intended engineering goals.
Modeling is a valuable tool for accelerating the DBTL (Design, Build, Test, Learn) cycle. In our case, modeling was
essential for addressing the following question:
How can we establish a stable continuous cascade system while maximizing the production of the target compound, MOVE?
This general question was expanded upon in various sections during the modeling work, allowing us to establish
strong links with Entrepreneurship by addressing the following aspects of the project:
Demonstrating the feasibility
of the project based on an efficient MOVE production bioreactor model.
Optimizing the process
through model-driven identification of key control parameters.
These results are then integrated into an economic model to assess scalability and profitability at an
industrial level, driving the feasibility of the project forward.
Method
To understand, predict, and ultimately control the behavior of the bioreactor, we developed a mechanical
dynamic model of the bioreactor. The concentration balance of extracellular substrates and/or products in
the bioreactor is modeled and described by a system of ordinary differential equations (ODEs). The final
model integrates reactions that represent a wide range of processes, including microbial growth, metabolite
production and utilization, transport between reactors, and diffusion. For the metabolic model of the PHB-producing
E. coli, we used the same model as described in Markan Lopar’s paper[1]
.
Representation of the model
Before constructing the mathematical model, we used standardized Systems Biology Graphical Notation (SBGN)
to represent all the elements of the system and their interactions (Figure 2). For more details on the meaning
of the various symbols, please refer to SBGM.
Figure 2. Representation of the model in Systems Biology Graphical Notation [3]
.
Figure 2. Representation of the model in Systems Biology Graphical Notation [3].
Figure 3. Six-step bioreactor cascade for continuous MOVE production.
Construction of the system of ODEs
In this section, we explain the structure of the model, the rate laws used to represent each reaction,
and the balance equations for each component of the system. The units used in the model are liters (L) for volume,
minutes (hour) for time, and millimoles (mmol) and grams of dry weight (gDW) for the amounts of metabolites and biomass,
respectively.
Glucose , NH3 , O2 uptake
To ensure that E. coli responds to changes in extracellular nutrient concentrations (Glucose),
the uptake of glucose is defined for both organisms using the Monod equation (Monod 1949), allowing
the specific consumption rates to be calculated. The consumption rate q (mmol.gDCW⁻¹.h⁻¹) is expressed based on
the maximum uptake rate of E. coli q_max (mmol.gDCW⁻¹.h⁻¹) and the half-saturation constant K_M (mM).
qGlu=qGlumax×(KGlu+GluGlu)(1)
O2 gas transfer
The O2 balance can be represented by considering three processes: the input of O2 into the reactor through
bubbling of O2-enriched air (I_O2), the transfer of O2 from the gas phase to the liquid phase (T_O2), and
finally, the output of O2 from the reactor (O_O2).
They are added by the flow rate v, which is a constant concentration. and can be expressed
by the concentration in the medium.
dtdGlu=Glu_input⋅ν(6)dtdNH3=NH3_input⋅ν(7)
Analysis Method
We conducted the analysis using COPASI. We prepared a metabolic and reactor model by combining 82 equations,
including the metabolic flux equations for PHB-producing E. coli and the reactor equations we developed. After
measuring the sensitivity of each parameter using the MCA (Metabolic Control Analysis) tool built into COPASI,
we optimized the controllable parameters that showed high sensitivity for MOVE production using a Parameter Scan.
Parameter
The parameters for the PHB-producing E. coli were based on Markan Lopar’s paper. A list of the equations and
parameters used is attached in the appendix. The following parameters are specific to the reactor design and
were newly added in our model.
Figure 3. m-MV production concentration in a single bioreactor
The model simulation of the single bioreactor confirmed that all species’ concentrations converge over time,
indicating a stable model reactor. To produce MOVE, it is necessary to create a nitrogen-depleted medium condition,
but this results in the cell concentration in the medium stabilizing at a low level, leading to lower MOVE production.
Simulation results of continuous cascade bioreactor
Figure 4. Bioreactor safely converges all concentrations
Figure 5a. Concentration of BIO in each bioreactor. Figure 5b. NH3 concentration in each bioreactor.
Figure 6a. Concentration of PHB in each bioreactor. Figure 6b. Concentration of m-MV in each bioreactor.
The model simulation of the continuous cascade bioreactor confirmed that all species’ concentrations converge over time,
indicating a stable model reactor. As expected, the results show that pre-culturing occurs in R1, followed by main
culturing in R2, where cell concentration increases significantly and NH3 is consumed. In R3, PHB is produced,
and in R4, R5, and R6, MOVE is produced on a large scale. The metabolic flux of the cells changes in each bioreactor,
supporting efficient, specialized production of MOVE. Overall, these simulations are consistent and demonstrate the
theoretical feasibility of the project.
Optimization of MOVE production
Through MCA, we were able to narrow down the key parameters we should focus on to optimize MOVE production efficiency.
Among these, the easily adjustable parameters—glucose supply and the ratio of glucose added between R3–R6 relative to
R1–R2—were found to have a significant impact. While increasing the glucose supply might seem like an obvious approach
to increase MOVE production, the limitation imposed by oxygen supply makes it necessary to find the optimal value.
Additionally, since the roles of the first and second halves of the continuous bioreactor differ, it is important to
adjust the glucose supply between these sections accordingly.
Figure 7. Bioreactor safely converges all concentrations
The areas where the concentration of m-MV is not displayed are where the O2 concentration became negative.
In these regions, anaerobic conditions prevail, making PHB production difficult, as indicated by the Wet lab
results. From this analysis, it was determined that when r =
3.715
and glucose addition is
990.335 mmol/h
,
0.00213 mmol
of MOVE can be recovered in 1 hour.
In a single bioreactor, it was found that MOVE could be recovered at a rate of 2e-5 mmol/h. This allows for the
production of pesticides for approximately 0.05 hectares in 1 hour. In contrast, the continuous cascade bioreactor
was able to recover MOVE at a rate of
0.00213 mmol/h
, which enables the production of pesticides for approximately
5 hectares in 1 hour. This represents a recovery rate about
100
times higher than that of the single bioreactor,
demonstrating that the continuous cascade bioreactor allows for much more efficient MOVE production. This significantly
enhances the feasibility of the business model in the economic analysis.
Process Scale and Profitability
As explained previously, our idea was to design an industrial biotech plant that would utilize our model
to provide quantitative data and produce MOVE. Here is a brief description of some of the elements for which
our model proved to be a useful predictive tool.
Cost Reduction Through Medium Optimization
As repeated before, a crucial factor for m-MV production is the nitrogen concentration in the medium.
m-MVs are only produced under nitrogen-depleted conditions, and sufficient cell growth cannot be achieved
without adequate nitrogen levels. Therefore, we can simulate the optimal nitrogen input for each reactor
to ensure the ideal concentration in the medium. This prevents unnecessary nitrogen addition, reduces the
risk of decreased MOVE production, and maintains cell density by avoiding nitrogen depletion.
Increase in MOVE Production Through Production Efficiency Optimization
Due to physical and equipment limitations on oxygen supply, simply increasing the glucose concentration
in the medium does not lead to an increase in MOVE production. An excessive increase in glucose addition
creates anaerobic conditions in the reactor, inhibiting the production of PHB and MOVE. We were able to
identify the maximum glucose addition amount (R1, R2: 178.2 g/h; R3, R4, R5, R6: 662.0 g/h) that avoids oxygen
depletion, optimizing MOVE production.
Cost savings with continuous bioreactors
In the continuous bioreactor, it was found that MOVE production costs could be reduced by approximately 27
times compared to the single bioreactor when considering NH3 and glucose as costs. This demonstrates the validity
of using a continuous bioreactor for organisms with multiple fluxes depending on medium conditions. The continuous
bioreactor model is effective for various transformants with multiple fluxes under different medium conditions.
Summary
In the continuous cascade bioreactor, significant improvements in both MOVE production per unit of medium and
MOVE production per unit of time can be expected compared to the single bioreactor. This model has clarified
how to design a bioreactor for efficient MOVE production. Additionally, it has demonstrated the potential for
producing MOVE at the lowest possible cost, providing a foundation for proving the theoretical feasibility of
the business in the economic model.
Modularization
This is the file of the model we used. This makes it easy to reproduce the simulation we have done here.Click here
.
[3] Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, Matsuoka Y, Villéger A, Boyd SE, Calzone L, Courtot M, Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A, Kim S, Kolpakov F, Luna A, Sahle S, Schmidt E, Watterson S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL, Kohn K, Kitano H. The Systems Biology Graphical Notation. Nat Biotechnol. 2009 Aug;27(8):735-41. doi: 10.1038/nbt.1558. Epub 2009 Aug 7. Erratum in: Nat Biotechnol. 2009 Sep;27(9):864. PMID: 19668183.