OVERVIEW

The need for a computational model

We are in an age where experimental data required to solve certain problems is already available; many interactomes are studied, and the data is ever-increasing. The simplest of mathematical or computational models uses some of these available datasets, applies mathematical relations (mostly correlated to the available experimental data), and predicts the outcomes if some of the parameters of the experiments were changed. This approach helps us predict what outcomes we would get from experiments and also helps us improve their design, reduce the number of trials, and test extremities. The modelling approaches that we would be working with are:

Metabolic Modelling: It involves an interactome of metabolites in a biological organism known as GSMM (Genome-scale Metabolic Model) and analysis of the metabolic pathways using Flux Balance analysis (FBA).
Structural Modelling: We used AlphaFold2 to predict the structures of our engineered proteins and performed molecular docking with their respective substrates to study the catalytic domains.

Metabolic Modelling

Introduction

Our idea is to metabolically engineer Pseudomonas putida to grow solely on the monomers of PET (Polyethylene terephthalate), namely EG (Ethylene Glycol) and TPA (Terephthalic acid), and produce sesquiterpenoids like α-santalol, β-santalol, α-exo-bergamotol and epi-β-santalol as secondary metabolites.
Genes were designed to add a few metabolic pathways required to obtain the target metabolite. Hence, a GSMM was required to predict these changes in the metabolic network.
A Genome-Scale Metabolic Model is a set of inter-dependent linear equations, where the equations represent reactions (assumed to be at their steady state) catalysed by proteins obtained from the genomic information of the organism. They all have a default objective of ‘Biomass’, which is a merged form of various equations (reactions) and elements (metabolites) required for the model (organism) to survive in an assumed live state backed up by actual wet lab experiments and metabolomics study of the organism.
iamge s loading

Our approach

Objectives of the model:

To be able to take up Ethylene Glycol and Terephthalic acid and produce santalol and isomers.
Predict the probable metabolic path taken by our metabolites
Predict the genes to knock or overexpress to get a better yield of our product.

Background and selection of model:

Application of GSMM has become trivial due to the vast availability of data and well-designed pipelines, but validating a model and checking how closely it resembles the biological system requires time and experimental validation. Thus, we opted for modifying pre-existing models and came across 3 specific models:

Model name	No. of Metabolites	No. of reactions	No.of genes	Year of publication
iJP815	892	948	818	2008
PpuQY1140	1104	1171	1140	2017
iJN1463	2160	2936	1462	2019

Trivia on models:

Model iJP815:
This GSMM was developed by Puchalaka et al., 2008 ^[1], It was one of the early attempts to to make a GSMM of Pseudomonas putida KT2440 to address it’s metabolic wiring as well as biocatalysis. The model was validated by making various knockout variants of the bacterium and high throughput data analysis. The auxotrophy was correctly predicted by the model in 75% of the cases.

Model PpuQY1140:
The GSMM was developed for Pseudomonas putida KT2440 by Qianqian et al., 2017 ^[2]. They compared four previously published models specifically, iJN746 (a precursor of iJN1463), iJP815, iJP962 (a later version of iJP815) and MBEL1071. Further, the discrepancies between the models leading to inconsistent pathway calculation results were identified. The group also corrected the models based on literature to ensure all the calculated synthesis and uptake pathways are similar, further built the pathway consensus model (PpuQY1140), and updated it to the latest genome annotation available while publishing. Hence the GSMM had stoichiometrically correct reactions but no formula or charge information on the metabolites.

Model iJN1463:
Built by Nogales et al., 2020^[3], this new GSMM incorporated several hundred additional genes and associated reactions and new nutrients supporting growth. This model was validated experimentally by in vivo growth screens on previously untested carbon and nitrogen source. The knockout predictions of essential genes were also very accurate. This model considers some metabolites to accumulate in the bacterium and treats them as sinks. This is the only GSMM to have accounted for the formula and charge of the metabolites and thus, the carbon flux (c-flux) could also be tested.
The models will be compared later based on how accurately the predicted metabolic pathway opted by Ethylene Glycol and Terephthalic acid coincides with the possible pathway predicted via experimental methods.

Flux Balance Analysis (FBA)

This is one of the core methods to analyse a GSMM. In a Flux Balance Analysis, we set a reaction of interest as our objective, and then the linear programming algorithms of COBRApy, like GLPK (GNU Linear Programming Kit), solve the matrix of linear equations to provide maximum value to the objective reaction. Generally, the default objective of a GSMM is to solve for Biomass, which represents the growth of our bacteria. For our model to grow on EG and TPA, we add them as new metabolites and connect them to existing metabolites by introducing their respective uptake and assimilation reactions. Following this, to make our model grow solely on EG and TPA, we set the uptakes of all other carbon sources in the model to zero and made it a feasible solution for Biomass to grow on TPA and EG. Further, our target metabolites, santalene and santalol, were added as new metabolites, and their corresponding enzymatic reactions to synthesise them were connected with the existing MEP pathway reactions.

The assumptions of the analysis are:

The fluxes of the entire model are calculated assuming that reactions and their individual fluxes have achieved a steady state, making the stoichiometric matrix easy to solve by forming linear equations.
Each reaction represents the function of an enzyme or an import-export system.
Terephthalic acid and ethylene glycol were given as uptake reactions from the extracellular environment. The uptake values were initially set as per the model’s preexisting uptake bounds for other metabolites.
Santalol synthesised as part of the model’s reactions would be secreted into the extracellular environment, acting as a carbon sink.
The Biomass reaction is an accumulated endpoint reaction of essential metabolites that were determined from the models listed above. The biomass reaction flux acts as a proxy for growth and is the condition for organism survival in silico. The following reactions and metabolites were added to each of the models.

Metabolite ID	Metabolite Names	Formula	Compartment
EGe	Ethylene Glycol	C2H6O2	Extracellular
EGc	Ethylene Glycol	C2H6O2	Cytosol
TPAe	Terephthalic acid	C8H6O4	Extracellular
TPAc	Terephthalic acid	C8H6O4	Cytosol
DCDc	1,2-dihydroxy-3,5-cyclohexadiene-1,4- dicarboxylate	C8H8O6	Cytosol
a_santalene_c	santalene	C15H24	Cytosol
a_santalol_c	santalol	C15H24O	Cytosol

The reactions added were:

Reaction name	Stoichiometry	Stoichiometry	compartment	Direction	Source
EG_media	nothing-> EGe	Uptake	Extracellular	To the cell	This study
EG_Uptake	EGe<-> EGc	Uptake	Exchange	Bi-directional	This study
EG_oxred	EGc+ NAD+ <-> Glycolaldehyde + NADH + H+	Connecting	Cytosol	Bi-directional	Franden et.al 2018
TPA_media	nothing-> TPAe	Uptake	Extracellular	To the cell	This study
TPA_Uptake	TPAe<-> TPAc	Connecting	Extracellular	Bi-directional	This study
TphA	TPAc + H+ + O2 + NADPH <-> NADP + DCD	Connecting	Cytosol	Bi-directional	Brandenburg et.al 2022
TphB	NADP + DCD <-> NADPH + H+ + Protocathechurate	Connecting	Cytosol	Bi-directional	Brandenburg et.al 2022
nDXS	ribulose-5-phosphate<-> formate +DHPB	betterment/optional	Cytosol	Bi-directional	Jordi
SaSSy	FPPS<-> santalene + PPi	Product specific	Cytosol	Bidirectional	Brenda
P450	Santalene + O2 <-> santalol + H2O	Product specific	Cytosol	Bidirectional	This study
Santalol_secretion	Santalol->nothing	secretion	Exchange	Outside the cell	This study

The above reactions were all to all the 3 models in a similar fashion, the simulations were tested while keeping the nDXS knocked out stage as we couldn't obtain the enzyme.

Modifications to the GSMMs:

The above-mentioned reactions and metabolites were added to each of the models. Except PpuQY1140 in which the Glyoxylate Shunt catalysed by malate synthase was borrowed from iJN1463 and added.

The default upper and lower bounds of the reactions were changed from -999999 to 999999 to a more feasible -1000 to 1000.

Simulating growth coupled production of santalol

In an ideal biological scenario, if the organism produces a secondary metabolite in excess, its growth gets sacrificed. To mimic this in our model, we set two objectives:

Santalol_secretion: To obtain our product.

Biomass: To mimic organism survival and growth

The objectives were solved for minimum and maximum fluxes through them, and the data was plotted as a production envelope using the ‘plot_flux_space’ function from the Strain design Package (Schneider et al.,2022)^[4]. The function gives a graphical output of the flux space and array of the points used for building the flux space. The x-axis represents the Biomass as an objective, and the y-axis represents santalol secretion as an objective. After selecting the desired flux of santalol secretion paired with the highest possible Biomass flux at that condition, a new objective was set to model:
new_objective = 1.0 Biomass + 1.0 Santalol_secretion
Where the upper bound of the objectives corresponds to the X and Y coordinates of the chosen point in the Production envelope plot built with strain design. The new objective states that the solver of the model should give equal priority to Biomass and Santalol_secretion.

Diverting flux towards santalol production pathway:

To find which genes to upregulate in Pseudomonas putida, we first kept the objective as Biomass. At this point, it mimicked its laboratory counterpart which can grow solely on EG and TPA (generously provided by Dr Oliver Brandenberg). For the bacteria to grow and produce santalol, we change the objective to “new_objective”. Then we noted the change of flux and pathways in the model and observed a branching point in utilisation of pyruvate. The reaction catalysed by Deoxy xylulose-5-phosphate synthase (DXS) which catalysed the following reaction:
Pyruvate + Glyceraldehyde-3-phosphate -> 1-Deoxy xylulose-5-phosphate
It was inactive when Biomass was the objective and activated when santalol secretion was set as a co-objective. This reaction leads to the MEP pathway and helps in the production of santalol, as observed from the flux mapping of all three modified GSMMs. Also, Dr Jordi (Jordi et al., 2019) ^[5] mentioned in his paper that the overexpression of DXS and DXR increases the flux of IPP/DMAPP to produce santalene.

Analysis

1. iJP815 :

The original model, without any modifications, revealed the following Uptake and Secretion values while using COBRApy’s own Summary function.

The objective for the solver was Biomass:

Uptake and Secretions after the model was modified with the mentioned set of reactions and the objective was :

Flux-space plot by Strain design:

The point chosen for this plot for further analysis was: [0.35238943384615384, 1.762291626], where 0.35238943384615384 mmol/gDw/Hr is the highest possible flux of Biomass reaction if we want a flux of 1.762291626 mmol/gDw/Hr for santalol production.

Uptake and Secretions of the model when both santalol production and Biomass were considered as objective:

The flux followed by the model when the solver computes for ‘new_objective’ is in the supplementary (S1).

2. PpuQY1140 :

The original model, without any modifications, gave the following Uptake and Secretion values while using COBRApy’s own Summary function.

The objective for the solver was Biomass.

Uptake and Secretions after the model was modified with the mentioned set of reactions and the objective was :

Flux Space plot by Strain design:

The X and Y axes represent fluxes of Biomass and santalol secretion respectively and the unit of flux is mmol/gDw/Hr.
The point chosen for this plot for further analysis was: [0.37542665025641025, 1.608374572], where 0.37542665025641025 mmol/gDw/Hr is the highest possible flux of Biomass reaction if we want a flux of 1.608374572 mmol/gDw/Hr for santalol production.

Uptake and Secretions of the model when both santalol production and Biomass were considered as objective:

The flux followed by the model when the solver computes for ‘new_objective’ provided in supplementary (S2).

3. iJN1463 :

The original model, without any modifications, gave the following Uptake and Secretion values while using COBRApy’s own Summary function. The objective for the solver was Biomass.

Uptake and Secretions after the model was modified with the mentioned set of reactions and the objective was :

Flux Space plot by Strain design:

The X and Y axes represent fluxes of Biomass and Santalol secretion, respectively, and the unit of flux is mmol/gDw/Hr.
The point chosen for this plot for further analysis was: [0.529592391025641, 1.786491986], where 0.529592391025641 mmol/gDw/Hr is the highest possible flux of Biomass reaction if we want a flux of 1.786491986 mmol/gDw/Hr for santalol production.

Uptake and Secretions of the model when both Santalol production and Biomass were considered as objectives:

The flux followed by the model when the solver computes for ‘new_objective’ is represented below:

Inference

The models we have chosen are from 3 different periods, and the later developers were influenced by the previously published models and corrected in the one they proposed. Their methods included both high-throughput and experimental validation of the model. The most consensus information was the expression of DXS enzyme to reach our target molecule.

Also, when we compared three of our GSMMs to a proposed pathway of ethylene glycol uptake for Pseudomonas putida (Franden et al., 2018) ^[6], the iJN1463 mode of EG metabolism resembles it the most. The iJP815 lacks the Hydroxypropanoate metabolism, and the unmodified PpuQY1140 lacks the glyoxylate shunt natively, which was later introduced following the iJN1463 glyoxylate shunt stoichiometry. The only GSMM among the chosen ones to have charge and formula for the metabolites is iJN1463 which was thus used for further analysis.

Estimation of the yield of santalol:

Now, the objective of our model was to estimate the yield of santalol to the amount of PET monomers used. Note that here ‘santalol’ refers to the total number of combined moles of the 4 isomers produced. Refer to the structural modelling section for binding affinity studies of these isomers with P450-CPR fusion protein.
The unit of flux is mmol/gDw/hr which stands for (millimoles per gram dry weight per hour). Gram dry weight refers to the dried mass of the bacterial separated from the culture broth. It is in linear relationship with the absorbance of the bacterial culture.
for example,
EG uptake flux =10 mmol/gDw/hr
which means 10 mmols of ethylene glycol can be uptaken by a synchronous bacterial culture of 1 gram of dry mass per hour.
Now, referring to the Uptake and Secretion table of iJN1463 in the case of dual objective, we find the following:
EG uptake flux = 10 mmol/gDw/hr
TPA uptake = 10 mmol/gDw/hr
Flux of santalol secretion= 1.786 mmol/gDw/hr
PET contains repeating units of the following structure which contains 1 molecule of terephthalate and 1 molecule of ethylene glycol.

Thus, in ideal conditions, 1 mole PET would give 1 mole of ethylene glycol and 1 mole of terephthlalic acid.
As the GSMM uptakes 10 millimoles of TPA and EG of both per hour, it is equivalent to utilising 10 millimoles of PET per hour.
The molar mass of Terephthalic acid is 166.13 g/mol (as per PubChem);
The molar mass of Ethylene glycol is 62.07g/mol ( as per PubChem);
The original sandalwood oil contains four isomers of santalol in a fixed ratio which, as per the literature, is present in the highest amount in sandalwood oil.
The mass of santalol =220.35g/mol
As the calculations are done for 1gDw of bacterial culture, the unit is omitted for simplicity in the calculation.
Thus,
Mass of TPA taken up per hour = 10 mmol x 166.13 g/mol = 1661.3 mg
Mass of EG taken up per hour= 10 mmol x 62.07g/mol = 620.7 mg
Mass of ‘santalol’ produced per hour = 1.786 mmol x 220.35 g/mol = 393.5451 mg
Thus, from Flux Balance Analysis we conclude that :
We can obtain 393.5451 mg of a mixture of santalol isomers from 2282mg (10 millimoles) of PET in 1 hour and ideal conditions.
As the catalytic sites of the enzymes are the same as those of the natural p450 of S. album , we expect similar ratios of the isomers to be produced and thus retaining their natural properties.

Structural Modelling

Introduction:

Structural studies of our designed proteins form the core of our project. It gives a deeper understanding of how the enzymes fold, function, and interact with substrates to obtain desired products. In our study, we worked with proteins involved in santalol synthesis whose structures had not been experimentally solved. We adopted various engineering strategies such as truncation, fusion, and modification of their amino acid sequences. Here, we justify our claims and predict the catalytic domain of a multi-substrate eukaryotic transmembrane P450, SaCYP736A167, which is the key enzyme required for the final step of santalol synthesis belonging to the Santalum album.

Objectives:

To examine if the modification of the transmembrane domain of SaCYP736A167 affects its function.
To look for Probable heme interaction sites within predicted P450.
To determine the binding energies of isomers of santalene and santalol.
To visualise the catalytic site of the P450-CPR fusion protein.

Approach:

As the structure of SaCYP736A167 was not solved, we used Google Colabfold, which uses Alphafold2 to predict protein structures using machine learning approaches. We preferred Alphafold2 over homology modelling as its predictions against the experimentally determined structures have an average RMSD of 0.28 Å ^[1]. Unlike homology modelling, these results suggest that AlphaFold 2 predicted structures fall within the experimental accuracy range and give us the whole protein structure.
SaCYP736A167 is a P450 with a heme molecule at its catalytic site. Thus, our docking approach was to first dock heme with the predicted structure of SaCYP736A167 (from now it would be referred to as predicted P450) and then compare the obtained data with interaction data of existing P450s ^[2], if the data correlates, it would justify our docking strategy and give us the catalytic centre around which we can target the docking of the isomers of santalene and santalol.
Finally, if the catalytic centre is away from the sites we truncated or fused, we assume the protein's function is unaltered.

Background

The binding and conversion of sesquiterpenes ie. α-santalene, β-santalene, epi-β-santalene, α-exo-bergamotene to sesquiterpenols i.e. α-santalol, β-santalol, epi-β-santalol, α-exo-bergamotol is not an independent event. It is dependent on a heme group, oxygen molecule, and FADH2.

Santalene + O2 + FADH2-> Santalol + H2O + FAD+

We performed the docking analysis by considering a Heme (lacking the iron centre) as our only cofactor, which could affect the binding energy of individual ligands. To address this issue, we considered the difference in binding energies between the santalene and santalol isomers for analysis, given all other cofactors are common to them and thus affect them in a similar manner.
Thus, in this study, the difference in binding energy carries more significance than the individual binding energies of each molecule.

Assumptions:

The sesquiterpenes (α-santalene, β-santalene, epi-β-santalene, and α-exo-bergamotenes) are present in almost equal effective concentrations around the protein.
The amount of product produced is inversely proportional to the binding energy differences between the substrate and product, i.e., it requires less energy to convert one form to another; thus, one conversion cycle can get over quicker than the one requiring more energy leading to more production of that specific compound.

Tools

Molecular docking is a computational technique to predict a ligand's preferred orientation and binding affinity when interacting with a target molecule, typically a protein. This method is crucial in drug design as it helps identify how small molecules (ligands) can bind to specific sites on proteins, which can influence biological activity.

Key Tools :

AutoDock Tools: The Graphical user interface for Autodock built by ‘The Scripps research Institute’. It was used to prepare the molecules for Docking in AutoDock GPU ^[4].
AutoDock GPU: It leverages the power of graphics processing units (GPUs) to accelerate the docking process significantly compared to CPU-based methods. By utilising the parallel processing capabilities of GPUs, AutoDock GPU can handle larger datasets and perform docking simulations more rapidly, making it suitable for high-throughput screening applications ^[5].
Nvidia DiffDock: Nvidia DiffDock is another advanced docking tool that utilises deep learning techniques to enhance the accuracy and speed of molecular docking simulations. It focuses on predicting ligand poses and binding affinities by incorporating trained neural networks. We used this to identify the probable regions of alpha santalene on the predicted P450 to get an idea of the interacting region ^[6].
PyMOL:It is a powerful molecular visualisation tool used to analyse the docking results obtained from Autodock ^[7].
Avogadro2^[8]:It is a molecular editor and a valuable visualisation tool for preparing ligands for docking studies. It allows for building and editing molecular structures, optimising ligand geometries, and exporting files in formats compatible with docking software. It was used to optimise our Ligands' geometry and apply Gastieger charges to it ^[3].
ChimeraX and Chimera 1.7
ChimeraX ^[9] and its predecessor Chimera 1.7 provide advanced molecular visualisation and analysis capabilities. More than being visualisation software, they excel in their analysis capabilities, Chimera 1.7 was used for energy minimization of the predicted structures, and Chimera X was used to predict the structures using Alphafold2.

Methodology:

Retrieval and preparation of receptor:

The Nucleotide sequence of SaCYP736A167 was obtained from the NCBI database under the accession ID (KU169302.1) ^[11]. The mRNA predictor plus.py was used to obtain the probable protein sequence and then compared it to the NCBI accession number(AMR44190.1) to validate the result, there was a 100% sequence identity. ChimeraX was used to predict the structure of the receptor using Alphafold2.
To simulate the receptor in an aqueous environment, energy minimization was performed using AMBER ff14sb force field.

Retrieval and preparation of ligand:

The sdf files of ligands, santalene, santalene, epi- santalene, exo- bergamotene, and heme were downloaded from PubChem.
The geometry of the ligands was optimised by using Avogadro2. This step is similar to the energy minimization of the receptors.

Preparation of structures for docking:

Once the receptor and ligand are retried and prepared for docking, they were imported in AutoDock tools for further analysis.

Observations

Docking of heme molecule with predicated P450

The nonpolar Hydrogen atoms of predicted P450 were merged, and Kollman charges (-38.764) were added to it.
Heme molecule in .mol2 format was loaded and found that 30 nonpolar Hydrogens,16 aromatic carbons were detected, and 10 rotatable bonds thus, torsional degrees of freedom were set to 10.
Grid box preparation: The heme was docked with the predicted P450 in Nvidia DiffDock to obtain the probable binding site and narrow the grid box. CxG domain (in our case CPG) specific to the catalytic site of P450 was considered inside the grid box.
Amino acids in P450 that are interacting with Heme are:
Arg 97, Val 113, Ile 114, Pro 432, Arg 438, Cysteine 440, Prolin 441, Glycine 442
Grid for heme docking with P450
Interactions: The interaction of Heme with the CPG motif validated our docking. So, the best conformation of the heme and predicted P450 interaction was saved as complex in .PDBQT format was used as the receptor macromolecule in subsequent docking studies. This macromolecule henceforth will be referred to as P450-heme.
Interaction of heme

Docking of α santalene molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry optimised α-santalene were preserved.
3 rotatable bonds,3 aromatic carbons, torsional degrees of freedom set to 3
The grid box origin (centre) remained the same, and the box volume was increased to account for the interacting domains of the santalene isomers, as we assumed them to be spread around the heme catalytic site. For the new grid, the amino acids interacting with heme were also taken into account: Arg 97, Val 113, Ile 114, Pro 432, Arg 438, Cysteine 440, Prolin 441, Glycine 442
This grid was common to all the rest of the Sesquiterpenes and Sesquiterpenols.
Common grid
Interaction results:
Nonpolar bonds with P450-heme: Leu 367, 368, Gly 481
Polar interactions: none
3.6 Å interaction mesh of α santalene

Docking of α santalol molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry-optimised α-santalol were preserved.
23 non-polar hydrogens were found, 3 aromatic carbons were detected, and 5 rotatable bonds were found; thus, Torsional degrees of freedom (TORSDOF) were set to 5
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar bonds: Gly 365, Leu 367, Leu 368, Ala 483
Polar bonds: Leu 482 (1.9 Å)
α-santalol interaction

Docking of β santalene molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry-optimised β-santalene were preserved.
3 rotatable bonds were detected,3 aromatic carbons were found, and torsional degrees of freedom were set to 3
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar bonds with P450-heme: Gly 365, Leu 368, Phe480
Polar interactions: none
β-santalene interaction

Docking of β-santalol molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry-optimised β-santalol were preserved.
23 non-polar hydrogens were found, 3 aromatic carbons were detected, and 5 rotatable bonds were found, thus Torsional degrees of freedom(TORSDOF) was set to 5
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar :Leu 367, 368, Phe 480, Leu 482, Ala483
Polar: Leu 482 (1.9 Å)
β-santalol interaction

Docking of epi β-santalene molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry-optimised epi β-santalene were preserved.
3 rotatable bonds were detected,3 aromatic carbons were found, and torsional degrees of freedom were set to 3
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar bonds:Leu 368
Polar interactions: none
epi β-santalene interaction

Docking of epi β santalol molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry-optimised epi β-santalol were preserved.
23 non-polar hydrogens were found, 3 aromatic carbons were detected, and 5 rotatable bonds were found; thus, Torsional degrees of freedom(TORSDOF) were set to 5
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar: Leu 367, Leu482
Polar: Leu 482(2.0Å)
epi β-santalol interaction

Docking of α-exo-bergamotene molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry optimised α-exo-bergamotene were preserved.
3 rotatable bonds were detected,3 aromatic carbons were found, and torsional degrees of freedom were set to 3
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar bonds with P450-heme:Leu 368
Polar interactions: none
α-exo-bergamotene interaction

Docking of α-exo-bergamotol molecule with P450-Heme

The charges of the P450-heme complex were preserved.
The charges of the geometry optimized α-exo-bergamotol were preserved.
23 non-polar hydrogens were found, 3 aromatic carbons were detected, and 5 rotatable bonds were found; thus, Torsional degrees of freedom(TORSDOF) were set to 5
The Common GPF file was used to set the grid.
Interaction results:
Nonpolar :Pro 364,Leu 482
Polar: Leu 482(2.1 Å)
α-exo-bergamotol interaction

Results and discussion:

The difference between the binding energy from the substrate to the product is illustrated below:
table

As it is evident from the above table, the conversion of α-santalene to α-santalol takes the least amount of energy; thus, as per our assumption, alpha santalol is going to be produced more comparatively to other isomers.
β-santalene has two equally probable states, one with the binding energy of -7.81Kcal/mol and -7.42 Kcal/mol; thus, an average binding energy was taken for calculations. The sites are depicted below:

Thus, due to the bistable nature of β-santalene, we predict a considerable amount of β-santalol would be in this form too.

Based on number of the residues interacting with each of the santalene and santalol, we plotted the following histogram:

From this data, it is evident that α santalol and β santalol have the highest number of interacting residues, i.e. 5, and thus are more likely to interact with protein hence, will be present in the highest amount in the sesquiterpene mixture produced by predicted P450.

We found several amino acids were more involved than others in the interaction study we performed and thus predicted them to be critical catalytic sites responsible for the function of predicted P450, as depicted below.

The most important observation was the exclusive polar interaction of Leu 482 with all the sesquiterpenols. Thus, we predicted this residue to be essential for the functioning of the predicted P450.

Based on the polar interaction bond length between the hydrogen atom of santalol and the oxygen atom of leucine isomers, we obtained the following data:
table

Here, too, α-santalol and β-santalol have the shortest bond lengths and, thus, much better interaction (inverse square relationship) than epi-β-santalol and α-exo-bergamotol, explaining the relative difference in their abundance.
All these molecular binding preferences towards α-santalol and β-santalol get amplified in their catalytic conditions, leading to a greater abundance of α-santalol and β-santalol.

Prediction of the probable active site of our fusion protein:
The Best docking conformation of β-santalene and heme was aligned to our fusion protein with a BC linker to predict the interaction domain of our fusion proteins.
table

The same was aligned to our fusion protein with a BM linker to predict the interaction domain of our fusion protein.

table

As the illustrations show, the catalytic centre, including the heme and santalene isomer, is away from the linker domain where the amino acid modifications were performed.

In the project design, the rationale of fusion proteins was explained, and we back it up with the computations we performed.

Transmembrane tendencies comparison between the native SaCYP736A167 and its truncated variant:

table

The figure depicts the native and the truncated forms of SaCYP736A167

The transmembrane tendencies of the same are represented below:(The Expasy Protscale Transmembrane tendency server was used for the following analysis)^[12]
table The transmembrane tendencies of the first few amino acids depict:

The score of native SaCYP736A167(predicted P450) around(0.5 to 1.5)
The score of truncated p450 is around(-1.5 to 0)

This shows that truncation of the transmembrane domain truly reduced the transmembrane tendencies.

Transmembrane tendencies comparison between the native CPR1 and its truncated variant:

table

The figure depicts the native and the truncated forms of SaCPR1

The transmembrane tendencies of the same are represented below:
table

The score of native SaCPR1 around(0.5 to 0)
The score of truncated p450 is around(-1.5 to 0.25)

This shows that truncation of the transmembrane domain truly reduced the transmembrane tendencies.

The Fusion proteins were built claiming that they were not transmembrane proteins:

The Transmembrane tendencies of the fusion protein are represented below:

As depicted in the transmembrane tendency plots, the N-terminal tendencies vary from -1 to 1 (with starting residues at -0.5), and the C-terminal tendencies are within (-0.5 to 5), implying that both domains have less tendency to get embedded in the membrane.

As analysed by the Swiss-Expasy protparam ^[13] server, The instability index of The fusion protein with BC linker was 37.64 and the one with the BM linker was 38.6 , as both the values were below 40 the fusion proteins were stable.

Future Perspectives:

The predicted interacting residues in the catalytic sites could be validated by mutating them and observing their effect on the catalytic activity of SaCYP736A167.
Obtaining a Heme molecule with an iron molecule and performing the docking with the presence of an Oxygen molecule and FADH2 would predict a more realistic docking scenario.
The SaCYP736A167 could be crystallised with the above-mentioned cofactors to experimentally validate the structure of the protein as well as catalytic sites.
From this analysis we propose that P450-CPR fusion proteins designed in this study are stable under catalytic conditions in our chassis. As per Expasy Protparam the protein might be stable >10 hr in E.coli under in-vivo conditions.

References

Metabolic Modelling

Puchałka J, Oberhardt MA, Godinho M, Bielecka A, Regenhardt D, Timmis KN, Papin JA, Martins dos Santos VA. Genome-scale reconstruction and analysis of the Pseudomonas putida KT2440 metabolic network facilitates applications in biotechnology. PLoS Comput Biol. 2008 Oct;4(10):e1000210. doi: 10.1371/journal.pcbi.1000210. Epub 2008 Oct 31. PMID: 18974823; PMCID: PMC2563689.
Yuan Q, Huang T, Li P, Hao T, Li F, Ma H, Wang Z, Zhao X, Chen T, Goryanin I. Pathway-Consensus Approach to Metabolic Network Reconstruction for Pseudomonas putida KT2440 by Systematic Comparison of Published Models. PLoS One. 2017 Jan 13;12(1):e0169437. doi: 10.1371/journal.pone.0169437. PMID: 28085902; PMCID: PMC5234801.
Nogales J, Mueller J, Gudmundsson S, Canalejo FJ, Duque E, Monk J, Feist AM, Ramos JL, Niu W, Palsson BO. High-quality genome-scale metabolic modelling of Pseudomonas putida highlights its broad metabolic capabilities. Environ Microbiol. 2020 Jan;22(1):255-269. doi: 10.1111/1462-2920.14843. Epub 2019 Nov 11. PMID: 31657101; PMCID: PMC7078882.
Schneider P, Bekiaris PS, von Kamp A, Klamt S. StrainDesign: a comprehensive Python package for computational design of metabolic networks. Bioinformatics. 2022 Oct 31;38(21):4981-4983. doi: 10.1093/bioinformatics/btac632. PMID: 36111857; PMCID: PMC9620819.
Hernandez-Arranz S, Perez-Gil J, Marshall-Sabey D, Rodriguez-Concepcion M. Engineering Pseudomonas putida for isoprenoid production by manipulating endogenous and shunt pathways supplying precursors. Microb Cell Fact. 2019 Sep 9;18(1):152. doi: 10.1186/s12934-019-1204-z. PMID: 31500633; PMCID: PMC6734295.
Franden MA, Jayakody LN, Li WJ, Wagner NJ, Cleveland NS, Michener WE, Hauer B, Blank LM, Wierckx N, Klebensberger J, Beckham GT. Engineering Pseudomonas putida KT2440 for efficient ethylene glycol utilisation. Metab Eng. 2018 Jul;48:197-207. doi: 10.1016/j.ymben.2018.06.003. Epub 2018 Jun 7. Erratum in: Metab Eng. 2021 May;65:263-265. doi: 10.1016/j.ymben.2021.01.001. PMID: 29885475.

Metabolic Modelling

Stevens AO, He Y. Benchmarking the Accuracy of AlphaFold 2 in Loop Structure Prediction. Biomolecules. 2022 Jul 14;12(7):98
5. doi: 10.3390/biom12070985. PMID: 35883541; PMCID: PMC9312937.
Syed K, Mashele SS. Comparative analysis of P450 signature motifs EXXR and CXG in the large and diverse kingdom of fungi: identification of evolutionarily conserved amino acid patterns characteristic of P450 family. PLoS One. 2014 Apr 17;9(4):e95616. doi: 10.1371/journal.pone.0095616. PMID: 24743800; PMCID: PMC3990721.
J.Gasteiger, M. Marseli, Iterative Equalization of Oribital Electronegatiity A Rapid Access to Atomic Charges, Tetrahedron Vol 36 p3219 1980
Morris, G. M., Huey, R., Lindstrom, W., et al. (2009). AutoDock Tools 1.5.6 User Guide. Scripps Research Institute. Available at: AutoDock .
Forli, S., Huey, R., Pique, M. E., et al. (2016). Computational Docking of Ligands to Proteins with AutoDock-GPU. Journal of Computational Chemistry, 37(16), 1627-1636. doi: 10.1002/jcc.24392.
Corso, Gabriele, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola. “Diffdock: Dif https://arxiv.org/abs/2210.01776fusion steps, twists, and turns for molecular docking.” arXiv preprint arXiv:2210.01776 (2022)
Schrodinger, LLC. (2020). The PyMOL Molecular Graphics System, Version 2.4. Available at: PyMOL.
Hanwell, M.D., Curtis, D.E., Lonie, D.C., Vandermeersch, T., Zurek, E. and Hutchison, G.R. (2012) Avogadro: An Advanced Semantic Chemical Editor, Visualisation, and Analysis Platform. Journal of Cheminformatics, 4, 17.
http://dx.doi.org/10.1186/1758-2946-4-17
Meng EC, Goddard TD, Pettersen EF, Couch GS, Pearson ZJ, Morris JH, et al. UCSF ChimeraX: Tools for structure building and analysis. Protein Science. 2023; 32(11):e4792. https://doi.org/10.1002/pro.4792
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., & Ferrin, T. E. (2004). UCSF Chimera—a visualisation system for exploratory research and analysis Journal of Computational Chemistry, 25(13), 1605-1612
Celedon JM, Chiang A, Yuen MM, Diaz-Chavez ML, Madilao LL, Finnegan PM, Barbour EL, Bohlmann J. Heartwood-specific transcriptome and metabolite signatures of tropical sandalwood (Santalum album) reveal the final step of (Z)-santalol fragrance biosynthesis. Plant J. 2016 May;86(4):289-99. doi: 10.1111/tpj.13162. Epub 2016 Apr 15. PMID: 26991058
https://web.expasy.org/protscale/pscale/Transmembranetendency.html
https://web.expasy.org/protparam/