Part 1 Integrated Protein Engineering
Overview
Aimed to improve DHA-PC production, we integrated Protein Engineering methods to improve LACS's activity. Improving enzyme's catalytic activity will facilitate the process of reaction, enhancing LACS's ability to produce DHA. To engineer LACS, we used protein modeling methods to identify beneficial changes to LACS's sequence.
Methods of our IPE are integrated and multifaceted, therefore the modeling results of IPE is comprehensively supported. Our workflow (as illustrated in figure1 below)included three sections on different scales. We optimized mRNA and improved transcription performance. We analyzed general properties of proteins and screened promising mutations. We examined residue-scale mechanism of reaction and established comprehensive analytic result. With a diverse of analytic methods, we constructed a Mutagenesis Candidate Pool (MCP), with each site rigorously generated and validated.
Several approaches we applied were distinctive and novel compared to traditional protein engineering methods. mRNA is a novel perspective of protein engineering. Micro-scale, mechanism-based modification of residues is based on newly-characterized mechanism of LACS. Therefore, our IPE approaches demonstrate innovation.
Experimental enzyme assay verified significant leap in LACS activity, which is a consistent result to our modeling simulations. With modeling approaches, we assessed the change in activity for each candidate in MCP, and chose the most promising candidate to further monitor in lab. Later we expressed chosen candidate and the mutant's ability to catalyze DHA had significant improvement (see result page of experiment result), which supported our analytic result.
Our integration of multiple methods provide a good example of engineering. For conducting protein engineering, the IPE procedure we had performed can be a systematic sample manual. Our three sections, mRNA optimization, semi-rational optimization and rational optimization can be performed accordingly to conduct in-depth study on a unfamiliar protein.

Background: Reaction Mechanism
In our workflow of protein engineering, the study on reaction mechanism of LACS was conducted after our semi-rational engineering. However, we illustrate it here in the beginning, in order to provide a greeting introduction and allow a preview of our enzyme LACS.
Firstly we introduce a few structural features and denotations of little fundamental parts in LACS:
- DHA & CoA & ATP: Small molecule that are the substrates of LACS and will enter the enzyme during formation of DHA-PC.
- C-terminal and N-terminal: LACS1 has a sequence with 660 amino acids, separated into two rather isolated domains, connected by a linker between. The C-terminal has approximately 100 residues and the N-terminal has approximately 500 residues, as illustrated in sketch.
- L-motif: A short sequence of peptide containing the linker between two isolated parts.
- Pocket: a hollow chamber in between the two domains that holds DHA and other substrates.
- G-motif: A short sequence of peptide containing a gating residue that covers the tunnel entrance.
- A-motif: A short sequence of peptide containing ATP binding site.
- P-motif: A short sequence of peptide containing a gating residue that covers the tunnel entrance.
The mechanism of LACS's stepwise reaction had been characterized in literature. This is the critical knowledge that supported our rational mutagenesis. We simplified and sketched the key steps as below:
- ATP enters the pocket between C-terminal and N-terminal.
- DHA enters pocket, interacts with surrounding residues and binds with ATP.
- CoA enters pocket, and Phosphate PPi is released from ATP.
- DHA leave AMP and rebind with CoA.
- DHA-CoA and AMP simultaneously leave LACS and LACS twist back to original form.

I. mRNA Optimization
Overview: Backstage Engineering
mRNA optimization is a novel strategy of protein engineering. Engineering protein usually focuses on changing peptide for a better version of enzyme, but it is rather uncommon to shift the focus onto switching mRNA codons while keeping the peptide to be the same.
Our inspiration came from a rather new RNA-assessing algorithm that had been deployed during the pandemic for vaccine design. Baidu of China developed LinearFold by using dynamic programming to parse mRNA sequence. Before starting to edit amino acids, this newly-developed RNA assessing algorithm inspired us to consider performing a novel workflow of RNA optimization.
We first searched on existing approaches of RNA optimization, discovering that it is rather an uncommon practice in protein engineering. The transcription process of DNA to RNA and the translation process of RNA to amino acid sequence are usually neglected when conducting protein engineering because they seem like foundational routines that lack a chance of being engineered.
However, we exploited mRNA's potential to be optimized after we studied the detailed biological procedures upstream and downstream to RNA formation. The RNA of a sequence can be modified while keeping the amino acid sequence consistent after translation because multiple codons are mapped to one amino acid in the process of translation. Hence, while having to express the same sequence of protein, we can optimize its upstream RNA sequence, with the aim to improve the RNA's performance for transaction and translation.
Our workflow included the following steps: after obtaining cDNA by parsing the original sequence, we adopted several methods for codon optimization, including:
- Increasing C/G content,
- Increasing Preferred Codons,
- Reducing Minor secondary Structures.
Later on, we evaluated our optimized mRNA sequence with Minimum Free Energy calculation, and validated its improvement in performance.
cDNA Acquisition
With the original sequence of the gene encoding LACS1, we first intercepted the open reading frame as cDNA, as shown in figure3. We then converted it to mRNA sequence and peptide for later analyses.

Codon Optimization
Increasing C/G Content
G-C form stronger base pairs due to the presence of three hydrogen bonds compared to the two hydrogen bonds found in A-T pairs. mRNA sequences rich in C and G are generally more stable and less likely to degrade rapidly within the cellular environment. Therefore increasing G/C content enhances mRNA stability and translational efficiency.
Increasing Preferred Codons
Different organisms differ in preferences for particular codons, due to different content of corresponding transfer RNAs (tRNAs). By using codons that are more frequently used by the host organism, the rate of translation can be increased, further leading to higher protein expression levels.
Reducing Minor Secondary Structures
Secondary structures of mRNA such as hairpins and internal loops can impact the ribosome's access to themRNA, thereby reducing protein synthesis process. By employing computational algorithms that predict RNA folding, we can modify the mRNA sequence to minimize these structures, particularly in the 5' untranslated region (5' UTR), which is crucial for translation initiation. This optimization can lead to higher yields of mRNA's transportation through nucleopore and translation to protein.
Structure Visualization and MFE Optimization
LinearFold is the first RNA folding algorithm that can accurately predict the secondary structure of an input RNA sequence in linear runtime. It uses innovative 5' -> 3' dynamic programming plus beam search, and can predicts RNA folding with linear runtime and linear space. Consequently, it has less system requirement while being substantially faster. Its performance of time complexity, comparing to other RNA calculative algorithms, is illustrated below in figure4.

When input a certain nucleotide sequence, LinearFold algorithm calculates different forms of possible secondary structure, and can provide a configuration with the lowest Minimum Free Energy (MFE). The MFE of sequence can be compared to assess their difference in stability, and structure of the lowest MFE can be observed to assess a sequence's possible performance.
We computed the secondary structure with minimum MFE for three sequence, the original open reading frame (ORF), the optimized yet not mutated sequence, referred to as WT, and the mutated WT (Mut). The predicted structure and MFE value are shown below in figure5 to figure10. We observed that MFE showed increment from ORF to WT to Mut, indicating that each iteration evolved to a simpler secondary structure. This change stayed in accordance with our goal of mRNA optimization, and is expected to facilitate the production of enzyme.
The detailed energy calculation of each secondary structure is shown in figure11-figure13. The three figures represent the unoptimized Open Reading Frame, the optimized LACS1, and the mutated LACS1 respectively. One way to present the energy calculation is mountain plot, which represents a secondary structure in a plot of height versus position, where the height at a certain position is given by the number of base pairs enclosing the base at the position. It is worth noting that loops will show plateaus and helices correspond to slopes.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Conclusion and Discussion
To summarize our work in mRNA optimization, our methods included Increasing C/G content, Increasing Preferred Codons, and Reducing Minor Secondary Structures. By conducting these optimization methods, we obtained processed mRNA sequence (WT) that express the same peptide chain and will result in exactly the same protein. Later on, we assessed mRNA changes with assessment methods including calculating MFE and comparing secondary structure. This assessment included sequence from mRNA optimization and protein optimization. To our surprise, the two iterations (ORF -> WT and WT->Mut) both showed improved performance in stability of mRNA secondary structure, indicating that our mRNA optimization methods were proven to be effective, and that our protein engineering provided a mutagenesis that also improved mRNA stability.
II. Semi-Rational Protein Engineering
Overview: Elementary Observation
Semi-Rational approaches refer to a set of protein engineering strategies that mainly base on general properties of amino acids and proteins, and are usually adopted when starting the study of a certain protein, to provide some elementary insights of the target protein. The Semi-Rational approaches we took into consideration include Stability Estimation, Conservation Analysis, Pocket Prediction and Alanine Scanning, each with details illustrated below.
Overall, Semi-Rational Approaches were utilized differently from how we deployed Rational Approaches. During our analyses, we noticed the inaccuracy of most of the Semi-Rational Approaches' theoretical basis. Therefore, most of these analytic methods are more proficient when used as assessments. Therefore, in our protein engineering workflow, most mutagenesis candidates were generated by Rational Approaches, while Semi-Rational Approaches were mostly used to assess the candidates.
Stability Evaluation
Stability of a protein can be indicated with free energy calculation. The algorithm of FoldX performs free energy calculations based on an empirical force field. This calculation can help evaluate the energetic change of each mutation and therefore observe the effects of replacement on protein's stability.
The algorithm breaks down the free energy (ΔG) into components, and used various energy terms such as van der Waals, solvation, hydrogen bonding, electrostatics, and entropy, as shown in FoldX's energy function below.
ΔG=Wvdw⋅ΔGvdw+WsolvH⋅ΔGsolvH+WsolvP⋅ΔGsolvP+ΔGwb+ΔGhbond+ΔGel+ΔGKon+Wmc⋅T⋅ΔSmc+Wsc⋅T⋅ΔSscParameters:
Wvdw, WsolvH, WsolvP, Wmc and Wsc: the weighting factors applied to the raw energy terms. Wvdw is set to 0.33 by derivation from vapor to water energy transfer. Th rest are set to 1.Variables:
∆Gvdw: the sum of the van der Waals contributions of all atoms with respect to the same interactions with the solvent.
∆GsolvH and ∆GsolvP: the differences in solvation energy for apolar and polar groups respectively when these change from the unfolded to the folded state.
∆Gwb: the extra stabilising free energy provided by a water molecule making more than one hydrogen bond to the protein (water bridges) that cannot be taken into account with non-explicit solvent approximations.
∆Ghbond: the free energy difference between the formation of an intra-molecular hydrogen bond compared to inter-molecular hydrogen-bond formation (with solvent).
∆Gel is the electrostatic contribution of charged groups, including the helix dipole.
∆Gkon: a term only when working with oligomeric proteins or protein complexes. Reflects the effect of electrostatic interactions on the association constant kon (applies only to the subunit binding energies).
∆Smc is the entropy cost of fixing the backbone in the folded state; this term is dependent on the intrinsic tendency of a particular amino acid to adopt certain dihedral angles.
∆Ssc is the entropic cost of fixing a side chain in a particular conformation.
🔗reference
We applied the results of FoldX for evaluating whether a candidate in MCP is beneficial to activity. FoldX calculates the free energy change before and after a single-site mutation. When input the index of a site, FoldX simulates a site saturation mutagenesis (SSM) and calculate ΔΔG for each mutagenesis. A negative ΔΔG value suggests that the mutation is likely to be stabilizing, while a positive value suggests it is likely to be destabilizing.
Alanine Scanning
Alanine Scanning involves replacement and evaluation. It traverses through the protein sequence and replace each amino acid residue with alanine, which is the smallest amino acid with the simplest side chain of methyl side chain. The minimal side chain of alanine minimizes the impact on the three-dimensional structure of the protein while eliminating any functionality that the side chain may contribute. Therefore, this practice allow a prediction of certain single site's contribution to the entire protein's stability.
Conservation Analysis
The concept of conservation for a certain site can be interpreted as similarity among the same family of enzyme. A certain type of enzyme will have a whole family of slightly diverse sequence that originated from different species. For mutation during evolution of species, these subtypes may differ in various extend. However, for these subtypes have to possess the same function of enzyme, their sequence will present relatively similar, especially in some period of the sequence that constitute fundamental structural or catalytic region of the enzyme. A site that varies is considered not conserved, while a site that stays constant is considered conserved.
Conservation has several indications, and can provide insight for conducting mutagenesis. Conserved amino acids may be critical to a protein's structure or may be directly involved in catalytic process. Mutating such conserved amino acids could disrupt the enzyme's normal structure and activity, even possibly causing it to lose catalytic activity. Similarly, sites with low conservation are less unlikely to be associated with critical protein properties, and mutating them has a lower chance of making a big difference. Therefore, sites with medium conservation are usually considered ideal for conducting mutagenesis.🔗reference
We ran computation to interpret the conservation per site of our target enzyme LACS1. Degree of conservation is usually calculated by first performing Multiple Sequence Alignment (MSA). Matching periods of sequence will be aligned and a scoring matrix will grade the conserve degree of each site. Result of conservation per site is visualized in figure14. Analyzed with our later-acquired knowledge of the pocket, the conservation result below shows an interesting property that the motifs surrounding the pocket chamber tend to show higher conservation, which is a consistent conclusion with their supposedly important role in structure and in reaction.

Pocket Prediction
A pocket of an enzyme is typically a chamber on the protein's surface that can accommodate the enzyme's substrate. In our case, the pocket is the cavity that will take in fatty acid DHA.
The properties of a protein pocket are multifaceted and include characteristics like size, shape, depth, hydrophilicity, hydrophobicity, electrostatic potential, and flexibility. These properties can be analyzed once the location of the pocket is identified, and are vital for designing site mutagenesis in protein engineering.
Therefore, identifying pocket location and shape is the first step of conducting study on pockets. Pockets can be shallow and broad or deep and narrow, and their surfaces can be lined with a variety of amino acid residues that contribute to their chemical nature. Predicting the location and properties of protein pockets involves analyzing the protein's surface structure, which can be achieved through several computational algorithms. The method we applied requires ligand input, and examines the surface of the protein 3D structure to identify possible pocket, as shown in figure15 and figure16.
![]() |
![]() |
Conclusion and Discussion
In semi-rational engineering, each method provided us an elementary observation of LACS's several properties, while each method has its limitations.
For example, Alanine Scanning is indeed considered capable of estimating the stability change when the mutagenesis make a substitution to alanine, and can identify sites that are critical to a protein's functionality. However, after studying its algorithm, we discarded this method for its lack of precision.
Similarly, we believe that semi-rational method generally over-simplified the sophisticated interactions between residues, which is a relatively common limitation of semi-rational approaches that led us to seek for more grounded rational methods.
Despite their limitations, these semi-rational methods can still be of use. We noted that most semi-rational methods grade all sites with certain algorithm on certain property in a general way. Therefore, we believe that semi-rational approaches are more suitable for broadly screening a mass of mutatable sites, or to be applied to certain selected sites for further grading and evaluation. In our workflow, semi-rational methods were eventually applied in the latter way. We generated mutagenesis candidates to MCP, and assessed each of them with several semi-rational methods. gradings with semi-rational methods provided a guidance for final round selection, as shown in our Result part.
In conclusion, Semi-Rational Approaches have limitations, but when combined with Rational Approaches that are more specific, they can serve as supplementary analyses that can satisfy certain demand for studying certain properties. Such integrated application can maximum the potential of Semi-Rational Engineering Approaches.
III. Rational Protein Engineering
Overview: Playing with Residues
The concept of Rational Protein Engineering Approaches refers to methods that modify specific residues directly based on detailed knowledge of the protein's regional structures and its detailed, amino-acid-level mechanism of catalyzing reaction, in order to rationally locate which site to be engineered and to rationally decide which amino acid to change to.
With strict accordance to catalytic mechanism, our rational engineering rigorously plotted mutagenesis. We first identified literature that characterized detailed features of LACS, including its stepwise reaction process, and key residues contributing to each step of the reaction process in structural or chemical ways. Base on these knowledge, our rational engineering aims to modify these key residues. For each key residue that impacts a certain step of the reaction with the detailed mechanism of residue impacting the reaction having been characterized, we analyzed what kind of amino acids will better facilitate the mechanism of the corresponding step of reaction, and propose to mutate the current amino acid to another that is more conductive to the reaction. The proposals of mutation are then added to our MCP to be later evaluated with other analytic methods and be further expressed in downstream experiment to perform enzyme assay and verify improvement in enzyme catalytic activity.
Rational approaches provide more promising design. Comparing to semi-rational approaches, results of rational approaches have a higher chance of hitting the jackpot. While semi-rational approaches apply algorithms based on general enzymatic reactions property, rational provide more specific insights based on reaction mechanisms, usually detailed to certain residues that directly play a part in the reaction.
Our integration of multiple methods provided a good example of rational engineering. We integrated analytic approaches on properties of enzyme, each analysis detailed to specific residues and each suggested rational mutagenesis. In this part we also combined analyses of previously introduced semi-rational methods, integrating all these approaches and performed coherent, concerted protein engineering.
Modeling Enzyme Structure
Downstream analyses in rational engineering require a protein structure model of our enzyme. Study on enzymes usually use electron microscopy to scan the structure of protein, and perform a series of process to obtain the protein's structure. The structure model is then published onto protein databases and can be used to perform later analyses.
However, LACS1, the subtype we focused on, has not been studied on before. It doesn't have a published structure model. As a solution, we obtained it through homologous modeling. Our first attempt was using alphafold, which applied homologous modeling and calculated the structure model of LACS1 using protein model of LACS alternative subtype published by previous researches and the known sequence of LACS1. The result of alphafold is shown in figure17 with confidence illustration in figure18.
![]() |
![]() |
Figure17 and figure18, Protein Structure Modeled by Alphafold and a per-residue confidence score (pLDDT) between 0 and 100 graded by Alphafold's algorithm.
Alphafold's model provide an interesting insight. The residue error value is presented in figure19 and figure20.Residue error value appeared to have an obvious jump at the 510th amino acid in the sequence. Among region 1-510 and 510-660 separately, two arbitrary residues present considerably low residue error. However, when choosing one residue from region 1-510 and another from region 510-660, residue error is obviously higher. Meanwhile, we have learned from literature and alignment result that the linker period (L-motif, see for detail below) in our enzyme LACS1 resides at the 513-518 amino acid. The linker is to connect between the two separated domains of C-terminal (1-512) and the N-terminal (519-660), and the distance between two arbitrary residue from the two domains respectively is supposed to be obviously longer. Therefore, the conclusion from reading literature and our own analysis and the result of Alphafold calculation stayed consistent.

Researching on Domains
Enzyme domains refer to distinct parts of an enzyme that have specific functions or can perform specific roles in the universal function of the enzyme. These domains can be considered subunits or modules within the enzyme that contribute to its activity and stability. Enzymes can be composed of one or more domains, and these domains can act independently or in concert to compose the enzyme's catalytic function.
Domains are usually located by matching unstudied enzymes to previously studied ones. Researches that use scanning electron microscope on the 3D structure of enzymes can also identify the stepwise reaction mechanisms of enzymes at the microscopic level, and structures that carry out specific functions during the reaction process will be identified as important domains. These domains will be later labeled in the enzyme's amino acid sequence. Other unstudied subtypes of the same enzyme family will have similar sequence, and can be studied by matching domains to pre-characterized ones with preforming multiple sequence alignment (MSA). From several papers that performed MSA on LACSs from multiple species, as example shown in figure21, we identified domains including catalytic domains, substrate-binding domains and possible membrane-binding domains. However, these knowledge lack precision of specific residue, leading us to searching for more detailed characterization of enzyme motifs.

Characterizing Critical Motifs
Enzyme motifs refer to short, conserved sequences within a protein that are involved in its function, often directly interacting with the substrate or other molecules. While a domain is a larger region of a protein that can possibly include multiple motifs, a motif is usually a short sequence that directly interact with substrates or show other fundamental functions. In motifs, characterized critical residues are considered as promising site for single-site mutagenesis, for changing these residues can possibly bring immense change of property for the entire enzyme.
In literature that studied stepwise reaction details of LACS in Thermus thermophilus, critical motifs and residues within motifs were characterized. Each of these motifs and residues played important role in the reaction process. The four critical motifs are described in detail below and shown in figure22:
The A-motif refers to the short period of sequence that interacts with ATP. In ttLACS, it contains several adenine-binding residues that interact with ATP, including Tyr324.
The L-motif refers to the fragment that links the C-terminal and N-terminal of LACS. It was identified as Asp432 to Leu437 in ttLACS, and was matched to Asp513 to Leu518 in LACS1 with highly similar sequence.
The G-motif refers to the motif containing the gating residue. The gating residue covers the entrance of fatty acid and removes when triggered by ATP's appearance. The G residue is identified as Trp234 in ttLACS. However, we aligned our LACS1 to other mammalian LACS and located the G residue in LACS1 to be Phe272, in order to remain consistent with the two common gates Tyr and Phe.
The P-motif refers to the period that forms a loop and binds the phosphate of ATP.

Outlining an Intuitive Pocket
The paper providing template for us had characterized ttLACS, another subtype of LACS from Thermus Thermophilus. The precise location of the pocket in ttLACS was denoted in two ways, including labeling the amino acids surrounding the pocket, and visually depicting the pocket in ttLACS's 3D structure model.
Accordingly, we took similar two parallel measures to locate the pocket. Firstly, as for labeling residues, we performed MSA and matched the sequence of our enzyme LACS1 to ttLACS, which is the subtype characterized in literature. From result of MSA, we matched each motif of ttLACS to LACS1, thereby locating and highlighting LACS1's four motifs, as shown in figure22.
Meanwhile, we also delineated an outline of the pocket in both 3D structure model of template ttLACS and target LACS1, by simultaneously highlighting the four motifs in the two model. The four motifs are marked in consistent colors throughout our project. Figure23 figure25 below are annotations of motifs on 3D protein model, and figure24 is our abstracted sketch illustration of the motifs' layout.
As supposed, the four motifs enclosed a chamber in both structure model, visually showing an open space that can possibly be a convincing pocket.


Verifying Pocket Accessibility
A pocket refers to a tunnel-shaped chamber for holding substrate. In our case, the pocket of LACS take in fatty acid DHA. A valid tunnel shall have a length deep enough and a diameter wide enough to allow carry fatty acid. Contrarily, if certain amino acids are too close to each other after the peptide chain folds into the protein's spatial structure, then they cannot form a channel that is large enough to accommodate fatty acid. Therefore, to be a legal pocket to intake and carry substrate, a space must have everywhere wider than a threshold. Luckily, we have computational methods to verify that for a certain chosen space.
We applied CAVER's algorithm,set the A-motif as the supposed binding site in tunnel, defined parameters including the minimum probe radius, and ran computation to calculate all possible tunnels. According to literature,A-motif binds with ATP,and the binding site will simultaneously form bond with fatty acid, making A-motif the center of catalyzing. Hence, the A-motif was chosen as the starting location. The result is displayed below in figure26, showing all the possible tunnel for substrate entrance.
Among the result set, we located a tunnel that overlaps with the previously identified intuitive pocket area, as shown in figure27. It was located for accordance with literature characterization, including passing the G-motif and ending at the open area between N-terminal and C-terminal.
We verified the chosen tunnel's accessibility by observing its radius graph in figure28, which indicated the tunnel's radius along its length. By observing its overall structure, we believe that this pocket has sufficient length and width to be a reasonable pocket for accepting and accommodating fatty acids. We further ran computation and obtained its radius graph in figure28, and therefore validated its accessibility for substrates.



Integrated Pocket Engineering
Modifying the pocket is the most sophisticated work in rational engineering. The pocket is a chamber enclosed by a mass of surrounding residues. Many of them are proven to interact with the substrates in different steps of the reaction process, each interacting in different mechanism. Rigorously based on specific mechanisms of each residue-substrate interaction, we designed single-site mutagenesis with multifaceted strategies, and added them to our MCP to later assess them with further evaluation methods.
Our analyses established a comprehensive view of the enzyme's pocket. In previous process of locating the pocket, we first determined the intuitive position of the pocket by enclosing the space with motif, and then calculated the structural features such as the length and width of the corresponding chamber. Based on the results of previous analysss, we identified the position of the pocket in the structural model of the protein and wanted to further optimize its various other properties.
Our multifaceted strategies aimed in multiple properties of the pocket. Our optimization perspective included the following:
Firstly, the Physical and chemical property of the pocket. Our attempt focused on tuning eletric charge and hydrophobicity to increase the affinity during reaction.
Secondly, the structural features. Our attempt to expand the pocket's width by modifying residues with large volume, in order to optimize the shape of pocket to facilitate substrate's entrance.
Thirdly, the biological interaction process and properties of specific residues during characterized reaction.These are our rational approaches of optimizing the pocket.
Therefore we titled this part as integrated pocket engineering. The integrated strategy of physicochemical property, structural features and biological interactions will each be detailed illustrated below.
i. Switching G-motif
Firstly, we considered whether the four critical motifs can be mutated to improve enzyme activity. We first performed multiple sequence alignment (MSA) with sequence of LACS1 and other subtypes of LACS that had previously been characterized in literature, and observed the conservation of each motif. From result of MSA, we located the four motifs and part of each were shown below (figure29-figure32).
We discovered that the L-sequence, P loop and A motif all display relatively conserved among the superfamily of adenylate-forming enzymes, which indicates that maintaining the precise motif sequence might be important for maintaining catalytic activity or other necessary functions. Therefore, we gave up the attempt to modify these sequence with relatively high conservation.
However, the G-motif showed a rather promising prospect of being modified, for it shows moderate conservation. A paper focused on mammalian LACS pointed out that the after aligning multiple sequences of LACS subtypes, it appears that the gating motif remain basically the same, while mainly differ in the aromatic gating residue of either phenylalanine (Phe/F) or tyrosine (Tyr/Y). Moreover, the paper characterized multiple isoforms including substituting the Gating motif and excluding it. Experimental data of significant change in activity suggests that the gating motif, especially the gating residue, is structurally important to the activity of the enzyme, with the possible mechanism of forbidding fatty acid's entrance as indicated in the paper on Thermus thermophilus's dimer LACS. Therefore, we also intend to carry out a switch of the gating residue of our target enzyme LACS1.
We observed the MSA result with G period as shown in figure30. MSA matched pre-characterized G-motif in other subtypes to a similar period of sequence in our enzyme LACS1, and showed G-residue alternating from Phe and Tyr. Therefore, the existence of G-motif and gating residue in our enzyme were consequently verified. The gating residue in LACS1 is shown to be Phe272, and F272Y was then added to our MCP waiting to be assessed.
![]() |
![]() |
![]() |
![]() |
ii. Tuning Charge
Tunnel charge will affect the activity of enzymes that use fatty acids as substrates mainly in two ways: exterior zone attracting substrate and internal zone interacting with substrates. Adding more positive charge to the tunnel can enhance attractions and facilitate interactions.
To tune the charge of LACS's tunnel, we plotto mutate the current residues to others that possess more positive electric charge. Different amino acids have varying side chain groups, and thus possess different electric charges. A pocket with a more positively charged periphery can better attract fatty acid to the pocket, and a more positively charged core area is more beneficial to fatty acid's reaction. Therefore, we attempt to introduce more positive charge to the pocket.
We analyzed charge property of the exterior and internal area of the pocket, and visualized them in figure33 and figure34. Figure33 showed exterior electric charge by marking negative residues red and marking positive residues blue. Figure34 showed internal electric charge specified to the surrounding residues of previously selected pocket. Base on visualization, we selected several candidates for MCP, by mutating a negative residue to a neutral one, or mutating a neutral residue to a positive one.


iii. Improving Hydrophobicity
The hydrophobicity of a pocket can significantly influence the rate and efficiency of internal reaction, especially when fatty acid participates. A hydrophobic environment, showing a lack of water molecules, will facilitate reactions involving fatty acids, as it reduces pocket's repulse interaction to fatty acid. To enhance the reaction, we aim to modulate the hydrophobicity of the channel, creating a more favorable environment for the reaction. By doing so, we can increase the likelihood of substrate interaction and reduce the energetic barrier for the reaction.
We managed to increase the hydrophobicity of pocket and surrounding residues. Amino acid side chains vary in their hydrophobicity, which is a crucial factor in determining the overall hydrophobicity of the chamber they enclose. Side chains that are more hydrophobic, such as those found in leucine, isoleucine, and valine, have a greater affinity for waterless environments and can effectively repel water molecules, thereby enhance a more hydrophobic environment. We first computed hydrophobicity along the chosen pocket, with result shown in figure35. Then by ranking different amino acid's hydrophobicity and strategically replacing amino acids with those that possess more hydrophobic side chains, we can enhance the pocket's hydrophobicity. This modification can create a more vacuum-like space to facilitate fatty acid's reaction.

iv. Expand pocket width
We plan to expand the pocket's width because the diameter of pocket is a main factor impacting its selectivity towards different fatty acid substrates. For DHA with a lengthy 22 carbon atoms and 6 degrees of unsaturation, its structure will be more twisted comparing to other saturated fatty acids. Therefore, it requires a particularly wider tunnel. Based on this property, we plan to expand pocket width to increase its selectivity and affinity towards DHA, and specifically increase catalytic activity.
We first studied the structure of the computed pocket, to identify the critical residues that restricted the diameter of pocket's thinnest period. The chamber of LACS is predicted to be the pocket. It's diameter is restricted by the surrounding residues that enclose the chamber. The pocket has a varying length along its depth, for residues are rather distant from each other at the deepest core of the tunnel, and are very close to each other at the middle of the tunnel. This thinnest period of tunnel is referred to as the bottleneck of the tunnel. It became our target spot to expand.
The three residues restricting the bottleneck is highlighted in figure36 and figure37. We made intricate plot for switching each. We plotted to modify the three restricting residues by switching each to a smaller residue, in order to enable a wider pocket and make room for the substrate. Meanwhile, we also kept control of proper electric charge and proper hydrophobicity of the residue to mutate to, to maintain the pocket's other property while expanding the tunnel's width.


Validation
Molecular Docking Study
Abstract
We employed molecular docking techniques to explore the effects of point mutations on protein-ligand interactions. The selection of point mutations was guided by stability analyses conducted with the FoldX software, ensuring that the mutated protein structures remained stable. We opted to utilize AutoDock, a widely recognized molecular docking software, for its frequent use in protein optimization and modeling validation. AutoDock offers a variety of scoring functions that predict the binding affinity between ligands and receptors, such as proteins, with high accuracy, aligning with our plan to simulate the binding modes of proteins with candidate ligands and predict their binding free energies.
Method
In our protein-ligand docking study, we implemented a series of targeted parameter optimizations and methodological customizations to ensure the accuracy and reliability of the simulation results.
- We incorporated parameters related to MG and, following Dr. Wu Sijin's advice, set the charge to 0.8 to more accurately simulate experimental conditions.
- Considering the large number of samples and limited time, while ensuring the quality of the docking results for each site, we set the number of docking runs for each site to 100.
- Homology modeling, given that parts of our enzyme's structure undergo rotation during the reaction, to ensure that the enzyme structure for DHA docking is as close to the actual situation as possible. We plan to use the rotated structure of a homologous LACS enzyme as a template for homology modeling of our enzyme.
Result
In our previous investigation into the intrinsic reaction mechanism of the LACS enzyme, we identified a crucial lysine residue that binds to the MG ion and DHA, making it one of the key reactive sites in the reaction.

In our study of LACS1, we identified the amino acid as LYS236, which we considered a mandatory docking condition due to its critical role in the reaction mechanism.
Docking Conformation Selection Criteria
When selecting the docking conformation, we paid close attention to the following criteria:
- Binding Affinity: We assessed whether the mutated protein exhibited improved binding affinity compared to the wild type.
- Change in Internal Energy: We compared the internal energy of the mutated protein to that of the wild type to ensure that the mutation did not introduce unfavorable energy changes.
- Conformational Stability: We analyzed whether the rotation energy of the mutated protein in complex with the ligand was similar to that of the wild type, ensuring its conformational stability.

In the end, we identified the mutant sites L633R, R305W, and H277M, which not only met the aforementioned criteria, but also were shown to be very common in clustering diagrams. This suggests that there is a high likelihood that the pathway proceeds in vivo in the same manner as predicted in the simulation, demonstrating the rationality of selecting this docking approach. Among them, H277M had the best binding energy, so we conducted further validation on it subsequently.
(Below is a detailed comparison table of the docking results for the wild type and the selected mutant type of LACS1.)

We have also optimized the proteins for LACS5 and LACS6 in parallel, following the same selection process as for LACS1. We will not provide a detailed introduction and analysis for them, but we will include the relevant selection data and mutation sequences in the table below for researchers interested in the optimization of LACS proteins.


Molecular Dynamics Simulation
Abstract:
Molecular docking typically predicts static binding modes and energies, while molecular dynamics (MD) simulations can assess the stability and dynamic behavior of protein-ligand complexes under simulated physiological conditions. Therefore, to further verify the accuracy of the selected mutations, it is essential to perform molecular dynamics simulations on the LACS1 mutant and wild-type forms, which can more detailedly display the docking results of proteins and ligands before and after mutation.
Method:
Due to the uniqueness of our pathway, to simulate the actual biological reaction process as closely as possible, we chose to perform molecular dynamics simulations on the LACS1, MG ion, ATP protein complex with the DHA ligand. Considering the time-consuming nature of learning this specialized simulation and to ensure the quality of each simulation, we opted for 100 ns MD simulations. Given the limited time later on, we selectively omitted the LACS1R305W mutation site, which showed a smaller degree of binding energy optimization compared to the other two in the docking data. Ultimately, we obtained results from MD simulations for RMSD, RMSF, and PCA analyses.


It can be seen that all RMSD values have converged, indicating that they are already in a more regular molecular motion state, which is suitable for analysis. The RMSD of the wild-type protein is higher than the normal value, which we believe is because the wild-type protein was predicted by AlphaFold2 without considering the rotation of the protein C-terminus during the reaction process, hence the slight increase in RMSD. The wild-type has been optimized by homology modeling, so the RMSD value is more reasonable.
It is worth mentioning that the RMSD values of both ATP and DHA are normal, which indicates that our docking results are reasonable, and the binding energy obtained from docking can be used as a basis for judging whether the mutation is effective.

Atom158-1580 is set as the range for PCA analysis, which is the enzymatic active site region we analyzed and obtained in Part 1. Additionally, we have set the time range for analysis from 70 ns to 100 ns. Within this time frame, through the RMSD plot, we can observe that the entire system has reached a regular motion state. In this PCA plot, it can be seen that the active site region has become stable in its movement, further detailing the validity of our docking criteria and the reliability of the results.
Results
We have identified potential mutation sites from mRNA, semi-rational, and rational perspectives. After triple validation with FoldX, docking, and molecular dynamics (MD) simulations, we successfully found the mutation site with the highest enzymatic activity after modeling, H277M. Subsequent experimental validation was conducted, and the results showed that the production of our target product, DHA-CoA, in the mutant type increased severalfold compared to the wild type. This indicates that the mutant LACS1 will be of great significance for the industrial production of large amounts of PC-DHA in the future.
PART 2: In Vivo-In Vitro Michaelis Constant Conversion
Overview
We have innovatively derived a formula that connects in vitro and in vivo Km values, facilitating our regression to the Schizochytrium chassis and more clearly demonstrating the advantage of the mutant sequences we identified earlier in producing the target product PC-DHA in Schizochytrium compared to the wild type.
2.1 Formula Derivation
In our project, we face a unique challenge: our goal is to engineer the production of DHA-PC in Schizochytrium, but laboratory conditions have limited our ability to culture it directly. To address this issue, we have devised an innovative approach by establishing a quantitative model to predict the yield of DHA-PC in Schizochytrium.
Our method begins with a derivation equation that transitions from in vitro to in vivo conditions, demonstrating the potential of protein optimization to enhance yields. We have constructed an equation to convert the in vitro measured Km values to in vivo values, based on the competitive inhibitor variant of the Michaelis-Menten equation. This equation requires key parameters: the Km value of the enzyme for the target substrate, the Km values for other significant competing substrates, and the in vivo concentrations of the substrates. We have experimentally determined the Km value for the enzyme with DHA, while other parameters were obtained through literature research.
We have collected data on different types of enzymes, their substrates, their respective Km values, and the in vivo substrate concentrations, which form the basis for constructing our equation and predicting yields. By substituting these parameters directly into our equation, we can predict the yield of DHA-PC in Schizochytrium.
This method is scientifically rigorous and highly practical, as it allows us to effectively predict and optimize the production process without direct experimental conditions.
Below is the specific derivation process of our equation:


2.2 Key Parameter Analysis
In section 2.1, it can be observed that α is quite significant. The Km measured in vitro, when multiplied by α, can be converted to the in vivo value. Below is our computational analysis of α for this reaction.
Substrate | Km Value | Concentration (Total Fatty Acid Concentration 6.5g/L) |
---|---|---|
C16:0 | 6.0μM | 14.9 ± 0.7% |
C18:0 | 3.0μM | 0.6 ± 0.0% |
C20:4(ARA) | 9.7μM | 1.1 ± 0.1% |
C22:5(DPAn6) | 10μM | 21.7 ± 0.2% |
C22:6(DHA) | NaN | 53.2 ± 0.3% |
\[ \alpha = 1 + \frac{\left[ I_1 \right]}{K_{mI_1}} + \frac{\left[ I_2 \right]}{K_{mI_2}} + \frac{\left[ I_3 \right]}{K_{mI_3}} + \cdots + \frac{\left[ I_n \right]}{K_{mI_n}} \]
\[ \alpha =1+\frac{\left [ 0.15*6.5g/L \right ] }{6.0\mu M} +\frac{\left [0.06*6.5g/L \right ] }{3.0\mu M} +\frac{\left [ 0.01*6.5g/L \right ] }{9.7\mu M}+\frac{\left [ 0.215*6.5g/L \right ] }{10\mu M} +...+\frac{\left [ I_{n} \right ] }{K_{mI_{n}}} \]
Given the limited article search, the Km values for potential competing substrates have not yet been fully identified. When there is sufficient experimental or literature data to support it, the accuracy of α will increase, and the estimated in vivo Km values will be more realistic.
Part 3: Genome-scale metabolic network modeling
Overview:
Predicting the metabolic processes and life activities of microorganisms is difficult in the absence of multi-omics data on microorganisms. Especially when there is a need to improve the production of target products, so we utilized genome-scale metabolic network modeling to assist us in solving this problem. We found a model describing Schizochytrium limacinum in the literature: iCY1170_DHA, which was used to perform optimization simulations and ultimately yielded a potential knockout target.
Method:
One of the methods in GEMM is Flux balance analysis (FBA), by adding the constraint of each reaction rate, FBA is solving the linear programming, and finds the answer in the solution space [1].

There forms an m*n stoichiometric matrix (S), and a reaction rate (v), where m represents the metabolites and n represents the reactions. Multiply S by v, we can know flow rates for individual reactions for each metabolite. After setting an objective function, we can get the optimal flux answer under the condition: S * v = 0.

In our project, for increasing the production of PC-DHA, we want to find the potential knockout target and increase the flux of PC-DHA. By using simulation in computer at the start, we can decrease the cost and provide a good direction to the experiment. We found a model of *Schizochytrium limacinum* called iCY1170_DHA and set the rate of substance uptake to a value consistent with the experiment [2].
Reaction Description | Equation | LBa1 | UBb2 |
---|---|---|---|
Exchange of D-glucose | D-glucose[e] < => | -1.4 | 1000 |
Exchange of Water | H2O[e] < => | -1000 | 1000 |
Exchange of Oxygen | O2[e] < => | -1000 | 1000 |
Exchange of Ammonia | NH3[e] < => | -10 | 1000 |
Exchange of Orthophosphate | Pi[e] < => | -1000 | 1000 |
Firstly, we're trying to find the best conditions for culturing Schizochytrium limacinum (SR21). We mainly focus on the carbon source and oxygen intake. From research, we know glucose is more fit for SR21 as a carbon source [2]. Then we performed an analysis based on these two variables:




From this analysis, we know that in an oxygen environment higher than 2 (mmol/gDW/h), the production of PC-DHA will decrease, and glucose should just be kept higher than the limit intake of glucose at 1.4 (mmol/gDW/h).
Optforce:
Secondly, we used Optforce to find the up-regulated and down-regulated targets. Optforce is an algorithm that identifies all possible metabolic interventions that lead to the overproduction of a biochemical of interest [3].

By using Optforce to predict the ability to produce PC-DHA, we found 11 MustLL and 4 MustL. Here are the 2 most important pathways:
Pathway | Reaction Formula | Metabolic Pathway |
---|---|---|
GDP-L-galactose phosphorylase | M_pi_c[c] + M_gdpgal_c[c] -> M_gdp_c[c] + M_gal1p_c[c] | Galactose metabolism |
S-malonyltransferase | M_h_c[c] + M_malcoa_c[c] + M_acp_c[c] -> M_coa_c[c] + M_malACP_c[c] | Lipid metabolism |
GDP-L-galactose phosphorylase is an important enzyme in the galactose metabolic pathway, which is critical for carbon source utilization, energy acquisition, and the synthesis of certain important metabolites. By regulating galactose metabolism, GDP-L-galactose phosphorylase can influence cell growth and adaptation, especially in the absence of major carbon sources such as glucose [4]. However, under the culture conditions of Schizochytrium limacinum—with no environmental stress and in glucose conditions—Schizochytrium has the highest growth rate [2], so it might be a potential knockout target.
The second most possible knockout target is the malonyl transfer pathway. In the fatty acid synthesis pathway, MAT is responsible for transferring malonyl coenzyme A to ACP to form malonyl-ACP. Malonyl coenzyme A plays an inhibitory role in fatty acid oxidation by binding to carnitine palmitoyl transferase 1 (CPT1) [5]. Knockdown of MAT may reduce the production of malonyl-coenzyme A, thereby reducing the inhibition of the fatty acid oxidation pathway and making more fatty acid precursors available for DHA-PC synthesis. By using the COBRA (Constraint-Based Reconstruction and Analysis) model, we found an interesting phenomenon: when the flux of the S-malonyl transferase reaction was limited to 0, although the cell growth rate slowed down to 60% of the original rate, the PC-DHA production increased to 250% of the original.

Results:
This finding suggests that S-malonyl transferase might be a potential knockout target. By knocking out this pathway, we can direct the metabolic flow more towards the synthesis of PC-DHA rather than its depletion through the metabolic pathways in which the enzyme is involved. This strategy is expected to increase the yield of PC-DHA, which is important for industrial production.

To achieve this metabolic engineering goal, we considered knocking out pathways involved in malonyl-coenzyme A synthesis, such as malonyl transferase (MAT1) or acetyl-coenzyme A carboxylase (ACC).
Prospect:
In fact, the construction of the GEMM model is not complete. To further validate the effectiveness of this strategy, we plan to perform knockdown experiments in the laboratory to see whether knockdown of S-malonyl transferase-related genes can indeed enhance PC-DHA production. Meanwhile, we will also explore other possible metabolic engineering strategies to further enhance the PC-DHA production capacity of Schizochytrium limacinum.
References:
- J. D. Orth, I. Thiele, and B. Ø. Palsson, Nat Biotechnol, 2010, 28, 245–248.
- C. Ye, W. Qiao, X. Yu, X. Ji, H. Huang, J. L. Collier, and L. Liu, BMC Genomics, 2015, 16, 799.
- S. Ranganathan, P. F. Suthers, and C. D. Maranas, PLOS Computational Biology, 2010, 6, e1000744.
- J. Tao, Z. Hao, and C. Huang, AoB PLANTS, 2020, 12, plaa055.
- G. Jogl, Y. Hsiao, and L. Tong, Annals of the New York Academy of Sciences, 2004, 1033, 17–29.
- Activity of the acyl-CoA synthetase ACSL6 isoforms: role of the fatty acid Gate-domains BMC Biochemistry Full Text, https://bmcbiochem.biomedcentral.com/articles/10.1186/1471-2091-11-18, (accessed October 2, 2024).
- Multiple erythroid isoforms of human long-chain acyl-CoA synthetases are produced by switch of the fatty acid gate domains BMC Molecular Biology Full Text, https://bmcmolbiol.biomedcentral.com/articles/10.1186/1471-2199-7-21, (accessed October 2, 2024).
- Drug screening with the Autodock Vina on a set of kinases without experimentally established structures IEEE Conference Publication IEEE Xplore, https://ieeexplore.ieee.org/document/9245440, (accessed October 2, 2024).