- Overview -
In today's digital age, secure transmission and storage of information are crucial. Our project
aims to achieve information encoding, transmission, and secure storage using genetically
engineered Escherichia coli through synthetic biology. By developing and refining mathematical
models, we simulated the entire process from information encoding to decoding and assessed the
stability and concealment of the information in various environments. These models not only help
optimize the transmission methods but also ensure the safety and reliability of information
during transmission, providing new insights and methods for the development of genetic
information storage technology.
Information Encoding and Transmission Model:
- Methods and Principles -
- Methods and Principles -
In our project, the information encoding and transmission model is based on the allocation of
codons corresponding to amino acids, utilizing the principles of the Wubi input method to encode
Chinese characters into DNA sequences. Here’s a detailed explanation of our encoding methods and
principles:
· Amino Acid Classification and Codon Allocation:
- Amino acids are categorized into two groups: those that easily form α-helices and those that easily form β-sheets. Codons from these groups are assigned to different letters according to the Wubi input method. This classification ensures effective information concealment and stability during the encoding process.
- Amino acids that easily form α-helices (e.g., Ala, Glu, Leu) and those that form β-sheets (e.g., Val, Ile, Thr) are associated with specific letters. For example, the letter "B" is represented by GCC (α-helix) and GTC (β-sheet).
· Encoding Letters into DNA Sequences:
- Each Chinese character is broken down into radicals according to the Wubi input method and mapped to four letter keys. Each letter is encoded using two different codons, one from the α-helix group and one from the β-sheet group. Thus, a letter is represented by two codons (6 base pairs in total), such as the letter "L" being encoded as CTG (α-helix) and ACT (β-sheet).
- This dual-codon allocation ensures that each Chinese character is ultimately encoded as a 12 base pair DNA sequence (4 codons), maintaining information concealment and diversity.
· Use of Placeholder Symbols:
If a Chinese character has fewer than four radicals, placeholder symbols, represented by the
character "$", are used to fill in the gaps. The "$" symbol is encoded by randomly selecting
from multiple codons (such as CAA, CAG, GAT, GAC, etc.) to prevent repetitive sequences and
enhance information diversity. These codons are chosen based on their minimal impact on protein
structure, ensuring that they do not interfere with the main encoded information.
· Optimization of Encoding and Mutation Resistance:
- Our encoding strategy is optimized to minimize repetitive base pairs, enhancing the stability of the encoded sequences. By mixing α-helix and β-sheet codons, we can effectively hide information while enhancing mutation resistance. This strategy ensures that information can still be accurately decoded and recovered even if minor mutations occur in the DNA sequence.
- Additionally, the encoding model considers the impact of different amino acids on protein structures to optimize the information concealment effect. By introducing redundancy and mismatch mechanisms in the encoding process, we enhance the reliability of the information under various environmental conditions.
- Information Hiding and Security Model -
Model Objective: To hide information by manipulating protein structures,
ensuring that the information is difficult to detect and recognize within biological systems.
Simulation and Results:
We use protein structure prediction tools (like AlphaFold) to verify the effectiveness of our encoding strategy, ensuring the encoded sequences resemble naturally occurring proteins.
· Methods and Principles:
-
Protein Structure Manipulation:
- We select codons for amino acids that tend to form α-helices and β-sheets to conceal information. These sequences are designed to resemble naturally occurring protein structures, increasing the hidden nature of the information.
-
Relationship Between Encoding and Protein Folding:
- By utilizing the structural properties of different amino acids, information is concealed within what appears to be normal protein structures. We use tools like AlphaFold to predict protein structures, making the encoded sequences appear like genuine proteins, enhancing the effect of information hiding.
-
Optimization Strategies for Information Hiding:
- A strategy combining α-helices and β-sheets is employed to increase the diversity of the information, making the encoded sequences harder to detect. We also consider environmental factors that might affect protein structure to optimize information hiding, ensuring stability across various biological conditions.
We use protein structure prediction tools (like AlphaFold) to verify the effectiveness of our encoding strategy, ensuring the encoded sequences resemble naturally occurring proteins.
- Model of Mutation Impact on Information Readability -
Model Objective:To analyze how mutations in DNA sequences affect the
readability and integrity of encoded information.
2. Readability Analysis:
Simulation and Results:
We simulated various mutation scenarios and analyzed their impact on decoding success rates. The results indicate that appropriate redundancy strategies significantly enhance information mutation resistance, ensuring readability for decades.
· Methods and Principles:
1. Mutation Simulation and Information Decoding:
- We simulated mutations in DNA sequences to study their impact on the decoding process. Mutation types included point mutations, insertions, and deletions.
- Based on literature, the natural mutation rate in Escherichia coli is approximately 10^-9 to 10^-10 mutations per base pair per generation. Using this mutation rate, we simulated potential mutations over several decades.
2. Readability Analysis:
- Our encoding strategy incorporates redundancy mechanisms to maintain information integrity even when mutations occur. Our model shows that even with up to 10 mutations in a DNA sequence, the information remains highly readable.
- This mutation resistance suggests that information can be stably stored in E. coli for many years without losing readability. We used statistical models to analyze the impact of different types of mutations on information integrity and readability.
Simulation and Results:
We simulated various mutation scenarios and analyzed their impact on decoding success rates. The results indicate that appropriate redundancy strategies significantly enhance information mutation resistance, ensuring readability for decades.
- Temperature-Sensitive Self-Destruction Mechanism Model -
Model Objective: To simulate a temperature-sensitive self-destruction mechanism
that ensures the degradation of bacterial DNA under specific conditions, thereby preventing
information leakage.
· Methods and Principles:
-
Temperature-Sensitive Promoter Design:
- We utilized an existing temperature-sensitive promoter for the self-destruction mechanism. This promoter initiates gene expression at a normal incubation temperature of 37°C, while suppressing gene expression at 30°C or lower.
- This mechanism uses temperature as a trigger, ensuring that bacteria maintain functional activity under normal conditions, but quickly respond when the self-destruction condition (such as high temperature) is activated.
-
Selection and Expression of the Self-Destruction Gene:
- To achieve the self-destruction mechanism, we used the DpnI gene, which encodes a methylation-specific endonuclease. Under the control of the temperature-sensitive promoter, the DpnI gene is activated at 37°C, leading to enzyme production that starts degrading bacterial DNA.
- This self-destruction design ensures that any unauthorized attempt to access or transmit the information results in the degradation of bacterial DNA, thereby protecting the security of the stored information.
-
Experimental Validation:
- We conducted a series of temperature-controlled experiments to validate the effectiveness of the self-destruction mechanism. In these experiments, the engineered bacteria demonstrated high self-destruction efficiency at 37°C, with DNA rapidly degraded, while at 30°C or lower, the bacteria were able to grow and survive normally.
- Conclusion and Outlook -
In this project, we developed mathematical models to simulate information encoding and
transmission, concealment, mutation impact on readability, and conditional growth with
self-destruction, supporting secure genetic information storage in Escherichia coli. The
Information Encoding and Transmission Model effectively encoded Chinese characters into DNA
sequences using Wubi input principles, ensuring efficient storage and transmission. The
Information Hiding and Security Model demonstrated successful information concealment through
codon allocation and protein folding strategies, enhancing stealth. The Mutation Impact on
Readability Model revealed how natural mutations affect information readability, offering
insights to improve encoding strategies for greater mutation resistance. Additionally, the
Conditional Growth and Self-Destruction Model secured information through a
temperature-sensitive mechanism, preventing leakage.
Moving forward, we aim to enhance encoding methods to improve redundancy and mutation resistance, refine information hiding techniques by exploring advanced protein folding strategies, expand self-destruction mechanisms by introducing additional environmental triggers like chemicals and light, and apply these models to other biological systems such as yeast or mammalian cells to verify their effectiveness. These efforts will contribute to advancing genetic information storage, offering new tools and methods for synthetic biology and information technology.
Moving forward, we aim to enhance encoding methods to improve redundancy and mutation resistance, refine information hiding techniques by exploring advanced protein folding strategies, expand self-destruction mechanisms by introducing additional environmental triggers like chemicals and light, and apply these models to other biological systems such as yeast or mammalian cells to verify their effectiveness. These efforts will contribute to advancing genetic information storage, offering new tools and methods for synthetic biology and information technology.