- Overview -
In today’s highly digitalized world, the security and confidentiality of information
transmission face significant challenges. To address this issue, we propose an innovative
solution: encoding information into DNA sequences using the Wubi input method and inserting
these sequences into Escherichia coli for secure and covert information transfer. Our design
includes three key components: first, we encode Chinese characters into DNA sequences using the
Wubi input method and introduce them into bacteria for information encoding and storage; second,
we design a conditional growth mechanism, ensuring that the bacteria can only grow under
specific conditions, using caffeine or xanthine as essential growth factors, which guarantees
that the information can only be accessed in certain environments; finally, we incorporate a
self-destruction mechanism, where the bacteria secrete DNase enzymes under high temperatures to
automatically degrade their DNA, thus destroying the encoded information to ensure its security.
This system provides a robust solution for protecting the confidentiality and security of
information.
Module 1:
- Information Encoding and Transmission -
- Information Encoding and Transmission -
The central dogma of molecular biology states that the base sequence of DNA determines the
expression of proteins. In our project, we leverage this principle by encoding information into
DNA sequences using the Traditional Wubi input method, aiming to make these artificially
synthesized base sequences resemble the protein-coding regions of bacteria, thereby disguising
the encoded information.
First, we classify codons (triplets of bases corresponding to mRNA) into three categories: the first category represents amino acids that easily form alpha helices, the second category represents amino acids that easily form beta sheets, and the third category includes codons that can represent both structural elements. For example, GCT and GTT can correspond to a particular structural element.
For the information encoded by the third category of codons, we use the Lorenz equation: when the value exceeds a certain threshold, we use codons representing alpha helices; when below, we use codons representing beta sheets. In this way, codons containing important information can be disguised as sequences that appear to encode proteins, effectively hiding the data within a biologically plausible context.
Although the probability is low, gene mutations can still occur, which may lead to distortion of our information. To address this issue, we add a hash value at the end of the codon sequence, representing the number of A, T, C, and G bases within the sequence. If the actual count of these bases does not match the hash value, it indicates that a mutation may have occurred, and the information needs to be re-extracted from another bacterium. This approach ensures the maximum integrity and reliability of the information encoded within the bacterial DNA.
First, we classify codons (triplets of bases corresponding to mRNA) into three categories: the first category represents amino acids that easily form alpha helices, the second category represents amino acids that easily form beta sheets, and the third category includes codons that can represent both structural elements. For example, GCT and GTT can correspond to a particular structural element.
For the information encoded by the third category of codons, we use the Lorenz equation: when the value exceeds a certain threshold, we use codons representing alpha helices; when below, we use codons representing beta sheets. In this way, codons containing important information can be disguised as sequences that appear to encode proteins, effectively hiding the data within a biologically plausible context.
Although the probability is low, gene mutations can still occur, which may lead to distortion of our information. To address this issue, we add a hash value at the end of the codon sequence, representing the number of A, T, C, and G bases within the sequence. If the actual count of these bases does not match the hash value, it indicates that a mutation may have occurred, and the information needs to be re-extracted from another bacterium. This approach ensures the maximum integrity and reliability of the information encoded within the bacterial DNA.
Module 2:
- Conditional Growth and Caffeine Dependency -
- Conditional Growth and Caffeine Dependency -
· Knockout of guaB
In its natural state, the guaB gene in Escherichia coli encodes a key enzyme in the guanine
nucleotide synthesis pathway, which is involved in the synthesis of XMP (xanthosine
monophosphate). XMP is a precursor for the synthesis of RNA and DNA, making the guaB gene
crucial for bacterial survival. To make the bacteria dependent on specific substances, we used
gene editing technology to knock out the guaB gene in E. coli. After the knockout, these
bacteria cannot synthesize XMP, and consequently cannot synthesize RNA or DNA, meaning they are
unable to grow on media that lack externally supplied xanthine or caffeine. This design makes
bacterial growth strictly dependent on exogenous substances in specific environments, thereby
enhancing the security and controllability of information transmission.
· Introduction of the DeCaf Pathway
To restore the growth ability of the guaB-knockout bacteria and to use more commonly available
caffeine in place of the less common xanthine, thereby increasing stealth, we introduced the
caffeine degradation pathway (DeCaf Pathway) from Pseudomonas putida CBB5. We constructed the
key genes of this pathway into an expression vector and introduced them into the guaB-knockout
E. coli strain (BW-ΔguaB), resulting in the BW-ΔguaB -DeCaf strain. This pathway enables the
bacteria to demethylate caffeine into xanthine, allowing them to regain the nucleotides
necessary for growth. By introducing the DeCaf pathway, the bacteria can utilize caffeine as a
substitute for xanthine, thereby restoring their growth capability.
We aim to further screen and test common caffeine-containing beverages from daily life to determine if they can support the normal growth of the guaB-knockout bacteria with the DeCaf pathway added. We plan to prepare solid media from these common beverages to test the growth of BW-ΔguaB-DeCaf. This experiment will further demonstrate the caffeine dependency of the BW-ΔguaB-DeCaf strain and showcase its growth ability in different caffeine environments. More importantly, it will show that BW-ΔguaB-DeCaf can grow in widely available beverages in daily life without relying on the uncommon xanthine, thereby increasing the stealth of agents and reducing the difficulty of retrieving information.
Module 3:
- Information Protection -
- Information Protection -
· Resistance Mechanisms
To ensure selective growth of the bacteria and further protect the transmitted information, we
designed multiple resistance mechanisms to respond to different environmental conditions. First,
during the knockout of the guaB gene, we replaced its location in the genome with a kanamycin
resistance gene, enabling the modified E. coli to gain resistance to kanamycin. This allows us
to use kanamycin-selective media to screen successfully transformed strains.
Additionally, the caffeine degradation pathway (DeCaf Pathway) incorporates a streptomycin resistance gene, further enhancing the selective resistance of the strain. The plasmid also includes an ampicillin resistance gene, enabling the strain to grow in the presence of ampicillin. Consequently, the final modified strain exhibits resistance to ampicillin, chloramphenicol, and kanamycin, ensuring that the bacteria only grow under specific conditions and protecting the internal information.
· Self-Destruct Mechanism
To prevent unauthorized access to the information, we designed a temperature-sensitive
self-destruct mechanism. Under high temperatures, the bacteria secrete DNase enzymes, which
degrade the bacteria’s own DNA, thereby destroying the encoded information and ensuring its
security while preventing unauthorized access.
We achieve this by combining a transcription factor (cI dimer) with a DNA endonuclease to protect the bacteria from self-lysis under normal conditions. However, when the temperature rises to 37°C or above, the dimer begins to dissociate and leave the binding site, causing the genetic circuit of the DNA endonuclease to be expressed, leading to the spontaneous cutting of the DNA into numerous fragments.
We chose to use the gene for expressing the DpnI endonuclease, as the E. coli genome is rich in its cutting sites. The recurrence and visibility of this specific sequence (GATC from 5’ to 3’) are prevalent in the E. coli genome. When the temperature reaches 37°C, the DpnI gene is activated, and the resulting DNase enzyme cuts the bacteria’s DNA, causing the bacteria to die and ensuring the destruction of the information.
Through these designs, the modified bacteria can selectively grow under specific conditions and automatically trigger the self-destruct mechanism when exposed to unfavorable environments, maximizing information security and preventing leaks.
Conclusion
Through our innovative design, we have established a robust method for covert information
transmission using microorganisms. By encoding information into DNA sequences through the Wubi
input method, we ensure secure storage within Escherichia coli, facilitating the transmission of
hidden data. Our approach features three essential components: first, the utilization of codons
that mimic protein-coding regions effectively disguises the encoded information, enhancing its
security. Second, the knockout of the guaB gene, coupled with the introduction of the DeCaf
pathway, ensures bacterial growth is dependent on specific substances like caffeine, adding a
layer of control over the environment. Finally, the temperature-sensitive DNase secretion
mechanism acts as a safeguard against unauthorized access by degrading the encoded information
under unfavorable conditions. Together, these elements create a microbial-based system for
discreet information transfer, ensuring that sensitive data remains secure and accessible only
in intended environments.