Implementation
Contents
- Overview
- nuCloud Usage
- Potential end users
- nuCloud and Sustainability
- nuCloud and Data Durability
- Future Improvements
Overview
Enzymatic DNA synthesis is a promising new technology with the potential to become a greener and cheaper alternative to current standard chemical DNA synthesis methods 1. nuCloud presents a DNA data storage pipeline that harnesses the potential of thermostable TdT to optimize the enzymatic DNA synthesis process.
Currently, low DNA read/write speeds and sequencing errors are a bottleneck for synthetic DNA to be used for storing data that is accessed and updated regularly and that needs be available quickly. Based on discussions with multiple researchers active in academia and industry, coupled with our literature review we concluded that the most viable initial application for nuCloud is in archival data storage i.e. data that is unlikely to be accessed or updated once stored.
nuCloud Usage
To allow for binary data to be archivally stored as DNA, the nuCloud has generated a software pipeline and a microfluidics DNA synthesis platform.
First, the user submits binary data that they want to archive to the software pipeline. This data is broken up and encoded into nucleotide sequences. The software pipeline also generates primers that are attached to the DNA-encoded data. These encoded sequences are then given to the wet lab pipeline for DNA synthesis.
The wet lab pipeline performing enzymatic DNA synthesis using ThTdT has been incorporated into a microfluidics platform to automate the process. Instead of having an individual manually add the nucleotides one-by-one, the nucleotides will be pumped into the microfluidics chip and drained on a timed basis, standardizing and automating the protocol. The DNA sequence outputted by the chip can then be stored in a DNA data archive center.
Potential end users
Using the nuCloud data storage pipeline for archival data storage has potential applications in research laboratories and corporate industry. Dr. Tafirout, a researcher at the TRIUMF particle accelerator in UBC has said that in its current stage a platform like nuCloud would be applicable for glacial storage i.e. data that is rarely accessed. CERN, which runs the largest particle physics laboratory in the world, already surpassed 300 pettabytes of data permanently archived in tapes by 2018, and the amount of archived data will only continue to increase (CERN, 2022), opening the space for DNA data storage as a more sustainable alternative.
Microsoft, a major stakeholder in the tech industry, also has a research team conducting active work on data storage using synthetic DNA. Dr. Karin Strauss, a senior researcher at Microsoft, indicated that DNA data storage may play a role in Microsoft’s sustainability goals for their datacenters over the next 10 years if the technology is ready in that time.
nuCloud and Sustainability
Worldwide data generation is expected to to grow 32% year over year to 175 ZB by 2025 2. Of this, over 8 ZB will need to be stored remotely in the “cloud” 3. Yet conventional archival data storage is environmentally costly, primarily driven by the energy consumed during write and access operations, amounting to over 80 MJ / TB * Year. These costs will continue to compound exponentially if new methods of data storage are not developed.
DNA-based archival data storage is an effective method of reducing this environmental footprint. Life cycle analysis has shown that DNA data storage emits less greenhouse gas and consumes less water and energy than comparable tape or HDD-based storage methods 3. Furthermore, nuCloud implements enzymatic DNA synthesis rather than the standard phosphoramidite method which relies on organic reagents such as acetonitrile, synthesized from oil or large amounts of ethanol. By contrast, enzymatic synthesis can be performed in an aqueous environment with fewer chemical inputs. As a result, enzymatic synthesis produces comparatively fewer GHG emissions than conventional synthesis while requiring less water and energy 3.
Challenges/Safety Considerations
- Thermostable TdT is reacted at a higher temperature than wt TdT which was used for previous enzymatic synthesis platforms. As a result, the energy used during write operations will increase. enzymatic DNA synthesis requires an order of magnitude less energy and water than disk-based storage.
nuCloud and Data Durability
Today, tape storage is the most widely used medium for archival data storage having the best durability among other currently available options. While tapes can last upto 30 years under optimal conditions, in our conversation with Dr. Tafirout we learnt that TRIUMF performs a data refresh to new tape media every 10 years and it is an intensive operation.
Meanwhile, DNA, has been shown to have a room-temperature half life of more than 100 years 4 and lasts even longer in cooler temperatures with 700,000 year-old DNA retrieved from permafrost. So DNA-based storage medium offers a multifold improvement over tape media, making it a very promising alternative for a durable data storage system. Being a DNA-based storage platform, nuCloud is thus poised to be implemented as an archival data storage method that does not require data refreshes as often as traditional tape media, saving on the time and resources that go into this periodic process.
Challenges/Safety Considerations
- While DNA can in theory last for several hundred years, it needs to be stored under the proper conditions to prevent contamination that may cause DNA degradation. Dr. Strauss said the basic steps for protection are to maintain a dry, cool, sterile environment and keeping the DNA away from light. Various containment measures need to be tested and optimized to guarantee preservation of DNA for an average of more than 10 years to be worth replacing current tape media storage.
Future Improvements
- It will be important to quantify the stability of the DNA products synthesized using the microfluidics pipeline of nuCloud and optimize the storage conditions of DNA to offer it as an improved alternative to tape storage for archival storage.
- In order to combat the errors that arise from DNA sequencing when needing to access stored data, it will be essential to develop a decoding pipeline with robust error-correcting codes.
- Significant increases in write speeds i.e. DNA synthesis through TdT optimization are necessary to expand the scope of nuCloud’s applications beyond archival data.
References
-
Barthel, S., Palluk, S., Hillson, N. J., Keasling, J. D., & Arlow, D. H. (2020). Enhancing Terminal Deoxynucleotidyl Transferase Activity on Substrates with 3’ Terminal Structures for Enzymatic De Novo DNA Synthesis. Genes, 11(1), 102. https://doi.org/10.3390/genes11010102 ↩
-
“Where in the World Is Storage: Byte Density Across the Globe.” IDC, 2013, [Online]. Available: http://www.idc.com/downloads/where_is_storage_ info-graphic_243338.pdf. ↩
-
Nguyen, B. H., Sinistore, J., Smith, J. A., Arshi, P. S., Johnson, L. M., Kidman, T., Dicaprio, T. J., Carmean, D., & Strauss, K. (n.d.). Architecting Datacenters for Sustainability: Greener Data Storage using Synthetic DNA. ↩ ↩2 ↩3
-
Allentoft, M.E., Collins, M., Harker, D., Haile, J., Oskam, C.L., Hale M.L., … Bunce M. (2012). The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. B. 279, 4724–4733. http://doi.org/10.1098/rspb.2012.1745 ↩