<!DOCTYPE html>

But is it safe ?!

Introduction

Safety is one of the most important issue to be adressed innovation-wise. From the very start, Cap’siRNA expressed the idea of specific viral targeting through the design and engineering of silencing RNA precursors (siRNA precursors). Long 21 to 25 nucleotides, siRNA can only express their activity when 100% stringence is found with the target site [1]. This specificity allows extreme efficiency in neutralizing desired mRNA [1,2], but also raises the question of off-targetting. The chances of a non-targetted mRNA falling victim of siRNA by sheer coincidence are slim but not null given the number of living organisms on Earth. A key safety to safely spray siRNA in the environment is therefore to ensure that no living organism aside from the one/those targetted can suffer from it. In short, we need to ensure that our RNA-interferent based pesticide does not repeat the mistakes of the past-used pesticides (neonicotinoids, glyphosate, …). SafeRNA was built for the purpose of ensuring the safety for humans to work with our product, but also the safety of the soil microbiota too often forgotten, as well as pollinators, insects etc.

Principle of SafeRNA

SafeRNA is a bio-informatic pipeline designed to be able to : - Collect coding sequences from desired taxa - Simulate targeting of siRNA toward the collected taxa - Analyze results and provide insight on how to improve the siRNA sequences

SafeRNA has a dedicated tool for each of those tasks, all three written in python, with the only necessary library being Biopython. It is however required to download the datasets.exe file from NCBI to be able to use the sequence collector tool [3]. Each of those programs is designed to solve a specific problematic :

siRNA sequences may change

The siRNA sequences produced will never be the exact same as the length at which they are cut by the RISC complex vaies between 21 to 25 nucleotides. The program main.py uses an algorithm to predict all possible 21 nucleotides sequences long that can be obtained from RISC activity, wherever it may cut the siRNA double stranded precursor.

Collecting relevant taxa omic informations

From the litterature it is relatively accessible to find the microbial and fungi taxa involved in the soil microbiome positive activity toward cultures. However collecting omic data for all of them can prove tedious and impractical with a larger number of species. The program get_data.py leverages the datasets.exe executable file from NCBI command line tools to massively download large lists of taxa coding DNA sequences and make them ready for analysis.

Analyze siRNA safety toward the microbiome

siRNA can only work if their complementarity is perfect [1]. Thus to check whether a siRNA sequence may be off-target, all possible siRNA sequences, obtained from main.py are aligned against the database downloaded previously with get_data.py. If the alignment is successful, meaning if the 21-nucleotides sequence is found in one of the coding sequence of a genome of the database, then this sequence is deemed a “hit”. A report then gives insight on the hits found in the database such as their number, the genome file location and a direct link to the species/strain assembly.

A detailed user guide can be found in the README on the iGEM software tool repository for details on the practical use of SafeRNA.

Methods

The genomic data collection is performed using the command line tool from NCBI, datasets.exe, and based on publications studying the microbiota populations found in Beta vulgaris cultures [5]. The list of bacteria and fungi taxa is gathered from the litterature, and only the corresponding genomic sequences for the entire taxa are downloaded from NCBI. We limit the downloading of sequences to after 2010 in order to avoid downloading too many poor quality DNA or too many imcomplete genomic sequences in our database.
The taxa were gathered based on the relative abundance found in the study of Peter Kusstatscher et al. in 2019, describing the microbiome found in Beta vulgaris cultures. All the coding sequences found in each taxa and sequenced between 2010 and 2024 were downloaded to build a database.
The siRNA sequences are obtained through a naive algorithm gathering all possible 21-nucleotides sequences from a precursor using a sliding window of 21 nucleotides along the precursor RNA sequence. All siRNA are then assigned an ID, and finally written in a file in fasta format ready to be used.

Figure 1: Scheme of the sliding window program generating all possible siRNA sequences from the target sequence.

A BLASTn (Basic Local Alignment Search Tool for nucleotides) alignment is then conducted between the genomes and the siRNA files using the short-word parameter. Local BLAST automatically discard short sequences. This parameter allows them to be taken into account. A BLAST alignment uses the BLAST algorithm to perform an alignment between sequences and find similarities or complementarity between the two of them. This means that both following alignments are considered “hits”, given the nature of the double stranded RNA we are producing through synthetic biology.

Figure 2: Scheme of alignment counted as hits.

Results

The database assembled more than 50,000 genomes coding sequences from 17 different taxa of bacteria and fungi, resulting in a 200Gb database.

Barplot des taxons et abondance de séquences

The general formula for the number of 21-nucleotides long siRNA that can be obtained in a sequence is the following: n_siRNA = L_target − 21 We target RNA-dependant RNA-polymerase (RdRp) and the p21 sequences from the BYV, respectively 1302 and 408 nucleotides long. Therefore the number of different siRNA sequence possible to obtain from those sequence is respectively 1281 and 387 for our targets.

Runtime of aligning 50,000 assembly on all the siRNA sequences took 36 hours. 8 hits were found in the entire database from 8 different bacterial strains. Each time only one siRNA amongst all those possible. This means the probability of an off-target siRNA to be produced is P = 1/(n_siRNA)$. The probability of one being generated is therefore very low.

Figure 3: Graph of total hits respect to the number of sequence obtained

To further the result analysis, we analyzed the strains found as potential off-targets and their geographical localization (as a geographical filter is not included at the moment in the search and download program). Consequently, it appeared that none of the bacteria species and strain found as off-target can be found in Europe : all 8 of them are found in Asia (mainly Korea, Japan an China). This information combined with the low probability of the off-target siRNA sequence being generated gives further reassurance of how specific inteference RNA is toward the BYV.

In case hits were found, depending on their number and localisation, they could be simply taken out of the targetted sequence (making it shorter), or even force the targetted window to switch, to avoid the creation of problematic sequences.

Figure 4: Corrected hits (errors, low quality, species not found in EU or US)

References

[1] Gavrilov K, Saltzman WM. Therapeutic siRNA: principles, challenges, and strategies. Yale J Biol Med. 2012 Jun;85(2):187-200. Epub 2012 Jun 25. PMID: 22737048; PMCID: PMC3375670.
[2] Hu B, Zhong L, Weng Y, Peng L, Huang Y, Zhao Y, Liang XJ. Therapeutic siRNA: state of the art. Signal Transduct Target Ther. 2020 Jun 19;5(1):101. doi: 10.1038/s41392-020-0207-x. PMID: 32561705; PMCID: PMC7305320.
[3] Assembly [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2012 – [cited 2024 August 09]. Available from: https://www.ncbi.nlm.nih.gov/assembly/
[4] Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. 2009. BLAST+: architecture and applications. BMC Bioinformatics, 10, 421.
[5] Kusstatscher P, Zachow C, Harms K, Maier J, Eigner H, Berg G, et al. Microbiome-driven identification of microbial indicators for postharvest diseases of sugar beets. Microbiome. 2019 Dec;7(1):1–12.