Trust us, it is totally safe!
Safety is one of the most important issue to be adressed innovation-wise. From the very start, Cap’siRNA expressed the idea of specific viral targeting through the design and engineering of silencing RNA precursors (siRNA precursors). Long 21 to 25 nucleotides, siRNA can only express their activity when 100% stringence is found with the target site [1]. This specificity allows extreme efficiency in neutralizing desired mRNA [1,2], but also raises the question of off-targetting. The chances of a non-targetted mRNA falling victim of siRNA by sheer coincidence are slim but not null given the number of living organisms on Earth. A key safety to safely spray siRNA in the environment is therefore to ensure that no living organism aside from the one/those targetted can suffer from it. In short, we need to ensure that our RNA-interferent based pesticide does not repeat the mistakes of the past-used pesticides (neonicotinoids, glyphosate, …). SafeRNA was built for the purpose of ensuring the safety for humans to work with our product, but also the safety of the soil microbiota too often forgotten, as well as pollinators, insects etc.
SafeRNA is a bio-informatic pipeline designed to be able to : - Collect coding sequences from desired taxa - Simulate targeting of siRNA toward the collected taxa - Analyze results and provide insight on how to improve the siRNA sequences
SafeRNA has a dedicated tool for each of those tasks, all three written in python, with the only necessary library being Biopython. It is however required to download the datasets.exe file from NCBI to be able to use the sequence collector tool [3]. Each of those programs is designed to solve a specific problematic :
The siRNA sequences produced will never be the exact same as the
length at which they are cut by the RISC complex vaies between 21 to 25
nucleotides. The program main.py
uses an algorithm to
predict all possible 21 nucleotides sequences long that can be obtained
from RISC activity, wherever it may cut the siRNA double stranded
precursor.
From the litterature it is relatively accessible to find the
microbial and fungi taxa involved in the soil microbiome positive
activity toward cultures. However collecting omic data for all of them
can prove tedious and impractical with a larger number of species. The
program get_data.py
leverages the datasets.exe executable
file from NCBI command line tools to massively download large lists of
taxa coding DNA sequences and make them ready for analysis.
siRNA can only work if their complementarity is perfect [1]. Thus to
check whether a siRNA sequence may be off-target, all possible siRNA
sequences, obtained from main.py
are aligned against the
database downloaded previously with get_data.py
. If the
alignment is successful, meaning if the 21-nucleotides sequence is found
in one of the coding sequence of a genome of the database, then this
sequence is deemed a “hit”. A report then gives insight
on the hits found in the database such as their number, the genome file
location and a direct link to the species/strain assembly.
A detailed user guide can be found in the README on the iGEM software tool repository for details on the practical use of SafeRNA.
The genomic data collection is performed using the command line tool
from NCBI, datasets.exe, and based on publications studying the
microbiota populations found in Beta vulgaris cultures [5]. The
list of bacteria and fungi taxa is gathered from the litterature, and
only the corresponding genomic sequences for the entire taxa are
downloaded from NCBI. We limit the downloading of sequences to after
2010 in order to avoid downloading too many poor quality DNA or too many
imcomplete genomic sequences in our database.
The taxa were
gathered based on the relative abundance found in the study of Peter
Kusstatscher et al. in 2019, describing the microbiome found in
Beta vulgaris cultures. All the coding sequences found in each
taxa and sequenced between 2010 and 2024 were downloaded to build a
database.
The siRNA sequences are obtained through a naive algorithm
gathering all possible 21-nucleotides sequences from a precursor using a
sliding window of 21 nucleotides along the precursor RNA sequence. All
siRNA are then assigned an ID, and finally written in a file in fasta
format ready to be used.
Figure 1: Scheme of the sliding window program generating all
possible siRNA sequences from the target sequence.
A BLASTn (Basic Local Alignment Search Tool for nucleotides)
alignment is then conducted between the genomes and the siRNA files
using the short-word
parameter. Local BLAST automatically
discard short sequences. This parameter allows them to be taken into
account. A BLAST alignment uses the BLAST algorithm to perform an
alignment between sequences and find similarities or
complementarity between the two of them. This means that both following
alignments are considered “hits”, given the nature of the double
stranded RNA we are producing through synthetic biology.
Figure 2: Scheme of alignment counted as hits.
The database assembled more than 50,000 genomes coding sequences from
17 different taxa of bacteria and fungi, resulting in a 200Gb database.
Barplot des taxons et abondance de séquences
The general formula for the number of 21-nucleotides long siRNA that
can be obtained in a sequence is the following: nsiRNA = Ltarget − 21
We target RNA-dependant RNA-polymerase (RdRp) and the p21 sequences from
the BYV, respectively 1302 and 408 nucleotides long. Therefore the
number of different siRNA sequence possible to obtain from those
sequence is respectively 1281 and 387 for our targets.
Runtime of aligning 50,000 assembly on all the siRNA sequences took 36 hours. 8 hits were found in the entire database from 8 different bacterial strains. Each time only one siRNA amongst all those possible. This means the probability of an off-target siRNA to be produced is P = 1/(nsiRNA)$. The probability of one being generated is therefore very low.
Figure 3: Graph of total hits respect to the number of
sequence obtained
To further the result analysis, we analyzed the strains found as
potential off-targets and their geographical localization (as a
geographical filter is not included at the moment in the search and
download program). Consequently, it appeared that none of the
bacteria species and strain found as off-target can be found in
Europe : all 8 of them are found in Asia (mainly Korea, Japan an China).
This information combined with the low probability of the off-target
siRNA sequence being generated gives further reassurance of how specific
inteference RNA is toward the BYV.
In case hits were found, depending on their number and localisation,
they could be simply taken out of the targetted sequence (making it
shorter), or even force the targetted window to switch, to avoid the
creation of problematic sequences.
Figure 4: Corrected hits (errors, low quality, species not
found in EU or US)
[1] Gavrilov K, Saltzman WM. Therapeutic siRNA: principles,
challenges, and strategies. Yale J Biol Med. 2012 Jun;85(2):187-200.
Epub 2012 Jun 25. PMID: 22737048; PMCID: PMC3375670.
[2] Hu B,
Zhong L, Weng Y, Peng L, Huang Y, Zhao Y, Liang XJ. Therapeutic siRNA:
state of the art. Signal Transduct Target Ther. 2020 Jun 19;5(1):101.
doi: 10.1038/s41392-020-0207-x. PMID: 32561705; PMCID: PMC7305320.
[3] Assembly [Internet]. Bethesda (MD): National Library of Medicine
(US), National Center for Biotechnology Information; 2012 – [cited 2024
August 09]. Available from: https://www.ncbi.nlm.nih.gov/assembly/
[4] Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J.,
Bealer, K., and Madden, T.L. 2009. BLAST+: architecture and
applications. BMC Bioinformatics, 10, 421.
[5] Kusstatscher P,
Zachow C, Harms K, Maier J, Eigner H, Berg G, et al. Microbiome-driven
identification of microbial indicators for postharvest diseases of sugar
beets. Microbiome. 2019 Dec;7(1):1–12.