Overview
To screen patients with cardiovascular diseases for their risk of cancer, our project aims to develop a LIRA-based system to target miRNA that are both highly expressed in CVD patients and capable of promoting cancer. LIRA is a designed RNA sequence with stem-loop structures, which could be activated by its target RNA to initiate the expression of reporter gene. To establish an ideal LIRA system, our model group mainly need to solve the following four questions.
Question 1:
Which miRNA to target?
Question 2:
How to design LIRA to target miRNA?
Question 3:
How to improve the design of LIRA?
Question 4:
How to tailor LIRA-mediated reaction for samples with miRNA at different concentrations?
To answer
Question 1
, we applied differential expression analysis in GEO database to identify miRNA that are highly expressed in patients post myocardial infarction (MI) or with heart failure. After differential analysis of miRNA in cancer database from TCGA, we identified miR-210-3p and miR-142-3p as ideal targets, as they are both highly expressed in multiple types of cancer and in post MI patients. Additionally, we also verified that these two miRNAs could promote migration and invasion of cancer cells with wet lab experiments.
To address
Question 2
and
Question 3
, we designed single-arm LIRAs to target miR-210-3p or miR-142-3p initially. Next, we designed double-arm LIRAs to target miR-210-3p and miR-142-3p, which could only be activated in the presence of both miRNAs. To optimize the design of LIRA, we analyzed multiple indicators of LIRA, and used Multinomial Logistic Regression to evaluate the potential contribution of these indicators to the expected performance to LIRA. To improve model performance, we introduced Random Forest to identify key indicators that could affect the stability and function of LIRA.
To answer
Question 4
, we conducted kinetic modeling to simulate the reaction of LIRA-mediated expression of reporter gene to obtain predicted time and concentration curves of the reaction. Furthermore, using 3D curves, we simulated the reaction with miR-210-3p and miR-142-3p at different concentrations, and estimated suitable concentration of LIRA for detecting miRNAs at a variety of concentration.
In summary, our model work laid the ground for our wet lab experiments that generates the LIRA to target miR-210-3p and miR-142-3p, and paved the way for the future application of the LIRA in the screening of cancer risk in post-MI patients.
ldentify target miRNA for cancer screening in CVD patients
Epidemiology studies have revealed that patients with CVDs, such as myocardial infarction (MI), heart failure (HF) and atherosclerosis, have increased risk of getting cancer[1-4]. For instance, MI patients exhibit increased prevalence of a variety of cancers and also elevated mortality that are caused by cancer[1]. HF patients also show higher cancer risk compared to healthy individuals[5]. Experiments using animal models reveal that MI enhances tumor progression through circulating factors and immune reprogramming[6,7]. Especially, MI leads to aberrant miRNA secretion from the inflicted heart into blood, which could circulate to distant organs and contribute to cancer progression at distant organs[8]. To find good targets for our cancer screening for CVD patients, we analyzed database of CVD, including MI and HF, and databases of multiple cancers, to identify miRNA that show increased expression in both CVD and cancer, and then verified the function of the identified miRNA on cancer progression.
Analysis of miRNAs in MI and cancer databases
The whole process of our analysis of MI and cancer database is shown in Figure 1. First, we searched Gene Expression Omnibus (GEO) database using myocardial infarction and miRNA as key words and collected 318 datasets. Since our aim is to identify highly expressed miRNA in the blood of MI patients, we manually excluded 305 irrelevant datasets based on the sample sources, species and disease descriptions. To the remaining 13 datasets, we calculated the log2FC value of the miRNAs based on the ratio of the mean values of the patient samples and control and applied logarithmic transformation to the ratio[9]. We used Student's t-test to obtain P-values for each miRNA. Furthermore, to reduce the occurrence of false positives, we adjusted the P-values by comparing these P-values with the threshold selected for an acceptable false discovery rate (FDR)[10], and used adj.P value to refer FDR. We used the log2FC > 1 and adj.P Value < 0.05 as our screening criteria[11], and got 9 datasets that contained significantly upregulated miRNA.
Figure 1. Flowchart of miRNA analysis in MI and cancer database
Second, we summarized multiple epidemiology studies and identified 33 types of cancers showing faster progression or higher risk in CVD[12-16]. We conducted differential expression analysis of miRNA expression of these 33 types of cancer using the Cancer Genome Atlas Program (TCGA) database and obtained 22 cancer datasets that show significantly upregulated expression of miRNA using our screening criteria (log2FC > 1 and adj.P Value < 0.05). The names of these 22 types of cancers are listed in Table 1.
Table 1. The 22 types of cancer that show significantly upregulated expression of miRNA
Abbreviation Disease Name Abbreviation Disease Name
BLCA Bladder Urothelial Carcinoma LUAD Lung Adenocarcinoma
BRCA Breast Invasive Carcinoma LUSC Lung Squamous Cell Carcinoma
CESC Cervical Squamous Cell Carcinoma & Endocervical Adenocarcinoma PAAD Pancreatic Adenocarcinoma
CHOL Cholangiocarcinoma PCPG Pheochromocytoma & Paraganglioma
COAD Colon Adenocarcinoma PRAD Prostate Adenocarcinoma
ESCA Esophageal Carcinoma READ Rectum Adenocarcinoma
HNSC Head And Neck Squamous Cell Carcinoma SKCM Skin Cutaneous Melanoma
KICH Kidney Chromophobe STAD Stomach Adenocarcinoma
KIRC Kidney Renal Clear Cell Carcinoma THCA Thyroid Carcinoma
KIRP Kidney Renal Papillary Cell Carcinoma THYM Thymoma
LIHC Liver Hepatocellular Carcinoma UCEC Uterine Corpus Endometrial Carcinoma
We compared the significantly upregulated miRNAs in MI and cancers to identify miRNAs that are significantly upregulated in both diseases and obtained 24 miRNAs. We think that two criteria are especially important for the potential effects on cancer promotion of these 24 miRNAs during CVD:
1. log2FC values in MI. A high log2FC value of the miRNA in MI demonstrates a large increased amount of this miRNA in the blood of MI patients, which indicates that a large amount of this miRNA is likely to reach distant organs through circulation system and exerts its potential cancer-promoting effects at distant organs.
2. Types of cancers that show significantly upregulated expression of these miRNAs. MiRNAs that are highly expressed in multiple types of cancers could potentially influence the progression of multiple types of cancers. We would like to design a screening kit to identify MI patients with increased risk of getting of a variety of cancers, instead of a certain type of cancer.
We summarized this two key information of the 24 miRNAs in Table 2.
Table 2. miRNAs significantly upregulated in both MI and cancers
miRNA ID log2FC Value in MI Types of Cancers miRNA ID log2FC Value in MI Types of Cancers
miR-210-3p 6.18 13 miR-320b 1.49 1
miR-142-3p 3.51 10 miR-199a-3p 1.48 3
miR-455-5p 3.09 1 miR-362-3p 1.39 1
miR-483-3p 2.79 1 miR-582-3p 1.31 1
miR-199b-5p 2.31 3 miR-146b-5p 1.26 9
miR-629-3p 2.19 1 miR-455-3p 1.23 5
miR-330-5p 2.14 4 miR-486-5p 1.17 2
miR-483-5p 2.02 1 miR-28-5p 1.15 1
miR-576-5p 1.95 4 miR-671-5p 1.06 3
miR-107 1.82 1 miR-628-5p 1.05 1
miR-450b-5p 1.58 3 miR-142-5p 1.03 4
miR-199a-5p 1.52 1 miR-127-5p 1.01 4
To further evaluate the log2FC in MI and types of cancers of these 24 miRNAs, we applied a scatter dots graph to visualize these data (Figure 2). MiR-210-3p and miR-142-3p distinguish from other miRNAs in this graph, as they show both much higher log2FC values in MI and much more types of cancers that showed significantly upregulated expression of them than other miRNAs.
Figure 2. Scatter dots graph of the 24 miRNA on log2FC in MI and Types of Cancers
As shown in Table 2, miR-210-3p ranks the highest in log2FC value in MI (6.18), which demonstrate 73 folds of increased expression in MI patients, and show significantly upregulated expression in 13 types of cancers, with log2FC values in cancer ranged from 1.02 to 5.04. The log2FC values of miR-210-3p in 13 types of cancers are shown in Table 3.
Table 3. Types of cancers with significantly upregulated expression of miR-210-3p
Types of Cancers log2FC Value in Cancer
Cervical squamous cell carcinoma
and endocervical adenocarcinoma
5.04
Bladder Urothelial Carcinoma 4.80
Lung Adenocarcinoma 4.54
Lung Squamous Cell Carcinoma 4.51
Uterine Corpus Endometrial Carcinoma 3.14
Breast Invasive Carcinoma 3.14
Kidney Renal Clear Cell Carcinoma 3.07
Head and Neck Squamous Cell Carcinoma 2.52
Kidney Renal Papillary Cell Carcinoma 1.82
Esophageal Carcinoma 1.58
Stomach Adenocarcinoma 1.47
Prostate Adenocarcinoma 1.30
Liver Hepatocellular Carcinoma 1.02
miR-142-3p ranks the second highest in log2FC value in MI (3.51), which demonstrated 11 folds of increased expression in MI patients, and show increased expression in 10 types of cancers, with log2FC values in cancers range from 1.04 to 6.61. The log2FC values of miR-142-3p in 10 types of cancers are shown in Table 4.
Table 4. Types of cancers with significantly upregulated expression of miR-142-3p
Types of Cancers log2FC Value in Cancer
Rectum Adenocarcinoma 6.61
Colon Adenocarcinoma 6.55
Cervical squamous cell carcinoma
and endocervical adenocarcinoma
3.58
Lung Adenocarcinoma 2.29
Kidney Renal Clear Cell Carcinoma 2.03
Breast Invasive Cancer 1.87
Uterine Corpus Endometrial Carcinoma 1.27
Bladder Urothelial Carcinoma 1.24
Lung Squamous Cell Carcinoma 1.09
Head and Neck Squamous Cell Carcinoma 1.04
To summarize, our analysis shows that miR-210-3p and miR-142-3p are the best target miRNAs to identify MI patients with high risk of getting cancers.
Analysis of miRNA in HF and cancer databases
HF is a condition resulting from the impairment of cardiac contractility and/or diastolic function[17].Current study also reveals that HF can promote cancer[6]. Hence we also analyzed miRNAs that are highly expressed in HF patients in GEO database. The analysis process is shown in the flowchart (Figure 3).
Figure 3. Flowchart of miRNA analysis in HF and cancer database
We collected 158 datasets from GEO database using heart failure and miRNA as key words. After we manually excluded 151 irrelevant datasets based on the sample sources, species and disease descriptions, we got 7 datasets to analyze miRNA expression. Using log2FC > 1 and adj.P Value < 0.05 as screening criteria, we got 6 datasets that contain significantly upregulated miRNAs in HF. We compared the significantly upregulated miRNAs in HF with those in cancers, which was described in the previous section, we got 35 miRNAs that show significantly upregulated expression in both diseases. We summarized the log2FC in HF and types of cancers that show significantly upregulated expression of these miRNAs in Table 5.
Table 5. miRNAs significantly upregulated in both HF and cancers
miRNA ID log2FC Value in HF Types of Cancers miRNA ID log2FC Value in HF Types of Cancers
miR-944 5.60 1 miR-331-5p 1.75 1
miR-223-3p 4.45 5 miR-203a-3p 1.70 6
let-7f-5p 4.26 2 miR-628-5p 1.51 2
miR-15a-5p 3.54 4 miR-143-5p 1.47 1
miR-16-5p 3.46 5 miR-18a-5p 1.45 5
let-7g-5p 3.25 2 miR-15b-5p 1.45 3
miR-126-3p 3.20 2 miR-301a-3p 1.41 7
miR-19b-3p 2.60 4 miR-106a-5p 1.30 8
miR-451a 2.56 4 miR-20b-5p 1.24 4
miR-26b-5p 2.52 2 miR-17-5p 1.23 13
miR-330-5p 2.50 4 miR-205-5p 1.19 6
miR-4746-5p 2.32 2 miR-484 1.16 1
miR-455-3p 2.19 5 miR-642a-5p 1.13 1
miR-9-5p 2.19 7 miR-944 1.12 1
miR-142-3p 2.13 10 miR-7-1-3p 1.10 4
miR-1976 2.11 1 miR-425-5p 1.08 7
miR-200c-3p 2.01 10 miR-4662a-5p 1.06 1
miR-125a-3p 1.94 1
To further evaluate the log2FC in HF and types of cancers of these 35 miRNAs, we applied a scatter dots graph to visualize these data (Figure 4). Unlike the results of analysis in MI, these miRNAs all clustered in the lower-left part of the graph, demonstrating that no miRNA showed superior performances on both log2FC values in HF and types of cancers that showed significantly upregulated expression.
Figure 4. Scatter dots graph of the 24 miRNAs on log2FC in HF and Types of Cancers
In summary, we did not identify ideal miRNAs as targets for cancer screening from our analysis of HF database.
Selection of miR-210-3p and miR-142-3p as target miRNAs for cancer screening in MI patients
From our analysis of MI database, we found that miR-210-3p and miR-142-3p are two outstanding candidates biomarkers to identify MI patients with high risk of cancer. We used an ROC curve to evaluate the diagnostic performance of miR-210-3p and miR-142-3p on our collected raw dataset samples from MI (Figure 5). We found that combined ROC curve of miR-210-3p and miR-142-3p (AUC=0.78) showed better results than ROC of miR-210-3p alone (AUC=0.66) and ROC of miR-142-3p alone (AUC=0.60), which demonstrates that using miR-210-3p and miR-142-3p as target miRNA simultaneously could decrease the probability of false discovery rate of detecting MI than using these 2 miRNAs as target miRNA separately.
Figure 5. ROC Curves with miR-210-3p and miR-142-3p in MI
We also revisited our HF screening dataset to specifically analyze miR-142-3p and miR-210-3p. MiR-142-3p was identified from our analysis of HF dataset with log2FC value (2.13), which demonstrated a 5 folds increased expression (Figure 4). MiR-210-3p failed to be selected from our analysis of HF dataset due to the log2FC value (0.67) did not pass our screening criteria.
From our analysis, we suggest using both miR-210-3p and miR-142-3p together as targets to screen MI patients to identify those with high risks of getting cancer.
Verification of miR-210-3p and miR-142-3p in cancer cells
Through bioinformatic analysis, we identified miR-210-3p and miR-142-3p as potential targets for screening cancer risk in MI patients. We did literature research of these two miRNA, and found that they are involved in the regulation of migration and invasion of cancer cells[18-21] .To verify whether miR-210-3p and miR-142-3p can promote migration and invasion of cancer cells, mimics and inhibitors of miR-210-3p or miR-142-3p were transfected into human renal clear cell carcinoma cell line 786-o cells, and the transfected cells were tested by Wound Healing and Transwell assays.
Ⅰ. miR-210-3p
To verify the function of miR-210-3p in cancer cells, we did Transwell and Wound Healing assays with 786-o cells transfected with mimics of miR-210-3p (mimics) and inhibitor of miR-210-3p (inhibitor). In Wound Healing assay, we evaluated the relative migration levels through measuring the wound area of 786-o cells transfected with control, mimics and inhibitor of miR-210-3p. Statistical analysis of the Wound Healing assays shows that 786-o cells transfected with mimics of miR-210-3p migrated faster than control cells and 786-o cells transfected with inhibitor of miR-210-3p migrated slower than control cells (Figure 6). Next, we evaluate the invasion levels through counting the amounts of 786-o cells that colonized in the lower well of transwell assays. Statistical analysis demonstrated that 786-o cells transfected with mimics of miR-210-3p showed higher invasion capability than control cells and 786-o cells transfected with inhibitor of miR-210-3p showed lower invasion ability than control cells (Figure 7). To summarize, 786-o cells with overexpression of miR-210-3p have increased migration and invasion capabilities, demonstrating that miR-210-3p could promote the migration and invasion of cancer cells.
Figure 6. Wound Healing assays of 786-o cells transfected with control, mimics-210-3p and inhibitor-210-3p. (A) Representative images of Wound Healing assays. (B) Statistical analysis of relative migration levels in the wound healing assays (n = 6). Data are presented as means ± SEM. * indicates p < 0.05, and ** indicates p < 0.01. Standard error and statistics are shown in graph.
Figure 7. Transwell assays of 786-o cells transfected with control, mimics-210-3p and inhibitor-210-3p. (A) Representative images of Transwell assays. (B) Statistical analysis of relative invasion levels in the Transwell assay (n=5). Data are presented as means ± SEM. *** indicates p < 0.001. Standard error and statistics are shown in graph.
Ⅱ. miR-142-3p
Similarly, to verify the function of miR-142-3p in cancer cells, we did Transwell and Wound Healing assays with 786-o cells transfected with mimics of miR-142-3p (mimics) and inhibitor of miR-142-3p (inhibitor). In Wound Healing assay, we evaluated the relative migration levels through measuring the wound area of 786-o cells transfected with control, mimics and inhibitor of miR-142-3p. Statistical analysis of the Wound Healing assays shows that 786-o cells transfected with mimics of miR-142-3p migrated faster than control cells and 786-o cells transfected with inhibitor of miR-142-3p migrated slower than control cells (Figure 8). Next, we evaluate the invasion levels through counting the amounts of 786-o cells that colonized in the lower well of Transwell assays. Statistical analysis demonstrated that 786-o cells transfected with mimics of miR-142-3p showed higher invasion capability than control cells and 786-o cells transfected with inhibitor of miR-142-3p showed lower invasion ability than control cells (Figure 9). In summary, 786-o cells with overexpression of miR-142-3p have increased migration and invasion capabilities, demonstrating that miR-142-3p could promote the migration and invasion of cancer cells.
Figure 8. Wound Healing assays of 786-o cells transfected with control, mimics-142-3p and inhibitor-142-3p. (A) Representative images of Wound Healing assays.(B) Statistical analysis of relative migration levels in the wound healing assays (n = 6). Data are presented as means ± SEM. * indicates p < 0.05. Standard error and statistics are shown in graph.
Figure 9. Transwell assays of 786-o cells transfected with control, mimics-142-3p and inhibitor-142-3p. (A) Representative images of Transwell assays. (B) Statistical analysis of relative invasion levels in the Transwell assay (n=5). Data are presented as means ± SEM. *** indicates p < 0.001. Standard error and statistics are shown in graph.
Together, the results of transwell and Wound Healing assays demonstrated that increased expression of miR-210-3p and miR-142-3p in MI patients might enhance the invasion and migration of cancer cells, which supports our idea of developing a screening kit that targets miR-210-3p and miR-142-3p to MI patients with high risk of cancer.
Design LIRA to target miR-210-3p and miR-142-3p
To design a screening approach that can target miR-210-3p and miR-142-3p, we adapted from an innovative approach named Loop-Initiated RNA Activator (LIRA) that was used to detect viral RNA[22]. As shown in Figure 10, LIRA forms a stem-loop structure that mainly contains a recognition region, a ribosome binding sequence (RBS) and a start codon AUG. Without interaction with target RNA, the RBS and start codon in LIRA are kept in a close status in the stem structure, making them impossible to initiate translation of downstream reporter gene. The recognition region is a 31 nucleotides (nt) sequence that is reverse complimentary to the target RNA of LIRA. While 21 nt of the reverse complimentary sequence is designed to locate in the loop structure of LIRA, which is named as a*, 10 nt of the reverse complimentary sequence is designed to locate in the stem structure of LIRA, which is named as b*. As a result, the target RNA of LIRA can bind to both a* and b*, and the binding of the target RNA to b* destabilize the stem structure that locks RBS and start codon, which makes them to be exposed in an open status for the translation initiation of downstream reporter gene. Due to the complexity of secondary structure and complicated hybridization process of RNA, we used NUPACK software to design and analyze LIRA sequences in the subsequent study.
Figure 10. Illustration of LIRA system[22]
Design LIRAs to target miR-210-3p and miR-142-3p separately
To design LIRA sequences that target miR-210-3p and miR-142-3p separately, we first determined the composition and structure of each part of LIRA, and replaced the recognition region with reverse complementary sequence of miR-210-3p and miR-142-3p. We used the design function of NUPACK to design the rest of LIRA sequences. Next, we used the analysis function of NUPACK to simulate the hybridization process of LIRA, and the nucleotide bases in the mismatch region of the LIRA with abnormal structure were modified to achieve our design expectation, which is to keep RBS and AUG at a closed status in a stem region without interactions with target miRNA and release RBS and AUG to an open status in a single strand RNA with interactions with target miRNA. The procedure of our LIRA designing is listed below.
Figure 11. Parameter settings for single-arm LIRA design. (A) Set Domian LIRA Toehold With 1 Segment of Sub-Sequences. (B) Domian miRNA settings. (C) Strands settings. (D) Complex and Target Structure settings.
During the process of designing, we found that our situation differed from the case reported in the literature in that the target RNA used in the literature was 31nt, while the target miRNAs used by our team were 22nt and 23nt respectively[18]. Due to the difference in the length of the target RNA, we were unable to use the way of LIRA design reported in the literature directly. To determine the distribution of the recognition region on our LIRA, we defined a concept of stem-loop ratio: the ratio of the number of nucleotides of recognition region in the stem structure to the number of nucleotides of recognition region in the loop structure. We tried to design LIRA structure using different stem-loop ratios. We found that when the number of nucleotides of the recognition region in the stem is too large, it will partially overlap with the complementary sequence of RBS, which will lead to the exposure of the RBS sequence. Therefore, we designed LIRA to miR-210-3p with the stem-loop ratios from 0/22 to 13/9 and LIRA to miR-142-3p from 0/23 to 13/10. The hybridization of LIRA to their target miRNAs was also analyzed with NUPACK. The pairing results and related data are shown in Table 6 and Table 7.
Table 6. Hybridization results of 14 LIRA structures of miR-210-3p and related data
Stem-loop ratio Single-arm sequence Conformation of LIRA alone Conformation of LIRA hybridized with miR-210-3p
0/22 CUUGCUGGUGCUCCCUUUCCUGGGGUUCGGUCAUCAGCCGCUGUCACACGCACAGGAUAGGAGGAGAGGAGAGGAGACAUGUGUGAG
1/21 GAGUCGAAUCCUAAUUUUUCUGAGUACGAGUAUCAGCCGCUGUCACACGCACAGAUACGCGUACAGAGGAGAGAGAGGAUGAGGUUC
2/20 GUCUAGGAUGUGGGGUCUCUUAGCCGCCGGCUCAGCCGCUGUCACACGCACAGAGAGUGGGUGGAGAGGAGAUUAAAUAUGAUAGAC
3/19 CUCUCAGAUUUGAGCUCUUUUAGGUGUAGGUCAGCCGCUGUCACACGCACAGAAUGGCGUACACAGAGGAGAGGGAAAAUGAGAGAG
4/18 GAGUCGAGUGCGACAUCUCCUGAUGUCCGUCAGCCGCUGUCACACGCACAGAAACUGAGGGGCAAGAGGAGAUGAAGCAUGAGGGAC
5/17 GUCCGGGGUCAUGUCUCUUCUAACUUGCUCAGCCGCUGUCACACGCACAGAAAUGGGGUGUAGGAGAGGAGAGGGUUGAUGACGGAC
6/16 CCUGCGGAUCGUUGCUCUCCUGAUGUUUCAGCCGCUGUCACACGCACAGAAUGAGGGGCGGGUAAGAGGAGAGUGGGGAUGGGGGGG
7/15 CGGUGGAAUUGAACGUCUUCUGAUGGUCAGCCGCUGUCACACGCACAGGAGCUCCGGCGGCGGAAGAGGAGAUGCGCGAUGACACUG
8/14 CCUCGAGAUCUUUCAUUUCCUAACGUCAGCCGCUGUCACACGCACAGAAAAUGAGUGGGUGGUGAGAGGAGAUGGGAGAUGGUGGGG
9/13 GUGUAAAGUUUAGCUUUUUCUAAUUCAGCCGCUGUCACACGCACAGAAUGGCUAGGCGACUGGAAGAGGAGAGGAGAGAUGAUGUAC
10/12 CUGUCGGAUAUCUCAUCUUUUACUCAGCCGCUGUCACACGCACAGAUAAAAAGUCAGUCGCUGAAGAGGAGAUGGUAUAUGAGAUAG
11/11 CUACCAGGUAACAAGUUUUCUGUCAGCCGCUGUCACACGCACAGGAUAACACACACAGGGGUUGAGAGGAGAUUAUUUAUGAGGUAG
12/10 GUAUGGGAUGAAACAUUUUCUUCAGCCGCUGUCACACGCACAGGAUUUAAAGAAGGCGACGGUUAGAGGAGAUGGAUUAUGGUAUAC
13/9 GGCGGGAAUACAUAUUUUUCUCAGCCGCUGUCACACGCACAGUAAAAAAUAAAUUGACGGGGGCAGAGGAGAGGGGGGAUGAUCGUU
Table 7. Hybridization results of 14 LIRA structures of miR-142-3p and related data
Stem-loop ratio Single-arm sequence Conformation of LIRA alone Conformation of LIRA hybridized with miR-142-3p
0/23 GCUGCUGAUGAAACAUCUCCUAACAGCGUCAGUCCAUAAAGUAGGAAACACUACAGAGUGGGGGAGAGGAGAUGGCAAAUGUGUAGC
1/22 CUAAGAAGUCUCUGUUCUUCUAAGCAGGACGUUCCAUAAAGUAGGAAACACUACAGCGCGGUGCAGAGGAGAACGGAGAUGACGGGG
2/21 CCUACGGAUUGGGCGUCUUCUAAUCUGUCGGUCCAUAAAGUAGGAAACACUACAGACCAGCGGGAGAGGAGACGGAAAAUGCGAAGG
3/20 GUGGCUAAUCUGACGUCUUCUGAGAUCGCGUCCAUAAAGUAGGAAACACUACACGGACACGGUGAGAGGAGAUGAAAGAUGGGAAAC
4/19 GUUCGUAAUUCUGACUCUCUUAGUUGCCAUCCAUAAAGUAGGAAACACUACACCUGGGUGGGGAAGAGGAGAGUGUGAAUGGGGGGC
5/18 GAUAAAGGUGGGGGUUCUUCUGGCCGUCUCCAUAAAGUAGGAAACACUACACGCGUGGCGGCGGAGAGGAGAAUGGGGAUGGUUAUC
6/17 GAAGGGAGUGGAAUUUUUCUUGGCAGGUCCAUAAAGUAGGAAACACUACACCGAGGGGGGGGUGAGAGGAGAGACGUUAUGAGGGGC
7/16 GCGCUGAAUUAGUGUUCUCUUGGUAUUCCAUAAAGUAGGAAACACUACAGGGGAGGAUAGAAUAAGAGGAGAGCUGUAAUGAAAUGC
8/15 GAAGCAAGUACUAGAUCUUCUAAGAUCCAUAAAGUAGGAAACACUACACGAAGCGGGAAGGGUUAGAGGAGAUUGGGUAUGAGGUUC
9/14 GUAGUGAGUUAUCCGUCUCCUGAAUCCAUAAAGUAGGAAACACUACACACCCCGGGGGCUGGAUAGAGGAGACGUUUAAUGAACUGC
10/13 GGUCGGGAUUUAGCUUCUUCUGAUCCAUAAAGUAGGAAACACUACAUAGGGACAACUUAGUGGAAGAGGAGAAGGGAAAUGGCGAUC
11/12 GUGUUAAAUGUAAUGUUUUCUGUCCAUAAAGUAGGAAACACUACAUACGACGAAUACUAGGGGGAGAGGAGAUGGAACAUGAGAUAC
12/11 CCACCGAAUGCCUAUUCUCCUUCCAUAAAGUAGGAAACACUACACUGGGCCACGCGGGGGUAGGAGAGGAGAAAAAGCAUGAGGUGG
13/10 GCGUGAGAUCACCGUUCUUCUCCAUAAAGUAGGAAACACUACAAGGAUGUGAUACGAAGUGUAUAGAGGAGAGUCGGGAUGAGGGGU
We think that the greater the free energy of a LIRA, the easier it is for LIRA structures to be paired with miRNAs for translation; the smaller the free energy of a LIRA, the more stable the structure of LIRA is. Therefore, we chose free energy as an important index to analyze the hybridization of LIRAs to their target miRNAs. We calculated the changes of free energies of the LIRA before and after hybridization and fitted the changes of free energy to stem-loop ratio. From the fitting curve, it seems that there is no regularity between the changes of free energy and stem-loop ratio (Figure 12 and Figure 13).
Figure 12. Fitting curves of the free energy of LIRA to miR-210-3p
Figure 13. Fitting curves of the free energy of LIRA to miR-142-3p
Since the LIRA sequence designed by NUPACK algorithm in the unrestricted region is random, which causes large variations among these designed LIRA. Therefore, we designed 10 sequences for each stem-loop ratio and averaged the free energy values of these LIRA. A new fitting was performed with the averaged free energy and stem-loop ratio (Figure 14 and Figure 15). The new fitting curve demonstrates that there is no linear relationship between LIRA free energy and stem-loop ratio, indicating that LIRA would not become more or less stable as stem-loop ratio increase or decrease.
Figure 14. New fitting curve of miR-210-3p
Figure 15. New fitting curve of miR-142-3p
Next, we calculated the changes of free energy as the differences of free energy of LIRA before and after hybridization with its target miRNA, and analyzed the relation between LIRA free energy and the changes of free energy. We observed that the data had a clear linear relationship, so we linearly fitted it. The fitting results are shown in Figure 16 and Figure 17. The slope of the fitted curve is approximately equal to -1, which indicates that the free energy of LIRA after hybridization with its target miRNA is stable at a constant value. The free energy of LIRA hybridized with miR-210-3p is around -54 kcal/mol, and the free energy of LIRA hybridized with miR-142-3p is around -44 kcal/mol. These results show that free energy of LIRA before hybridization might be a more important indicator to determine the quality of a LIRA.
Figure 16. Linear fitting results of miR-210-3p
Figure 17. Linear fitting results of miR-142-3p
Furthermore, we averaged 10 LIRA free energies at the same stem-loop ratio, which could represent the LIRA free energy and hybridization free energy at certain stem-loop ratio. Subsequently, we performed a clustering with these 28 points, and observed a linear relationship between LIRA free energy and the averaged changes of free energy (Figure 18 and Figure 19).
Figure 18. miR-210-3p clustering results
Figure 19. miR-142-3p clustering results
From our analysis of LIRA to miR-210-3p and miR-142-3p, we concluded that stem-loop ratio has little effect on the free energy of LIRA. For our subsequent wet lab experiments, we chose 3 LIRA sequences for miR-210-3p with stem-loop ratios of 2/20, 7/15, and 10/12, and 3 LIRA sequences for miR-142-3p with stem-loop ratios of 3/20, 8/15, and 11/12. We selected these 6 LIRA sequences to verify whether stem-loop ratio indeed had no effect on the performance of LIRA. Meanwhile, since the free energies of these 6 LIRAs are very different, the influence of the LIRA free energy on the functions of LIRA can also be investigated (Table 8). In this regard, we expect that a stable LIRA structure is often accompanied by a lower free energy.
Table 8. 6 LIRA data
miRNA Stem-loop ratio Free energy of LIRA before
hybridization(kcal/mol)
miR-210-3p 2/20 -66.61
miR-210-3p 7/15 -62.10
miR-210-3p 10/12 -56.54
miR-142-3p 3/20 -54.83
miR-142-3p 8/15 -47.89
miR-142-3p 11/12 -41.10
Design double-arm LIRA to target miR-210-3p and miR-142-3p simultaneously
To target miR-210-3p and miR-142-3p simultaneously, we designed LIRAs with double-arm structure that can target two target RNA[22]. As shown in Figure 20, double-arm LIRA has two recognition regions, A* and B*. While A* is located on the short arm of the double-arm structure, B* is located on the long arm of the double-arm structure, which is connected to RBS and AUG. We think that putting one miRNA reverse complimentary sequence in A* and the other miRNA reverse complimentary sequence in B* would make our LIRA targeting two miRNAs. Our expectation is that only when the LIRA binds to the target miRNA A and then bind to the target miRNA B, it would expose the RBS and AUG in an open status to initiate the translation of the downstream reporter gene, and otherwise the RBS and AUG would be locked to at a closed status in stem region.
Figure 20. Double-arm LIRA structure[22]
We identified the composition of each part of double-arm LIRA and replaced the recognition region with sequences that were reverse complimentary to miR-210-3p and miR-142-3p. Utilizing the design function of NUPACK, we designed a primary double-arm LIRA structure and obtained the sequence. The following is the design procedures of LIRA sequences to detect miR-210-3p and miR-142-3p in NUPACK.
Figure 21. Parameter settings for double-arm LIRA design. (A) Set Domian LIRA Toehold With 1 Segment of Sub-Sequences. (B) Domian miRNA settings. (C) Strands settings. (D) Complex and Target Structure settings.
During the process of simulation, we utilized NUPACK to generate 100 double-arm LIRAs. First, we hybridized each double-arm LIRA to both miR-210-3p and miR-142-3p , and found that 32 double-arm LIRAs could only hybridize to a single miRNA. We analyzed the results of the remaining 68 LIRA that could hybridize to both miRNAs, and 21 LIRAs did not form a correct base-pairing structure between RBS and AUG. We then modified the bases between RBS and AUG in these 21 LIRAs according to the hybridization results to ensure that RBS and AUG could be successfully exposed after hybridization. Finally, we analyzed the sequences at other regions of these 21 LIRAs, and found that 15 of them would destabilize AUG and RBS when LIRA interacts with miR-210-3p alone. We finally get 6 ideal double-arm LIRA that meet the design expectation, e.g., the RBS and AUG remain in a close status without interaction with miR-210-3p and miR-142-3p, and keep in an open status with interactions with miR-210-3p and miR-142-3p. The simulation results of these 6 LIRAs are shown in Table 9.
Table 9. Hybridization results of 6 double-arm LIRA sequences to detect 2 miRNAs
Number Sequence Conformation of LIRA alone Conformation of LIRA hybridized with
miR-210-3p and miR-142-3p
1 AGCCUCAGCGGUCGACCCGCGGUCCAUAAAGUAGGAAACACUACAUUUGUGGGGAGCGGGGUAUAUCAUCCUCUCCUAGUCAGCCGCUGUCACACGCACAGCGGCUGGAGAGGAGAAUAUGAUAUACGGGGCCGAGGGGGCU
2 GUCCACGUUGAGCGUCCAGUCCAUAAAGUAGGAAACACUACAUUUGUGGAAAGGAUGGGUGCCUGUCCGAUCUCAGCUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGAUGAUAUGGGCGCCGACAACGCAGAU
3 CGAUUUUCUCUUGCAGGAAUCCAUAAAGUAGGAAACACUACAUUUAUGGACGCCUGCCGGUCCCGUACAUUCUCAACUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGAAUACAUGGGGCUGUCGAGAACUUCG
4 GCACGACGCCUUUCGAGUUUCCAAAAAGUAGGAAACACUACAUUUGUGGACUUUCGACUUAGUCAUCAACUCUCAACUUCAGCCGCUGUCACACGCACAGUGGUAGAAGAGGAGAGAGAAUGACUAAGCUGGUGGCGUGC
5 CGGUUCCGCUAAGGCAUCAUCCAUAAAGUAGGAAACACUACAUUUAUGGAACAUGCCCAUGUUUAUCCGUUCUCGACUUCAGCCGCUGUCACACGCACAGUGGCUGAAGAGGAGAAUAUAUGGAUAUGAUGGUGGCCCCG
6 GUCCACGUUGAGCGUCCAGUCCAUAAAGUAGGAAACACUACAUUUGUGGAAAGGAUGGGUGCCUGUCCGAUCUCAGCUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGAUAAAAUGGGCGCCGACAACGCAGAU
Optimization of double-arm LIRA to miR-210-3p and miR-142-3p
Setting indicators by the characteristic of LIRA
To generate an ideal double-arm LIRA sequence to miR-210-3p and miR-142-3p, we would like to optimize the LIRA sequences that we designed in the previous section. To facilitate the optimization process, we need to identify the indicators that could predict the performances of LIRAs. For our analysis of the indicators, we focus on the bases between RBS and AUG, as we think that the status of these bases determines the status of LIRA mostly. In the case that these bases are paired with other bases to form stem structure, LIRA should be in a close status. On the contrary, in the case that these bases are unpaired, LIRA should be in an open status to initiate transcription of reporter genes. From NUPACK, we could get the "unpaired probability" for any base, which calculates the probability that such base is unpaired at equilibrium, and the "paired probability" for any base, which calculates the probability that the base is paired with another base at equilibrium. We define indicator "unlock" for each product for example complexes of LIRA and its target miRNA, by averaging the "paired probability" of bases between RBS and AUG in the product and indicator "lock" for LIRA only by averaging the "unpaired probability" of bases between RBS and AUG in LIRA.
Figure 22. Diagram of "lock" and "unlock" calculation process
1. Unlock
2. Lock
NUPACK could simulate the LIRA-mediated reaction and calculate the amount of each product from such reaction. The reaction with LIRA, miR-210-3p, miR-142-3p could generate multiple complexes, including "LIRA/miR-210-3p/miR-142-3p/complex", "LIRA/miR-142-3p/miR-210-3p/complex", "LIRA/miR-142-3p/complex", and "LIRA/miR-210-3p/complex", which we label as product 1 to 4 accordingly and calculated their "unlock" values. Among all these products, LIRA/miR-210-3p/miR-142-3p/complex is by far the dominant one, so we treat it as the main product.
Subsequently, we define indicator "unlock divide lock" and "unlock minus lock" using the "unlock" value of the main product, i.e., the LIRA/miR-210-3p/miR-142-3p/complex.
3. Unlock divide lock
4. Unlock minus lock
"unlock level average" is the average of "unlock" values of all products. "unlock level weighted" is the weighted average of "unlock" value of all products, weighted by their corresponding concentrations given by NUPACK.
5. Unlock level average
6. Unlock level weighted
We define the following four metrics using subtraction and division, mean and weighted mean respectively.
7. Unlock minus lock average
8. Unlock minus lock weighted
9. Unlock divide lock average
10. Unlock divide lock weighted
It should be noted that some indicators differ in how many products are included. Take "unlock", "unlock average" and "unlock weighted" as an example: "unlock" includes the main product only, "unlock average" includes the average "unlock" value of all products, "unlock weighted" count the weighted average "unlock" value of all products.
Select better performance LIRA by indicators
To explore the influence of the above 10 indicators on the simulation results of LIRA from NUPACK, which shows that whether LIRA meets the design expectation (See part 2, design of double-arm LIRA), we analyzed a variety of double-arm LIRA. In addition to the 6 LIRAs designed in previous section that meet the design expectation (Table 10), we also included 6 LIRAs that did not meet the design expectation as well. Sequences and conformation of the 6 LIRA which did not meet expectations are shown in the table below.
Table 10. Sequences and conformation of 6 LIRA that did not meet the design expectation
Number Sequence Conformation of LIRA alone Conformation of LIRA hybridized with
miR-210-3p and miR-142-3p
7 AGGACACAAGUAGCGGCACUCCAUAAAGUAGGAAACACUACAUUUAUGGAAAGCCGCGAGCUCCGUCUCGUCUCGGCUUCAGCCGCUGUCACACGCACAGCGGCUGGAGAGGAGACGUCAUGGAGCUCCCCUUGUAACCU
8 GCAUUGAUUCCAGUACCCCUCCAUAAAGUAGGAAACACUACAUUUAUGGACAGGUACCAAUCGCAUAAUCUUUCGGCUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGAGAACAUGCGAUUGACGAAUCUUUGC
9 UCGGAGCGUGAAGAGGGUAUCCAUAAAGUAGGAAACACUACAUUUGUGGAGCCCCUCGUCCCGCGUUGGGUCUCGACUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGACCAUAUGCGGGACAACGCGUAGCGA
10 UGCGAGGCGGUUACAGAAAUCCAUAAAGUAGGAAACACUACAUUUGUGGACCUUUGUCGGUCGCAUAGCAUCUCGACUUCAGCCGCUGUCACACGCACAGUGGUUGAAGAGGAGAUGGGAUGUGACUGCUCCGUUGAGCG
11 CCGAAGACAAAACUCUCAAUCCAUAAAGUAGGAAACACUACAUUUAUGGAACGAGAGGGAGGUCAUUUCCUUUUGGCUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGAGGCCAUGACCUCCGCUUGUCGACGG
12 CCCACCUUCCUCGUGCGUGUCCAUAAAGUAGGAAACACUACAUUUGUGGAGUCGCACCGCCGGUAUCAGAUCUCGGUUUCAGCCGCUGUCACACGCACAGCGGCUGAAGAGGAGAUCCAAUGCCGGUGCUGGGAGCCGGG
We call results that meet expectations as 1, and fail to meet expectations as 0. Multivariate regression is applied to analyze the effects of these 10 indicators on the predicted performance of LIRA, i.e., meeting or not meeting the design expectation. The results of multiple regression are shown in Table 11.
Table 11. Multiple regression result
Chi-square Pseudo R2 Significance AIC BIC
16.636 0.75 0.119 24.000 29.819
The multiple regression model has a high R2 value as 0.75, which means that the model has a good fitting effect. However, the P-value is 0.119, which means that the multiple regression model does not pass the significance test. We think that the large p-value is most likely caused by strong collinearity. To verify such collinearity, we used the Pearson correlation coefficient matrix to analyze the correlation among 10 indicators, which demonstrates that the collinearity is indeed very large (Table 12).
Table 12. Pearson correlation coefficient matrix for 10 indicators
Note:* means p<0.05 and ** means p<0.01. We highlighted the correlation coefficient value >0.7 with colors.
We find that the correlation between the group of indicators "unlock", "unlock level average" and "unlock level weighted" is stronger than the correlation between these 3 and other indicators. The same strong correlation exists in the group of indicators "unlock minus lock", "unlock minus lock average", and "unlock minus lock weighted", and in the group of indicators "unlock divide lock", "unlock divide lock average", and "unlock divide lock weighted".
We explored the reason for the strong correlation within different groups. Take the group of indicators "unlock", "unlock level average" and "unlock level weighted" as an example, these 3 indicators basically measure the same thing "unlock", and only differ in the way on how to include "unlock" values from different products. Since the main product, i.e., the LIRA/miR-210-3p/miR-142-3p/complex, comprises the vast majority proportion of all the products, which results in almost equal values for the three indicators, it could explain why the independent variants in model have strong collinearity.
To overcome the problem of collinearity, we use random forest which is not sensitive to collinearity to train the 10 metrics and results associated with unlock and lock. We believe that the indicators with higher R2 value in testing datasets could better help us to evaluate the performance of LIRA. The ratio of training set to test set is 0.8:0.2. Results are shown in Table 13.
Table 13. Random Forest result
Dependent Variable Dataset R2 MAE MSE RMSE MAD MAPE EVS MSLE
unlock training datasets 0.804 0.067 0.006 0.078 0.075 0.014 0.819 0.003
testing datasets 0.758 0.022 0.001 0.025 0.017 0.001 0.939 0
lock training datasets 0.814 0.045 0.003 0.058 0.051 NULL 0.825 0.002
testing datasets -0.711 0.094 0.009 0.095 0.086 0.016 -0.392 0.006
unlock level average training datasets 0.827 0.050 0.004 0.061 0.046 0.018 0.843 0.002
testing datasets -0.983 0.028 0.001 0.037 0.022 0.002 -0.792 0.001
unlock level weighted training datasets 0.811 0.094 0.014 0.117 0.07 0.052 0.827 0.008
testing datasets 0.231 0.227 0.068 0.26 0.308 0.021 0.232 0.03
unlock divide lock training datasets 0.804 1.964 10.705 3.272 0.876 0.13 0.804 0.31
testing datasets -19.792 1.643 2.854 1.689 1.701 0.05 -0.119 0.343
unlock minus lock training datasets 0.817 0.103 0.016 0.128 0.097 0.136 0.835 NULL
testing datasets -0.192 0.079 0.009 0.097 0.08 0.007 0.466 0.005
unlock minus lock average training datasets 0.825 0.041 0.002 0.05 0.036 0.142 0.844 NULL
testing datasets -0.219 0.034 0.002 0.039 0.029 0.007 0.311 0.001
unlock minus lock weighted training datasets 0.803 0.072 0.008 0.088 0.062 0.309 0.815 NULL
testing datasets -0.146 0.12 0.032 0.179 0.041 0.011 0.111 0.019
unlock divide lock average training datasets 0.912 1.011 1.484 1.218 0.854 0.089 0.923 0.196
testing datasets 0.911 1.028 1.452 1.205 1.028 0.006 0.935 0.034
unlock divide lock weighted training datasets 0.779 0.458 0.317 0.563 0.566 NULL 0.781 0.075
testing datasets -68.13 1.56 2.61 1.616 1.361 0.299 -3.65 0.649
As can be seen from the table, on the training set, the fitting effect of the model is significant, and the R2 value is generally high (>0.8), indicating that the model has achieved good training effect on each index on the training set. For most indicators, MAE, MSE, RMSE, MAD and MAPE values are low, so it can be considered that the deviation between the predicted value and the actual value of the model is small.
On the test set, indicator "unlock", "unlock level weighted" and "unlock divide lock level weighted" have R2 above zero, which indicates that these indicators have a higher accuracy in predicting the result of open and close. Particularly, the R2 of "unlock divide lock weighted" reached 0.911, thus we can get the conclusion that "unlock divide lock weighted" performs best among all the indicators.
We get weights of indicators on the performance of LIRA on NUPACK from random forest analysis, and list them in Table 14 below.
Table 14 The weight of the indicator to the result
Indicators unlock lock lnlock divide lock unlock minus lock unlock level average
weights 0.050 0.110 0.140 0.080 0.090
Indicators unlock level weighted unlock minus lock average unlock minus lock weighted unlock divide lock average unlock divide lock weighted
weights 0.130 0.130 0.060 0.080 0.130
The weight of "unlock divide lock" is larger than that of other indicators, which is followed closely by "unlock level weighted" and "unlock divide lock weighted". To measure the expected performance of LIRA quantitively, we calculated "expectation" for all LIRA according to the formula shown as follows, and listed in Table 15.
Figure 23. Calculation method for indicators' weight for expectation
Table 15. The weight of the indicator to the result
numbers 1 2 3 4 5 6
expectation 1.47 1.71 1.07 1.54 1.89 1.64
numbers 7 8 9 10 11 12
expectation 0.58 0.44 0.13 0.23 0.59 0.47
Therefore, we sort all LIRAs according to "expectation" from high to low, and find that LIRAs 2 and 5 have highest ranking score. Thus, we conclude that LIRA sequences 2 and 5 might meet our design expectation well, and should have high expression and low leakage in the wet lab experiments.
Outlook
In the subsequent analysis, we attempt to explore the effects of some of the LIRA's own parameters, such as base content, on indicators and whether such parameters could be used to predict the performance of LIRA on NUPACK, i.e., meeting or not meeting design expectation.
Next, we investigate the weights of parameters to indicators. We collected the parameters of LIRA sequence itself from NUPACK software, including "A","C","G","U" base content, and "A-U", "C-G", "G-U" base-pair content. Then the random forest model is used for 10 indicators. The weights of different parameters on different indicators are obtained as shown in Table 16 and Table 17 below.
Table 16. The weight of bases to the indicators
A C G U
unlock 0.111 0.474 0.095 0.319
lock 0.095 0.569 0.109 0.227
unlock divide lock 0.098 0.304 0.493 0.104
unlock minus lock 0.116 0.511 0.078 0.295
unlock level average 0.154 0.530 0.124 0.193
unlock level weighted 0.096 0.297 0.167 0.440
unlock minus lock average 0.088 0.569 0.096 0.247
unlock minus lock weighted 0.128 0.475 0.126 0.270
unlock divide lock average 0.116 0.287 0.473 0.124
unlock divide lock weighted 0.107 0.231 0.536 0.126
Table 17. The weight of the base pairs to the indicators
A-U C-G G-U
unlock 0.342 0.368 0.289
lock 0.443 0.215 0.342
unlock divide lock 0.430 0.231 0.339
unlock minus lock 0.364 0.268 0.368
unlock level average 0.387 0.240 0.373
unlock level weighted 0.400 0.355 0.244
unlock minus lock average 0.431 0.205 0.364
unlock minus lock weighted 0.382 0.282 0.332
unlock divide lock average 0.497 0.187 0.316
unlock divide lock weighted 0.412 0.211 0.377
To judge the effect of different parameters on the result of open or close accurately, we calculate by the following formula, multiply the weights respectively, and obtain the weights of the parameters for the result of unlock and lock.
Figure 24. Calculation method for parameters' weight for indicators
Table 18. The weight of the parameter to the result of unlock and lock
Paraments A% C% G% U%
weights 0.108 0.419 0.248 0.226
Table 19. The weight of the parameter to the result of unlock and lock
Paraments A-U% C-G% G-U%
weights 0.302 0.393 0.305
C% has weight of 0.1976, which is obviously higher than other parameters, and in Table 19, C-G% has weight of 0.393, which also confirmed that C% matters most in influencing the result.
Analysis of LlRA-mediated reaction
Since the core of our project is to use LIRA to detect miR-210-3p and miR-142-3p, a comprehensive and detailed understanding of the LIRA-mediated reaction is essential for the success of our project. Through simulation of the process and analysis of the kinetics of the LIRA-mediated reaction, we can get crucial information of this reaction, such as the trend of changes of substrates and peak time of the expression of the reporter gene, which could provide critical information for designing and optimizing of our project.
We simulated reactions mediated by the single-arm and the double-arm LIRA under both intracellular and extracellular conditions, and our simulation includes the following steps:
a. Write down the chemical equations of LIRA system;
b. Transform chemical equations into ODEs;
c. Collect parameter values;
d. Solve the equations using MATLAB;
e. Analyze results.
Simulation and analysis of intracellular reactions mediated by LIRA
The interactions between LIRA and its target RNA could change the conformation of LIRA, and initiate translation of downstream reporter gene. Since the reaction between single-arm LIRA and miR-142-3p has the same reaction process and parameters as the reaction between single-arm LIRA and miR-210-3p, we think that these two reactions have the same concentration-time curve and peak time of expression of reporter gene. Without loss of generality, we take miR-210-3p as an example for kinetic simulation below.
Schematic presentation of the intracellular reaction mediated by single-arm LIRA is shown in Figure 25
Figure 25. Intracellular single-arm LIRA reaction process
Reaction process of single-arm LIRA is listed in Figure 26
Figure 26. Equations of intracellular single-arm LIRA reaction
Since the transition from 210-CSIL to OSIL is rapid, we treated the product of the combination of CSIL210 and miR210 as OSIL210. Therefore, we carried out a reasonable simplification of the simulation, and the simplified reaction is shown in Figure 27.
Figure 27. Simplified equations of intracellular single-arm LIRA reaction
Based on the chemical reaction described above, the equations of ordinary differential equations with specific parameter values are listed in Figure 28.
Figure 28. ODEs of intracellular single-arm LIRA reaction
In the simulation, we set the initial concentration of PLMg and PLM210 as 0.01mol/L, and other substances as 0. We use MATLAB and ode45 function to get the numerical solution. As shown in Figure 29, the concentration of EGFP reaches the maximum value at 83 minutes and 1/2 of the maximum value at 11.5 minutes. The concentration of OSIL210 increases rapidly and reaches the maximum value of 0.00054mol/L at 160s. Due to the low transcription rate of PLMg and PLM210 plasmids, and the high reaction rate of OSIL210 to EGFP, the concentration of OSIL210 gradually decreases after 160s.
Figure 29. simulation of intracellular single-arm LIRA reaction
Next, we simulated the intracellular reaction mediated by double-arm LIRA, and the schematic presentation of this simulated reaction is shown in Figure 30.
Figure 30. intracellular double-arm LIRA reaction process
For the reactions with double-arm LIRA, we assume that miR-210-3p cannot bind to CTIL when miR-142-3p is not present. According to these reaction processes, we can list the reaction equation of the double-arm LIRA, as shown in Figure 31.
Figure 31. equations of intracellular double-arm LIRA reaction
According to the chemical reaction described above, the equations of ordinary differential equations with specific parameter values are listed below in Figure 32.
Figure 32. equations of intracellular double-arm LIRA reaction
In the simulation, we set the initial concentration of PLMag, PLM210 and PLM142 as 0.01mol/L, and other substances as 0. Similarly, we use MATLAB and ode45 function to get the numerical solution. As shown in Figure 33, the concentration of EGFP reaches the maximum value of 0.00118mol/L at 3 minutes. The concentration of OTIL210 increases and reaches a maximum value of 0.00064mol/L at 95s. Due the low transcription rate of PLMg and PLM210 plasmids and the high reaction rate of OTIL210 to EGFP, the concentration of OTIL210 gradually decreases after 95s.
Figure 33. Simulation of intracellular double-arm LIRA reaction
Simulation and analysis of extracellular reactions mediated by LIRA
Compared to the intracellular kinetic simulation, the extracellular reaction process does not have the step of transcription of miR-142-3p and miR-210-3p from plasmids. Since the intracellular and extracellular conditions are different, the modeling parameters are altered accordingly. The reaction between single-arm LIRA and miR-142-3p has the same reaction process and parameters as the reaction between single-arm LIRA and miR-210-3p, these two reactions have the same concentration-time curve and the same peak time. Without loss of generality, we take miR-210-3p as an example for kinetic simulation below.
Schematic presentation of extracellular reaction mediated by single-arm LIRA is shown in Figure 34.
Figure 34. Extracellular single-arm LIRA reaction process
Reaction process of single-arm LIRA are listed in Figure 35
Figure 35. Equations of extracellular single-arm LIRA reaction
Since the transition from 210-CSIL to OSIL is rapid, we treat the product of the combination of CSIL210 and miR-210 as OSIL210. Therefore, we carry out a reasonable simplification of this simulation, and the simplified reaction is shown in Figure 36.
Figure 36. simplified equations of extracellular single-arm LIRA reaction
According to the chemical reaction described above, the equations of ordinary differential equations with specific parameter values are listed below.
Figure 37. ODEs of extracellular single-arm LIRA reaction
In the simulation, we set the initial concentration of PLMg, PLM210 and PLM142 as 0.01mol/L, and other substances as 0. Similarly, we use MATLAB and ode45 function to get the numerical solution. As shown in Figure 38, the concentration of fluorescent protein EGFP reaches the maximum value of 0.0083mol/L at 30 minutes and 1/2 of the maximum value at 9 minutes. The concentration of OSIL210 increases rapidly and reached a maximum value of 0.00054mol/L at 160s. The concentration of OSIL210 gradually decreases after 160s.
Figure 38. Simulation of extracellular single-arm LIRA reaction
Next, we simulated the extracellular reaction mediated by double-arm LIRA, and the schematic presentation of this simulated reaction is shown in Figure 39.
Figure 39. Extracellular double-arm LIRA reaction process
The process of reaction mediated by single-arm LIRA is listed in Figure 40
Figure 40. Equations of extracellular double-arm LIRA reaction
According to the chemical reaction expression above, the equations of ordinary differential equations with specific parameter values are listed below.
Figure 41. ODEs of extracellular double-arm LIRA reaction
In the simulation, we set the initial concentration of PLMag, miR-210-3p and miR-142-3p as 0.01mol/L, and other substances as 0. Similarly, we use MATLAB and ode45 function to get the numerical solution. As shown in Figure 42, the concentration of EGFP reaches the maximum value of 0.0083mol/L at 30 minutes and 1/2 of the maximum value at 9 minutes. The concentration of OTIL210 increases rapidly and reaches a maximum value of 0.00054mol/L at 160s. The concentration of OTIL210 gradually decreases after 160s.
Figure 42. Simulation of extracellular double-arm LIRA reaction
Table 20. Explanation of abbreviation
Abbreviation Description
PLMg gate plasmid of CSIL
PLMag AND gate plasmid of CTIL
PLM210 input plasmid of miR-210
PLM142 input plasmid of miR-142
CSIL RBS closed single-input LIRA
CTIL RBS closed two-input LIRA
OTIL RBS open two-input LIRA
210-CSIL miR-210 & CSIL complex
210-CTIL miR-210 & CTIL complex
142-CSIL miR-142 & CSIL complex
142-CTIL miR-142 & CTIL complex
OSIL210 RBS open single-input LIRA formed by 210-CSIL
OSIL142 RBS open single-input LIRA formed by 142-CSIL
EGFP enhanced green fluorescent protein
degradated substance
Table 21. Parameters used in single-arm LIRA reaction process
Abbreviation Description Value(unit) Reference
ktrg gate plasmid transcription rate constant 1.1×10-3 (s-1) Baabu et al., 2021
ktr210 miR-210 input plasmid transcription rate constant 1.1×10-3 (s-1) Baabu et al., 2021
ktr142 miR-142 input plasmid transcription rate constant 1.1×10-3 (s-1) Baabu et al., 2021
kcb210 combine rate constant of miR-210 and CSIL 1×105 (M-1 ·s-1) Baabu et al., 2021
kcb142 combine rate constant of miR-210 and CSIL 1×105 (M-1 ·s-1) Baabu et al., 2021
ktl210 OSIL210 translation rate constant 0.3361 (s-1) https://2023.igem.wiki/ugm-indonesia/
ktl142 OSIL142 translation rate constant 0.3361 (s-1) https://2023.igem.wiki/ugm-indonesia/
kde210 miR-210 degradation rate constant 3×10-4 (s-1) Baabu et al., 2021
kde142 miR-142 degradation rate constant 3×10-4 (s-1) Baabu et al., 2021
kdeCSIL CSIL degradation rate constant 3×10-4 (s-1) Baabu et al., 2021
kdeOSIL OSIL degradation rate constant 3×10-4 (s-1) Baabu et al., 2021
Table 22. Parameters used in double-arm LIRA reaction process
Symbol Description Value(unit) intracellular Value(unit)
cell-free
ktrag AND gate plasmid transcription rate constant 1.1×10-3 (s-1) 1.1×10-3 (s-1)
ktlOTIL OTIL translation rate constant 0.03361 (s-1) 0.0168 (s-1)
kcb1 combine rate constant of miR-210 and CTIL 1×105 (m-1·s-1) 1×105 (m-1·s-1)
kcb2 combine rate constant of miR-210 and 142-CTIL 1×105 (m-1·s-1) 1×105 (m-1·s-1)
kdeCTIL CTIL degradation rate constant 3×10-4 (s-1) 3×10-4 (s-1)
kdeOTIL OTIL degradation rate constant 1.3×10-5 (s-1) 3×10-4 (s-1)
kde210CTIL 210-CTIL degradation rate constant 3×10-4 (s-1) 3×10-4 (s-1)
kde142CTIL 142-CTIL degradation rate constant 3×10-4 (s-1) 3×10-4 (s-1)
kdeEGFP EGFP degradation rate constant 0 (s-1)* 0 (s-1)*
3-Dimensional simulation graph
After conducting kinetic analysis of the reactions mediated by our double-arm LIRA, we started to think how to apply LIRA as a testing kit in practical situations. We encountered an issue which is that the concentrations of the two highly expressed target miRNAs in patient blood often vary, whereas in our kinetic simulation, we assumed the concentrations for both target miRNAs are equal. Consequently, it is questionable to simply take the kinetic simulation results as our reference for LIRA's application in real-world situation. To address this issue, we aim to introduce different concentrations of our target miRNAs in our LIRA reaction system to observe the corresponding dynamic changes in EGFP expression.
Firstly, we set the concentrations of miR-210-3p and miR-142-3p. We selected concentrations of 0.0025, 0.005, 0.01, 0.02, and 0.04 mol/L for our 2 target miRNAs, while keeping the LIRA concentration at 0.01 mol/L. Subsequently, we simulate the reactions between 2 miRNA and LIRA based on our extracellular kinetic model of reaction mediated by double-arm LIRA. From our simulation of extracellular reaction in Figure 40, we can see that the value of EGFP reaches its maximum in our reaction at approximately 30 minutes. Hence, we collected the EGFP values at 42 minutes in our reaction after changing the concentrations of our target miRNA and presented the results as scatter points in three-dimensional space, as shown in Figure 43.
Figure 43. The influence of the concentrations of miR-210-3p and miR-142-3p on EGFP expression in LIRA system
From Figure 43, we can see that the EGFP expression increases to high values as the concentrations of miR-210-3p and miR-142-3p increase. Additionally, the scatter points in the graph tend to cluster more densely at lower concentrations of miR-210-3p and miR-142-3p. To better understand the relationship between the concentrations of miR-210-3p and miR-142-3p and their corresponding EGFP values, we attempted to fit these scatter points.
A Quadratic Regression is commonly used for surface fitting models[23]. Therefore, we employed the fitting function as Z = a + bX + cY + dX² + eY² + fXY, where Z represents the dependent variable (EGFP), and X and Y are the independent variables (X: [miR-210-3p], Y: [miR-142-3p]). By substituting the data into this equation, we can obtain the values of each parameter. The specific parameter interpretations and results are presented in Table 23 below.
Table 23. Explanations and Values of Second-Order Polynomial Regression Parameters
Parameter Description Value
a constant term -0.000118
b coefficient of the linear term for X 0.3505
c coefficient of the linear term for Y 0.3288
d coefficient of the quadratic term for X - 7.42
e coefficient of the quadratic term for Y - 6.84
f Interaction coefficient between X and Y 3.76
Based on the calculations above, we obtained the regression equation: Z = -0.000118 + 0.3505X + 0.3288Y + 3.76XY - 7.42X² - 6.84Y². To understand the performance of our fitted equation, we conducted an Analysis of Variance (ANOVA) at a 95% confidence level (α = 0.05) for the terms in our regression. This analysis could assess the significance of our model terms and determine the terms that have statistically significant impacts on EGFP value. Tables 25 and 26 present the results of this analysis.
Table 24. ANOVA Results Summary for Regression Model Terms
Source DF Adj SS Adj MS F-Value P-Value
Model 5 0.000137 0.000027 12.78 0.000
Linear 2 0.000094 0.000047 22.01 0.000
[miR-210-3p] 1 0.000053 0.000053 24.64 0.000
[miR-142-3p] 1 0.000056 0.000056 25.99 0.000
Square 2 0.000047 0.000024 11.04 0.001
[miR-210-3p]*[ miR-210-3p] 1 0.000026 0.000026 11.95 0.003
[miR-142-3p]*[ miR-142-3p] 1 0.000022 0.000022 10.13 0.005
2-Way Interaction 1 0.000013 0.000013 5.88 0.025
[miR-210-3p]*[ miR-142-3p] 1 0.000013 0.000013 5.88 0.025
Error 19 0.000041 0.000002
Total 24 0.000177
Table 25. Regression model fit summary
Standard deviation R2 R2(adj)
0.0007532 96.44% 91.99%
Based on the results of the ANOVA, we can observe that the R² value is high, indicating a good fit of the model. Additionally, the coefficients for the constant term, the interaction term ([miR-210-3p] * [142-3p]), the two linear terms ([miR-210-3p] and [miR-142-3p]), and the two quadratic terms ([miR-210-3p] * [miR-210-3p] and [miR-142-3p] * [miR-142-3p]) are all statistically significant. This suggests, in our quadratic regression equation, the terms are not redundant or causing interference in the regression results. Finally, we simulate the surface fitting based on our equation and the result is visualized in Figure 44.
Figure 44. Surface Plot of EGFP expression
From Figure 44 we can see that, as the concentrations of the two miRNAs continue to increase, the reaction rate accelerates, and the observed EGFP levels also rise continuously. When the concentration of one miRNA remains constant, as the concentration of the other miRNA increases, the EGFP value first rise and then decline. To have a more detailed observation of the changes of EGFP value in specific concentration ranges, we have plotted a contour map as shown in Figure 45.
Figure 45. Contour Plot of EGFP expression
Figure 45 is a contour plot of EGFP levels at 30 minutes for the reaction between LIRA and two target miRNAs. The color intensity in the figure represents different numerical ranges of EGFP. Specifically, the gradient from light green to dark green indicates an increasing trend of the EGFP variable from low to high. It can be observed that there is a distinct high-value region in the upper right corner of the plot. In contrast, as we move towards the lower left corner, the EGFP levels continuously decrease, and a low-value region appears. This may be attributed to the incomplete reaction of LIRA due to lower concentrations of miRNAs, resulting in lower EGFP value. Furthermore, we can observe in the plot that the EGFP value lies within the range of 0.008-0.01 when miR-210-3p is within 0.025-0.04 mol/L and miR-142-3p is within 0.0275-0.04 mol/L.
We can apply this conclusion into practical scenarios. Assuming the LIRA testing kit we use has a LIRA concentration of 0.01 mol/L, and only as the EGFP value reaches between 0.008 and 0.01 will it be considered as positive and identified by colors. We can simulate two scenarios:
First, we can use LIRA to detect unknown concentrations of miRNA in blood samples. When the concentrations of miR-210-3p and miR-142-3p in the blood sample we are testing fall within the ranges of 0.025-0.04 mol/Land 0.0275-0.04 mol/L respectively, the LIRA testing kit will successfully show colors and indicating after 30 minutes.
Second, we can use miRNA with known concentrations to validate the quality of the LIRA testing kit. Suppose we have miR-210-3p and miR-142-3p with concentrations as a mol/L and b mol/L, respectively, and both a and b are within the concentration range of 0-0.04 mol/L. Given the regression equation above, we can get Z = -0.000118 + 0.3505X + 0.3288Y + 3.76XY - 7.42X² - 6.84Y².When we add both a and b to the LIRA testing kit and allow the reaction to proceed for 30 minutes, the EGFP value that are detected should be close to the value of -0.000118 + 0.3505a + 0.3288b + 3.76ab - 7.42a² - 6.84b².With the premises of disregarding error interference, If there is a significant deviation between the two results, it indicates that there may be an quality issue with the LIRA testing kit.
In summary, in the future application of LIRA detection, it is necessary to take whether the concentration of LIRA and the concentrations of miRNA are falling in matching ranges as considerations and adjusting the LIRA concentration based on the concentration of miRNA in actual blood samples. Due to the limitation of lacking the information on the concentration of miRNA in blood samples, we can not adjust the concentration of our LIRA in our test kit. However, we believe that along with the advancement of miRNA detection technology and through continuous adjustment and optimization of our LIRA, we can finally obtain an ideal screening kit to detect expression of miRNA in real samples.
Reference
Han H, Guo W, Shi W, et al. Hypertension and breast cancer risk: a systematic review and meta analysis[J]. 2017, 7: 44877.
Rasmussen Torvik LJ, Shay CM, Abramson JG, et al. Ideal cardiovascular health is inversely associated with incident cancer: the Atherosclerosis Risk In Communities study [J]. Circulation, 2013, 127 (12): 1270 1275.
Solomon SD, Wang D, Finn P, et al. Effect of candesartan on cause specific mortality in heart failure patients: the Candesartan in Heart failure Assessment of Reduction in Mortality and morbidity (CHARM) program [J]. Circulation, 2004, 110 (15): 2180 2183.
Malmborg M, Christiansen CB, Schmiegelow MD, et al. Incidence of new onset cancer in patients with a myocardial infarction a nationwide cohort study [J]. BMC Cardiovasc Disord, 2018, 18 (1): 198.
Hasin T, Iakobishvili Z, Weisz G. Associated risk of malignancy in patients with cardiovascular disease: evidence and possible mechanism [J]. Am J Med, 2017, 130 (7): 780 785.
Meijers WC, Maglione M, Bakker SJL, et al. Heart failure stimulates tumor growth by circulating factors [J]. Circulation, 2018, 138 (7): 678 691.
Yang M, Li T, Guo S, et al. CVD phenotyping in oncologic disorders: cardio-miRNAs as a potential target to improve individual outcomes in revers cardio-oncology. J Transl Med. 2024;22(1):50. Published 2024 Jan 12.
Yuan Y, Mei Z, Qu Z, et al. Exosomes secreted from cardiomyocytes suppress the sensitivity of tumor ferroptosis in ischemic heart failure [J]. Signal Transduct Target Ther,2023,8(1):121.
Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics. 2014;15:14. Published 2014 Jan 15.
Dai R, Zheng C. False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach. Biometrics. 2023;79(4):3497-3509.
Yang G, Yang L, Wang W, Wang J, Wang J, Xu Z. Discovery and validation of extracellular/circulating microRNAs during idiopathic pulmonary fibrosis disease progression. Gene. 2015;562(1):138-144.
Stocks T, Van Hemelrijck M, Manjer J, Bjorge T, Ulmer H, Hallmans G, Lindkvist B, Selmer R, Nagel G, Tretli S, Concin H, Engeland A, Jonsson H, Stattin P. Blood pressure and risk of cancer incidence and mortality in the Metabolic Syndrome and Cancer Project. Hypertension. 2012;59:802–810
Vinter N, Christesen AMS, Fenger-Gron M, Tjonneland A, Frost L. Atrial fibrillation and risk of cancer: a Danish population-based cohort study. J Am Heart Assoc. 2018;7:e009543
Conen D, Wong JA, Sandhu RK, Cook NR, Lee IM, Buring JE, Albert CM. Risk of malignant cancer among women with new-onset atrial fibrillation. JAMA Cardiol. 2016;1:389–396.
Ridker PM, Cook NR, Lee IM, Gordon D, Gaziano JM, Manson JE, Hennekens CH, Buring JE. A randomized trial of low-dose aspirin in the primary prevention of cardiovascular disease in women. N Engl J Med. 2005;352:1293–1304.
Rasmussen-Torvik LJ, Shay CM, Abramson JG, Friedrich CA, Nettleton JA, Prizment AE, Folsom AR. Ideal cardiovascular health is inversely associated with incident cancer: the Atherosclerosis Risk In Communities study. Circulation. 2013;127:1270–1275.
Banke A, Schou M, Videbaek L, et al. Incidence of cancer in patients with chronic heart failure: a long-term follow-up study. Eur J Heart Fail. 2016;18(3):260-266.
Zheng J, Cheng C, Xu J, Gao P, Wang J, Chen L. miR-142-3p Regulates Tumor Cell Autophagy and Promotes Colon Cancer Progression by Targeting TP53INP2. Chemotherapy. 2022;67(2):57-66.
Zhang Y, Ma S, Zhang J, et al. MicroRNA-142-3p promotes renal cell carcinoma progression by targeting RhoBTB3 to regulate HIF-1 signaling and GGT/GSH pathways. Sci Rep. 2023;13(1):5935. Published 2023 Apr 12.
Zhang X, Sai B, Wang F, et al. Hypoxic BMSC-derived exosomal miRNAs promote metastasis of lung cancer cells via STAT3-induced EMT. Mol Cancer. 2019;18(1):40. Published 2019 Mar 13.
Ren D, Yang Q, Dai Y, et al. Oncogenic miR-210-3p promotes prostate cancer cell EMT and bone metastasis via NF-κB signaling pathway. Mol Cancer. 2017;16(1):117. Published 2017 Jul 10.
Ma D, Li Y, Wu K, et al. Multi-arm RNA junctions encoding molecular logic unconstrained by input sequence for versatile cell-free diagnostics. Nat Biomed Eng. 2022;6(3):298-309.
Yong Z, Siyu Ye, Xianqi C, et al. Polynomial Response Surface based on basis function selection by multitask optimization and ensemble modeling[J].Complex & Intelligent Systems,2022,Vol.8(2): 1015-1034