In this section, we will demonstrate how to use existing h5ad files for single-cell RNA sequencing (scRNA-seq) workflows in our platform, Mo-BASE, including batch correction, data processing, analysis, and visualization. Our workflow aims to help users efficiently process and interpret single-cell data.
Data Import
Import single-cell RNA sequencing data from h5ad files. You can easily load the data using the `anndata.read_h5ad` function.
Upload Data
Quality Control (QC)
Use quality control metrics (such as gene counts and mitochondrial gene expression) to filter cells and ensure data quality. We conduct detailed analysis by plotting the high-expressed genes and violin plots, which help identify expression characteristics and potential issues in the samples. QC metrics can be calculated using `scanpy.pp.calculate_qc_metrics`.
High-Expressed Genes Plot
Violin Plot
Data Normalization
After quality control, we perform data normalization to adjust for differences in sequencing depth and to make the data comparable across cells. This step ensures that the expression values reflect true biological variations rather than technical artifacts. We typically use methods like `scanpy.pp.normalize_total` to normalize the data to a common scale.
Contrast before and after normalization
Dimensionality Reduction
Perform dimensionality reduction using PCA, then use UMAP, or t-SNE to visualize the distribution and populations of cells.
PCA (colored with CST3)
Clustering Analysis
Apply clustering algorithms such as Leiden or Louvain to identify cell subpopulations. This helps to understand functional differences among different cell types.
Leiden
Louvain
Clustering with Leiden or Louvain
Batch Correction
If your data comes from multiple experimental batches, use methods like BBKNN, Combat, or Harmony for batch correction to eliminate batch effects.
clustering without batch correction
clustering with BBKNN
clustering with Combat
clustering with Harmony
Batch Correction
Differential Expression Analysis
After identifying cell clusters, we perform differential expression analysis to determine which genes are significantly expressed between different cell types or conditions. This helps in understanding the functional roles of different cell populations. We typically use methods like `scanpy.tl.rank_genes_groups` to identify marker genes for each cluster.
Rank Genes Groups with Wilcoxon
Rank Genes Groups with T-test
Rank Genes Groups with Logreg
Differential Expression Analysis with different methods
Enrichment Analysis
To understand the biological significance of our differentially expressed genes, we perform enrichment analysis. This process identifies over-represented biological functions, pathways, or gene sets in our gene list using tools like Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Enrichment analysis provides insights into the functional context of our findings.
Enrichment Barplot
Gene Annotation
Once we have identified marker genes, we perform gene annotation to assign biological functions and pathways to these genes. This can be done using public databases such as Ensembl or gene ontology resources. Annotating genes allows us to interpret their roles in biological processes and diseases.
Gene Annotation Dotplot
Gene Annotation Groups Dotplot
Gene Marker
Trajectory Inference
Finally, we apply trajectory inference methods to study the dynamic processes of cell differentiation. This approach helps in understanding how cells transition between different states and can reveal developmental pathways. Popular methods for trajectory inference include Monocle and Slingshot, which help to visualize these transitions in a biological context.
Gene Annotation Trackplot
Gene Annotation Groups Trackplot