Experiments | USTC-Software

In this section, we will demonstrate how to use existing h5ad files for single-cell RNA sequencing (scRNA-seq) workflows in our platform, Mo-BASE, including batch correction, data processing, analysis, and visualization. Our workflow aims to help users efficiently process and interpret single-cell data.

Data Import

Import single-cell RNA sequencing data from h5ad files. You can easily load the data using the `anndata.read_h5ad` function.

Upload Data

Quality Control (QC)

Use quality control metrics (such as gene counts and mitochondrial gene expression) to filter cells and ensure data quality. We conduct detailed analysis by plotting the high-expressed genes and violin plots, which help identify expression characteristics and potential issues in the samples. QC metrics can be calculated using `scanpy.pp.calculate_qc_metrics`.

High-Expressed Genes Plot

Violin Plot

Data Normalization

After quality control, we perform data normalization to adjust for differences in sequencing depth and to make the data comparable across cells. This step ensures that the expression values reflect true biological variations rather than technical artifacts. We typically use methods like `scanpy.pp.normalize_total` to normalize the data to a common scale.

Contrast before and after normalization

Dimensionality Reduction

Perform dimensionality reduction using PCA, then use UMAP, or t-SNE to visualize the distribution and populations of cells.

PCA (colored with CST3)

Clustering Analysis

Apply clustering algorithms such as Leiden or Louvain to identify cell subpopulations. This helps to understand functional differences among different cell types.

Leiden

Louvain

Clustering with Leiden or Louvain

Batch Correction

If your data comes from multiple experimental batches, use methods like BBKNN, Combat, or Harmony for batch correction to eliminate batch effects.

clustering without batch correction

clustering with BBKNN

clustering with Combat

clustering with Harmony

Batch Correction

Differential Expression Analysis

After identifying cell clusters, we perform differential expression analysis to determine which genes are significantly expressed between different cell types or conditions. This helps in understanding the functional roles of different cell populations. We typically use methods like `scanpy.tl.rank_genes_groups` to identify marker genes for each cluster.

Rank Genes Groups with Wilcoxon

Rank Genes Groups with T-test

Rank Genes Groups with Logreg

Differential Expression Analysis with different methods

Enrichment Analysis

To understand the biological significance of our differentially expressed genes, we perform enrichment analysis. This process identifies over-represented biological functions, pathways, or gene sets in our gene list using tools like Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Enrichment analysis provides insights into the functional context of our findings.

Enrichment Barplot

Gene Annotation

Once we have identified marker genes, we perform gene annotation to assign biological functions and pathways to these genes. This can be done using public databases such as Ensembl or gene ontology resources. Annotating genes allows us to interpret their roles in biological processes and diseases.

Gene Annotation Dotplot

Gene Annotation Groups Dotplot

Gene Marker

Trajectory Inference

Finally, we apply trajectory inference methods to study the dynamic processes of cell differentiation. This approach helps in understanding how cells transition between different states and can reveal developmental pathways. Popular methods for trajectory inference include Monocle and Slingshot, which help to visualize these transitions in a biological context.

Gene Annotation Trackplot

Gene Annotation Groups Trackplot