scRNA-seq Results and Analysis

1. First, we imported the raw scRNA-seq data from the uploaded .h5ad file into our analysis pipeline.

Data Import Demonstration

Upload Data

2. Quality control metrics were calculated to assess the quality of the data. We evaluated metrics such as the number of genes detected and the percentage of mitochondrial genes.

Quality Control Metrics

3. After filtering out low-quality cells, we proceeded to perform normalization on the data using the appropriate methods to ensure comparability across samples.

Data Normalization

4. We conducted dimensionality reduction using PCA, followed by UMAP, to visualize the distribution of cells in the reduced space.

Dimensionality Reduction

5. Clustering analysis was performed on the UMAP coordinates using the Leiden algorithm, revealing distinct cell populations in the dataset.

Clustering Results

6. Differential expression analysis was conducted to identify marker genes for each cluster, which can provide insights into the biological characteristics of the identified cell types.

Differential Expression Analysis

7. Finally, we visualized the expression of key marker genes across the identified clusters using violin plots and heatmaps.

Marker Gene Expression Visualization

8. Full results, including detailed metrics and analysis outputs, are available for review and further investigation.

Statistical Evaluation of Users

To better understand users' real needs and make our platform more practical and reliable, we conducted a trial activity at three universities: USTC, HFUT, and AHU. After trying out the platform, users could fill out our questionnaire, provide feedback, and suggest modifications. Over the three days from Seqtember 20th to Seqtember 24th, we collected a total of 154 questionnaires. Here is a statistical analysis of the questionnaire results:

In terms of personnel structure, out of the 154 participants, 140 were from USTC, and the remaining 14 were from HFUT and AHU. Most of them were first and second-year students, mainly from biology-related majors.

Survey Result 1 Survey Result 2 Survey Result 3

Personal Information of Participants

Regarding the trial experience, the majority of users found our interface design and feature guidance to be excellent, scoring 4.64 and 4.68, respectively.

Survey Result 4 Survey Result 5

Evaluation of Interface Design and Guidance

In terms of specific functionality, most users believed that our platform could help them in learning synthetic biology and RNa-seq analysis to some extent. However, many also felt that there is room for improvement in our platform.

Survey Result 6 Survey Result 7

Benefit Evaluation

Of course, we understand that conducting simple online trials may not capture more professional guidance. Therefore, we also reached out to several peers who conducted in-depth trials and evaluations of our platform. For more details, please click on the link: Website Usage Experience

Challenges in Developing scRNA-seq Workflow

Initially, our goal was to create a comprehensive scRNA-seq analysis platform to streamline the workflow from data preprocessing to downstream analysis. We began by focusing on dimensionality reduction and clustering algorithms, such as PCA, UMAP, and Leiden clustering. However, we quickly encountered challenges when dealing with high-dimensional datasets and the integration of batch effect correction methods, which required complex computational pipelines and substantial computational resources.

Despite successfully implementing basic functions like quality control and dimensionality reduction, we faced difficulties when trying to integrate batch correction methods like BBKNN and Harmony into the platform. These methods were computationally intensive, especially when handling large datasets, resulting in long processing times and system instability. To improve user experience and processing efficiency, we had to simplify the workflow and provide users with pre-set parameters for the most common tasks, reducing the burden on computational resources.

Furthermore, one of the major hurdles was the lack of consistent datasets for validating our models. While several publicly available datasets provided good benchmarks, many of the specific cases we wanted to target required proprietary or unpublished datasets from collaborating labs. Unfortunately, due to the asynchronous nature of academic collaboration and access restrictions to certain datasets, we were unable to proceed with the in-depth validation that the project required. This led to the reconsideration of certain modules, focusing more on making the platform accessible and user-friendly for a broader range of users.