

Uncollated Databases Struggle to Handle Large Annual Data Volumes
Every year in the iGEM competition, hundreds of teams from around the world contribute numerous parts to the iGEM parts registry, which plays a key role in supporting synthetic biology research. However, the registry faces issues due to a lack of professional maintenance, leading to low data quality. On the other hand, iGEM teams in China lack sufficient activities that facilitate inter-team communication.
Challenges Faced by iGEMers in Querying Parts
Building on what we mentioned in the background, the database itself is somewhat disorganized, with some parts containing only fragmentary information, such as missing experimental parameters and quality verification. This creates difficulties for participating teams in gene component design and construction, as they struggle with inefficient parts retrieval. In addition, many laypersons interested in this field face obstacles due to limited knowledge and difficulty accessing industry information, which hinders their professional growth and business development opportunities. Besides above problems, in China, opportunities for interaction among participants are limited outside of a few offline events like the annual CCiC, making it difficult for teams to collaborate and for newcomers to integrate into this specialized field.

of respondents stated that they have a habit of using AI to solve problems
of respondents cited the ease of finding information and its integration as the reason for using AI
of respondents expressed interest in trying an AI model specifically designed for synthetic biology after introducing our project.
C. hat P. arts A. I A. gent
We developed a Q&A AI agent that integrates large language models and an RAG database through the Langchain framework, aiming to simplify parts information retrieval through Q&A, replacing traditional time-consuming searches. The RAG (retrieval-augmented generation) technique combines information retrieval with a generation model. It splits uploaded documents into chunks, and generates vector representations stored in a VectorStore. When a user inputs a query, it is vectorized, matched with relevant text chunks, and used to generate a prompt for the LLM (Large Language Model), leading to more accurate and relevant content generation.
For communication difficulties, we built up a Synthetic Biology Forum which is an online community that facilitates knowledge sharing, collaboration, and innovation in synthetic biology. It enhances expertise through continuous learning, accelerates innovation through cross-disciplinary collaboration, and provides valuable resources and networking opportunities for researchers and students, driving advancements in the field.


Learn More About Our Solution
Learn more about our model design, evaluation and implementation.
Safety
In the Safety section, we prioritized data and software security throughout the project to protect sensitive information and ensure reliable operations. As a follow-up, we also initiated the drafting of a biosafety white paper, aimed at providing additional ethical guidelines and best practices to further support the safe and responsible application of synthetic biology technologies.


Human Practices
The Human Practice of this project focuses on investigating the real experiences of iGEM participants to identify and confirm key issues, such as inconsistent data quality and the lack of collaboration platforms. Through extensive research and expert consultations, we found that this issue indeed have a significant impact on participants, motivating us to work towards resolving them. Additionally, we organized public education initiatives to raise awareness of synthetic biology, ensuring that the project not only addresses existing challenges but also contributes to broader societal understanding and impact.