Software

SynBio Online Forum

An online forum has been created where users can register and post questions in designated sections. Others can access the RAG system's dialogue interface through the forum's search bar for information queries. Having a communication platform has obvious benefits for teams and even laypeople.

Benefits for having a forum

Enhance information retrieval efficiency:

By developing an AI agent information query system, we are able to enhance the efficiency of information retrieval for developers, researchers, and investors. This system allows iGEM teams to quickly search the forum to identify if other teams are working on similar ideas or if a problem they’re facing has already been solved. It helps reduce redundancy, promotes collaboration, and enables teams to build on existing solutions, accelerating innovation within the iGEM community.

Foster inter-team communication:

Currently, the exchange of project designs and experimental materials in the iGEM community mostly takes place through private contacts, which creates barriers to wider collaboration and knowledge sharing. Without a public and efficient communication channel, participants often struggle to connect with the right individuals or teams for support and cooperation. The introduction of a forum addresses this gap by offering a clear and organized platform for communication. It not only facilitates more precise and streamlined exchanges but also helps build a high-quality technical community where iGEMers can easily collaborate. This enhanced connectivity fosters better partnerships, allowing teams to benefit from shared expertise and resources. Additionally, if investors are interested in synthetic biology projects, they can use this platform to search for information and directly contact team members, greatly reducing communication costs.

Click here to enter the forum: ChatParts Community

Figure 1iGEM ChatParts Community


Offline Software Package

In addition to the online large language model, we have also developed an offline package to meet the additional needs of researchers.

Underlying models used

We trained a total of two models, the underlying models are Llama 3.1-8B and qwen 2.5-14B respectively.

ChatParts-llama3.1-8B and ChatParts-qwen2.5-14B are dialogue models specifically fine-tuned from Meta-Llama-3.1-8B and qwen 2.5-14B. The models are designed for the field of synthetic biology, with the goal of helping competition participants and researchers efficiently gather and organize pertinent information.The model is trained on an extensive dataset which includes more than 200,000 question-answer pairs, carefully compiled to encompass a broad range of topics in synthetic biology.

Sourced from various authoritative references including:

  1. iGEM Wiki Pages (2004-2023)

  2. Synthetic Biology Review Papers

  3. iGEM Parts Registry Documentation

Model & Dataset links

ChatParts-llama3.1-8B: Huggingface or ModelScope

ChatParts-qwen2.5-14B: Huggingface or ModelScope

ChatParts Dataset: Huggingface or ModelScope

Our local sofaware data package for chroma is also available here!


Advantages of offline software

1. Maintain the confidentiality of data during the research process

Researchers encounter major challenges related to data privacy violations. There are widespread worries about sensitive information being exposed or accessed without authorization, particularly when shared over public networks. To enhance control and reduce the risk of data leaks, many research institutions utilize intranets or secure, closed networks. Due to fears of compromising confidentiality, researchers frequently refrain from using platforms like GPT, as they want to avoid uploading private documents to external servers.

Our local software package successfully addresses this issue, allowing researchers to work offline without the need to upload their private data to external websites, thereby effectively reducing the risk of data leakage.


2. Provide opportunities for custom RAG repository

Researchers frequently encounter the challenge that, despite large models demonstrating robust generalization capabilities, they often fall short in providing the depth of domain-specific knowledge necessary for specialized tasks. While these models may offer general insights, their responses can tend to be overly simplistic or inaccurate when tackling highly technical or niche subjects. This limitation hinders their effectiveness in research, where a profound understanding and precise, detailed information are critical.

We have also taken this issue into account. In the local software package, users can upload files to the RAG repository, and subsequent inquiries to the language model will search for relevant information from this repository. For instance, if a user wants to specialize in environmental management within synthetic biology and feels that our trained model's performance is insufficient to address related issues, they can upload documents specific to their field, thus shaping the knowledge base in the direction they desire.


Source code repository: Github or Gitlab

Online demo link: demo

Figure 2Online Demo Presentation

Online tiral version only opens during Gemboree Days since the high calculation server fee.

User and coder guidence of offline models: Usage Document

The documentation provides users with basic introduction of our iGEM project and detailed deployment guide. In addition, we have introduced the functions of important code files in the sidebar, hoping to encourage communication between teams and future secondary development.