Integrated Human Practices

Overview

Project Background and Challenges

Currently, the Parts database, which is officially managed by iGEM (International Genetically Engineered Machine Competition), serves as a crucial resource for participating teams in designing and constructing genetic components. However, due to the lack of dedicated personnel for organizing and maintaining the database, the quality of the data within the database is inconsistent, with some information even missing. This situation poses significant inconveniences for participants who rely on the Parts database for project design, as they often encounter difficulties in accessing complete and accurate data, which severely impacts their research progress and outcomes.

Through an in-depth understanding of how past participants have reorganized Parts data, we have also discovered that iGEM participants in China face another significant challenge: the lack of a unified platform for communication and collaboration. Apart from a few offline events such as the annual CCIC (Conference of China iGEMers Community), opportunities for participants to communicate and collaborate are very limited. This scattered mode of interaction not only restricts the sharing of experiences and knowledge transfer between teams but also makes it difficult for newcomers to quickly integrate into this specialized field.

Market Research and Target User Groups

To further expand our target user base, we conducted market research that involved visiting several relevant companies and engaging in in-depth discussions with professionals interested in synthetic biology. Our findings revealed that, in addition to iGEM participants, many researchers and professionals with an interest in synthetic biology also face a series of challenges.

They lack a systematic understanding of the basic knowledge and application scenarios of synthetic biology, which creates significant obstacles in their exploration of the field.
They find it difficult to access integrated information about companies and industry research within the field of synthetic biology, which, to some extent, hinders their professional development and the discovery of business opportunities.

Solution: AI model with RAG system

Before we come up with our technical solutions, we conducted research on modern techniques commonly used for solving search-related tasks and summarized their advantages and disadvantages in the table below. We found that traditional search techniques and the currently popular RAG technology each have their own strengths and weaknesses. However, for designing a user-friendly query system and a versatile AI assistant, the RAG system better meets the practical needs identified in our research.

-  – Comparison Dimensions	RAG System	General Large Models	Traditional Information Retrieval Systems
Comprehensiveness and Relevance of Information	🌐 Comprehensive information with contextual relevance 📈 Extracts information from numerous reliable data sources, ensuring high relevance	🤖 Usually comprehensive but may lack relevance	📄 Provides links, requiring users to filter information themselves
Generation of Natural Language Responses	🗣️ Coherent responses 🌟 Combines retrieval and generation to provide more precise answers	🗣️ Coherent responses	🔗 Document links, requiring user processing
Dynamic Updates and Adaptability	🔄 Dynamic updates, low-cost rapid adaptation to new data 🚀 Faster response times	🔄 Updates anytime but require substantial data and separate training time	⏳ Updates have latency
User Experience	😊 High interactive experience, reduces filtering time 🔧 Can automatically integrate and process information in specific fields	😊 High interactive experience but may overgeneralize	😓 Requires users to process and filter information themselves
Information Accuracy and Reliability	✅ Reduces hallucination generation, ensures information reliability through retrieval 🎯 Increases accuracy	⚠️ May produce errors or hallucinations	✅ Reliable and accurate information. However, users need to filter and understand search results themselves
Handling Complex Queries and Professional Domains	🧐 Better at handling professional domain information 📊 Can extract relevant literature in specific fields	🧐 Model training data is more generalized, limited in handling niche professional information	📚 Can generally meet complex and professional needs. However, comprehensive issues still require human organization and summarization of search results, resulting in lower query efficiency
Computational Resources and Costs	💰 Relatively low computational resources and costs ⚖️ Combines retrieval to reduce computational resources required for pure generation	💰 High computational resources and training costs	💵 Lower cost
Information Transparency	🔍 Low transparency, difficult to trace information sources 🔗 However, can provide the original information sources retrieved, adding some transparency	🔍 Low transparency, difficult to trace AI-generated information sources	🔗 Highest transparency, each result comes from an actual database

Therefore, we have developed an AI-based solution specifically designed for synthetic biology. This solution is built on a Retrieval-Augmented Generation (RAG) system and incorporates model fine-tuning techniques, aiming to provide users with efficient and accurate knowledge support in the field of synthetic biology. To expand the use of our AI model and address the communication difficulties mentioned between teams, we have set up an online communication platform. This will facilitate technical exchanges between teams, and through this shared platform, allow more people to engage with and learn about the AI model.

This innovative platform not only meets the needs of iGEM participants but also offers a new channel for knowledge acquisition and exchange to a broader audience of synthetic biology enthusiasts and professionals. By integrating various resources and data, our platform is poised to become a powerful tool for advancing the development and collaboration within the synthetic biology field.

Principles of Integrated Human Practices

1. Understanding Local Issues

AZENTA: Current Status and Challenges of Plasmid Synthesis

We first visited AZENTA, a renowned life sciences company, to understand the current landscape of plasmid synthesis. During our in-depth discussions with their team, we learned that the field of plasmid synthesis is currently facing several key challenges. AZENTA's researchers highlighted that the existing plasmid synthesis process heavily relies on the manual search and comparison of offline databases. This process is not only time-consuming but also inefficient, often becoming a bottleneck in the workflow of researchers. As the pace of database updates fails to keep up with research advancements, researchers frequently spend a significant amount of time searching for and comparing data, which adversely affects their research efficiency.

AZENTA conducted a detailed analysis of the limitations of current solutions and engaged with us on how technological advancements could improve this process to enhance work efficiency. This insight prompted our research team to reflect on how modern technologies, particularly artificial intelligence and data integration techniques, could be utilized to optimize the data processing aspect of plasmid synthesis, ultimately better serving biological research.

Figure 1XJTLU-Software had a learning discussion with AZENTA

Figure 2Photo about AZENTA Company and XJTLU-Software

Professor Dechang Xu: Data Quality Issues in the iGEM Competition

During the research phase of our project, we also held discussions with Professor Dechang Xu. In our conversations, Professor Xu specifically pointed out the inconsistent quality of data in the Parts database used in past iGEM competitions. He noted that these data quality issues not only caused significant inconvenience for participants but also directly impacted the overall performance of their projects. Many participants have found that the Parts information they rely on is often inaccurate, forcing them to spend additional time and effort on data cleaning and verification.

Professor Xu suggested that in our project design, we should consider establishing an integrated and user-friendly database to address these data quality issues. His recommendation provided crucial guidance for our project, helping us realize that creating a high-quality, easily accessible, and user-friendly database is key to solving the existing problems.

Figure 3The team member consulted Professor Xu for project suggestions

2. Define a Good Solution

After identifying the key issues that the project needed to address, we began to define a solution capable of effectively tackling these challenges. To this end, we drew on expert recommendations and conducted extensive user research to ensure that our solution would not only incorporate advanced technology but also genuinely meet the needs of its users.

Professor Xin Chen: Insights from Large Model Technology

During a lecture on the application of large model technology in the field of chemical materials, we had the privilege of listening to an inspiring presentation by Professor Xin Chen. Professor Chen elaborated on the immense potential of large model technology in addressing complex problems, particularly through its applications in the chemical materials domain. This lecture profoundly influenced our understanding, highlighting that large model technology is not only capable of handling intricate data processing tasks but also offers precise predictions and analyses through its advanced learning capabilities.

Professor Chen’s insights opened up new avenues of thought for our project, sparking the idea of incorporating large model technology into our solution. We realized that this technology could potentially help us better integrate and process data within the field of synthetic biology, thereby enhancing the overall performance of our project.

Figure 4Participating in Professor Xin Chen's lecture

Survey: Clarifying User Needs

To ensure that our solution would meet the needs of our target users, we conducted a survey targeting individuals who have participated in the iGEM competition and students in the field of synthetic biology. The survey results revealed that over 70% of respondents were accustomed to using AI tools to assist in their research and project development. However, these respondents also highlighted that existing general-purpose AI tools often lack precision in specialized fields, failing to fully meet their needs.And after we introduced our project, 80% of the respondents expressed their intention to try an AI model specifically focused on synthetic biology.

This feedback reinforced our determination to develop an AI model specifically focused on the field of synthetic biology. We recognized that only a targeted AI model could offer the level of accuracy and expertise required to effectively address the pain points that users encounter in their research.

Figure 5Partial results of the questionnaire

Figure 6Intention to use our project AI model

Dr. Chunxiao Li: Agreement on the solution

To further optimize our technical approach, we engaged in a discussion with Dr. Chunxiao Li from the University of Science and Technology of China. Dr. Li, with her extensive experience in AI and data processing, provided valuable guidance that greatly benefited our project. She not only endorsed our idea of using large model technology but also offered crucial advice on model selection.

In particular, Dr. Li’s insights into data preprocessing and model fine-tuning helped us avoid many potential pitfalls in the early stages of the project, thereby clarifying our technical strategy. This exchange solidified our technical foundation, ensuring that the project's technical implementation could proceed smoothly and successfully.

Figure 7XJTLU-Software exchanged views with Dr. Li

3. Realization of the solution

After defining the overall direction and technical pathway for the project, we began to delve into the specifics of our solution. During this process, we not only focused on the feasibility of the technical implementation but also paid close attention to details related to data processing and legal compliance. To ensure that our solution was both comprehensive and executable, we engaged in close collaboration with industry experts and relevant companies, seeking their professional feedback and advice. This thorough approach helped us refine our solution, ensuring it would meet both technical and regulatory standards while effectively addressing the needs of our users.

Shengran Weian: Key Feedback on Data Processing and Legal Compliance

During our interactions with Shengran Weian, a company specializing in intellectual property within the biological field, we received invaluable advice. Shengran Weian, with its extensive industry experience and deep legal knowledge, provided critical insights into data processing and model selection for our project. They emphasized that in biological research, the collection and organization of data are crucial steps. Accurate and comprehensive data not only form the foundation for model training but are also key to ensuring the reliability of the results.

Furthermore, Shengran Weian highlighted the importance of strict adherence to relevant laws and regulations when handling this data, particularly concerning data ownership. Given that synthetic biology involves a significant amount of patents and intellectual property protection, unauthorized use of data could lead to legal disputes and potentially negatively impact the project’s future development. They strongly recommended that we prioritize legal compliance throughout the data collection, organization, and usage processes and suggested collaborating with a professional legal team to ensure that every step of the project aligns with relevant regulations.

The feedback from Shengran Weian has played a crucial role in guiding the subsequent development of our project. We recognized that in designing our solution, it is essential not only to focus on the technical implementation but also to ensure the legality and compliance of the data used. Based on their advice, we decided to strengthen data management within the project and establish strict data usage policies to ensure that all data sources are legal and fully authorized. Additionally, we will incorporate legal reviews at every stage of model development to ensure that our solution operates safely and effectively within the legal framework.

Figure 8XJTLU-Software had a learning discussion with Shengran Weian

Figure 9Photo about Shengran Weian Company and XJTLU-Software

4. Implementation & Evaluation

After establishing the basic framework of our solution, we moved into the implementation and evaluation phase. This stage is crucial not only for translating our concepts into tangible results but also for continuously optimizing and refining the solution. Throughout this phase, we engaged in deep collaboration with leading teams and companies in the industry, regularly reflecting on and improving our approach. This iterative process was essential in ensuring that the project would achieve success in practice, allowing us to address any challenges and adapt our solution to meet real-world demands effectively.

CCIC Conference: Exchange and Optimization

During the CCIC (Conference of China iGEMers Community), we had the opportunity to engage in extensive exchanges with iGEM teams from across the country. These interactions allowed us to share the progress of our project and gain valuable insights from the experiences and innovations of other teams. Particularly in discussions with the judges, we received numerous constructive suggestions that helped us further clarify our project’s direction and identify areas for improvement.

One standout moment was our encounter with the team from Nanjing University, who introduced a novel approach to data processing. Their innovative thinking provided us with a fresh perspective, making us realize that there was still significant room for optimization in our data processing workflow. Inspired by their methods, we decided to integrate these techniques into our own project, which led to noticeable improvements in our practical implementation.

This conference not only boosted our confidence in the project but also motivated us to refine the technical details with greater precision, ultimately enhancing the overall quality of our work.

Figure 11Photo with NJU-China and the judges after the CCIC presentation

Yanyin Tech. Company: Reevaluating Model Simulation

After the CCIC conference, we made a special visit to Yanyin Tech. Company to seek their professional opinions on model simulation. As a well-known bioinformatics software company with extensive experience in model development and application, Yanyin Tech. provided us with valuable insights into the strengths and weaknesses of existing model simulation tools in the market. During our discussions, they pointed out that there are already several well-established simulation tools available, prompting us to rethink the future direction of our project.

Their feedback led us to reevaluate our project's core competencies and future development path. We realized that directly competing with existing tools in the model simulation space might not be the best approach. Instead, we decided to focus on enhancing the professionalism and specificity of our model, particularly in its applications within synthetic biology. By integrating and innovating upon existing tools, we aim to create a more customized solution that better meets the specific needs of our users.

Figure 12XJTLU-Software had a learning discussion with Yanyin Tech.

Figure 13Photo about Yanyin Company and XJTLU-Software

Closed-Loop Thinking: The 3R Principle

Throughout the development and promotion of our project, we adhered to the 3R principle proposed by the iGEM community: Responsiveness, Reflection, and Responsibility. This closed-loop thinking provided us with a comprehensive logical framework, guiding us to maintain effective communication, engage in continuous reflection, and uphold our social responsibilities at every stage of the project.

Following this principle, we focused not only on the technical aspects of our project but also actively interacted with society, reflecting on the feedback we received, and striving to give back to the community. In the later stages of the project, we extended our efforts beyond the campus, engaging in deep discussions with professionals from various sectors to ensure that our project not only stands out in terms of scientific and technological innovation but also withstands scrutiny on social and ethical grounds.

Roundtable on AI, Biosecurity, and Bioethics

Upon completing our project, we became acutely aware of the potential biosafety and ethical issues that could arise from the application of AI technology in the biological field. To address these concerns, we collaborated with Jilin University, Nanjing University, Shanghai Jiao Tong University, and the University of Science and Technology of China to co-host the “Roundtable on AI, Biosecurity, and Bioethics.” This forum brought together experts and scholars from various fields to discuss the role of AI technology in areas such as biological data, agriculture, and public health, as well as the potential safety and ethical challenges these technologies may pose.

During the forum, experts shared their research and perspectives, focusing on how to ensure the safety and ethical standards of these technologies while advancing technological progress. Following the event, we organized the drafting of the "White Paper on AI, Biosafety, and Bioethics," which comprehensively summarized the forum's discussions and proposed ethical guidelines and safety considerations for the future application of these technologies. This white paper established a crucial ethical foundation for the responsible application of AI in the biological field, demonstrating our commitment to advancing technology while actively upholding our social responsibilities.

Figure 14Photo of online roundtable discussion with NJU-China, SJTU-Software, USTC-Software, JLU-CP teams

If you want to learn more about the white paper, please open it

Download the Biology Safety with Artificial Intelligence PDF

Education and Outreach in Synthetic Biology

In addition to focusing on the technical aspects of our project, we actively engaged in the education and dissemination of synthetic biology. Recognizing the general public's limited awareness of synthetic biology, we collaborated with the XJTLU-CHINA team to organize a science outreach event at the Suzhou Customs Biosafety Museum, targeting primary and secondary school students. The event covered topics such as the management and prevention of invasive species and the protection of biodiversity. Through engaging explanations and interactive activities, we aimed to spark the students' interest in synthetic biology. These activities not only enhanced the youth's understanding of biosafety but also inspired their curiosity and enthusiasm for science.

Furthermore, we showcased our project’s achievements at the CCIC conference, where we used technical explanations and interactive demonstrations to further promote synthetic biology knowledge. Our poster presentations and live demonstrations attracted significant attention from the attendees, receiving widespread praise. Through these efforts, we hope to increase public understanding and appreciation of synthetic biology, ultimately contributing to the advancement of this field.

Figure 15Science popularization in Suzhou National Customs Biosafety science Hall

Figure 16Introduce our project to other teams on CCIC