1. Overview of Integrated Human Practices

In this project, HP worked through every step of constructing the Prometheus model, from idea to implementation, and played a great role in the integration of the development of the model and social progress, attracting more and more social forces to invest in the model, also maintaining the continuous improvement of the model.Meanwhile, HP undertakes the task to promote and publicize synthetic biology, through a variety of ways to let participants understand synthetic biology and love synthetic biology. Here are the highlights of our work:

1. We invited 50+ iGEMers to conduct a pre-survey to further investigate the problems existing in designing biological parts and use of the database.

2. We have interviewed experts in computer, biology and medicine, collected professional opinions and guidance extensively, improved our model according to their opinions, achieving precise matching of functional requirements and part descriptions through the large language model.

3. We provide new solutions to promote the integration of synthetic biology and AI and maintain AI-related biosafety byholding and participating in multiple biology-related conferences.

4. We created the "North-South iGEM League" to gather inspiration for the fusion of artificial intelligence and synthetic biology and create a guide to promote the application of AI models to synthetic biology.

5. We visited research institutes and companies related to bioinformatics and gained opportunities to cooperate with research institutes and the government.

6. We attended the establishment ceremony of Changzhou Zhonglou Synthetic Biology and AI Institute. Prometheus, as a typical case of the combination of artificial intelligence technology and synthetic biology, attracted the attention of government leaders.

7. We conducted several exchanges and cooperation in Jiangsu province, collected feedback from different stakeholders, and assisted the model to complete its scoring function.

8. We inspire students' determination to advance the career of synthetic biology by holding and participating in seminars, and bring together 30+ students to use, grade and give feedback on the Prometheus database.

2. Pre Investigation - Why do we start this project?

With the 20th anniversary of iGEM in 2024, we hope to summarize the achievements of iGEM over the past two decades and to create a project to facilitate future iGEMers.In the previous year, NJU-China 2023 notes that the iGEM parts database is not convenient to use, especially when searching for target parts. In order to verify whether this problem is common among iGEMers, NJU-China 2024 conducted a questionnaire survey on 55 iGEMers in the iGEM China community before we started our project.

Here are our questions and results.

"1.Have you used the iGEM parts database?"

"2. Have you used other parts/plasmid related databases or search engines?"

"3. If you chose "Yes" to the second question, what do you think are the advantages or disadvantages of the databases or search engines you have used?"

The following answers were written by users who selected "Yes" for question 2.

Number 3:"It is relatively complete, but many databases rely on designer uploads and cannot obtain the full sequence of some plasmids that are only mentioned in the literature."

Number 4:"It's very integrated."

Number 5:"The iGEM parts database only has the functions of the components in a particular project, and cannot access different functions in other projects."

Number 11:"Its data is incomplete."

Number 16:"The search results will be inaccurate. It's too broad."

Number 28:"The goal of synthetic biology is to standardize components like components in engineering. However, the current database does not have a unified standard for the 'indicators'of components, such as what data should be measured to measure a certain type of component/plasmid, expression intensity, size, plasmid copy number, the number of enzyme cleavage sites, labels, etc. What else are there? In addition, it seems that existing databases do not have very convenient filtering functions, such as filtering desired promoters according to "promoter strength". This depends on the sorting, entry and indexing of the aforementioned indicators."

Number 35:"It's not easy to find what you want."

Number 36:"It is not easy to find the interface and applicable version of the specific component."

Number 39:"Addgene lacks tag, generally has no marking element, and the element is generally not standardized."

"4. When designing and searching for parts, do you feel that steps are cumbersome?"

"5.When designing parts, can you be sure that the components you have found are the optimal solution?"

"6. If our model can save you the time to retrieve plasmids and improve efficiency, will you use it?"

About half of the iGEMers in our survey had used iGEM parts database, and most of them reported problems with missing data and inaccurate results. This confirms our hypothesis. More than 60 percent said steps are cumbersome when they design and search for parts. 80% of people are not sure that the component they find when designing the component is the optimal solution. We are glad to see that 85.45% of people will use our plasmid model if it can save time in retrieving components, which gives us the motivation to push forward the progress of our project.

In addition to the students, we also want to hear what the teachers think. In order to review the 20-year journey of iGEM, we interviewed Professor Weijuan Zheng, one of the earliest PIs to serve as iGEM instructors in Nanjing University, in order to understand the history of iGEM in Nanjing University ten years ago, and the current status of the participants in that year. We hope that this interview will be both a review of iGEM and an indication of the future of synthetic biology. Also, Professor Zheng has great hopes for our project.

Q1. When did you first become an iGEM instructor? Do you remember what the project was about?

I was the instructor of Nanjing-China 2013. The project is probably to build an engineered bacterium that can recognize, enrich and degrade a toxic compound. As an academic advisor, I participated in the 2013 Asian Competition in Hong Kong and won the gold medal. Later, I led the team to participate in the Global Competition held by MIT in Boston and won the silver medal.

Q2. How did participating in the iGEM competition affect you and the student team you led? Have any students chosen a career in the field of synthetic biology?

I sometimes talk to Mao Yafei, who was a member of Nanjing-China at the time. After studying for a doctorate in the United States and Japan, he is now teaching at Shanghai JiaoTong University. He has a wide range of interests and has tried many fields, mainly in the direction of biological evolution. We mainly communicate through WeChat and email.

Q3. We are very happy to see that the seniors who participated in iGEM have achieved such high achievements. How do you think the spread of iGEM and synthetic biology will affect society? In what ways would the general public benefit from learning about synthetic biology?

I think any promotion of scientific knowledge is beneficial, and the promotion of synthetic biology can improve people's scientific literacy.

Q4. In the tide of artificial intelligence, do you think synthetic biology needs the help of artificial intelligence?

With the rapid development of artificial intelligence today, any discipline will intersect with artificial intelligence, which is a big trend. Proper and reasonable use must be helpful.

Q5. Our team plans to create a large language model similar to a search engine this year. Just input your functional requirements for components, and it can help you obtain plasmids that can achieve corresponding functions. What advantages or disadvantages do you think it might have?

I think it does help, it makes research more efficient.

3. Feedback - How does HP connect with dry lab and wet lab?

In the process of building our model, we encountered some problems. For example, the internal attributes of the model are chaotic and cannot precisely describe the functions and features of the components, as well as the structure of the relationship network is too simple, which is not convenient for global indexing and relevant inference. To improve our AI model, HP interviewed Zhen Wu on May 8th, an associate research fellow at the School of Artificial Intelligence at Nanjing University, who is an expert in large language model. Through this interview, we got a lot of useful advice, and here is our conversation:

NJU-China : We are the iGEM team NJU-China of the School of Life Sciences. We are looking forward to receiving professional guidance and suggestions from you to optimize our project. At present, we can abstract the task as the query of one-to-many best match between texts based on the similarity of texts under the premise of supporting preprocessing. There are now multiple biological units to choose from, and the functions of each biological unit are given in the form of natural language. For a given user's input, we need to query the database for similar functions and get the biological unit that is most similar to the desired function provided by the user.

Wu Zhen

What are the difficulties you are facing now?

NJU-China : Our matching algorithm is not very good.

Wu Zhen

What is the form of your data?

NJU-China : Our data is all in text format, which has been classified.

Wu Zhen

You can use word vector technology, encode your functions into word vectors, encode things in the database into word vectors, and do a similarity search, isn't this possible?

NJU-China : OK. We have about 10,000 strings to be matched, and we need to match 10,000 strings every time we put forward a demand, which is relatively inefficient. At present, we use chatGLM.

Wu Zhen

You must be slow with large language models. Have you ever thought about doing a simple version first, just using some of the old models, no big language models, sort roughly first, then sort precisely, do a pre-filter.

NJU-China : Is there any suitable method to support pre-treatment or pre-screening?

Wu Zhen

Yes, for example, wordvector, after dividing the text into words, and then averaging to get its representation, should also be able to pre-filter some out. And if that doesn't work, you can use a classic like bert or something like that to train on it. Do you have a data set?

NJU-China : There are about a thousand at the moment.

Wu Zhen

A thousand should be OK. Without that training, I suggest you try the faster word vector and so on. There are also more open source ones, such as paragraph, which is a matching method for those vectors in a paragraph. You try to do a similarity calculation and see if these are the ones you want? It would be very efficient in that way. Because once you calculate the word vector of the 10,000 functions in advance, and then turn it into a 100-dimensional vector or a 200-dimensional vector, then what you're left with is that every time a requirement comes, you just need to turn this requirement into a vector, and do a similarity calculation with the remaining 10,000 dimensional vectors, which will be much faster.

NJU-China : OK, thanks teacher.

We also encountered some problems with semantic matching search, especially in which model we should use. To handle with these problems, HP interviewed Xinyu Dai on May 11th, a Professor of Department of Computer Science and Technology at Nanjing University.

NJU-China : Hello, Professor Dai. We are the NJU-China team participating in the 2024 iGEM competition. Currently, we are facing some challenges regarding semantic matching search in our project, and we would like to seek your advice. Here is the background: we have a user-input query, and we aim to conduct a semantic matching search in a database to retrieve content that fits the query. For example, when a technical term is inputted, we want to search for similar terms with the same meaning, a sentence that explains the term, or other terms that are conceptually encompassed by this term. All of this pertains to natural language processing in the context of synthetic biology. What we need is an efficient and accurate method to perform this semantic matching and search.

Xinyu Dai

There are indeed many tools available for this type of task. For example, commonly used models like Sent2Vec and Word2Vec. You might want to try some models that have been fine-tuned or trained on biological data.

NJU-China : Yes, we are aware that these models are quite mature. We also wanted to ask if you could recommend any newer models that would be suitable for our task. We were considering using some pre-trained large language models (LLMs), such as Llama2, Llama3, or ChatGPT, and using them to perform semantic matching searches on our database. However, would this approach be too slow?

Xinyu Dai

That approach would indeed be less efficient. Additionally, these large models may not necessarily have specialized knowledge in the field of synthetic biology, particularly for technical terms. If you're considering training a model yourself, you could try some BERT-based or GPT-based models. These can improve accuracy, but they may be challenging to implement. Another option is to explore knowledge representation models—there were some earlier models in this area.

NJU-China : Thank you for the suggestion, Professor. We also have a more directional question. Suppose the data in our database consists of components, and we have prior knowledge of the relationships between these components. For instance, we have a knowledge graph that stores these relationships. How can we make use of this knowledge?

Xinyu Dai

You could look into knowledge-fusion models. There are many large models now that can integrate external knowledge, such as using the relationship data from your knowledge graph. Specifically, you could explore retrieval-augmented methods to incorporate external knowledge into large models. Besides, there are a range of other approaches you could consider. In general, you could fine-tune the large model itself, enhance retrieval, or explore hierarchical relationship mining. All of these directions are quite feasible.

NJU-China : Thank you very much for the guidance, Professor. We will study it further.

NJU-China created the "North-South iGEM League", which is made up of iGEM teams interested in artificial intelligence. On August 4, NJU-China organized a seminar on synthetic biology and artificial intelligence at the School of Life Sciences of Nanjing University. Six teams from Peking University Health Science Center, Nanjing Tech University, China Pharmaceutical University and Nanjing University attended this seminar. Teams actively discussed the application of AI models in synthetic biology.

Mingyue Deng from NJU-China also presented two component search models - Prometheus model developed by our team and Kernel model developed by iGEM sponsor ASIMOV. At the same time, we also provide guidance for each team to use the model, and each team can describe the needs of the components to obtain the target components, which provides an opportunity for them to find better components and improve the wet experiment effect. We also collected feedback from various teams on the use of our model, Prometheus, and based on their suggestions, we further improved our Prometheus model.

After the seminar, NJU-China led the production of a guide for the application of AI models to synthetic biology. The tri-fold below is designed by Zhen Tang from NJU-China.Based on the project of each team, the guide describes the application of artificial intelligence models in synthetic biology or the aid role of wet lab. This is not only to promote the content of experiment for our project, but also we hope to let more people know about the integration of artificial intelligence and synthetic biology through this guide, and collect more people's feedback to improve our experiments.

In order to improve the ranking function of our model, we developed a "scoring plan" to collect professional evaluations of some parts. We are carrying forward this plan within the North-South iGEM League and the School of Life Sciences at Nanjing University. If you would like to help us with our project, you can scan the QR code of the poster below. We will continuously improve our model based on your contributions.

4. Stakeholders - How do we incorporate their views?

In order to make our project more practical, we visited stakeholders in different fields. We collected their opinions and suggestions on our project and saw the opportunity for project transformation from their views.On August 24th, we went to the Institute of Bioinformatics and Medical Engineering of Jiangsu University of Technology and visited Shan Chang, the director of the institute.

Yourong Shi started with Prometheus, introduced the concept and goal of NJU-China and the volunteer teaching activities we had done before. Then introduced the principle and working logic of our AI model, and finally explained the importance of feedback with wet experiment as an example. This gave the interviewees a preliminary understanding of our products, but also let them understand our ambitious vision of making synthetic biology accessible to everyone.

Zixuan Zhao gave the interviewee a detailed demonstration of how to use Prometheus and introduced the iGEM database. Zhao then showed the structure of our code and how the model communicates information from the front end to the back end. He also describes how we encode large amounts of data so that the model can be understood and learned.Here are questions and suggestions from the interviewee.

Questions raised by the interviewee:

Q1: I noticed that you did a two-step fine-tuning. And if there is no fine-tuning, are both steps necessary?

A1: It's our fault. We did not use our second fine-tuning step in the original model, and it was terrible. It extracted a lot of useless answers. Then a second tweak is necessary. The first step of fine-tuning, we are directly to do, because it is basically an academic convention.

Q2: What is the configuration of the server you are using?

A2: Four A6000 cards if configured. If we use the server in the research group to run outside, in fact, a 4090 is enough.

Q3: Can people who haven't studied biology use your model?

A3: We ran some examples last night. The user can directly use a more common expression, and does not need to know too much about the professional vocabulary related to biology. Shan Chang suggested that we use data presentation to give people a more intuitive feeling.Although our model may be useful for beginners, it has limited value for professionals and needs further refinement in downstream design.

Questions raised by NJU-China:

Q1: Do you currently have any projects similar to ours?

A1: No, we don't really do that. What we are more concerned about is that when the protein is expressed, it may need to optimize the codon. So we have some software that does that codon optimization.

Q2: About the combination of synthetic biology and bioinformatics, what is your opinion or inspiration on this?

A2: That's for sure. In fact, I have always felt this way, very early on, when I heard the concept of synthetic biology, I thought it should be closely integrated with bioinformatics, because synthetic biology is actually a discipline that wants to make biology more engineered, or more scientifically controllable. I think synthetic biology should be "accurate". But biology is often difficult to do accurate, traditional biology or bioengineering is not very accurate. So if you're going to turn it into a controlled, engineered discipline like synthetic biology, it has to be accurate. Only by using artificial intelligence can it be done accurately, otherwise it is the traditional life science, and this concept of synthetic biology that we say is completely not on the same level. So I think that's one of the key reasons why synthetic biology has become such a term that we're talking about so much now, because our current developments in artificial intelligence and bioinformatics, including computational biology, allow us to make it precise.

Q3: We want to find out which databases or search tools you have used:

A3: I mainly used PDB, BLAST, and AlphaFold.

Shan Chang presented his views on the combination of synthetic biology and bioinformatics. He believes that the addition of artificial intelligence and bioinformatics will make synthetic biology more precise, thus making biology more engineered and controlled.

He also mentioned that the mayor of Changzhou city will be working with the Institute to establish Changzhou Institute of Artificial Intelligence and Synthetic Biology, and he wants us to make a demo video of Prometheus so people can understand how does our model work. If there is industrialization cooperation with NJU-China in the future, he will provide us with a place and equipment.We look forward to a further cooperation with Changzhou Institute of Artificial Intelligence and Synthetic Biology in the future.

On September 29, in order to promote the synergistic development of AI technology and synthetic biology, with the support of the Changzhou government, the Changzhou Zhonglou Synthetic Biology AI Institute was established in Changzhou and a launch event was held. At the invitation of Shan Chang, NJU-China team's Prometheus participated in the event as a typical example of the combination of AI technology and synthetic biology, and received widespread acclaim. Vice Mayor Pengju Jiang and Secretary Jintu Chen of the Changzhou Municipal Party Committee also attended the event.

"Prometheus is a very novel product, but some attendees said they found it difficult to understand. We hope you can add a Chinese version and further democratize it," the director of the Changzhou Zhonglou Synthetic Biology AI Institute hopes we can further improve Prometheus and expresses hope that it can drive the development of the synthetic biology industry in Changzhou in the future.

In addition to research institutions, we hope to be able to explore the collaboration of our model in the medical field. On August 25, we interviewed Junfeng Shi, founder of Luoxi (Shanghai) Medical Technology Co., Ltd., also a doctor of Fudan Ophthalmology and Otolaryngology Hospital.

Yourong Shi introduced the tasks of the NJU-China, especially the subjects and importance of the HP group's interviews. Xinjie Shen first introduced the concept and goal of NJU-China from Prometheus, then introduced the principle and working logic of our AI model, and he finally explained the importance of feedback, taking wet lab as an example. This gave the interviewees a preliminary understanding of our products, but also let them understand our ambitious vision of making synthetic biology accessible to everyone. At the end of the interview, Shen also showed the interviewee the use of Prometheus Expert version.

Dr.Shi : How does the Prometheus system work for wet lab?

Xinjie Shen : We just used this model to find promoters for us.

Dr.Shi : The database you used is the iGEM database, so the results you screened must be true and valid?

Xinjie Shen : Through this database, we selected a more suitable promoter for our yeast project. Because we want to rely on this model to help us design an experiment, and a very important part of that experiment is to find the promoter. But the results are not necessarily the best and need to be improved.

Dr.Shi : How exactly do we provide the correct description of the promoter?

Xinjie Shen : First of all, it is necessary to locate which gene, and then which region of which chromosome, and the specific sequence. But in our terminal description, it doesn't have that information. Therefore, we need to go to NCBI or Uniprot and other websites to find out, at least these websites have more recognized and accurate promoter information.

Dr. Shi provided some suggestions for our project and the iGEM database:

With the development of sequencing technology, the location and name of the promoter sequence identified in the past and the sequence identified now are likely to change. So for iGEM's database, there needs to be a normalization, or update, based on the existing known information, which helps them do a normalization of these original information. This can also be one of your functions.

This step of searching literature can also be replaced by AI. Replace the search box with a chat model on the Prometheus site, so you can talk to it over and over again, and it also has the ability to browse other sites, such as literature.

It is recommended that the logo used for the promoter on the website be rotated 90 degrees, otherwise it might be difficult to understand its meaning.

On September 29, we went to Changzhou Sungod Biotechnology Engineering Equipment Co., Ltd. to introduce the concept and work objectives of NJU-China to the general manager and technical team. In order for users to better try our model, NJU-China produced a Prometheus demo video and led them to experience Prometheus, introducing the application of AI model in synthetic biology and its working principle. Technicians say Prometheus is a very novel model with amazing features and hope to commercialize it in the future.

5. Collaboration and Promotion - How can we spread synthetic biology to create a better world?

1. World Environment Day - Artificial intelligence gives synthetic biology more possibilities

On June 5, NJU-China, together with the Nature Association of Nanjing University and Nanjing-China, held a publicity activity for World Environment Day. NJU-China held a booth with the theme of "Synthetic Biology and Environment", popularizing the important applications of synthetic biology in environmental protection, and showcasing the various possibilities belonging to the large model of synthetic biology.

Through this publicity, we let the teachers and students of Nanjing University understand the application scenarios of synthetic biology in environmental protection, such as straw renewable fuel, green degradable fiber, green forest fire retardant foam, microplastic degradation enzyme system, etc., aiming to let more people understand the various application possibilities of synthetic biology.

2. Cooperation with XJTLU - How can our models be used to promote synthetic biology?

At the invitation of XJTLU, we participated in a roundtable forum on biological safety related to AI on July 22nd. Zixuan Zhao from our dry lab gave a speech titled "LLM Safety in AI Based Biomedical Researches" and discussed safety risks for general LLMs with other teams. At this conference, we not only served as a member of the white paper on AI-related biological safety, but also gained insights into improving prompt design, such as paying extra attention to the expression of the environment when extracting descriptions and introducing safety and ethical norms.

On September 24, 2024, XJTLU-China iGEM team organized a lecture aimed at promoting the iGEM competition to a broader audience and sharing the team's scientific research experience. During this lecture, the team specifically showcased the Prometheus model developed by the Nanjing University iGEM team (NJU-China), which holds significant application value in the field of synthetic biology.

Event Highlights:

Prometheus Model Demonstration: The XJTLU-China iGEM team demonstrated the powerful capabilities of the Prometheus model through practical operations. The model showcased its efficiency and precision, particularly in providing suitable synthetic biology components for research.

Positive Feedback: Teachers, students, and industry experts who attended the lecture spoke highly of the Prometheus model. Feedback indicated that participants widely recognized its broad application prospects in teaching and scientific research, effectively supporting the experimental design and data analysis of synthetic biology.

Promoting iGEM Competition and Synthetic Biology:The lecture not only showcased the Prometheus model but also introduced the background, significance, and participation experience of the iGEM competition. To a certain extent, it greatly promoted synthetic biology and our team's product, making our product known to more people.

The lecture by the XJTLU-China iGEM team was a complete success. Through this event, not only was the iGEM competition successfully promoted, but the power of synthetic biology tools was also demonstrated, especially the important role of the Prometheus model in scientific research. The team looks forward to hosting more such events in the future to further promote the popularization and application of synthetic biology.

3. CCIC-collaborating with more teams, promoting the application of AI in synthetic biology

At the 11th Conference of China iGEMer Community (CCiC) held from July 11th to July 14th, 2024, members of the NJU-China team, including Shen Xinjie, Guan Zecheng, Deng Mingyue, and He Feng, actively participated in this grand event. Our goal was to showcase our iGEM project, Prometheus, and engage in in-depth exchanges and learning with iGEM teams and synthetic biology enthusiasts from all over the country.

Before heading to Xi'an Jiaotong-Liverpool University, the NJU-China team made meticulous preparations. We designed an exquisitely crafted poster that was not only aesthetically pleasing but also rich in content, detailing their Prometheus project. The poster particularly emphasized the model they developed — Prometheus, which is the core of our project.

At the CCiC venue, the NJU-China team's poster attracted the attention of many iGEMers. They showed a strong interest in the Prometheus model, providing the NJU-China team with an opportunity to showcase and promote their product. The team members warmly invited attendees to try out their model and search for the biological parts needed for their own teams.

After the trial, the NJU-China team sincerely sought feedback and suggestions for improvement from the attendees. They listened carefully to every voice of the participants and recorded valuable suggestions.

During the presentation session in the conference hall, the NJU-China team showcased their project to the audience present. After the presentation concluded, three professors from XJTLU University, China Agricultural University, and the Tianjin Institute of Industrial Biotechnology at the Chinese Academy of Sciences each offered their professional advice.

After the meeting, the NJU-China team members meticulously organized the guidance received from various sources and summarized the following three main suggestions:

User-friendliness

Database completeness

Optimization of output content

After receiving these constructive suggestions, the HP team members promptly conveyed the feedback to the dry lab group. The dry lab members responded swiftly and made improvements based on the feedback, creating a front-end web page for use.

After the improvements, the NJU-China team's Prometheus model became more powerful and practical, providing the iGEM community with a more efficient and reliable tool. The experience at CCiC not only provided the team with valuable feedback but also promoted further development of their project, and to some extent, further promoted our Prometheus model.