Software | UCAS-China

Overview

Our Software Tool comprises two key components: Visualization page and Ask Lantern model. The visualization tool illustrates the functionality of our genetic circuits. The Ask Lantern model leverages the power of large language models (LLMs) to save time in searching the large BioBrick database. By allowing users to describe desired components in plain language, the model speeds up the search process, making it easy to select suitable BioBricks for experimental setups.

Visualization

Introduction

In our LANTERN Visualization tool, two methods are implemented to enable users to intuitively explore and modify genetic circuits:

By selecting inducers and observing their effects in real-time animations, researchers can gain deeper understanding of logical operations and dynamic behaviors of these circuits.
The tool supports circuit design based on logical expression, allowing users to input truth tables or Boolean expressions and receive optimized circuit configurations.

Usage

Click Visualization page to use it. Here is a gif that shows all the operations:

Demp

To view an animated demonstration, simply select the inducer you wish to add from the drop-down menu at the bottom left labeled "Select the inducer to add." Please note that you could only select one inducer at a time, after which you can click the "Run" button. The schematic representation of the genetic circuits on the right will adjust according to your selection, either by flipping or removing elements. Once the animation concludes, you may repeat this process to observe how the circuits evolve with the sequential addition of inducers. Additionally, you can click the "Show Expr" button to view the logical expression corresponding to the modified circuits. (Note that by doing so, the content in the "Original Expression" field and the truth table will be overriden.) To return to the original circuits, simply press the "Reset" button.

To get a circuit of a specific logical function, you can enter your target logical expression in the "Original Expression" field, such as "A|B". Alternatively, you can configure the truth table using the rightmost column of the table displayed. The page will then present the corresponding "Formalized Expression," along with the necessary inducers and their order of addition to achieve that expression. Following this, you can select the suggested inducers and enjoy the animated demonstration.

Ask Lantern

Demo

Introduction

Synthetic biology is growing thanks to new design methods that let researchers use existing parts to customize systems more efficiently. Standardized biological parts are vital to this field. The iGEM community has created a large collection of these parts, making it easier for anyone to access and reuse them. However, the huge Biobrick database can be overwhelming because there are so many options.

Recently, large language models have advanced quickly, especially in helping with semantic querying, which means finding the right term based on a description. We decided to use these models to help search for Biobricks. Last year, we developed Ask NOX (2023 UCAS-China iGEM team), a platform that lets users describe what they need in plain language, making it easier to find the right components. However, the model had several limitations:

Limited well-labeled data. only part of the Biobrick data were accessed while during Ask NOX. during the data cleaning, too much important information was lost, leading to inaccurate results.
Lack of user feedback. Gathering and analyzing user feedback is crucial for improving search tools, but this step was overlooked last year.
Suboptimal Model Performance: The model we employed was not the most advanced. More effective models have since been developed in the academic field, highlighting the potential for improvement.

This year, we've developed a better model called Ask Lantern, trained with more complete data, and added a feature to collect user feedback, helping us continuously improve the tool.

Usage

Welcome to our online platform! You can start experiencing our service directly via the webpage(http://rd2024.xjs.ink) we provide.

To search for a BioBrick:

Enter the description of the BioBrick you require.
Click the search button.
Wait for the model to display the results.

The search results will be presented line by line, with each line containing the name of a potentially matching BioBrick and a brief description. You can click on the name link to view detailed information about the BioBrick. If you are satisfied with any of the search results, please click on the thumbs-up icon following it. Your feedback will help us continuously improve our model's performance.

All of our source code has been open-sourced on GitLab. If you wish to deploy the platform yourself, please refer to the README file in the repository for instructions. If you want to utilize the API for quick development, you can click on the button at the bottom of the webpage to access detailed information about the API.

Method

An inverted dictionary involves identifying the appropriate target term from a word description, relying on semantic matching. In recent years, research in this area has grown substantially. The development of large-scale pre-trained models, such as BERT (Devlin et al., 20191), which incorporates multiple Transformer encoder layers, has been pivotal. These models have been adapted to enhance inverted dictionary tasks, enabling effective reverse lookups and supporting cross-lingual queries.

The idea of an inverted dictionary involves finding the right term from a description using semantic matching. This area has grown a lot recently, especially with models like BERT. These models have been adapted to enhance inverted dictionary tasks, enabling effective reverse lookups and supporting cross-lingual queries.

We used Biobrick data from parts.igem.org. By fine-tuning an inverted dictionary model with this data, we created a semantic space for Biobricks. When users input queries, BERT encodes them and maps them into this space. The model then finds and returns the most relevant Biobricks based on their proximity in this space.

Result

To test the model's accuracy, we split the data into two groups: "test_seen" (training set) and "test_unseen" (test set). The model was trained on the "test_seen" group and then tested with both "test_seen" and "unseen" inputs. Each test prompt corresponds to a BioBrick as the "label." We sorted output BioBricks by relevance and checked accuracy based on the correct BioBrick's rank (e.g., top 1, top 10, top 20). Our wet lab team also made queries based on their needs ("test_human" group). In most cases, they found suitable BioBricks within the first ten results. The test result of our reverse dictionary model for Biobricks is as follows:

test data	top1 hit rate	top10 hit rate	top20 hit rate
test_seen	0.971	0.996	0.998
test_unseen	0.047	0.338	0.433
seen+unseen	0.509	0.667	0.716
test_human	0.476	0.634	0.794

Top10 hit rate means the probability that the bio bricks you want appear in the top ten items in the model's result; others are similar.

Cooperation with wetlab. After we trained the model, the students in the wetlab group incorporated it into their work. This model helps wet lab teammates quickly locate the Biobricks they need when designing their experimental setups, such as BBa_K1021005 and BBa_K3963005, thus saving them time.

User-friendliness

Ask Lantern is designed to be user-friendly, enabling researchers without a computer science background to search Biobricks easily. The frontend utilizes Gradio to provide an intuitive web user interface and APIs. Our software is compatible with most modern browsers like Chrome, Firefox, Edge, and Safari. The source code is also available on GitLab, allowing users to deploy and modify it according to their needs.

After judging session

In the future, we aim to apply our model and workflow to other databases, such as UniProt, to assist researchers in sequence searches. Additionally, we plan to develop a publicly accessible model for the iGEM community, which would facilitate experimental design and enhance model iteration.

We have also noted that many teams, such as the DiKST project (Leiden, 2021) and the PartHub project (Fudan, 2022), are actively developing software for searching and visualizing Biobrick relationships. We hope to collaborate with these projects to provide more user-friendly software to the iGEM community. By working together, we can integrate various functionalities and create a comprehensive toolset that enhances user experience engaged in synthetic biology projects.

We plan to enhance our user feedback features by launching a forum that will empower users to contribute by adding or editing descriptions of Biobricks within our model. This collaborative platform aims to enrich our database, ensuring that search results for Biobricks are more accurate and relevant.

Reference

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186.
Slaven Bilac, Wataru Watanabe, Taiichi Hashimoto, Takenobu Tokunaga, and Hozumi Tanaka. 2004. Dictionary search based on the target word description. In Proceedings of NLP.
Yan, Hang, et al. BERT for Monolingual and Cross-Lingual Reverse Dictionary. 1, arXiv, 30 Sept. 2020. arXiv.org