Plasmid.AI is the largest open-source toolkit for developing plasmid foundation models. Created by the iGEM Toronto team, this project aims to revolutionize the field of synthetic biology by leveraging machine learning to generate novel plasmids.
Plasmid.AI provides a comprehensive set of tools and models for the analysis, design, and generation of plasmids. By utilizing state-of-the-art machine learning techniques, this project enables researchers and synthetic biologists to explore new possibilities in plasmid engineering and design.
To install Plasmid.AI, you can either use pip or clone the repository for development access.
From pip.
pip install plasmidai
From source. For development or to access the latest features, you can clone the repository:
git clone https://github.com/igem-toronto/plasmidai.git
cd plasmidai
pip install -e .
The Plasmid.AI project is organized into several key components, as outlined below:
data/ Contains datasets and scripts for data processing
scripts/ Helper scripts for data manipulation
tokenizers/ Custom tokenizers for plasmid sequences
datasets/: Modules for loading and preprocessing plasmid datasets
experimental/: Cutting-edge features and models in development
callbacks.py: Custom callbacks for model training
lit.py: Lightning modules for PyTorch Lightning integration
optimizers.py: Custom optimizers for training plasmid models
sample.py: Functions for sampling from trained models
train.py: Training pipelines for plasmid models
utils.py: Utility functions used across the project
paths.py: Path configurations for the project
This structure represents a clear development pipeline that facilitates data processing, training, and refining of plasmid models.
The Plasmid.AI repository can be found at GitHub, or the iGEM GitLab for the purposes of our submission.
To clone the repository:
git clone https://github.com/igem-toronto/plasmidai.git
To set up the environment, install dependencies with:
conda env create -f environment.yml
The plasmid sequences used for training can be downloaded with:
cd data
gdown "1iIsat00ST5vK-06BUstuTbJkfWKpV2lE"
gzip -d 240212_plasmid_seq_54646.fasta.gz
mv 240212_plasmid_seq_54646.fasta plasmids.fasta
We welcome contributions from the community! If you would like to contribute, please check out the repository on GitHub and follow the contributing guidelines.
Plasmid.AI is licensed under the MIT License.