Software



Plasmid.AI is the largest open-source toolkit for developing plasmid foundation models. Created by the iGEM Toronto team, this project aims to revolutionize the field of synthetic biology by leveraging machine learning to generate novel plasmids.

Overview

Plasmid.AI provides a comprehensive set of tools and models for the analysis, design, and generation of plasmids. By utilizing state-of-the-art machine learning techniques, this project enables researchers and synthetic biologists to explore new possibilities in plasmid engineering and design.

Installation

To install Plasmid.AI, you can either use pip or clone the repository for development access.

From pip.

pip install plasmidai

From source. For development or to access the latest features, you can clone the repository:

git clone https://github.com/igem-toronto/plasmidai.git
cd plasmidai
pip install -e .

Repository Structure

The Plasmid.AI project is organized into several key components, as outlined below:

data/               Contains datasets and scripts for data processing
    scripts/        Helper scripts for data manipulation
    tokenizers/     Custom tokenizers for plasmid sequences
datasets/:          Modules for loading and preprocessing plasmid datasets
experimental/:      Cutting-edge features and models in development
    callbacks.py:   Custom callbacks for model training
    lit.py:         Lightning modules for PyTorch Lightning integration
    optimizers.py:  Custom optimizers for training plasmid models
    sample.py:      Functions for sampling from trained models
    train.py:       Training pipelines for plasmid models
utils.py:           Utility functions used across the project
paths.py:           Path configurations for the project

This structure represents a clear development pipeline that facilitates data processing, training, and refining of plasmid models.

Repository Setup

The Plasmid.AI repository can be found at GitHub, or the iGEM GitLab for the purposes of our submission.

To clone the repository:

git clone https://github.com/igem-toronto/plasmidai.git

To set up the environment, install dependencies with:

conda env create -f environment.yml

The plasmid sequences used for training can be downloaded with:

cd data 
gdown "1iIsat00ST5vK-06BUstuTbJkfWKpV2lE" 
gzip -d 240212_plasmid_seq_54646.fasta.gz
mv 240212_plasmid_seq_54646.fasta plasmids.fasta

Contributing

We welcome contributions from the community! If you would like to contribute, please check out the repository on GitHub and follow the contributing guidelines.

License

Plasmid.AI is licensed under the MIT License.