Plasmid.AI
An intelligent solution to antibiotic resistance

Motivation

The rise of antibiotic resistance

Antibiotics are crucial in treating bacterial infections, and have prevented millions of deaths since the discovery of penicillin in 1928 [1]. However, misuse of these drugs in medicine and agriculture, combined with a dry antibiotic pipeline [2], has led to the generation of superbugs – namely, bacteria that can withstand antibiotic treatment [3]. This, coupled with broader drug resistance from other microbes (viruses, protists, etc.), constitutes a phenomenon called antimicrobial resistance (AMR), which represents one of the most pressing global health challenges humanity currently faces. According to the World Health Organization (WHO), AMR is projected to cause 10 million deaths per year by 2050, cost more than $100 trillion in lost economic output, and drastically increase poverty rates [4].

Many organizations have developed watchlists for the most concerning superbug bacteria, notably the ESKAPE pathogens [5] and the WHO’s critical priority pathogens list [6]. In particular, pathogens like carbapenem-resistant Enterobacteriaceae and extensively drug-resistant tuberculosis are at the forefront of antibacterial research.

Current solutions fall short

Multiple treatment methods currently exist to tackle the pathogens above, but all with limited efficacy:

Treatment Limitations
Stronger antibiotics
  • Bacteria become resistant to antibiotics faster than we can produce them
  • Antibiotics that are too strong are harmful for patients
Nanoparticles
  • Limited success demonstrated in vivo
Antimicrobial peptides
  • Hard to design and formulate
  • Often degraded in vivo
Phage therapy
  • Hard to find suitable phages for patient infections
  • Very complex infection dynamics

At iGEM Toronto, we are rethinking the way we approach treating drug-resistant pathogens. Instead of directly attacking superbugs, we are leveraging generative artificial intelligence (AI) to make resistance-nullifying plasmids that healthy bacteria can transfer to them. This approach is both highly targeted to single patient bacterial strains while being highly generalizable to all types of resistance mechanisms.

To achieve this, we aim to create a proof of concept through a two part project:

  1. Develop the generative model and lab automation to create novel, working plasmids

  2. Fine-tune the plasmid functionality to enhance their stability and make them target antibiotic resistance pathways

To make this pipeline a reality requires an integrated effort from all of our sub teams.

Dry Lab

Dry lab generates whole plasmids using machine learning (ML). Through an in silico validation pipeline, we then filter the sampled sequences by predicted viability and deliver promising plasmid components to the wet lab.

Generating plasmids

We aim to explore different sequence modelling approaches and, to scale to entire plasmid sequences, we also pull from the recent literature on long-range ML architectures, such as Mamba [7] and Hyena [8]. Our preliminary experiments pursued diffusion-based approaches, which iteratively refine random noise into coherent outputs [9,10,11]. We thought that they could naturally extend into geometric ML algorithms that respect the circular nature of plasmids. Unfortunately, we found it too challenging to train diffusion models, which we attribute to the noisier objective function, long sequences, and limited compute.

Next, we investigated a nucleotide-level language modelling approach, where plasmids are generated one nucleotide at a time in linear order. We train on plasmids curated from the PLSDB database [12] with plasmids of less than 10 kbp in length. Batch-wide analysis focusing on metrics such as size distribution, GC content, and repeat types has shown one batch to be promising, transitioning it to our in silico validation pipeline. Our future efforts will focus on implementing hybrid architectures and a custom tokenizer to compress nucleotide inputs for training models.

In silico validation

We aim to assess the validity of important plasmid components such as the ori, toxin-antitoxin pairs, and antibiotic resistance genes in silico to filter initial batches of sequences to a few viable results ready for wet lab testing. We have currently constructed a pipeline for assessing oriV viability, with pipelines for other components to follow.

Our oriV verification pipeline consists of the following:

  1. Alignment: The most broadly used ori detection tool, Ori-Finder [13] was found to be insufficient for our project, as it lacked open source code and had a size limit for uploads, making it unscalable. Thus, we aligned 10,000 generated sequences against DoriC [14] (the largest existing ori database) using MMseqs2 [15]. Oris were filtered based on their alignment scores.
  2. Annotation: We curated a database of important ori components and motifs from across literature and DoriC, found here: motif database. We searched for DnaA boxes, AT-rich regions, iterons, and accessory motifs in generated oris, as literature showed these components were important to ori viability [16].
  3. Replication machinery: We searched for appropriate replication machinery to integrate our ori into the appropriate plasmid backbone.

We are awaiting wet lab results to verify the oris in vitro.

Wet Lab

Artificial plasmids generated by machine learning models hold great potential for addressing antibiotic resistance. The wet lab team is tasked with validating these AI-generated sequences in vitro.

Technology rationale

Our primary strategy to counter superbugs involves creating AI-generated plasmids that can outcompete the natural plasmids carrying antibiotic resistance genes. While all plasmids impose a metabolic burden on bacteria, those harbouring genes conferring antibiotic resistance are strongly selected for due to their survival advantage in the presence of antibiotics. Our goal is to design AI-generated plasmids that provide an even stronger selective advantage, causing bacteria to replace the resistance-carrying plasmids with our engineered versions.

When antibiotic pressure is removed, bacteria no longer have a selective reason to retain plasmids carrying resistance genes, making them once again susceptible to antibiotics. Our AI-generated plasmids function to aid the bacteria in accelerating this process of discarding the plasmids carrying antibiotic resistance by providing an even more favourable alternative.

More favourable plasmids are less metabolically burdensome, easier to replicate, and capable of directly competing with existing resistance plasmids. Our team will tackle all three facets by designing plasmids with replication machinery that is not only more efficient but fundamentally incompatible with target plasmid. Incompatibility creates competition for host replication machinery, leaving the target plasmid insufficient replication resources to sustain itself, effectively neutralizing it.

Plasmid design, construction, and validation

To validate the efficacy of AI-generated plasmids, the wet lab team will synthesize these plasmids and test them in bacterial cultures. This process starts with first confirming the functionality of these generated plasmids. The generated plasmid’s origin of replication (oriV) are isolated, synthesized, and subsequently cloned into a testing backbone and transformed for characterization.

To facilitate high-throughput, the testing backbone is engineered with Golden Gate type IIS-restriction enzyme cut sites for transgene insertion. The functionality of the AI-generated oriV will be assessed through growth assays, where the number of colony-forming units (CFUs) in experimental populations will be compared to negative control populations. A significant difference in CFUs in the experimental group would indicate successful plasmid replication and functionality.

Further validation of plasmid viability will be conducted by extracting plasmids from transformed bacteria followed by sequencing, hence confirming the propagation of the oriV and other essential components such as the toxin-antitoxin system.

Hardware

Our dry lab-wet lab pipeline consists of exhaustively generating and testing candidate AI-generated sequences for purpose and performance. As numerous and diverse plasmid components are being generated and evaluated at any time, high throughput experimentation workflows, tools, and best practices must be developed to support team operations. This has the added benefit of expediting use of our dry lab-wet lab pipeline by other researchers or in industry. The hardware team’s mandate is to develop such workflows and tools.

High throughput operations

Sample storage and management is a perennially present problem across all kinds of labs. Laboratories at the University of Toronto typically use HECHMET for laboratory inventory management [17], which is a barcode based check-in/check-out system. Although this solution is optimised for generic chemicals, it creates high overhead for small samples with short object life cycles. To enable high throughput operations, we are developing a solution aimed at tackling managing large numbers of samples, potentially across different locations.

The sample manager product system will consist of one or more holders for samples with sensors and wireless communication capabilities, a server for data retention and information transmission, and an application endpoint for data display and check-in/check-out functionality. Wet lab team input and solutions used in the manufacturing industry were incorporated. Cost and ease of use were considered as part of the design process.

We based the product system based on a self-driving lab architecture described in [18]. Although initially developed for a different purpose, we are adapting the architecture mentioned to form a readily extendable internet of things framework for developing future networked laboratory automation projects. Furthermore, data collected from usage trends identified here may be fed into the Lean and Six Sigma process design strategies, improving wet lab team parallelism while improving quality and decreasing variability.

High throughput tools

Most assays done by the wet lab team will require some form of growth quantification. Colony counting and OD600 assays are two such approaches. However, the former requires significant repetitive effort and time, while the latter requires specialised equipment in the form of a spectrophotometer or nephelometer [19]. Advancements in computer vision, however, have enabled object detection models to succeed in the former task [20].

We intend to implement and finetune a Vision Transformer model to perform colony detection on petri dish plates. Furthermore, we plan to deploy the vision transformer to a single chip computer, which will be housed on a benchtop stand with a camera and related accessories. The whole assembly will be internet of things accessible. Considerations made by the team include cost, throughput, ease of construction, and eventual open source availability.

Human Practices

Our objective is to comprehensively educate the public about antibiotic resistance through a holistic approach that addresses all relevant sectors. To achieve this, we are implementing a series of strategic actions. First, we are collaborating with schools to educate children on this critical issue, developing age-appropriate programs to instill awareness from an early age. In addition, we are conducting preliminary risk assessments to understand the scope and impact of antibiotic resistance, guided by established frameworks such as the TUCKER framework, the NASEM framework, and the FBI UNICRI framework.

Engaging with stakeholders, including both supporters and opponents, is another key step in our strategy. We aim to discuss potential countermeasures and gather valuable feedback and insights to enhance and refine our project. To ensure our approach is well-rounded, we are also collecting detailed information about antibiotic resistance in various sectors, such as healthcare, agriculture, and the environment. This information will help us tailor our strategies and interventions more effectively.

To reach different demographics, we aim on creating a variety of educational materials, including newspaper articles, blogs, and using social media. These resources will be designed to be accessible and engaging, maximising public outreach and impact. Furthermore, we are committed to continuously improving our initiatives by actively seeking and incorporating feedback from stakeholders and the public. By following this structured approach, we aim to comprehensively address antibiotic resistance and effectively educate the public, leveraging collaboration, informed strategies, and continuous engagement.

Entrepreneurship

Our objective is to create a go-to market stage company for our AI-powered discovery engine that integrates our proprietary large language model (dry lab) with rapid in vitro screening and clinical validation (wet lab). Our company will identify new antibiotic-resistant strain killing mechanisms and leveraging existing antibiotics to create therapies faster and more cost-effectively than competitors.

We’re co-incubating with 2 independent incubators located in Canada and the UK. These incubators have allowed us to refine our problem statement and understand our value proposition. We’ve developed a rapid fire 3-minute pitch, a 6-minute investor ready pitch, a robust business plan, and other deliverables necessary to raise seed-stage capital and beyond.

References

  1. Adedeji, W. A. The treasure called antibiotics. Annals of Ibadan Postgraduate Medicine 14, 56–57 (2016).
  2. Gupta, S. K. & Nayak, R. P. Dry antibiotic pipeline: Regulatory bottlenecks and regulatory reforms. Journal of Pharmacology and Pharmacotherapeutics 5, 4–7 (2014).
  3. Strathdee, S. A., Hatfull, G. F., Mutalik, V. K. & Schooley, R. T. Phage therapy: From biological mechanisms to future directions. Cell 186, 17–31 (2023).
  4. World Health Organization. New report calls for urgent action to avert antimicrobial resistance crisis. (2019).
  5. Santaniello, A., Sansone, M., Fioretti, A. & Menna, L. F. Systematic review and meta-analysis of the occurrence of ESKAPE bacteria group in dogs, and the related zoonotic risk in animal-assisted therapy, and in animal-assisted activity in the health context. International Journal of Environmental Research and Public Health 17, 3278 (2020).
  6. World Health Organization. WHO publishes list of bacteria for which new antibiotics are urgently needed. (2017).
  7. Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
  8. Poli, M. et al. Hyena hierarchy: Towards larger convolutional language models. arXiv preprint arXiv:2302.10866 (2023).
  9. Austin, J., Johnson, D. D., Ho, J., Tarlow, D. & Berg, R. van den. Structured denoising diffusion models in discrete state-spaces. in Advances in neural information processing systems (eds. Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S. & Vaughan, J. W.) vol. 34 17981–17993 (Curran Associates, Inc., 2021).
  10. Hoogeboom, E. et al. Autoregressive diffusion models. in International conference on learning representations (2022).
  11. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. in Advances in neural information processing systems (eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) vol. 33 6840–6851 (Curran Associates, Inc., 2020).
  12. Schmartz, G. P. et al. PLSDB: Advancing a comprehensive database of bacterial plasmids. Nucleic Acids Research 50, D273–D278 (2021).
  13. Dong, M.-J., Luo, H. & Gao, F. Ori-Finder 2022: A comprehensive web server for prediction and analysis of bacterial replication origins. Genomics, Proteomics & Bioinformatics 20, 1207–1213 (2022).
  14. Dong, M.-J., Luo, H. & Gao, F. DoriC 12.0: An updated database of replication origins in both complete and draft prokaryotic genomes. Nucleic Acids Research 51, D117–D120 (2022).
  15. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology 35, 1026–1028 (2017).
  16. Solar, G. del, Giraldo, R., Ruiz-Echevarría, M. J., Espinosa, M. & Díaz-Orejas, R. Replication and control of circular bacterial plasmids. Microbiology and Molecular Biology Reviews 62, 434–464 (1998).
  17. University of Toronto. Central chemical inventory management system. (2019).
  18. Baird, S. G. & Sparks, T. D. Building a ‘hello world’ for self-driving labs: The closed-loop spectroscopy lab light-mixing demo. STAR Protocols 4, 102329 (2023).
  19. Implen. OD600 (cell density, bacterial growth, yeast growth). (2024).
  20. Majchrowska, S. et al. AGAR a microbial colony dataset for deep learning detection. arXiv preprint arXiv:2108.01234 (2021).