Engineering Success

Summary of Engineering Success


Throughout the 2024 iGEM season, we pursued engineering in every aspect of our project in discovery of novel satellites, both bioinformatically and physically, and in our three applications: gene delivery using the P4 cosmid, using novel mycobacteria satellites to overcome prophage immunity, and host range expansion using tail fiber engineering with novel mycobacteriophage satellites. In the process of achieving engineering success in each of these facets, we utilized mathematical modeling as a framework to inform the engineering process at every step, and the engineering informed the parameters to refine the modeling. 


For detailed information about our experiments and results, please visit our experiments page and results page.


Bioinformatic Discovery of Novel Satellites


For more information about our software, please visit our software page.


Final SaPhARI Pipeline: 

The SaPhARI pipeline was developed to provide a comprehensive, automated solution for the identification and classification of satellite families in bacteriophages. Leveraging a range of powerful bioinformatics tools, including Prodigal, DIAMOND BLAST, Aragorn, Barrnap, tRNAscan-SE, and custom-built Python and shell scripts, the pipeline offers researchers a high degree of flexibility and precision in annotating and clustering protein sequences. The final version allows users to seamlessly customize database searches, apply functional filters, and group proteins based on their target families, streamlining the discovery and characterization of satellite prophages.

Sample Image

Early Development and Tool Integration:

The initial phase of SaPhARI's development focused on integrating key tools for accurate protein annotation and functional grouping. We aimed to build a robust pipeline comparable to PHASTEST, capable of clustering proteins and offering detailed functional insights. The workflow was initially designed to process nucleotide sequences, filter out prophage regions, and pass the data through a sequence of specialized tools. The ultimate goal was to deliver high-quality gene annotations alongside functional protein clusters to facilitate downstream analysis.


The core tools integrated in this early version included:
  • Prodigal: A gene prediction tool used for identifying protein-coding regions from nucleotide sequences.
  • DEPhT: Initially incorporated for prophage detection but later removed from the pipeline to streamline processing.
  • DIAMOND BLAST: A high-speed protein alignment tool used for comparing sequences against large databases.
  • BLASTn: Facilitates nucleotide sequence alignments for DNA region comparisons.
  • Aragorn, tRNAscan-SE, and Barrnap: These specialized tools were incorporated for the detection of non-coding RNA elements such as tmRNA, tRNA, and rRNA genes, ensuring thorough gene annotation.

To manage the command-line interface (CLI) tools and optimize workflow efficiency, we utilized Nextflow as the core pipeline orchestration tool, enabling seamless integration and scalability.


Design-Build-Test Cycle: From Initial to Final Pipeline

The development of the SaPhARI pipeline was guided by an iterative Design-Build-Test cycle. Each component of the pipeline was tested and improved to achieve a robust system for bacteriophage satellite discovery. The following sections outline the design rationale, construction, and iterative testing of the pipeline's key tools and processes.


Design: Prodigal was chosen for its accuracy in predicting open reading frames (ORFs), making it a critical first step in identifying potential satellite prophage regions. The objective was to predict ORFs from nucleotide FASTA files of experimentally verified prophages, such as EcCIEDL933 and EcCICFT073 in our software experiments, while capturing essential annotations needed for downstream analysis.

Build: Prodigal was configured to accept nucleotide sequences and output either nucleotide or amino acid ORFs with comprehensive notations, including ORF number, strand direction, start and stop positions, and GC content. Additionally, metadata such as partial gene status and start codon types were retained to provide full context for each ORF.

Test: Early tests revealed that Prodigal’s raw output could not be directly utilized by downstream tools like BLAST+ due to inconsistent formatting in the headers. The outputs lacked essential notations making it impossible to trace back ORFs to their genomic locations.

Learn: We introduced a Nextflow process (formateHeaders) to clean Prodigal’s output, specifically reformatting headers to preserve ORF annotations. This ensured that all downstream tools could correctly parse ORF data and maintain accurate linkages between ORF predictions and genomic coordinates.


Design: BLASTn and DIAMOND BLAST were selected for their high-speed sequence alignment capabilities against curated databases (e.g., PHASTEST). The goal was to enable comprehensive functional annotation of predicted ORFs by identifying homologous proteins across viral and bacterial datasets, critical for satellite phage analysis.

Build: We configured BLAST+ with a customizable output format (outfmt 6), presenting key alignment details in a tabular structure. This format was selected to streamline downstream analysis, providing essential information such as alignment length, bit scores, e-values, and taxonomic classification.

Test: During initial runs, BLAST produced large, complex outputs that were difficult to parse for biologically meaningful insights. The volume of data slowed downstream processes and overwhelmed manual interpretation efforts, particularly when dealing with large datasets.

Learn: To optimize the output, we introduced user-configurable filters, allowing the contents of static parameters—e-value, percent identity, percent coverage, and the number of matches—to be adjusted. This enhancement allowed users to focus on the top hits and tailor the alignment depth, ensuring that only the most relevant homologs were included in the results. This reduced processing time and made the outputs more manageable and meaningful for further analysis.


Design: Functional annotation of proteins is a critical step in satellite prophage identification. Our objective was to automate the assignment of protein functions based on BLAST results, while minimizing the inclusion of hypothetical proteins, which contextually contribute limited biological insight.

Build: A custom Python script (extract.py) was developed to parse BLAST outputs and assign functions to each ORF. The script works by selecting the majority function from the top BLAST hits, excluding hypothetical proteins where possible. If a consensus function is found, it is assigned to the ORF. If all matches are hypothetical, the ORF is flagged accordingly.

Test: Initial tests revealed that the script struggled with ambiguous BLAST results, especially when low-quality or irrelevant matches dominated the output. In cases where no clear majority function could be determined, the script either failed to assign a function or assigned inconsistent annotations.

Learn: To improve accuracy, we introduced a scoring system that ranks protein functions based on the order of top BLAST hits. If no clear majority function exists, the highest-scoring match is selected. In cases where all hits are hypothetical, the function is explicitly labeled as hypothetical, distinguishing these proteins from poorly characterized ones. This refinement significantly improved the consistency and completeness of functional annotations, making the results more reliable for downstream analysis.


Design: In designing our pipeline, we aimed to leverage existing public databases for protein detection, expecting that PHASTEST’s focus on viral and bacterial proteins would efficiently identify relevant homologs, while the comprehensive scope of NCBI’s Non-Redundant Protein Database(NR) would ensure broader protein coverage. Our goal was to enable accurate detection of satellite prophage proteins without prematurely building a custom database.

Build: We implemented PHASTEST and NR as our primary databases for protein detection, integrating them into our pipeline for sequence alignment and functional annotation using BLASTn and DIAMOND BLAST. PHASTEST was chosen for its specificity, while NR provided extensive coverage of non-redundant proteins.

Test: During the initial tests with satellites EcCIEDL933 and EcCICFT073, we evaluated PHASTEST and NR separately. PHASTEST demonstrated faster alignment but failed to capture key satellite proteins such as AlpA, leading to detection gaps. In contrast, NR provided better protein coverage but proved to be computationally expensive, significantly slowing down the pipeline due to its large size.

Learn: To address these issues, we developed a custom database incorporating the non-redundant bacterial, viral, and archaeal protein sequences from RefSeq, as well as a curated set of satellite-specific proteins from PLE’s. Additionally, we refined our analysis by shifting the focus to benchmarking with DIAMOND BLAST, which provides significantly faster alignment speeds while maintaining high accuracy. BLASTn is still available as an option in SaPhARI, but we prioritize DIAMOND for its efficiency. This shift, combined with our custom database, resulted in reduced processing times and improved detection accuracy for key satellite proteins, ultimately enhancing both performance and coverage.


Design: DEPhT, a prophage detection tool, was integrated into the pipeline to screen genomes for potential prophage elements as a pre-screening step before satellite detection to prevent false positive detection. The objective was to assess DEPhT's effectiveness in detecting wide host-range prophages.

Build: A process was developed to input genome sequences into DEPhT, enabling the collection of predictions regarding prophage regions. These predictions were then integrated into the broader SaPhARI workflow for subsequent satellite analysis.

Test: Although DEPhT yielded reliable results for certain prophages, challenges arose in managing and training the tool on diverse bacterial strains, leading to inconsistent performance.

Learn: Given its performance limitations, DEPhT was ultimately removed from the pipeline in favor of direct satellite protein clustering utilizing BLAST and Prodigal outputs. This decision was made to allow researchers greater flexibility in identifying satellite prophages without the constraints imposed by a pre-screening process in the pipeline, which primarily focused on excluding prophage sequences rather than aiding in their identification.


Design: Non-coding RNA elements, including tRNAs, rRNAs, and tmRNAs, play crucial roles in prophage biology and serve as important indicators of satellite elements. The objective was to integrate specialized tools—Aragorn for tmRNA detection, Barrnap for rRNA, and tRNAscan-SE for highly accurate tRNA prediction—to effectively annotate these non-coding elements.

Build: Separate Nextflow processes were developed for Aragorn, Barrnap, and tRNAscan-SE, allowing adjustable parameters tailored to specific detection needs. Score thresholds were established for tRNAscan-SE, while e-value, rejection, and length cutoffs were defined for Barrnap. 

Test: Initial tests demonstrated successful detection of non-coding RNA elements, with parameters adjustable by users to optimize detection based on specific research needs. This flexibility allowed for enhanced accuracy in identifying non-coding elements within the genomic context.


Design: To classify satellite prophages based on genomic protein clusters, we developed a Python class titled Satellite. This class allows for flexible grouping of proteins, configurable aliases (e.g., treating 'major capsid' and 'major head' as synonyms), and customizable thresholds for the number of proteins required to elicit a match. Protein searching is performed through string-based matching of the protein titles from the extract.py outfile, which we opted for as it is easier for users to work with. By treating the protein titles as strings, the system can easily identify clusters based on similar or synonymous protein names. This approach also helps account for convergent evolution in proteins, where similar functions may arise independently, allowing for more flexible classification. The goal was to create a user-friendly framework for classifying satellites into distinct families.

Build: The class was built to manage unordered protein sets, with customization through aliases. Key parameters, which include minimum protein thresholds for family assignment, exclusion of forbidden proteins, inter-protein distance limits, and maximum satellite length, were incorporated to boost flexibility.

Test: Initial implementations encountered issues where multiple sets containing the same proteins were detected, leading to inconsistent outputs based solely on variations in size.

Learn: To resolve this, the class was restructured to enhance alias matching and ensure the identification of the largest unique protein cluster. Extensive unit tests were implemented to validate the logic for grouping proteins and ensure accurate family classification across various input conditions. To further verify our approach, we used nucleotide FASTA genomes from individual satellite families in our novel database to test whether SaPhARI could successfully identify satellites. Additionally, we conducted negative control tests to ensure SaPhARI did not falsely identify satellites in non-satellite genomes, further confirming its accuracy.

Performance Evaluation: The find_it() method in the Satellite Python class is designed to identify and extract specific protein regions from an annotated SaPhARI file based on parameters defined for a satellite family. The method processes the file by reading all lines and extracting protein names along with their respective positions. A nested function checks for forbidden proteins while iterating through the lines to collect potential protein regions within the specified maximum length. The algorithm keeps track of distinct proteins found and ensures uniqueness of regions by avoiding subsets of previously identified regions. Additionally, it incorporates functionality to include flanking genes around each identified region, thus providing a comprehensive output. The overall time complexity of this approach is O(n²) in the worst case, attributed to the nested loops iterating through the lines; however, practical performance may vary depending on the content of the input genome.


Design: Simplicity in deployment was a top priority, aiming to allow users to run the entire SaPhARI pipeline with minimal configuration, while still providing options for customization.

Build: A Python script was developed as the primary interface for SaPhARI. This script captures user input for essential parameters, such as database selection, e-value thresholds, and output location, and passes them to shell scripts for execution either locally or on an HPC system. The Python script automates the setup and execution of the pipeline, enabling users to customize parameters without needing to modify the underlying shell scripts directly.

Test: Early feedback from users indicated that while the pipeline functioned as intended, new users often found the setup process overwhelming, particularly those unfamiliar with Python environments or manual configuration.

Learn: To enhance usability, we introduced a configuration template that walks users through setting key parameters and choosing workflows. The Python script was also updated with pre-configured defaults for common use cases, allowing novice users to run the pipeline with minimal setup. Advanced users still have the flexibility to fine-tune parameters directly within the template as needed.


Design: Initially, the pipeline was designed to output only the identified regions containing core proteins, with the goal of providing users with a concise summary of detected satellite families.

Build: The output structure focused on providing clear results for each detected satellite region. Users could define core proteins and set thresholds for family classification. The output was generated in .txt format, summarizing the identified regions containing the specified core proteins.

Test: During testing, it became evident that focusing solely on the identified regions provided insufficient genomic context. Users found it challenging to interpret how the detected satellite regions fit into the broader genomic landscape, especially when analyzing large genomes with complex prophage structures.

Learn: To address this, we enhanced the Satellite class output by including the five flanking genes both upstream and downstream of each identified region. This provided valuable genomic context around the core protein regions, making it easier to understand the surrounding genomic structure. The refined output remains in .txt format, but now includes detailed context to offer a more complete view of each detected satellite region.


Analysis

The iterative Design-Build-Test cycle led to the evolution of SaPhARI into a powerful and flexible tool for the discovery and classification of satellite prophages. By integrating tools such as Prodigal and BLAST, and refining the pipeline through continuous feedback and testing, SaPhARI has become a robust system that balances precision and performance. Each step, from open reading frame prediction to functional annotation and satellite classification, was meticulously designed to address the unique challenges of large-scale genomic analysis.

The incorporation of a custom database, dynamic filtering, and enhanced context through flanking gene inclusion ensures that researchers are provided with biologically relevant insights in a manageable format. With its streamlined deployment and high degree of customization, SaPhARI is now well-equipped to handle complex datasets, offering valuable genomic context and improving the accuracy of satellite prophage detection.

Ultimately, the final version of the SaPhARI pipeline stands as a versatile, scalable, and user-friendly platform, empowering researchers to uncover novel satellite prophage elements and deepen their understanding of bacteriophage satellite biology in diverse bacterial hosts and metagenomics.

Applications


Sample Image

The traditional (1) Design - (2) Build - (3) Test - (4) Learn model that investigators iGEM teams and SynBio researchers typically follow involves testing engineered circuits under laboratory conditions, where bacteria are provided with an environment conducive to growth. While extremely valuable and an absolutely necessary first step, the engineering cycle should also progress to another level of testing for constructs that will eventually be used in real-world environments. This level entails testing in and understanding how these circuits behave in environments that mimic those in which the devices will be deployed. For our experiments this entailed testing in complex and less predictable environments such as the colon and soil microbiomes, which is a vital component to expanding SynBio beyond the lab setting. Due to this, we have decided to utilize two cycles with (1) Design - (2) Build - (3) Test in vitro - (4) Learn - (5) Design - (6) Build - (7) Test in situ - (8) Learn cycle for our 2024 project. Under this two-cycle approach, researchers progress beyond the engineering process at the in vitro level to the in situ  (real world mimics) level – for our experiments a model colon and soil microcosm systems once their constructs are shown to work in the lab setting.

Gene Delivery

Summary

The P4 cosmid system was originally designed by Dr. Fa-arun at the University of Edinburgh. The original purpose of this system was to induce CRISPR cas9 mediated cell death for disease-causing strains of bacteria such as E. coli O157:H7 and Shigella flexneri. Aspects of the P4 packaging mechanism were used to create transducing units with a CRISPR cassette. In order to create a versatile tool that could be used for both E. coli and Shigella, Dr. Fa-arun also created chimeric tail fibers to expand the host range of these transducing units. The CRISPR cas9 used in the original study targeted shiga toxins in the bacterial genomes, causing a double stranded break in the genome and thus killing the bacteria. 

Design: In order to make the P2 lysogen package the P4 cosmid with a wide selection of tail fibers, the genes encoding for the tail fibers were removed from the genome. The P2 wild type tail fibers were compatible with the indicator strain E. coli EMG2-K12. Therefore, the plasmid with the P2 tail fibers is necessary for the packaging of the P4 cosmid. Also, the system was designed in such a way that the P2 lysogen would only package and lyse the P4 cosmid in the presence of a sugar called L-rhamnose for higher levels of system control.  The cas9 on this cosmid lacks a crRNA target, so the transducing units would have no killing effect. However, they do contain a chloramphenicol selection, so the indicator strain would convey this antibiotic selection after being infected with transducing units. 

Build: Initially, the bacteria carrying the P2 prophage was made chemically competent and then transformed with both the P4 cosmid with the non-targeting CRISPR and the P2 wild type tail fibers. Then that cell was forced to lyse with the presence of L-rhamnose and the lysate was collected.

Test: The indicator strain, E. coli EMG2-K12, was infected with the P4 transducing units and selected for chloramphenicol resistance to estimate the titer, quantifying the transducing units. 

Learn: During the initial trial, the lysate degraded within a week period, decreasing the titer 10-fold per day. However, after another replicate, the titer remained stable after a two week period at about 3.4x109, which is higher than that of a standard phage. That supports the argument that satellite phage have a higher titer than other phage and thus a higher efficiency. 


Sample Image

The P4 cosmid system’s engineerable host range was initially very intriguing to us, we reasoned that it would be much easier for future users of this system to assay the potency of transducing agents with modified host range if the cosmid contained a reporter device to allow researchers a positive visual cue for transduction. We elected to replace the Cas9 cassette in the cosmid with a reporter device, rather than simply inserting one, to conform to the size limitations of the P4 cosmid system. After an unsatisfactory first version we re-examined the promoter we used in the RFP cosmid and made changes so the sequence ahead of the promoter conformed to the consensus promoter UP-element (BBa_J428201) in the parts registry. Ultimately, we were able to successfully transduce the second version of the RFP cosmid produced this way both in vitro, and when deployed in soil microcosms. RFP Cosmid transductants show potent fluorescence visible with the naked eye after in vitro transduction, and after transduction in the biologically stressful, heterogeneous, and dynamic conditions of our soil environments.

Design: After deciding that a version of the cosmid with a reporter device would be desirable, we evaluated alternatives for which reporter we would use. We decided on the fluorescent protein mRFP1 as it has been shown to be a reliable reporter and was readily available. The P4 cosmid, as with all transducing vectors, has a limited capacity for delivery of genetic circuits and simply inserting a fluorescent protein into the cosmid would have caused it to exceed the size of the wild type P4 genome, which we were advised against during our first IHP meeting with Dr. Fa-Arun. Ultimately we decided on a design wherein we replaced the Cas9 cassette on the original cosmid with a mRFP reporter device. 

Build: To assemble this version of the cosmid we amplified fragments of the P4 cosmid and the 2022 Interlab construct, Test Device 2 (TD2), via PCR. We then joined these fragments via a gibson assembly and transformed into DH5 alpha cells.

Test: The first version of the RFP cosmid did not show a level of fluorescence visible to the naked eye when transformed into DH5 alpha. This vexed us, as colonies transformed with Test Device 2 showed visible fluorescence under the same conditions. We verified the sequence of the first version of the RFP cosmid and performed a number of small experiments to rule out possible confounding factors which could produce the drastic difference in fluorescence we observed between the two constructs.

Learn: After ruling out confounding factors, analysis of our sequence-confirmed construct showed us that our assembly excluded a small part of the consensus UP element ahead of the J23101 constitutive promoter. Though UP elements are not generally required for E. coli promoters to successfully initiate transcription, in this case exclusion of the complete UP element abolished transcription from J23101 to a level where fluorescence could not be observed with the naked eye, nor in a plate reader when compared to untransformed E. coli DH5 alpha.

Redesign: We redesigned our construct to include the consensus UP element ahead of the J23101 constitutive promoter. 

Build: To assemble this version of the cosmid, we amplified fragments of the P4 cosmid and TD2 via PCR. We then joined these fragments via a Gibson assembly and transformed it into DH5 alpha cells. The transformed DH5 alpha cells had visible fluorescence after incubation for 48 hours. To ensure that fluorescence was not due to solely the TD2 plasmid without the P4 cosmid backbone, we confirmed our results via sequencing of each fluorescent colony. The results came back with 100% alignment to our predicted engineered product. Following sequencing, we transformed both the cosmid and the chimeric P2 tail fiber plasmid, created by Dr. Fa-Arun, into P2 lysogen cells. This resulted in fluorescent colonies, visible to the naked eye. We grew up, concentrated, and resuspended these co-transformed cells in media supplemented with L-Rhamnose. Induction with L-Rhamnose starts a positive feedback loop of late gene expression in the co-transformed cells resulting in cell lysis and the production of viable transducing agents. We then filtered that lysate with 0.22nm filters to produce transducing agent filtrate suitable for use in our experiments. 

Test: We performed spot titer assays of our transducing units to measure the potency of these vectors in E. coli HL 713, the indicator strain used in our soil microcosms. This resulted in an average titer of 8.37x106, significantly lower than the 3.4x109 titer of the original P4 cosmid. This assay also served as a test of function, as visible fluorescence could be seen in the infected HL 713. We additionally tested the P4/RFP cosmid system in our soil environments. Ten samples were collected from distinct colonies from soil microcosm plating on LB with kanamycin and chloramphenicol. 

Sample Image

A1: Negative control - uninfected E. coli HL 713, A2: Positive control - E. coli HL 713 infected with RFP, A3: Negative control - water. A4: Sample 1, A5: Sample 2, A6: Sample 3, A7: Sample 4, A8: Sample 5, A9: Sample 6, A10: Sample 7, A11: Sample 8, A12: Sample 9, B1-11: Empty wells, B12: Sample 10. 

The above table is a red fluorescence protein measurement for the 10 colonies from soil samples. Sample 6 showed a significantly higher measurement than the negative control suggesting that this sample was fluorescing. 

Learn: We learned that engineering the P4 cosmid can be useful as a reporter system for the presence of bacteria with a compatible host range. We predict that the lower titer is due to the size difference between the original cosmid (11544 bp) and our engineered system (7742 bp), resulting in less efficient packaging.


An argument for using phage satellites instead of standard phages is that they tend to have a higher titer, and thus would be more efficient in real world environments. They also provide a certain level of safety, since they are unable to replicate without a helper phage. However, this could prove to be indesirable in certain applications. 

Design: The P4 cosmid system relies on the P2 prophage to package the transducing units, but the tail fiber genes have been removed from this prophage. Therefore, in order to produce transducing units that are able to infect the target strain of bacteria, the bacteria with the P2 prophage must be transformed with the tail fibers that accompany P2. Therefore, infecting the bacteria with the P2 lysogen that was transformed with the tail fiber plasmid and forcing that cell to lyse would create more transducing units. The system was designed to only lyse in the presence of a sugar called L-rhamnose, which increases levels of control when using this system since no further transducing units can be packaged if L-rhamnose is no longer supplemented to the system. Therefore, if the effects of the P4 transducing units do not reach the desired extent, the system could be further grown in vitro by adding a combination of the bacteria with the P2 prophage and the transformed tail fibers as well as L-rhamnose. 

Build: The P2 lysogenic bacteria and the P2 wild type tail fiber plasmid was designed to accompany the P4 cosmid. The bacteria with the P2 lysogen was made chemically competent in order to be transformed with the plasmid. The tail fiber plasmid created for this system has several variations: one with wild type P2 and two more chimeric tail fibers that are able to vary the host range of the transducing units. The tail fiber compatible with our indicator strain was the wild type P2. Therefore, in order to create a replicative system, we transformed competent P2 lysogens with P2 wild type tail fibers and infected that with transducing units. 

Test: The titer of the transducing units was quantified, and infected the transformed cell. Then that cell was lysed and the lysate was quantified. 

Learn: The result was that the titer increased exponentially, meaning that the replicative system is efficient even after only one round of infection. The ability with satellites to be supplemented with bacteria for repackaging could be a useful tool when employing them in real world environments. 


In order to put the P4 cosmid to the test in the environments it was designed for, we needed a suitable experimental design that would allow us to easily assay its sequence specific killing effect in our simulated colon model. The Kanamycin Resistance Targeting P4 cosmid was born out of this need. It uses a gRNA target for Neomycin Phosphotransferace to induce killing only in an E. coli variant engineered to have a genome integrated kanamycin resistance gene (HL 713). The point of using KanR as a target because we are specifically interested in killing bacteria with this gene, but rather KanR works as a useful proxy for an arbitrary toxin gene or virulence factor of clinical or scientific interest. This choice enhances the biosafety of our system as it only induces a killing effect in the specially engineered bacterial target used in our experiment and not wild type bacteria. The kanamycin resistance gene in HL 713 also doubles as a screening marker which allows us to differentiate our target bacteria from others in our simulated colon or soil environments.

We chose a target site within neomycin phosphotransferase with the help of CHOPCHOP, a web tool for identifying suitable Cas9 target sites in a given DNA sequence (Labun et al., 2019). After choosing a target site we designed a crRNA spacer insert with flanking BsaI cut sites for insertion into the P4 cosmid via a golden gate reaction. We cloned the kanamycin targeting spacer insert into the P4 cosmid and measured its killing effect and transduction efficiency on HL 713 and a standard lab strain of E. coli in vitro. These experiments showed us that the Kanamycin Targeting Cosmid was effective in causing sequence specific killing in HL 713. Additional measurement confirmed that the observed killing effect on HL 713 was indeed resulting from the Cosmid directly and not simply a product of knocking out KanR and screening on Kanamycin. These results were promising, so we pushed the P4 cosmid system even further by scaling it up and testing it in a simulated human colon. Though smaller in magnitude than our in vitro experiments, we observed a significant killing effect by the KanR targeting cosmid in our simulated colon, indicating this system indeed shows promise in its intended role of sequence specific killing of pathogenic bacteria in vivo. This experiment was also the first to use the P4 cosmid system which targets a specifically engineered bacteria. Its success in this role indicated to us that it, or a similar system, could one day be used to help remediate a bioengineered bacteria which has breached containment while leaving wild-type bacteria unharmed.

Design: Choosing a gene target and target site was not trivial. We needed a gene which we could be certain is present in one indicator strain, and not another, as choosing a target present in all or most strains of E. coli would prevent us from accurately measuring transducing unit titer, as any killing effect would artificially deflate the measured titer of a functioning cosmid. We eventually decided to use an engineered strain, E. coli HL 713, and choose its genome integrated Kanamycin Resistance gene as our target. This would decrease the likelihood of off-target killing effects when measuring transducing agent titer, and allow us to screen for our target bacteria when used in our simulated colon and soil environments.

Build: We deviated from standard protocols for cloning gRNAs into this CRISPR cassette so that we didn’t need to use a polynucleotide kinase, which we did not have on hand. Instead we designed and ordered synthetic DNA oligos, which, when annealed, could be cut into the proper insert by BsaI during a golden gate reaction. After cloning the new gRNA into the cosmid via a golden gate reaction we transformed the new cosmid into E. coli, harvested it with a miniprep, and confirmed its sequence with nanopore sequencing.

We then co-transformed the KanR targeting cosmid with a plasmid containing the wild type P2 tail fiber into the P2 lysogen created by Dr. Fa-Arun, E. coli EMG C5545 ∆cosσε ∆HG. We grew up, concentrated, and resuspended these co-transformed cells in media supplemented with L-Rhamnose. Induction with L-Rhamnose starts a positive feedback loop of late gene expression in the co-transformed cells resulting in cell lysis and the production of viable transducing agents. We then filtered that lysate with 0.22nm filters to produce transducing agent filtrate suitable for use in our experiments. We repeated this protocol as needed for each experiment.

Test: Experiments in vitro showed a significant and large killing effect, when compared with the “empty” P4 cosmid, specifically on HL 713, even in the absence of kanamycin. When evaluated in our synthetic colon, treatment with KanR Cosmid filtrate significantly lowered the population of HL 713 in the colon when compared to pretreatment measurements.

Learn: The P4 cosmid performed well in its role of inducing sequence specific killing in vitro and in our simulated colon. During our experiments we determined the cosmid did not have a 100 percent efficient killing effect, although it was very close to it, yielding a small number of HL 713 transductants which were still viable after transduction, albeit with diminished fitness. We reasoned, after consulting with Dr. Fa-Arun, the creator of the original P4 cosmid, this was due to flaws in the transducing agent production phase which naturally occurred at a low level. Alternatively, some of these colonies could have survived due to a silent mutation in the protospacer adjacent motif of their KanR target site. Either way, this result led us to wonder if the P4 cosmid had hidden potential to perform genetic manipulation beyond sequence-specific killing, and iterate on it further.


The dCas9 P4 Cosmid was the first construct born out of our curiosity to see if the P4 cosmid could be used in a wider variety of genetic manipulations, while still retaining its desirable characteristics. Through replacing its Cas9 cassette with a catalytically dead variant (dCas9), we have altered the P4s cosmid functionally from inducing sequence-specific killing of its transductants to causing targeted silencing of gene expression. 

Design: The dCas9 variant of the P4 cosmid employs a Cas9 variant with 2 amino acid substitutions which renders it catalytically dead and unable to cleave its target site (Bikard, 2013). Its targeted DNA binding activity is retained however, allowing it to silence genes by blocking transcription. We elected to use a dCas9 cassette as similar as possible to the one in the original cosmid, both to make cloning easier and because the cosmid was already at an ideal size for packing into P4 transducing agents and we didn’t want to alter it too dramatically and risk making an inferior transducing vector. 

Build: Rather than using site-directed mutagenesis we elected to use fragments of an existing dCas9 containing plasmid. We PCR amplified the essential region of the P4 cosmid and left it in two homologous arms at the beginning and end of the CRISPR cassette. We also amplified part of our new dCas9 gene, including the 2 amino acid substituted regions. We joined these fragments via Gibson Assembly and transformed them into DH5 alpha before harvesting and sequence confirming the cosmid. We then co-transformed the dCas9 cosmid into EMG C5545 ∆cosσε ∆HG with a plasmid containing the wild type P2 tail fiber and followed the same protocol used for the other cosmid variants to produce transducing agents with the modified cosmid.

Test: We reasoned that we could use the same crRNA spacer insert used to create the KanR targeting P4 cosmid to characterize the silencing ability of the dCas9 P4 cosmid. If the dCas9 cosmid was working as intended, HL 713 transduced with the P4 cosmid should exhibit reduced resistance to kanamycin, as dCas9 would inhibit transcription elongation past its binding site, this reducing expression of neomycin phosphotransferase.

We inserted the neomycin phosphotransferase crRNA into the dCas9 cosmid via a IIS reaction with BsaI, transformed into E. coli DH5-alpha, harvested the construct and confirmed its sequence with nanopore sequencing. We then followed the same procedure as with previous experiments to produce transducing agent filtrate containing the dCas9 P4 cosmid (with added KanR targeting crRNA), as well as original P4 cosmid filtrate as a negative control.

Treatment with dCas9 Cosmid (with added KanR targeting crRNA) transducing agent filtrate produced fewer kanamycin resistant cosmid transductants compared to a control group treated with original P4 cosmid transducing agent filtrate. Overall the experiment was not sufficiently powered to draw any definitive conclusions about the magnitude of the dCas9 cosmid’s silencing effect, but is intriguing enough to potentially motivate further characterization of this part.

Learn: For future characterization of this part we aim to use crRNAs which specify a target sequence in the promoter region of a gene target for silencing rather than using an in-frame target.


The rationale for the dCas9-⍵ Cosmid (pronounced “dCas9 Omega”) followed much the same reasoning for the dCas9 cosmid. It functions to induce targeted upregulation via promoting assembly of RNA polymerase near its target site upstream of a promoter region. It accomplishes this by coding for a dCas9 variant which is fused to the Omega subunit of RNA polymerase on its C terminus (Bikard, 2013).

Design: We elected to use a C terminal RNAP omega fusion rather than an N terminal fusion as it was more expedient to create via gibson assembly. We also took care to try and minimize the size difference between the dCas9-w cosmid and original dCas9 cosmid so as to not interfere with P4’s packaging mechanism, potentially risking sabotaging the titer of transducing agents produced with this cosmid. In the end, the dCas9-w cosmid ended up being slightly closer to the wild type P4 genome than the original cosmid.

Build: In order to create the Cas9-RNAP omega cosmid we PCR amplified the backbone of the P4 cosmid as well as a fragment from an RNAP omega fused dCas9 variant from pWJ66 (AddGene Plasmid #46570). We assembled each of the fragments together via a gibson assembly.

Test: Following the same procedure as we did with the dCas9 cosmid, we transformed the assembly into E. coli DH5 alpha, harvested the assembled cosmid, and confirmed its sequence via nanopore sequencing.

Learn: Though we used a C terminal fused RNAP omega subunit for the dCas9-omega cosmid an N terminal fused dCas9 may also be useful for upregulating certain targets if they do not respond well to the C terminal fused variant. The success of this assembly also inspired us to consider other, smaller Cas proteins as potentially interesting payloads for future uses of the P4 cosmid, as the dCas9-omega cosmid is nearing what we believe to be the maximum size for a P4 cosmid variant.

Overcoming Prophage Immunity



“Phagelets,” phage satellites in a strain of bacteria called Mycobacterium aichiense, were discovered years ago at William & Mary. M. aichiense is a strain of bacteria that is known to carry strong immunity to other phages, and these phage satellites were the only phage that could ever infect this strain to our knowledge. Due to their fragile nature, they degraded and were thought to be lost for six years. Our team “resurrected” these “phagelets” and grew them out to a high titer, and then employed them in the soil microcosm to fully test their effects. 

Design: In order to search for at least one viable mycobacteria satellite in degraded lysate, large quantities need to be screened. We used a procedure similar to screening for satellites in the soil, in order to find lysate that is not degraded. 

Build/Test: We plated several milliliters of 20 different degraded lysates at each time.

Learn: Using our protocols, our team managed to grow out four “phagelets,” two of which were not sequenced or characterized before. What we found was one was in fact not a phage satellite and was a standard phage that shared large similarities with the prophage in M. aichiense called HerbertWM. While the satellites worked to excise HerbertWM, there was no evidence of HerbertWM when the phage lysed the cell.

Originally, these phage satellites showed one or two plaques after the initial plating of large quantities of old lysate, but were grown out to a titer of 107 to 108. The non-satellite phage reached a titer of 109

Some of the soil microcosms were inoculated with these “phagelets” while others contained the phage that was previously thought to be a satellite for comparison. In these microcosms, we tested for the mycobacteria satellite phages’ and the phage’s killing effect on M. aichiense, and continued to test the presence of phage as well.


“Phagelets” have historically been extremely fragile and degraded quickly. After achieving the desired titer, we began plating them less frequently. After taking several days between collecting the lysate and plating, they stopped growing. 

Design: The procedure we have developed for reviving phage satellites was used to continue to grow them at a slower rate due to an ideal titer being achieved. However, they stopped forming plaques. 

Build/Test: After plating as described above and the plates forming perfect lawns and no plaques in the incubator at 37℃, they were left to sit untouched at room temperature for 2 days on the lab bench. 

Learn: After 2 days being left at room temperature, webbed lysis plaques formed on each plate except for the negative control. The reason for this could be temperature sensitivity, although previously they formed plaques in the incubator at 37ºC. The negative control with no “phagelets” added to the M. aichiense did not form any plaques. What we learned is that these phage satellites may still be viable but not under any conditions and assuming degradation is not accurate. 

Host Range Expansion



We have engineered chimeric tail proteins that we believe may expand the host range of HerbertWM and its “phagelet” to include M. smeg. To test these chimeric tail genes, we proposed two potential methods for introduction to HerbertWM. Rather than attempting to directly engineer the genome of the helper phage, we want to allow the new tail proteins to be expressed nearby during the packing of the Herbert-”phagelet” systems, allowing them to potentially pick up the new protein during the last stages of assembly of the tail.  The first component of development for this approach was a comprehensive literature review.

Gene Product 6 (GP6) has been identified among an array of minor tail proteins in the A2 phage L5 (Hatfull, 1993). Via BLAST analysis, minor tail genes with high sequence similarity and genomic location to those listed in the paper exist in virtually all A2 phage. HerbertWM’s corresponding minor tail protein coding genes show similarly high homology, with the exception of gene product 3, which corresponds to GP6 in L5. Gp3 shows remarkably low sequence similarity to that of all other A2 phages, which are also all known to infect M. smegmatis. GP6 has also been identified as potentially involved in surface binding of M. smegmatis; GP6 immobilized on gold surfaces showed specificity in capture of exclusively M. smeg cells as compared to other Mycobacterium species after incubation (Arutyunov, 2014), suggesting GP6 as a strong candidate for M. smeg-specific detection. 

The inability of HerbertWM to infect Mycobacterium smegmatis is a defining characteristic of the lysogenic phage, which inspired us to look further into this specific tail protein. Below, we describe the approach and analysis that informed our designs.

Sample Image

Design: We began our design process via sequence analysis. In the lab, we isolated a novel bacteriophage distinct from HerbertWM, with the same observed host range of exclusively M. aichiense. Sequencing analysis of gene product 3 in this phage showed a 92% identity to that of HerbertWM, further suggesting that this gene plays a role in determination of infection range. Using the standard phage analysis software DNAMaster, we identified the ribosome binding sites associated with corresponding genes Gp3 and Gp6 in HerbertWM and L5, respectively. We also identified the region of the HerbertWM Gp3 gene with the highest sequence homology to the corresponding protein in other A2 phages, which we ended up switching in when designing the chimeric tail proteins.

Build: We constructed these genes, pictured above, in the lab using standard assembly methods. First, we amplified specific regions of both the HerbertWM Gp3 and L5 Gp6 phage genes via PCR. We then added ‘overhangs’ to the ends of each fragment using PCR, creating overlaps that were then used for final gibson assembly of the complete constructs.


Design: Our first approach to testing these novel proteins involved insertion of the desired gene directly into the “phagelet” genome. This meant ligating the insert into a linearized region of the “phagelet” genome, in hopes that the protein will be expressed and available during packaging.

Build/Test/Learn: We amplified the entirety of the “phagelet” genome via PCR, then performed overlap PCR to extend the ends of the “phagelet” fragments, forming homologous arms with the tail fiber inserts. We attempted gibson assembly repeatedly using a huge variety of variations to standard protocols including incubation time and vector:insert ratio. We performed a diagnostic PCR each time to amplify any potential product, which showed no success in the assembly. 

From this, we learned that “phagelet” genomes cannot be directly modified via standard assembly methods due to a variety of unique characteristics, including unusually high GC content and repeat-heavy sequences.

Sample Image

Redesign: Our next approach involved cloning of the tail protein inserts into the pKW08 plasmid. This is a construct designed for reliable gene expression in vitro. The pre-designed pKW08 plasmid includes a promoter, terminator, and hygromycin resistance gene. Our intention is to ultimately use this plasmid alongside traditional “phagelet” infection of M. aichiense.

Build: We amplified and added restriction cut sites to the ends of the plasmid backbone and tail protein inserts. This allows for a double digest followed by ligation of cohesive sticky ends and transformation into competent E. coli for replication, which we are currently in the process of performing in the lab. To test this construct, we plan to transform the plasmid into M. aichiense and confirm its presence via hygromycin resistance. We will then use our protocols for phage plating of the transformed cells with “phagelet” lysate on hygromycin.


While the new and exciting field of machine-learning has drawn interest in novel approaches to biological problems, there have only been a few publications that explore the specific application of phage “host range” (the selection of bacteria that a given bacteriophage is able to infect), working off training sets pulled from public databases en masse. We have compiled a novel, nearly exhaustive ongoing database of confirmed interaction data and complete genomes for phages infecting both E. coli strains and mycobacteria species. We use this data to train and evaluate current machine-learning models from the literature, identifying and addressing their weaknesses and building a basis for future improvement to this approach.

Design: Rather than build a new program completely, we utilized various open-source code from the literature (Ataee, 2020; Leite, 2018), in which evaluation of models and parameters for processing phage-bacteria interaction data has been done through mass-scale optimization.

The model processes the provided phage and bacteria genomes through one-hot encoding (similar to binary) using representative vectors for individual base pairs, allowing the interaction category and genomic information to pass through a series of complex neural networks to train the model.

Build: The key to having a good model is a good training set - here we go through some of the key considerations we made when building our datasets that set our model apart from others like it.

The following are limitations set by previous models in the literature, and how we addressed them.

1) Generation of “putative non-interacting pairs” (Ataee, 2020) due to the absence of information on negative interactions between phage-bacteria pairs in public databases.

As can be seen in the database we have compiled, many papers testing the host range of novel phages often found significant numbers of positive interactions with bacteria outside of their species of origin, and even outside their genus. This means the accuracy of this ‘negative’ interaction dataset is likely questionable, and not fit to be used in training data as none of it is experimentally confirmed.

To address this, we performed a near-exhaustive manual literature review of experimentally confirmed phage-bacteria interaction data, including both positive and negative interactions; approximately 52% of the documented interactions in our dataset are negative.

2) Database consisting of interactions between phage and solely the host they were isolated from.

In real-life applications like phage therapy, hundreds of phage may end up being screened for effective killing of a treatment-resistant host (LeMieux, 2020). A dataset with the limitation of only the host bacteria of initial isolation is unlikely to capture less ‘obvious’ interactions between phage and completely different hosts with distinct serotypes.

Many of the interactions documented in our database are taken from papers characterizing novel phage, where host range evaluations spanned a wide range of bacterial genus and species. 

3) Over-representation of certain bacterial host species (Leite, 2018).

While the authors chose to oversample the less represented hosts to prevent the model from being reliant on the presence or absence of M. smeg in a pair, this resulted in 86% of the public dataset being represented by only 149 interactions from 94 bacteria. 

We compiled our training set of Mycobacteria interactions in a similar way, while compiling our E. coli interaction database with a more evenly spread range of pairs. The difference in model performance we observed indicates the importance of representative and robust training sets.

4) Choosing to build a database consisting of phage with less diversity (i.e. Pseudomonas phage) than of that seen in other families.

Phage are incredibly diverse, and failing to capture this within a training set also likely fails to capture various interactions that may lay ground for new discoveries. A more predictable family of phage will have more predictable interactions in any model.

Our E. coli interaction database captures a much more diverse span of pairs, with 160+ strains and serotypes and over 130 unique phages, including more well-characterized phages like T7 or P2 as well as novel genetic elements from recent literature.

Test/Learn: We trained the model on a random section of our dataset each time, then tested its ability to predict whether a phage-bacterium pair can interact with the remaining data. The accuracy score each time was recorded and averaged, which we found to be around 68% with p < 0.001. For a broader picture: out of the top 20 highest probability interactions chosen by the model each time, 14 were confirmed interactions.

Sample Image

While indicating some ability to pick up on patterns in available interactions and genomes, the fact that the over 1031 phage on earth still remain largely unexplored is the main reason for the current prediction accuracy; with more robust training sets that will come with future work on phage, this approach will increase in relevance and useability.

This serves as inspiration for further characterization and discovery of phage systems; physical isolation, discovery of phage-like elements, and characterization of host range are a core pillars of our project, and we show in our handbook that this is a very accessible method of contribution to the field. 

Additionally, our prediction model and database were both used in selecting helper phage gene sources for the engineering of chimeric tail proteins in our host expansion experiments.

Model Colon and Soil Microcosms



For more information about our hardware and modeling, please visit our hardware page and modeling page.


We designed and built a model colon to test the efficacy of the non-replicative kanamycin targeting CRISPR Cas9 system and the non-replicative non-targeting CRISPR system in a realistic model of the human colon environment. We designed this to approximate the fluid dynamics, temperature, and motion of a human colon. We also tested it with a probiotic to simulate a multimicrobial system. 

Design:  We designed a model colon based on previous models of the human gastrointestinal tract, particularly the SHIME (Simulator of Human Intestinal Microbial Ecosystem). Our original design consisted of a five chamber system for the stomach, small intestine, ascending colon, transverse colon, and descending colon. Each chamber is an enteral feeding bag, a flexible and cost effective alternative to the glass containers typically used in model colons. We wanted a flexible chamber to better represent the method of mixing in the human digestive system. Each chamber would be pumped to the next segment of the colon by peristaltic pumps and have ports to allow for sampling. This entire system of bags is in a heated fish tank to allow for motion of the bags while efficiently maintaining a temperature of 37ºC. We designed the model colon to allow for nitrogen purging to keep each chamber anaerobic to better represent the conditions of the human colon. 

Sample Image
Sample Image

Build:  We assembled a preliminary system of the pumps and enteral feeding bags. 

Sample Image
Sample Image
Sample Image
Sample Image

Test: We tested and calibrated the pumps to the appropriate flow rates.

Build: We finished building the model colon. This includes coating the enteral feeding bags in mucin agar to increase the adhesion and colonization to the chambers of the colon model. This system was heated in the fish tank. 

Sample Image
Sample Image
Sample Image
Sample Image

Design: In order to test more systems in the model colon, we re-designed the model colon to have two three chamber models of the ascending colon, transverse colon, and descending colon as well as a “mini colon” which is a single chamber model. 

Build: We built the additional colon chambers and added three additional pumps and one additional enteral feeding bag to accommodate the changes to the design. 

Test: We inoculated the colon with the indicator strain HL 713 for the E. coli systems. We took measurements daily to monitor the growth and persistence of the indicator strain. 


Sample Image

Learn: This initial inoculation was unsuccessful, yielding no indicator strain surviving in the model colon. We concluded that although E. coli HL 713 can in theory survive anaerobically, it was not able to colonize or persist in the model colon with regular nitrogen purges. 

Design/Build: We stopped regular nitrogen purges and allowed small amounts of oxygen into the model colon. This should allow E. coli to consume the oxygen and the colon will result in a semi-anaerobic environment. 

Test: We re-inoculated the model colon with HL 713 allowing small amounts of oxygen. We continued daily measurements to determine the growth and persistence of the indicator strain. The E. coli indicator strain successfully colonized and persisted in the model colon. However, the mucin agar stopped adhering to the walls of the enteral feeding bags resulting in clogging of the tubing. 

Learn: The small amounts of oxygen are necessary for the model colon to be colonized. Mucin did not perform as expected and will not be used in future experiments with the model colon. 

Design/Build: We restarted the model colon without mucin agar and re-inoculated with E. coli HL 713 still including a small amount of oxygen to promote colonization and persistence. 

Test: We took regular measurements of the population of E. coli HL 713 to determine colonization and persistence. The model colon was later inoculated with transducing units for the kanamycin targeting Cas9 CRISPR system and empty CRISPR system to test the efficacy of these systems in the colon environment. 

Learn: The E. coli HL 713 colonized successfully without the inclusion of mucin agar. The kanamycin targeting Cas9 CRISPR system and empty CRISPR system showed an effect. The kanamycin targeting Cas9 CRISPR system exhibited a significant killing effect. The empty CRISPR system conveyed chloramphenicol resistance. However, this was only measurable in the ascending colon chamber suggesting that the non-replicative empty CRISPR system did not have enough of an effect size to disperse throughout the entire colon. 

Test: We re-inoculated the model colon with E. coli HL 713 and the Seed Health DS-01 Daily Probiotic to test our phage systems in a multimicrobial environment. 


The soil microcosms were designed to demonstrate horizontal movement of the bacteria and satellite phages, allow for watering, and included plants to increase the survival of E. coli and make the soil microcosms more representative of potential agricultural applications. 

Design: The soil microcosms are designed according to three key features of our experimental design: measurement of horizontal movement of bacteria and satellite phages, rainfall simulation, and inclusion of plants. Therefore we built our soil microcosms to be roughly 12” x 18” x 6” with layers of soil, mesh, and gravel to allow appropriate drainage when watering. We also planted bush bean and lettuce seeds, both fast growing plants, to provide a sufficient root system which is important for the growth of E. coli in the soil. This also is a better representation of potential agricultural applications. The initial design of the rainfall apparatus used the same aluminum drip pans which the plastic soil containers were placed in. 

Build: We assembled the microcosms according to our design. The rainfall apparatus was made by poking 1/16” holes in every indent on the bottom of the drip pans. A stand for the watering pan was made out of scrap wood to rest on the containers’ edges.


Sample Image
Sample Image

Test: The drip pans were ineffective at evenly spacing water droplets across the microcosms, as the water would weigh down the middle of the pan and most of the ‘rain’ would fall into the center of the microcosms. The stand also did not provide enough height above the microcosms, which was important because we planned on planting bean and lettuce seeds in each microcosm, which grow fast and tall. The size of the holes also was also too small which caused water to pool in the pans without dropping out.

Learn: The initial rainfall design had several issues which informed the next design iteration. The pans were not strong enough to support the weight of the water, the small holes didn’t allow water to drop out, and the rainmaker needed to be higher above the microcosms. 

Design: The next rainmaker was designed with stronger materials, and included a stand which would rest on the same ground as the microcosm rather than the microcosm container. 

Build: A new rainmaker was built from a sheet of ¼” thick plexiglass. The sheet was cut to create a small container, with a waterproof glue sealing the pieces together. The holes of 5/64” diameter were drilled into the bottom, and a stand was made from spare gantry system pieces to sufficiently elevate the rainmaker above the microcosm. 

Test: The new rainmaker was much sturdier and stable; the larger holes allowed water to drip smoothly out of the apparatus to mimic rainfall. We inoculated microcosms 1-8 with HL 713, the indicator strain for our P2 P4 cosmid systems and inoculated microcosms 9-16 with M. aichiense. Microcosms 1-4 were inoculated with the non-targeting CRISPR system, 5-8 were inoculated with the RFP system, 9-12 were inoculated with a mycobacteria phage, and 13-16 were inoculated with the novel mycobacteria satellite phage.


Sample Image


We wrote several PDE models of each of our systems for the model colon and soil microcosms to predict results and inform experimental design. We used preliminary results to inform experimental design. We adjusted the model to preliminary experimental data to improve accuracy. 

Design: We have written several partial differential equations (PDEs) which model the growth, fluid flow, and the infection dynamics of satellites. For both the soil microcosms and model colon experiments, we wrote a PDE model for each of our systems both replicative and non-replicative kanamycin targeting CRISPR system, non-targeting CRISPR system, and RFP. 

Build: These are based on previous modeling of satellite systems, fluid dynamics, and population dynamics. This includes reaction-diffusion dynamics:


Sample Image

This models the flow rate of water and diffusion. These models use Monod growth kinetics which limits the growth by a growth limiting substrate, in this case carbon. 


Sample Image

Test: An important early result that we found from a preliminary model was that given the titer and efficiency of the transducing units, the non-replicative systems would not have as significant of an effect as the replicative system. Therefore, we tested both the non-replicative and replicative systems to confirm this result and show a more significant effect. 


Sample Image

Figure. Early result of the PDE model for the non-replicative kanamycin targeting CRISPR system in the model colon. 

Learn: Results from the preliminary sun of the soil models were compared to preliminary results from the soil microcosm and informed new parameters to improve the accuracy of the model. 


References