Software

Loading...
blackbird header

The immune system of L. Fusiformis, as well as those of various cyanobacteria, are often not well understood and strain-specific. This poses a problem when integrating gene inserts, as interference from native systems are hard to predict. A common immune response found in L. Fusiformis and UTEX 3154 is the Restriction Modification System(RMS), a system that targets specific sequences of DNA for cutting. It is likely that the RMS plays a role in the degradation of genetic inserts, inhibiting transformation. To deal with this,

We needed a way to bypass the RMS

Our PI, David Bernick, wrote the Stealth algorithm,[1] a program that identifyies underrepresented K-mer sequences in a given genome- we used the Stealth algorithm as a starting point to help solve the problem.

home1

As we cannot disable the immune systems of LiFT's target and model organisms, we had to work around the RMS of L. fusiformis and UTEX3154. Based on the hypothesis that RMS cut sites become unrecognizable as they generate random mutations from the repeated cutting and repair mechanisms, the apparent lack of a sequence could hint at it being a cut site. To aid LiFT in integration of accelerators, we wanted to remove cut sites and additionally optimize our inserts for UTEX 3154. To this end, we designed



BLACKBIRD [2]


a Stealth-based pipeline that optimizes inserts
for cyanobacterial transformations in non-model organisms.

Search and click on the underrepresented sequences to learn more about Stealth!

GAAAG GGCC GCGAA ACGAT GAAAG CCGGC GATCC GGATC CGATT GGTAC ACTTCG AGATCG CGCG CAAAT GGCC GCGAA
GAAAG CCGGC GCGAA CGATT GATCC GGCC ACGAT CAAAT GAAAG GGTAC ACTTCG AGATCG CGCG CCGGC GGCC
GCGAA CGATT CCGGC GGCC GAAAG ACTTCG AGATCG GATCC GCGAA GGATC CGATT GGTAC CAAAT CGCG GAAAG GGCC
ACGAT CCGGC GAAAG CGATT GCGAA CGATT GCGAA GAAAG CCGGC AGAT GGATC CGATT GGTAC AGATCG ACGAT
GCGAA GAAAG CCGGC GGATC GGTAC GGCC GCGAA GAAAG ACGAT CGAT GGTAC GAAAG CCGGC ACTTCG AGATCG GCGAA
GAAAG CCGC GGATC CGATT GAAAG GGCTC GCGAA GATCC CCGGC CGATT GGTAC ACTT ACGAT GCGAA GAAAG CCGC
GGATC GGTAC ACGAT GGTAC GAAAG CGCGC AGATC GGATC GAAAG CGATT CCGGC GCGAA GAATAG GGCC GCGAA GATCC ACGAT
CCGGC CCGGC GCGAA GAAAG GGTGAC ACTTCG CGATT GGCC GAAAG GAAAG GGCC GCGAA GATCC ACGAT CCGGC
GGCC GAAAG GCGAA GATCC ACGAT CCGGC GGTAC AGATCG CGATT GAAAG GGCC GCGAA GATCC ACGAT CCGGC

The BLACKBIRD Software

To accelerate the transformation and integration of Cyanobacteria, LiFT used multiple methods to maximize integration efficiency. BLACKBIRD contributes the use of the Stealth program to optimize our gene inserts to transform as efficiently as possible. BLACKBIRD automates the process of altering gene inserts with regard to RMS cut sites, beginning with the processing of the host and target genomes.

BLACKBIRD uses the genomes of the host organism, the organism from where the gene insert is derived, and the target organism to optimize the gene insert. It calculates the codon usage table for the genomes. This is used to generate a ranking table between the host and target organism, so that when codon optimizing the sequence, codons are used in accordance with their relative abundance. This is done to preserve slow translating regions of proteins, shown to increase the folding efficiency of proteins as the time taken for the rarer tRNA to bind promotes proper protein folding [3][4]. The codon usage table is created through the use of a Open Reading Frame finder coded into the program, and generates the usage statistics based off of the frames found. When these are completed, the codon tables of both organisms are compared, matching codons to amino acids based on usage rankings between organisms, and the gene insert can be altered.

The process of adapting the gene insert is by far the most time-intensive process, specifically due to the chance of generating new RMS cut sites with each edit. Our implemented solution created an 'editing window' that would take into consideration the following and trailing sequence to identify if any changes generated new cut sites. Using the targetted organism’s genome, BLACKBIRD runs the Stealth program and returns the underrepresented theoretical RMS cut sites that are to be checked against the insert. An editing window is created at every RMS cut site found in the gene insert, and the adapting process begins. The window is based on the first codon containing the RMS cut site, and extends 2 codons before and 2 after. This was chosen based on the K-mers being 4 nucleotides long at a minimum, and a 5 codon window being best suited to catching any changes. With each change, BLACKBIRD rechecks the window for any newly generated cut sites until none are found. In the event that the RMS cut site cannot be removed from the current codon, the window shifts forward to the next codon and attempts changes

README

For Unix/macOS:

First, check if your system can run Python and the pip installer. Python packages that are not downloaded to your system need to be retrieved by an installer like pip. Use the following prompts to check:

usr:~$ python --version Python 3.x.x #OR usr:~$ python3 --version Python 3.x.x #OR usr:~$ python -m pip --version pip X.Y.Z 
                from /<path>/<to>/<your>/pip (python 3.x.x)

If you receive the following error, proceed to install Python or pip on your system:

usr:~$ python3 --version Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'python3' is not defined

Installing Python / pip

If you do not have Python, get started by installing Python 3.10 or above from python.org, or through a distribution such as Anaconda. If you do not have a working pip installer, follow steps found here.

Installing BlackbirdCoOp

Installing BlackbirdCoOp can be done by running the following command in your terminal with a valid pip installer to install the blackbirdCoOp package from the Python Package Index (PyPI):

usr:~$ pip install blackbirdCoOp #OR usr:~$ python -m pip install blackbirdCoOp

For Windows:

Windows requires a 'Path' environment in order to run the given CLI commands. First, confirm if you have the compatible Python and pip environments on your system:

usr:~$ py --version Python 3.x.x #OR usr:~$ py -m pip --version pip X.Y.Z from /<path>/<to>/<your>/pip (python 3.x.x)

Download Python and pip with the links above and follow the instruction for a Windows OS based install.

Installing BlackbirdCoOp on Windows

Installing BlackbirdCoOp can be done by running the following command in your terminal with a valid pip installer to install the blackbirdCoOp package from the Python Package Index (PyPI):

usr:~$ python -m pip install blackbirdCoOp

If you are unable to run the accompanying CLI commands, you need to set up a 'Path' environment with both Python and the 'Scripts' folder. Find the paths to both folders. Use the following command to find the location of BLACKBIRD:

usr:~$ pip show blackbirdCoOp

Follow these steps to add to path via the GUI:

  1. 'Windows + X' -> 'System Properties' -> 'Advanced system settings' -> Environment variables -> System Variables
  2. In this section, find the 'Path' and click 'Edit'.
  3. Now, click 'New' and add the following paths:
C:\Users\YourUsername\AppData\Local\Programs\Python\PythonXX\ C:\Users\YourUsername\AppData\Local\Programs\Python\PythonXX\Scripts\

BLACKBIRD CLI

Once installed, the main function can be easily run with the command blackbird

blackbird --insert (-n) <insert infile> --stealth (-s) <stealth infile> --hostT (-ht) <host genome infile> --target (-t) <target genome infile> --outfile -o [outfile | default: stdout]

The blackbird command takes 4 required arguments:

  • --insert (-n): the insert sequence of interest in Fasta format (.fa/.fasta)
  • --stealth (-s): the list of Stealth outputted kmers in a text file (.txt/.stealth)
  • --hostT (-ht): the host organism's codon usage table in TSV format (.tsv)
  • --target (-t): the target organism's complete genome in Fasta format (.fa/.fasta)

Input File Examples

Insert Sequence (Fasta format):

>pET28:EGFP CDS
              ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA

Stealth Input File:

N = 3081514
              CGCG	[100]	RC Palindrome
              GCGC	[98] RC Palindrome
              GGCC	[100]	RC Palindrome
              AATAG	[92]	
              AATCG	[100]
              ...
              GAAGAC
              GTCTTC
              GGTCTC
              GAGACC

Codon Usage Table (TSV format):

TTT	22.31	-2414783
              TTC	16.54	-1789835
              TTA	13.76	-1489606
              TTG	13.65	-1477363
              ...

Output File Format (Fasta):

>pET28:EGFP CDS output [8]
              ATGTCAATATATCAA...

The number in brackets refers to the current number of stealth hits of the outputted insert sequence.

The LIFT, the 2024 UCSC iGem team consents to receiving any and all contributions offered.
This software is published under the MIT license. Feel free to use any and all code provided by the project in any way and for any purpose.

Contributors

BLACKBIRD was written and contributed to by:

Special Thanks

  • David L. Bernick (email: dbernick@soe.ucsc.edu), our PI, for allowing the further application of Stealth and for all the support and contributions throughout.
  • Robin Rounthwaite (email: rrounthw@ucsc.edu) for consultance in software architecture and Git repository management.
  • TABI 2023 UCSC iGem team (github) for support regarding Git repository management and project packaging.
  • Reto Stamm (email: rstamm@ucsc.edu | github) for guidance in developing and publishing a package to the Python Package Index.

The BLACKBIRD Modules Documentation

FastAreader

A class created to open and read FASTA files. Operates by taking a file name as an argument.

def __init__(self, fname='')

Initiates by saving the name of the file given to it to be read.

def doOpen(self)

Opens the file provided, and if none provided, uses STDIN.

def readFasta(self)

Creates a generator that reads the FASTA file line by line, yielding the header and sequence of each entry.

StealthParser

Class that reads the contents of the outputs from Stealth, and converts its contents into a list. Adapts the IUPAC nomenclature into multiple sequences.

def replace_letters(input_str, replacements_dict)

Finds IUPAC codes and saves their positions on a string.

def gen_combos(input_str, positions, index=0)

Generates all possible combinations of the IUPAC codes found in the input string.

def return_seq(input_str)

Returns the sequences generated by a IUPAC nucleotide.

def textReader(filename)

Reads a text file with Stealth hits and deciphers the IUPAC conventions.

CodonChoice

A class that creates a sequence with the optimal codon choices. Is recursively called to edit a target site until it exits the site.

def __init__(self, hostIn, targetIn)

Initiates the class by saving the host and target codon usage tables.

def hostUsage(self)

Reads the host codon usage table and returns it as a dictionary.

def host(codon)

Returns the frequency of a codon in the host organism.

def targetUsage(self)

Reads the target codon usage table and returns it as a dictionary.

def target(codon)

Returns the frequency of a codon in the target organism.

def if_same(new_index)

Checks if the codon is the same in both organisms.

def replaceCodon(codon)

Replaces the codon with the optimal codon choice. Iterates through every possible choice.

def altSeqMaker(self, input_dna, stealthHits, start=None, altList=None)

Function that alters and confirms the absence of a stealth hit in a given subsequence. Operates the 5 codon window.

CodonOp

Main functions that optimize inserts based on host and target codon biases and its relevant classes.

def __init__(self, insert_In, stealth_In, host_In, target_In, outFile)

Initiates the class by saving the insert, stealth, host, and target files, and the output file.

def find_tempStart(start,pos)

Sets a temporary 'start' position for a chosen subsequence. Part of prepping the 5 codon window.

def find_tempEnd(length,end,pos)

Sets a temporary 'end' position for a chosen subsequence. Part of prepping the 5 codon window.

def what_changes(seq, hits)

Builds and returns the subsequences or windows to be optimized.

def replaced(seq, replaced)

Formats the final sequence with all the altered sequences from CodonUsage.altSeqMaker().

def find_stealth_hits(self,iSeq)

Finds the stealth hits in the insert sequence and returns a list format.

def run_all(self)

Runs the entire optimization process.

def main()

Main that runs the Command Line Interface and executes the program.

CodonUsage

Class that builds the codon usage tables used throughout the program.

def __init__(self, genome)

Initiates the class by saving the genome file.

def find_start(codon)

Finds the start codon in the sequence.

def find_end(genome, length, start_pos)

Finds the end codon in the sequence for an ORF.

def find_orf(self, genome)

Finds the Open Reading Frames in the genome.

def tally(self, all_orfs)

Counts the codons in each ORF.

def percentage(self, total_codons)

Calculates the percentage usage of each codon.

def returnDict(self)

Creates and formats the dictionary.

References

[1] V. Nandakumar, A. Mahesh BLACKBIRD GitLab iGEM UCSC 2024.

[2] S. Hu, "Altering under-represented DNA sequences elevates bacterial transformation efficiency" mBio, Oct. 31, 2023. https://doi.org/10.1128/mbio.02105-23 (accessed Sep. 24, 2024).

[3] G. Zhang, "Transient ribosomal attenuation coordinates protein synthesis and co-translational folding" Nat Struct Mol Biol, Jul. 13, 2008 https://www.nature.com/articles/nsmb.1554 (accessed Sep. 26, 2024).

[4] G. L. Rosano, "Rare codon content affects the solubility of recombinant proteins in a codon bias-adjusted Escherichia coli strain" Microb Cell Fact, Jul. 24, 2009 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2723077/ (accessed Sep. 26, 2024).