Bioinformatics

Bioinformatics analysis

To reveal new antimicrobial peptides (AMPs) encoded in our thermophilic bacteria (Brevibacillus borstelensis AK1 APBN01000001.1), we aligned the genes encoding proteins with length ranging between 30-75 amino acids with all the antimicrobial peptides already available on the Antimicrobial Peptide Database (APD) https://aps.unmc.edu/.

APD contains 3940 peptides, including 3146 natural AMPs from six life kingdoms (383 bacteriocins/peptide antibiotics from bacteria, 5 from archaea, 8 from protists, 29 from fungi, 250 from plants, and 2463 from animals), 190 predicted and 314 synthetic AMPs. Genome analysis using NCBI blast revealed that the genomic DNA of our thermophilic bacteria contains some potential match with some sequence on APD. Six of our top hits were selected for further analysis.

1lcl|ORF278_APBN01000001.1:15207:15323
2lcl|ORF316_APBN01000001.1:29763:29900
3lcl|ORF330_APBN01000001.1:37251:37376
4lcl|ORF393_APBN01000001.1:28499:28422
5lcl|ORF396_APBN01000001.1:27719:27651
6lcl|ORF587_APBN01000001.1:41121:41053

Considering the fact that bacterial AMP genes comes in the form of clusters, i.e. they need to be produced with other proteins needed for their transportation, modification, synthesis, secretion and so on, we decided to utilize servers like Interpro (https://www.ebi.ac.uk/interpro/), SignalP-5.0 (https://services.healthtech.dtu.dk/services/SignalP-5.0/), uniport (https://www.uniprot.org/align) and antiSMASH to find these genes from both our predicted sequence and the genome sequence of our thermophile.

AMP1

DNA Sequence

ATGGGGCGCTTACCTGGTGTGGAGAGAATTAGTTTCTACTTCATAGGCGTAATGCAATGTCAGCTTCCACGCGTCGGGAACGGTAAACAGTCGCGATTCGTGGTTAACTTTGCCTGA

Protein sequence

MGRLPGVERISFYFIGVMQCQLPRVGNGKQSRFVVNFA

AMP2

DNA Sequence

ATGTTTCTCGCCTTGATCAAGCTTTCCGTCCCGCCGACATTCAGCGCCATGCGCTTGGAATATTCCAAAATCGCGTCAATCAGCTCCTGGAAATCGCGGCATACAAAAAGCTGCGGCTGCGGTTTTGTGATGTCATAA

Protein sequence

MFLALIKLSVPPTFSAMRLEYSKIASISSWKSRHTKSCGCGFVMS

AMP3

DNA Sequence

ATGAGAATCATATCGTTCCTTCAACGCAAATCAGATTGGAATTGCATACGTTTTGACTGGATAAGCATAAACCTCTAA

Protein sequence

MEGILSEITSTTAPGAIMPGAFLVSLCEKILSFLVGLSNIM

AMP4

DNA Sequence

ATGTTCTTAGCGTTAATAAAATTGAGTGTGCCACCAACCTTTAGTGCGATGCGTCTGGAGTACTCGAAGATTGCCTCTATATCCTCTTGGAAGTCACGTCACACTAAGTCATGTGGTTGTGGATTCGTCATGTCGTGAMRIISFLQRKSDWNCIRFDWISINL

Protein sequence

MRIISFLQRKSDWNCIRFDWISINL

AMP5

DNA Sequence

ATGGAAGGCATCCTGAGTGAGATCACGAGTACAACGGCTCCAGGCGCCATTATGCCAGGAGCATTCCTTGTGTCACTGTGCGAAAAGATCTTGTCCTTCCTGGTGGGGCTTTCAAACATTATGTAA

Protein sequence

MRFSASFVCSSQSPTDGGFDWS

AMP6

DNA Sequence

ATGAATAACACAGAACGTATTAATTGTGTTTGGATGCCGTGTAACCCAGGGATACAACGTACCTCGTAA

Protein sequence

MNNTERINCVWMPCNPGIQRTS

Note: the Nucleotide sequence provided here has been optimized for expression in E.coli.

SignalP and interpro sever showed that our predicted AMP1 genes contained signal peptides such as Sec/SPI, Tat/SPI and Sec/SPII on their N-terminal domain which are responsible for transporting proteins across the bacterial membrane. Results obtained from analyzing the sequences on uniport suggested that our predicted peptides contained genes which could be involved in essential cellular function with notable emphasis on transport and signaling.

Result obtained from antiSMASH showed the presence of the AfsR gene, a pleiotropic regulatory gene that controls the production of antimicrobial compounds.

AfsR gene: GCGTTCATTTAT

To validate these sequences experimentally, we cloned them into an expression vector to express them in Escherichia coli. Please refer to our engineering section for more details.