At NTU Dry Lab, we envisioned a project to alter T7 RNA Polymerase (RNAP) sequences from the Wild Type, and configure alternate forms of the protein.
We recognized early on that our goal was adventurous and bold, and far too large to accomplish in one massive program. We knew that to have any hopes of success, we had to break the project down to its simpler parts. So, we structured our approach in three key components:
This model was rigorously evaluated for its performance using HuberLoss, achieving a minimised loss of 0.1542. While this value is formidable for an alpha program, it falls short of our target of 0.1000. To enhance our assessment, we adapted our code to generate alternate RNAP sequences, enabling comparative analyses against the ancestral data.
1. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIA
SGKTTWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV
EEQLRLIKEHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIELA
PEYAEAIATRAESLLDISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAW
KINKKVLAVANVITKWKHSSFKAIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANH
KAIWFPYNMDWRGRVYAVSMFNPDAPKTTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMAC
AKSPLENTWWAEQDSPFCFLAFCFEYAGVQSFVKSYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGI
VAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKKLLVKLAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTI
QPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWARRRGKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPV
WQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQIIEKSRKTVVWAHEKYGIESFALIHDSFGTIPA
DAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDSEAVE
2. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIE
AGKTPWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV
EEQLAKLEKHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIEL
APEYAEAIATRAADVLAISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTA
WKINKKVLAVANVITKWKHKEGLSIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFAN
HKAIWFPYNMDWRGRVYAVSMFNPSELKETKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMAC
AKSPLENTWWAEQDSPFCFLAFCFEYAGVQLVKKGYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGI
VAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKEVAVDLAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTI
QPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWRTAAAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPV
WQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQSSSGSRKTVVWAHEKYGIESFALIHDSFGTIP
ADAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDSGDLY
3. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIR
RLEDGWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV
EEQLKKRLKHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIEL
APEYAEAIATRAASLVRISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTA
WKINKKVLAVANVITKWKHGEKKTIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFAN
HKAIWFPYNMDWRGRVYAVSMFNPDSPATTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMAC
AKSPLENTWWAEQDSPFCFLAFCFEYAGVQFGASGYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYG
IVAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKELIDKLAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTI
QPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWSSLRAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPV
WQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQLVALARKTVVWAHEKYGIESFALIHDSFGTIP
ADAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDVSEVV
4. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMID
AGIVEWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNVE
EQLKEAQKHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIELA
PEYAEAIATRALLLRAISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAWK
INKKVLAVANVITKWKHGLLKGIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHK
AIWFPYNMDWRGRVYAVSMFNPRTREFTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAK
SPLENTWWAEQDSPFCFLAFCFEYAGVQIPIPKYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGIVAK
KVNEILQADAINGTDNEVVTVTDENTGEISEKVKEAKDALAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTIQPA
IDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWRKSKAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVWQE
YKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQAAALARKTVVWAHEKYGIESFALIHDSFGTIPADAA
NLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDGADGL
5. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIK
ELSKKWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV
EEQLRALGAHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIEL
APEYAEAIATRAAAIVRISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAW
KINKKVLAVANVITKWKHLSPLIIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHK
AIWFPYNMDWRGRVYAVSMFNPSPELRTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAK
SPLENTWWAEQDSPFCFLAFCFEYAGVQGALKAYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGIVA
KKVNEILQADAINGTDNEVVTVTDENTGEISEKVKDADIILAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTIQPA
IDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWLSLRSKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVWQE
YKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQSLSSARKTVVWAHEKYGIESFALIHDSFGTIPADAA
NLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDDELAA
Using these alternate sequences, we could now activate the final component of our plan. The program built was designed to use the sequences generated to predict protein structure. The code was also able to conduct a docking process and perform molecular dynamics simulation (MD) to simulate a 5 ns dynamic interaction between protein and targeted DNA sequence in a 1-nm size water cube.