At NTU Dry Lab, we envisioned a project to alter T7 RNA Polymerase (RNAP) sequences from the Wild Type, and configure alternate forms of the protein.
We recognized early on that our goal was adventurous and bold, and far too large to accomplish in one massive program. We knew that to have any hopes of success, we had to break the project down to its simpler parts. So, we structured our approach in three key components:
This model was rigorously evaluated for its performance using HuberLoss, achieving a minimised loss of 0.1542. While this value is formidable for an alpha program, it falls short of our target of 0.1000. To enhance our assessment, we adapted our code to generate alternate RNAP sequences, enabling comparative analyses against the ancestral data.
1. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIA SGKTTWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV EEQLRLIKEHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIELA PEYAEAIATRAESLLDISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAW KINKKVLAVANVITKWKHSSFKAIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANH KAIWFPYNMDWRGRVYAVSMFNPDAPKTTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMAC AKSPLENTWWAEQDSPFCFLAFCFEYAGVQSFVKSYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGI VAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKKLLVKLAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTI QPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWARRRGKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPV WQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQIIEKSRKTVVWAHEKYGIESFALIHDSFGTIPA DAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDSEAVE 2. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIE AGKTPWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV EEQLAKLEKHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIEL APEYAEAIATRAADVLAISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTA WKINKKVLAVANVITKWKHKEGLSIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFAN HKAIWFPYNMDWRGRVYAVSMFNPSELKETKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMAC AKSPLENTWWAEQDSPFCFLAFCFEYAGVQLVKKGYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGI VAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKEVAVDLAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTI QPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWRTAAAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPV WQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQSSSGSRKTVVWAHEKYGIESFALIHDSFGTIP ADAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDSGDLY 3. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIR RLEDGWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV EEQLKKRLKHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIEL APEYAEAIATRAASLVRISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTA WKINKKVLAVANVITKWKHGEKKTIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFAN HKAIWFPYNMDWRGRVYAVSMFNPDSPATTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMAC AKSPLENTWWAEQDSPFCFLAFCFEYAGVQFGASGYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYG IVAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKELIDKLAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTI QPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWSSLRAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPV WQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQLVALARKTVVWAHEKYGIESFALIHDSFGTIP ADAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDVSEVV 4. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMID AGIVEWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNVE EQLKEAQKHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIELA PEYAEAIATRALLLRAISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAWK INKKVLAVANVITKWKHGLLKGIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHK AIWFPYNMDWRGRVYAVSMFNPRTREFTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAK SPLENTWWAEQDSPFCFLAFCFEYAGVQIPIPKYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGIVAK KVNEILQADAINGTDNEVVTVTDENTGEISEKVKEAKDALAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTIQPA IDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWRKSKAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVWQE YKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQAAALARKTVVWAHEKYGIESFALIHDSFGTIPADAA NLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDGADGL 5. MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPKMIK ELSKKWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNV EEQLRALGAHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIEL APEYAEAIATRAAAIVRISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAW KINKKVLAVANVITKWKHLSPLIIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHK AIWFPYNMDWRGRVYAVSMFNPSPELRTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAK SPLENTWWAEQDSPFCFLAFCFEYAGVQGALKAYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGIVA KKVNEILQADAINGTDNEVVTVTDENTGEISEKVKDADIILAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLEDTIQPA IDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWLSLRSKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVWQE YKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQSLSSARKTVVWAHEKYGIESFALIHDSFGTIPADAA NLFKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDDELAA
Using these alternate sequences, we could now activate the final component of our plan. The program built was designed to use the sequences generated to predict protein structure. The code was also able to conduct a docking process and perform molecular dynamics simulation (MD) to simulate a 5 ns dynamic interaction between protein and targeted DNA sequence in a 1-nm size water cube.