MAGA:Make Aptamer Generally Applied

Accurate and timely detection of disease biomarkers is crucial for effective diagnosis and treatment across various medical conditions. Traditional methods, such as SELEX, used for screening Aptamers that bind to specific biomarkers, are often slow, costly, and lack the precision needed for diverse applications. To overcome these limitations, we introduce MAGA: Make Aptamer Generally Applied—a universal machine learning-based platform designed to predict Aptamer sequences that can target a wide range of disease biomarkers with high specificity and affinity.

Methodology:

  1. Pretrained Feature Extractor:
    • We pretrained protein and nucleic acid feature extractors (encoders) using extensive datasets from publicly available sources such as the Protein Data Bank (PDB) and GenBank. These encoders are designed to capture the complex structural and sequence-based features of proteins and nucleic acids, forming the foundation of our predictive modeling.
  2. Affinity Prediction Model:
    • The second component of MAGA involves a predictive model trained to estimate the binding affinity between proteins and corresponding Aptamers. By leveraging the extracted features from the pretrained encoders, our model accurately predicts the strength and specificity of the Aptamer-protein interactions, providing crucial insights into their potential as biomarkers.
  3. Monte Carlo Search Optimization:
    • To identify the optimal Aptamer sequences for specific proteins, we employed a Monte Carlo search strategy. This approach systematically explores the sequence space, guided by the predicted affinities, to find the most suitable Aptamers. This method ensures that our predictions are not only accurate but also optimized for practical use in various diagnostic applications.

Result

The MAGA platform’s capabilities were rigorously tested using AlphaFold3, a state-of-the-art tool for predicting protein structures. By integrating our machine learning-driven Aptamer prediction model with AlphaFold3, we were able to simulate the binding interactions between the Aptamer sequences and their target proteins with remarkable precision. The results achieved through these predictions were on par with those obtained from controlled laboratory experiments, underscoring the accuracy and reliability of our system.

As a specific example, we focused on thrombin, a critical enzyme in the blood coagulation process, to evaluate the performance of our approach. Thrombin was chosen due to its well-characterized binding interactions and clinical relevance. Our predictive model not only identified Aptamer sequences that showed high binding affinity to thrombin, but the structural predictions made using AlphaFold3 were also in excellent agreement with experimental data. This alignment between predicted and observed results demonstrates the potential of the MAGA platform to deliver high-accuracy predictions that are comparable to, if not better than, traditional laboratory-based methods.

Furthermore,we employed Electrophoretic Mobility Shift Assay (EMSA) to further validate that the predicted aptamer can indeed bind to the target protein. EMSA is a classic method used to study protein-nucleic acid interactions. By observing the changes in electrophoretic mobility before and after the binding of the aptamer to the target protein, we can directly detect the formation of the complex.

The successful application of the MAGA system to thrombin not only validates our approach but also sets the stage for its expansion to other disease biomarkers. This positions MAGA as a powerful, universal tool for biomarker detection, offering a blend of computational efficiency and experimental precision that could revolutionize the field of diagnostic medicine.

Conclusion:

MAGA represents a novel approach to Aptamer-based biomarker detection, combining advanced machine learning techniques with rigorous optimization strategies. This universal system has the potential to revolutionize how we detect and diagnose a wide range of diseases, offering a scalable, cost-effective, and highly accurate alternative to traditional methods. Future developments will focus on expanding MAGA’s capabilities to cover even more biomarkers and refining its predictive accuracy.

Acknowledgments:

We would like to thank Peking University for their support, as well as our collaborators at Institute for Artificial-Intelligence. Special thanks to our mentors, Long Qian, for their guidance and expertise.

Contact Information:

Email: juntingzhou@stu.pku.edu.cn

Web: zjtpku.github.io