Investigating thermostability of gshF enzyme using computational methods
1. Calculate_B_factors
The calculate_B_factors.py script is a python script designed to compute and analyze the average B-factors for protein loops based on structural data from Protein Data Bank (PDB) files. B-factors provide insights into the atomic displacement or flexibility in protein structures. By focusing on specific loops in the protein, researchers can gain a deeper understanding of the flexibility or rigidity in those regions, which can be critical for functional and stability studies of proteins.
This tool automates the process of parsing PDB files, extracting residue-level B-factors, and calculating the average B-factor for protein loops defined in a CSV file. It then outputs the average B-factors for each loop in another CSV file, which can be used for further analysis, visualization, or integration with other structural bioinformatics tools.
Features
- PDB File Parsing: Utilizes Gillespie's algorithm to simulate random reaction times and events.
- Residue-Level Analysis: Provides a deterministic solution for comparison using "scipy.integrate.odeint".
- Loop Definition from CSV: Plots the abundance of each chemical species over time, comparing the stochastic simulation and the ODE solution.
- Loop B-Factor Calculation: Handles complex multi-step reactions between chemical species.
- CSV Output
: Saves the computed loop B-factors into a well-structured CSV file, suitable for further analysis, statistical evaluation, or graphical representation.
Dependencies
To run this code, you will need:
- Python 3.x
- The pdb file to the same folder with the script.
How to Run
- Clone the repository or download the code.
- Run the python script using an IDE.
2. Calculate Loop Depth
The calculate_loopdepth.py script is a python script designed to compute the estimated depth of protein loops relative to the overall center of mass of the protein, based on structural data from Protein Data Bank (PDB) files. The depth of loops in a protein can offer insight into their potential roles in protein dynamics, function, and interactions with other molecules. By identifying the relative position of these loops with respect to the protein's center of mass, this script provides valuable information for structural biologists and bioinformaticians.
This tool automates the process of parsing PDB files to extract atomic coordinates, computing the protein's center of mass, and calculating the depth of specified loops. The results are output into a CSV file for easy interpretation and further analysis.
Features
- PDB File Parsing: Extracts atomic coordinates from standard PDB files, handling both ATOM and HETATM records.
- Center of Mass Calculation: Computes the center of mass of the protein using the atomic coordinates.
- Loop Definition from CSV: Reads loop information (start and end residues) from a CSV file, allowing users to define which loops to analyze.
- Loop Depth Calculation: Estimates the depth of each loop by calculating the Euclidean distance between the loop's center and the overall center of mass.
- CSV Output: Saves the computed loop depths into a CSV file for further analysis, plotting, or integration with other tools.
Dependencies
To run this code, you will need:
- Python 3.x
- numpy library for numerical operations
How to Run
- Clone the repository or download the code.
- Run the python script using an IDE.
3. Identify Loops
The identify_loops.py script utilizes the output of the STRIDE program, which provides information about the secondary structure assignments for gshF enzyme.
Features
- Reading the STRIDE File: The script reads a specified STRIDE output file that contains lines of text representing different secondary structure assignments. Each line includes details such as the type of secondary structure, the residue range, and the associated chain.
- Storing Secondary Structure Ranges: It extracts and stores the start and end residues of each secondary structure element for each protein chain in a dictionary. This enables easy access and manipulation of the structure data.
- Identifying Loop Regions: The script identifies loop regions by analyzing gaps between the end of one secondary structure element and the start of the next. For each chain, it checks the sorted ranges of secondary structures and determines if there are any gaps, which are defined as potential loop regions.
- Handling Open-ended Loops: In addition to identifying closed loops (those with defined start and end residues), the script also accounts for open-ended loops, which may extend to the end of the protein chain.
- Output: Finally, the script prints the identified loop regions to the console, providing a clear overview of where these loops occur within the protein structure.
Dependencies
To run this code, you will need:
- Python 3.x
- The STRIDE file to the same folder with the script.
How to Run
- Clone the repository or download the code.
- Run the python script using an IDE.
4. Predict DDG
The predict_DDG.py computes the ΔΔG (change in Gibbs free energy) for all possible single-point mutations in a given protein structure. The script utilizes PyRosetta, a Python-based interface for the Rosetta molecular modeling suite, to perform energy calculations before and after introducing mutations. The results are saved in an Excel file, providing a convenient way to analyze the effects of mutations on protein stability.
Unfortunately due to limitation of access to the PyRosseta, we could not run the script predict_DDG.py to generate real data.
Features
- Automated ΔΔG Calculation: The script first calculates the energy of the wild-type protein using PyRosetta's full-atom scoring function.
- Mutation Introduction: The script introduces a mutation to each of the 20 standard amino acids at every residue position, except for the wild-type amino acid.
- Mutant Energy Calculation: After mutating the residue, the script calculates the energy of the mutant protein structure.
- ΔΔG Calculation: The difference in energy between the mutant and the wild-type structure is computed (ΔΔG = Mutant Energy - Wild-Type Energy).
- Results Storage: The ΔΔG values, along with the mutation details, are stored in a Pandas DataFrame and then saved into an Excel file for further analysis.
Dependencies
To run this code, you will need:
- Python 3.x
- pandaslibrary
- openpyxllibrary
- PyRosseta
Install the required dependencies using pip:
pip install pandas openpyxl pyrosetta
How to Run
- Clone the repository or download the code.
- Run the python script using an IDE.