IMPROViSeD Pipeline

Pseudocode for IMPROViSeD. Modelling three dimensional structures for protein complexes using crosslinking is broken down into two steps: Localization and Registration. The subunits are oriented in multiple iterations. Each set of results obtained from a subset of crosslinks is followed by localization and registration. The structures with low crosslink violations and those that do not have clashes make it to as final result.

Flip Correction Refinement

Recall from the "Engineering cycles", that although geometrically mirror images have similar inter-atomic distances. They are valid solutions to the registration problem. However, one of the instances do not occur in nature. We therefore to verify the same and discard one of the choice of orientations.

This was achieved by the flip correction algorithm. The operation was encoded as a reflection across the origin in the Ramachandran map, as can be seen in the "Engineering cycles" section. As a result we were successful in obtaining the correct orientation for the subunits in the protein complex. Speaking in a general sense, the step can be seen as an integral component for modelling protein structures based on the distance constraints obtained from experiments and the covalent geometry of the protein. Below is the visual representation of the flip correction refinement for Ribonuclease Inhibitor Complexed With Ribonuclease A(PDB ID - 1DFJ) . IMPROViSeD was run starting with just 6 out of 12 crosslinks. Note that after the flip correction, the three dimensional structure is computed even starting with just half the available crosslinks. This also demonstrates the robustness of the IMPROViSeD method.

Registered Structure without flip correction.

Registered Structure with flip correction.

Result for Ribonuclease Inhibitor Complexed With Ribonuclease A (PDB 1DFJ)

We first present the result obtained for the complex with PDB ID 1DFJ. The experimental data was obtained from link . Following the localizaton, and having performed the flip correction, as required, the final results for the complex with PDB ID 1DFJ, is shown below. The results of multiple iterations was displayed. The backbone RMSD after aligning the obtained result with that availabe in PDB is shown underneath each of the figure. The number beside represents the number of violated crosslinks out of the total number of crosslinks.

Note that we were able to achieve multiple orientations, which are distinct from the original structure in PDB, but satisfy most of the crosslinks. This not only demonstrates the robustness of the IMPROViSeD method, but also highlights its efficiently to model new interfaces.

Complex with PDB ID 1DFJ structure modelled using a subset of crosslinks, backbone RMSD with respect to the structure in PDB, and crosslink violations (denoted as number of violations/total number of crosslinks) are shown.

More new orientations that satisfy most of the crosslinks. The top left image is the original structure in PDB. The others are due to the IMPROViSeD method. All the structures are shown in similar orientation

Result for Human Neutrophil Gelatinase-associated Lipocalin (HNGAL)(PDB ID LCN2)- Hydrolase/Hydrolase Inhibitor Complex(PDB ID MMP9)

Next we present the result obtained for the complex with LCN2-MMP9. The crosslinking data was obtained artificially, as descibed in the Experiments section. Similar to the previous case, we have performed the flip correction, as required, and the final results for the complex with LCN2-MMP9, is shown below. The results of multiple iterations was displayed.

The backbone RMSD after aligning the obtained result with that availabe in PDB is shown underneath each of the figure. The number beside represents the number of violated crosslinks out of the total number of crosslinks. As before, multple results from IMPROViSeD demonstrate its efficiency to model orientations which differ from those available in the public repository (PDB). This is a significant advantage as it allows us to model new interfaces.

LCN2-MMP9 structure modelled using a subset of crosslinks, backbone RMSD with respect to the structure in PDB, and crosslink violations (denoted as number of violations/total number of crosslinks) are shown.

Comparison of orientations for LCN2-MMP9 complex with PDB. The one downloaded from PDB is shown in top left. The ones modelled by IMPROViSeD are shown in similar orientation.

Analysis of Results

IMPROViSeD was able to model the structure of the protein complex, shown here for two instances, both from the experimental and artificial data.
The time taken for execution is less than 1.5 minutes. This is a significant improvement over the existing methods, which take hours. The user certainly has the choice of running molecular dynamics simulations to refine the structure further.
The method is robust and can model multiple orientations. It is a unique advantage as it allows an end-user to model new interfaces. This has the potential to offer insights into the physiological functions of the protein complexes.