We demonstrate the results of our pipeline in two cases: one using experimental crosslinking data and the other using synthetic crosslinking data. Value of the crosslinking distance varies with the cross linker used. In the cases discussed below, the value is taken to be 30 Angstorms. In the first case, both protein structures in the complex consist of a single chain. In contrast, the second case involves a dimer and a monomer forming a complex. The data for the second case was synthetically generated, as detailed in the "LCN2-MMP9 Case" section.
We are working with the protein identified by PDB ID 1DFJ, which consists of two chains, as illustrated below. Additionally, we have experimental crosslinking data for this protein obtained from IMP.
MMP9 has 2 chains and LCN2 has 1 chain. We compare the results obtained from IMPROViSeD with the structures for the complex downloaded from PDB.
We do not have experimental crosslinks data available in this case. Hence we have generated synthetic crosslinks data using the following algorithm.
We ran our pipeline by choosing random subsets of crosslinks, since we are solving a localization problem to form the supporting framework for the two bodies. Note that this being inherently non-convex, the solution is not unique (depends on random seed). In fact, we use the same to our advantage to generate multiple structures for the supporting framework by starting with random seeds. Note that the execution time is less than a minute.
We also tried by choosing all crosslinks, but that resulted in more clashes. The reason for this is that the magnitude of the crosslinking distance is not an absolute value ( additionally, the presence of alternately organised complexes cannot be ruled out ). It is dependent on the flexibility of the sidechain and the backbone of the protein. Hence, the distance of 30 Angstroms varies. Moreover, the crosslinks denote the distance between the residues, while IMPROViSeD uses the distance between the C\(^\alpha\) atoms. We thus add a tolerance value of 5 Angstroms to the crosslink distance, while evaluating violations.
The results are obtained by running IMPROViSeD, is tested for: