Modelling missing protein fragments

If one decides to venture to the Protein Structure Database at www.rcsb.org and randomly chooses a protein structure file, the odds are high that this structure file is missing residues. This project attempts to solve this problem computationally.

That a protein structure is missing residues frequently happens at the beginning (N-terminal) or the end (C-terminal), but also parts in the middle of the sequence are not spared. For instance, the Pigeon Cryptochrome 4 (6PU0) is missing residues 228-244 and 498-527, cattle rhodopsin (6OY9) is missing its last 26 amino acids, and the zebrafish RNase5 (3LJE) loses its last eight residues. Other more extreme examples include the fruit fly's eukaryotic origin recognition complex (4XGC), which is missing a total of 399 residues spread all over the structure. The residues are most likely missing due to the fact, that they are intrinsically disordered or just highly versatile within the protein structure. This movement leads to bad resolutions in experimental techniques, forbidding a localization estimate.

 

C-Terminal remodeled
The missing C-terminal on European Robin Cryptochrome 4 has been remodeled.

 

Sadly, it is exactly these versatile parts that are most promising as part of signaling pathways or conformational changes. Therefore, it is most desirable to actually simulate them. 

To start tackling this problem, a program was written called Pep McConst, which utilizes a Monte-Carlo approach to randomly place amino acid residues to either append a given structure or create a free-flying polypeptide. Pep McConst yields a multitude of physically possible structures but does not judge them, in any case, to be more realistic than another. Details on Pep McConst can be found here and have been implemented in VIKING for convenient application. In the paper, we showed some case studies showing that a truncated end of a ubiquitin protein was successfully reconstructed using Pep McConst and all-atom MD simulations. We continued to estimate a C-terminal for European Robin Cryptochrome 4, which is still an open question to find the best fitting, most realistic C-terminal for further studies.

A second program, based on the same algorithm as Pep McConst, is Pep McBridge, which is designed to fill gaps or replace a chain of residues within a protein structure. Pep McBridge is currently in testing and will be made available through VIKING as soon as possible.