PROJECT DESCRIPTION
Computational methods to predict protein tertiary structure remain an important role in protein structure elucidation, complementing more accurate but much slower and much more expensive experimental methods. Protein threading, one of the most used methods, predicts the structure of a query protein sequence by threading it through the available structure models to identify the best fit. Because a large number of proteins fold into a much smaller number of structures, protein threading is viable and can potentially keep up with the rapid pace of protein sequences being identified. Protein structure prediction via threading is the key strategy employed by several world-wide Structural Genomics efforts.
However, existing threading programs have yet to deliver desired performance for accurate, high throughput protein structure prediction. Fold recognition is still challenging on remote homologs and structural analogs that may account for about 40% of all proteins encoded in a typical genome. Even if a fold is correctly recognized, the alignment accuracy between the query protein and the model may be as low as 60%. This unsatisfactory performance is rooted at the intractable task to thread a query through structure models to find the best fit, which is too difficult to compute with comprehensive models. Existing prediction techniques avoid the formidable complexity of computation by adopting even simpler models or suboptimal heuristic computation, often with the prediction accuracy seriously compromised.
To address the aforementioned challenging issues in threading accuracy, this research project introduces a novel graph-theoretic framework for tertiary structure modeling. Preliminary results for this investigation demonstrate small tree width (i.e., tree-decomposability) of such graph models, which permits very efficient (e.g, linear-time) threading computation. The promise of the threading efficiency will make it feasible to incorporate comprehensive profiles for spatial interactions of amino acids that would otherwise be too sophisticated to compute with other threading techniques. It will also facilitate simultaneous fold recognition and atomic structure packing, a task of a too high computational cost to accomplish previously. In addition, the proposed modeling method will allow feasible generation of all tree-decomposable structure models, enabling a "threading" prediction for proteins that are of novel structures, which account for 15-20 % of all proteins, and for which existing threading techniques have failed. We expect these new techniques endowed by the tree-decomposable graph model together to deliver highly accurate threading programs.
PEOPLE
Principle Investigator:
Participants:
PUBLICATIONS (supported by in part the grant)
RNA-Informatics Lab © 2011.