NSF CCF Project (2010-2014)

NSF CCF:AF:Small – A Unified Computational Framework to Enhance the Ab-Initio Sampling of Native-Like Protein Conformations

video?

The research involves the design and analysis of a framework to compute the spatial arrangements, also known as conformations, in which a protein chain of amino acids is biologically-active (in its native state). This is an important goal towards understanding protein function. While proteins are central to many biochemical processes, little is known about millions of protein sequences obtained from organismal genomes.

Intellectual Merit: The intellectual merit of this work lies in the development of a novel computational framework that combines probabilistic exploration with the theory of statistical mechanics to efficiently enhance the sampling of the conformational space near the native state. Low-dimensional projections guide the exploration towards low-energy and geometrically-diverse conformations. Additional intellectual merit lies in the incorporation of knowledge and observations emerging from biophysical theory and experiment, such as the use of coarse graining, relation between energy barrier height and temperature, and hierarchical organization of tertiary structure. Algorithmic components of the framework will be systematically evaluated for efficiency, accuracy, and how they enhance the sampling of the conformational space near the native state.

Broader Impact: The broader impact of this research will be the creation of a filter that efficiently computes diverse coarse-grained conformations relevant for the protein native state that can then be further refined through detailed biophysical studies. The work lies at the interface between computer science and protein biophysics and can benefit both communities. On the computational side, the work will lead to new algorithms on modeling articulated chains characterized by continuous high-dimensional search spaces and complex energy surfaces. On the biophysical side, the framework will elucidate which aspects of our understanding of proteins allow efficient and accurate modeling. The work will impact both undergraduate and graduate students. New courses are proposed by the investigator as part of efforts to introduce computational biology in the computer science curriculum at George Mason University. The work will be employed as a pedagogic device in courses and educational outreach venues to spawn and maintain interest in computer science, with a particular focus on women and minorities.

This project included three graduate students and several undergraduate and high-school students. Contributions included:

  • Executable for linux.
  • Active education of involved communities through workshops, tutorials, and software demos at widely-attended conferences and society meetings.
  • 11 peer-reviewed publications, 2 M.S. theses, and 1 Ph.D. thesis.

Protein energy surfaces are nonlinear and multimodal, which makes them suitable systems to study with evolutionary search/optimization algorithms. We are currently exploring such algorithms to effectively sample local minima in the protein energy surface. These minima are of relevance when studying thermodynamically-stable and semi-stable structural states that a native protein uses for its biological function or a variant employs for loss of function. Our focus is on equipping the basic algorithmic frameworks with domain-specific (biophysical) knowledge on proteins and then pursuing adaptations of the basic frameworks for an enhanced exploration capability. The large objective is to employ these algorithms to obtain a detailed characterization of the structure space and model the structure-function relationship in protein and protein-like systems.

Our work has investigated the basic Basin Hopping framework, more powerful hybrid population-based frameworks, implementation of various global and local moves in evolutionary search algorithms, and the incorporation of multi-objective optimization through Pareto-based metrics to attenuate the reliance on noisy energy functions and obtain a more diverse conformational ensemble. Details can be found in the related pages and publications.

On this Project:
Brian Olson
Sameh Saleh (Undergraduate Student)
Irina Hashmi
Kenneth De Jong
Amarda Shehu

This material is based upon work supported by the National Science Foundation under Grant No. 1016995 and IIS CAREER Award No. 1144106. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.