NSF CAREER (2012-2017)

NSF:IIS:CAREER – Probabilistic Methods for Addressing Complexity and Constraints in Protein Systems.

The proposed activity involves a research environment and educational curriculum dedicated to dealing efficiently with the complexity and constraints that protein molecules pose to computational studies. The emphasis is on elucidating the motions that proteins employ for biological function. This is a fundamental issue in the understanding of proteins and biology due to the central role of proteins in cellular processes.

The research addresses fundamental issues in protein modeling. Understanding proteins in silico involves searching a vast high-dimensional conformational space of inherently flexible systems with numerous inter-related degrees of freedom, complex geometry, physical constraints, and continuous motion. Three core research directions are identified. (1) Geometric constraints underlying protein motion are not trivial to identify or address. The proposed research exploits mechanistic analogies between proteins and robot kinematic linkages and investigates inverse kinematics techniques to efficiently formulate and address complex geometric constraints arising in diverse protein studies. (2) The funnel-like protein energy landscape exposes physics-based energetic constraints that are often demanding to address in silico. The proposed research pursues a multiscale treatment of energetic constraints in the context of probabilistic search, supporting coarse- and fine-grained levels of protein representational detail and converting between them with information gathered during exploration. (3) The conformational ensemble view of the protein state relevant for function necessitates search algorithms capable of exploring the high-dimensional conformational space and its rugged energy landscape. A novel probabilistic search framework is proposed that gathers information about the space it explores and employs this information to advance towards promising unexplored regions of the space. Taken together, these research directions allow addressing complexity in proteins by formulating and exploiting geometric and energetic constraints, thus narrowing the search space of interest to regions where the constraints are satisfied, and by employing a novel probabilistic framework with enhanced sampling capability able to feasibly search the relevant regions of the space.

The proposed activity promises to advance discovery and understanding both in the computer science and protein biophysics communities. Since most problems of practical interest are high-dimensional and often exhibit complex non-linear spaces, the proposed research cuts across and spans multiple areas in computer science, such as robot motion planning, optimization in complex non-linear spaces, and modeling and simulation of complex physics-based systems. In particular, the research will reveal effective probabilistic search strategies for continuous high-dimensional search spaces. Analogies with articulated mechanisms will offer insight on how to generate valid robot configurations in the presence of constraints. On the biophysical side, the research promises to advance protein modeling and understanding across diverse applications. The proposed activity involves interdisciplinary collaborations with computer scientists, biophysicists, and chemists. Findings and data will be disseminated broadly to enhance scientific understanding across diverse communities. Specific educational objectives focusing on curriculum design and outreach activities are formulated to employ the proposed research for broadening the participation of college and pre-college students, with a particular emphasis on underrepresented groups.

This project has included four graduate students and several undergraduate and high-school students. Contributions include:

  • Executable for linux.
  • Active education of involved communities through workshops, tutorials, and software demos at widely-attended conferences and society meetings.
  • 8 peer-reviewed publications.