Projects Before 2009 – SHEHU – Computational Biology Laboratory

Computing Constrained Motions of a Protein Fragment

Consider a fragment of a protein chain from amino acid a to b, for example, a loop. Modeling fluctuations of the fragment under physiological conditions requires finding conformations where the fragment termini amino acids remain connected to the rest of the protein. This problem, often referred to as loop modeling or loop closure for fragment loops, poses the need to compute geometrically-constrained conformations of a fragment of the protein chain that result in low-energy conformations of the entire protein chain. This is illustrated below for the 12-aa loop of CI2.

Left: Loop fragment of 12 amino acids (in grey) is constrained to connect to the rest of the CI2 protein structure (in blue). Center: The main steps of FEM. Right: The FEM method allows computing low-energy geometrically-constrained conformations of the loop fragment in CI2.

The Fragment Ensemble Method (FEM) was developed to obtain an ensemble of physical conformations for a protein fragment. FEM first strips a fragment off its side chains to model the fragment backbone as an open kinematic chain. This analogy is exploited to sample backbone conformations similarly to sampling configurations of a kinematic chain. An optimization-based inverse kinematics method (Cyclic Coordinate Descent) is applied to each sampled conformation in order to obtain closure conformation; that is, fragment conformations that satisfy the termini constraints.

Low-energy side-chain configurations are then placed on each backbone conformation. Energetic refinement is conducted to reduce unfavorable inter-atomic interactions. The refinement focuses mostly on the fragment while allowing small fluctuations in the rest of the protein structure. Following a statistical mechanics framework, each resulting conformation is weighted by its Boltzmann probability, which provides a quantitative measure of feasibility under physiological conditions.

The FEM method has been applied to characterize the flexibility of mobile protein fragments such as loops. In addition to its ability to reproduce the preference of strongly stable loop fragments for one average native conformation, the method can also model missing loops. The method has been applied to rigid and flexible loops of diverse lengths, from 12 to 31 amino acids, that reside on the surface or interior of protein structures

Left: Disorder scores measured as Boltzmann averages over generated VlsE loop ensemble agree well with scores predicted from the loop sequence. Right: B factors measured as Boltzmann averages over generated ensemble for CI2 loop agree well with B factors available from X-ray experiments.

This work appears in: 1) Amarda Shehu, Cecilia Clementi, and Lydia E. Kavraki “Sampling Conformation Space to Model Equilibrium Fluctuations in Proteins” Algorithmica, 2007, 48(4):303-327; and 2) Amarda Shehu, Cecilia Clementi, and Lydia E. Kavraki “Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations” Proteins: Structure, Function, and Bioinformatics 2006, 65(1):164-179.

On this Project:
Amarda Shehu
Cecilia Clementi
Lydia Kavraki
This project is completed.

Structure-guided Search for Native-like Protein Conformations

The high-dimensionality of the space associated with conformations of a protein chain poses a direct challenge on a search method; many parameters determine the positions of atoms in a protein conformation. Moreover, energetic interactions give rise to a multitude of local minima in the energy surface associated with the protein conformational space. While native conformations are associated with the lowest-energy basin(s) of the protein energy surface, how can a search algorithm efficiently locate these basins?

The Protein Ensemble Method (PEM) was developed to compute low-energy conformations around an experimentally-available (average) protein structure. The available structure can be employed as reference to denote the location of the global energy minimum and focus the search. Essentially, PEM divides the problem of computing structural fluctuations around the reference structure into independent (parallelizable) subproblems of computing fluctuations of consecutive overlapping fragments of the protein chain. Probabilistic exploration then samples conformations using analogies with geometrically-constrained kinematic chains.

The protein chain is divided into consecutive fragments of significant overlap. This is illustrated below on the 123-aa chain of alpha-Lac. Sliding a fixed-length window of length 30 amino acids over the alpha-Lac chain defines 19 fragments where neighboring fragments overlap in 25 amino acids with one another. For each fragment, the FEM method is applied to obtain an ensemble of low-energy fragment conformations, which are pictorially illustrated by the ensembles inside each window. The final step of PEM combines fluctuations measured over the conformational ensembles of neighboring fragments to obtain a statistical picture of equilibrium fluctuations of the entire protein chain.

Left: An overview of PEM. Right: The end-result is illustrated through root-mean-squared-deviations (RMSD) measured over each amino acid of the chain. RMSD measurements obtained over different fragment ensembles are color-coded. Measurements of overlapping fragments are combined in a statistical mechanics framework to characterize the flexibility of the entire alpha-Lac chain.

PEM exploits locality, the fact that in proteins with non-concerted motions global information can be obtained by combining local information. This local-to-global strategy, known as a first-order approximation in biophysics, while limiting the domain of applications to proteins with non-concerted motions under native conditions, allows obtaining atomic fluctuations in silico. Applications of PEM to proteins of diverse lengths and native folds reproduce wet-lab data of broad (nanosecond-microseconds) time scales, as illustrated here for proteins like ubiquitin, protein G, and PAB.

Obtained conformations for ubiquitin (in transparent) are superimposed over the native structure (in opaque). Amide and methyl order parameters (middle) and residual dipolar couplings (right) measured over PEM-obtained conformations agree very well with respective NMR data. This is significant, considering that nanoseconds-long MD simulations in explicit water reproduce NMR data poorly.

This work appears in: 1) Amarda Shehu, Cecilia Clementi, and Lydia E. Kavraki “Sampling Conformation Space to Model Equilibrium Fluctuations in Proteins” Algorithmica, 2007, 48(4):303-327; 2) Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi “On the Characterization of Protein Native State Ensembles” Biophysical Journal, 2007, 92(5):1503-1511; and 3) Amarda Shehu, Cecilia Clementi, and Lydia E. Kavraki “Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations” Proteins: Structure, Function, and Bioinformatics 2006, 65(1):164-179.

On this Project:
Amarda Shehu
Cecilia Clementi
Lydia Kavraki
This project is completed.

From Sequence and Cyclization to Native Conformational Ensembles of Cyclic Cysteine-rich Peptides

The problem of computing native-like conformations when no average structure is available or descriptive of the protein native state is more challenging. Addressing this question, however, is becoming increasingly necessary due to the rampantly growing amount of genomic sequences and the lagging structural classification of sequence data. Extracting in silico the ensemble of functionally-relevant structures from protein sequences is a central challenge to computational molecular biology.

The Native state characterization of Cyclic Peptides (NcCYP) method was developed to address this problem for short (10-31 amino acids long) protein sequences with a characteristic geometric constraint: cyclization. The method limits the amount of a priori information to (i) amino acid sequence and (ii) a geometric constraint that results from cyclization in the native state of cyclic cysteine-rich peptides. The focus on cyclic cysteine-rich peptides is duly placed: these peptides are extremely robust, stable, and exhibit a rich array of diverse therapeutic properties.

The NcCYP exploits cyclization in cyclic peptides as a geometric constraint that lowers the dimensionality of the conformational space relevant for the native state. The search for native conformations proceeds in two stages: A broad view of cyclic low-energy conformations is first obtained. Second, conformations representative of emerging energy minima are iteratively used as references to compute more conformations and enrich the explored space with more energy minima until no lower-energy minima are obtained.

The search is conducted at multiple resolutions in order to extract all-atom detail native conformations efficiently. The broad view is obtained over a coarse-grained conformational space, where only the peptide backbone is explicitly modeled. The iterative enrichment of the conformational space and exploration of emerging low-energy minima is conducted in an all-atom conformational space. In addition, since these peptides are very rich in cysteines and disulfide bonds, a novel heuristic is proposed to compute an optimal cysteine arrangement into disulfide bonds in generated conformations.

Applications of the NcCYP method to both naturally-occurring and engineered cyclic cysteine-rich peptides 20-30 amino acids long show that the method can obtain a comprehensive view of the conformational space relevant for the native state. The following figure shows a lower-dimensional embedding of the conformational space associated with low-energy conformations computed by the method from the RTD-1 sequence. The shown embedding is obtained through a non-linear dimensionality reduction technique known as SciMAP.

Left: In two-dimensional embedding color-coded with free-energy values, two local minima emerge (deep and light blue). Right: One-dimensional embedding shows a 10 RT unit separation in free energy values between the minima. The separation shows that only the deep blue minimum is significantly populated under native conditions.

Color-coding the embedding with energy values highlights conformational states associated with present minima. Two minima are obtained for RTD-1. The energy difference between the minima is significant enough to predict only one of them as being relevant under native conditions – the global minimum. The conformational ensemble associated with the local minimum is strikingly homogeneous, though conformations in NcCYP are sampled independently of one another. The conformational ensemble associated with this global minimum is strikingly similar to the NMR ensemble available for RTD-1. The method also obtains the correct disulfide-bond arrangement in the native state.

This work appears in: 1) Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi “Unfolding the Fold of Cyclic Cysteine-rich Peptides” Protein Science, 2008, 17(3):482-493.

On this Project:
Amarda Shehu
Lydia Kavraki
Cecilia Clementi
This project is completed.

A Multiscale Ab-initio Exploration to Compute Diverse Conformational Ensembles of a Protein Chain

Characterizing functionally-relevant conformations in silico is particularly challenging when employing no structural or geometric information to localize the exploration. Yet, the problem of extracting from the amino-acid sequence native conformational subensembles of proteins with potentially multiple functional states is crucial to decoding the sequence-structure-function relationship in proteins.

The Multiscale Space Exploration (MuSE) method is recently proposed to efficiently explore the vast high-dimensional conformational space of a protein chain employing only knowledge of the protein’s amino-acid sequence. MuSE is a multiscale method that proceeds in two stages. The method first obtains a broad view of the entire conformational space at a coarse-grained level of detail. In the second stage, the exploration focuses to few selected low-energy regions in the space.

In its first stage, the method searches a coarse-grained conformational space, employing structural databases to assemble low-resolution structures. The method adopts the fragment-based assembly of protein conformations, which is currently the most successful ab-initio approach in protein structure prediction. However, the proposed method focuses on computing not just one structure, but ensembles of native-like conformations that may be potentially diverse.

The fragment-based assembly is employed in the context of a simulated annealing exploration, which employs a coarse-grained force field to guide the assembly process. Most importantly, during the first stage of the exploration MuSE adds atomic detail on the fly to detect emerging energy minima possibly relevant in an all-atom view of the conformational space. This detail is stripped off to continue exploring the coarse-grained space. Atomistic refinement and further analysis of the explored conformational space is conducted in the second stage, after MuSE obtains a broad view of the coarse-grained conformational space relevant for the native state. Low-dimensional embedding highlights energy minima that are further populated by the method in all-atom detail.

Embedding of the energy surfaces explored for the calbindin (left) and calmodulin (right) sequences reveal low-energy minima relevant for the native state. The conformational ensembles associated with the minima capture well the diverse functional states of each of the proteins.

Applications of the method on different protein sequences show that the lowest-energy all-atom conformational ensembles obtained capture well the diverse functional states populated by the proteins under under native conditions. These applications suggest that MuSE can predict functional motions for further testing and refinement in wet labs. Currently, adaptations of the method are being tested in the context of enhancing and improving protein structure prediction in CASP.

This work appears in: 1) Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi “Multiscale Characterization of Protein Conformational Ensembles” Proteins: Structure, Function, and Bioinformatics, 2009,76(4):837-851.

On this Project:
Amarda Shehu
Lydia Kavraki
Cecilia Clementi
This project is completed.