Journal Papers

Articles are listed in reverse chronological order. Impact factors (IFs) at the time of publication are provided. When not available, 5-year average or most recent IFs are provided. Links to publishers are provided for each article. Local copies are made available, under the warning that articles are provided under the copyright permission for noncommercial dissemination of academic work. Citation counts per paper below are slightly outdated. As of January 12, 2018, the total citation count per google scholar is 1263, h-index is 21, and i-10 index is 43.    link

Shehu’s advisees indicated by: undergraduate (u), graduate (g), and postdoctoral (p) students. Corresponding authors are indicated by (*).

J45: David Morrisg, Tatiana Maximovap, Erion Plaku, and Amarda Shehu*. Attenuating Dependence on Structural Data in Computing Protein Energy Landscapes. BMC Bioinformatics 2018, under review.

Abstract Bibliography

Publisher

Local Copy

citations:

IF: 2.448 

J44: Wanli Qiao, Nasrin Akhterg, Xiaowen Fangu, Tatiana Maximovap, Erion Plaku, and Amarda Shehu*. From Mutations to Mechanisms and Dysfunction via Computation and Mining of Protein Energy Landscapes. BMC Genomics 2018

Abstract Bibliography

Publisher

Local Copy

citations:

IF: 3.729 

J43: Nasrin Akhterg, Wanli Qiao, and Amarda Shehu*. An Energy Landscape Treatment of Decoy Selection in Template-free Protein Structure Prediction. Computation 6(2), 39, 2018 (doi: 10.3390/computation6020039 invited to special issue on “Computation in Molecular Modeling”).

Abstract
The energy landscape, which organizes microstates by energies, has shed light on many cellular processes governed by dynamic biological macromolecules leveraging their structural dynamics to regulate interactions with molecular partners. In particular, the protein energy landscape has been central to understanding the relationship between protein structure, dynamics, and function. The landscape view, however, remains underutilized in an important problem in protein modeling, decoy selection in template-free protein structure prediction. Given the amino-acid sequence of a protein, template-free methods compute thousands of structures, known as decoys, as part of an optimization process that seeks minima of an energy function. Selecting biologically-active/native structures from the computed decoys remains challenging. Research has shown that energy is an unreliable indicator of nativeness. In this paper, we advocate that, while comparison of energies is not informative for structures that already populate minima of an energy function, the landscape view exposes the overall organization of generated decoys. As we demonstrate, such organization highlights macrostates that contain native decoys. We present two different computational approaches to extracting such organization and demonstrate through the presented findings that a landscape-driven treatment is promising in furthering research on decoy selection.
Bibliography

@Article{computation6020039,
AUTHOR = {Akhter, Nasrin and Qiao, Wanli and Shehu, Amarda},
TITLE = {An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction},
JOURNAL = {Computation},
VOLUME = {6},
YEAR = {2018},
NUMBER = {2},
ARTICLE NUMBER = {39},
URL = {http://www.mdpi.com/2079-3197/6/2/39},
ISSN = {2079-3197

}

Publisher

Local Copy

citations:

IF: 1.821

J42: Daniel Veltri, Uday Kamath, and Amarda Shehu*. Deep Learning Improves Antimicrobial Peptide Recognition. Bioinformatics 2018,

Abstract
Motivation: Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates. Results: In this work, we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition. Results show that the proposed model outperforms state-of-the-art classification models on a comprehensive dataset. By utilizing the embedding weights, we also present a reduced-alphabet representation and show that reasonable AMP recognition can be maintained using nine amino acid types. Availability and implementation: Models and datasets are made freely available through the Antimicrobial Peptide Scanner vr.2 web server at www.ampscanner.com. Contact: amarda@gmu.edu (for general inquiries) or dan.veltri@gmail.com (for web server information) Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography

@article{veltri2018deep,
title={Deep learning improves antimicrobial peptide recognition},
author={Veltri, Daniel and Kamath, Uday and Shehu, Amarda},
journal={Bioinformatics},
volume={1},
pages={8},
year={2018},
publisher={Oxford University Press}
}

J41: Nasrin Akhterg and Amarda Shehu*. From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-free Protein Structure Prediction. Molecules 2018, 23(1), 216.

Abstract
Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.
Bibliography

@article{AkhterShehuMolecules2018,
title={From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction},
author={Akhter, Nasrin and Shehu, Amarda},
journal={Molecules},
volume={23},
number={1},
pages={216},
year={2018},
publisher={Multidisciplinary Digital Publishing Institute}
}

J40: Tatiana Maximovap, Zijing Zhang, Daniel B Carr, Erion Plaku, and Amarda Shehu*. Sample-based Models of Protein Energy Landscapes and Slow Structural Rearrangements. J Comput Biol (JCB) 2018.

Abstract
Proteins often undergo slow structural rearrangements that involve several angstroms and surpass the nanosecond timescale. These spatiotemporal scales challenge physics-based simulations and open the way to sample-based models of structural dynamics. This article improves an understanding of current capabilities and limitations of sample-based models of dynamics. Borrowing from widely used concepts in evolutionary computation, this article introduces two conflicting aspects of sampling capability and quantifies them via statistical (and graphical) analysis tools. This allows not only conducting a principled comparison of different sample based algorithms but also understanding which algorithmic ingredients to use as knobs via which to control sampling and, in turn, the accuracy and detail of modeled structural rearrangements. We demonstrate the latter by proposing two powerful variants of a recently published sample-based algorithm. We believe that this work will advance the adoption of sample-based models as reliable tools for modeling slow protein structural rearrangements.
Bibliography

@article{MaximovaShehuJCB2018,
title={Sample-based models of protein energy landscapes and slow structural rearrangements},
author={Maximova, Tatiana and Zhang, Zijing and Carr, Daniel B and Plaku, Erion and Shehu, Amarda},
journal={Journal of Computational Biology},
volume={25},
number={1},
pages={33–50},
year={2018},
publisher={Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA}
}

J39: Emmanuel Sapinp, Kenneth De Jong*, and Amarda Shehu*. From Optimization to Mapping: An Evolutionary Algorithm for Protein Energy Landscapes. IEEE/ACM Trans Comp Biol and Bioinf (TCBB) 2017, (doi: 10.1109/TCBB.2016.2628745).

Abstract
Stochastic search is often the only viable option to address complex optimization problems. Recently, evolutionary algorithms have been shown to handle challenging continuous optimization problems related to protein structure modeling. Building on recent work in our laboratories, we propose an evolutionary algorithm for efficiently mapping the multi-basin energy landscapes of dynamic proteins that switch between thermodynamically stable or semi-stable structural states to regulate their biological activity in the cell. The proposed algorithm balances computational resources between exploration and exploitation of the nonlinear, multimodal landscapes that characterize multi-state proteins via a novel combination of global and local search to generate a dynamically-updated, information-rich map of a protein’s energy landscape. This new mapping-oriented EA is applied to several dynamic proteins and their disease-implicated variants to illustrate its ability to map complex energy landscapes in a computationally feasible manner. We further show that, given the availability of such maps, comparison between the maps of wildtype and variants of a protein allows for the formulation of a structural and thermodynamic basis for the impact of sequence mutations on dysfunction that may prove useful in guiding further wet-laboratory investigations of dysfunction and molecular interventions.
Bibliography

@article{SapinShehuTCBB2017,
title={From Optimization to Mapping: An Evolutionary Algorithm for Protein Energy Landscapes},
author={Sapin, Emmanuel and De Jong, Kenneth A and Shehu, Amarda},
journal={IEEE/ACM transactions on computational biology and bioinformatics},
year={2016},
publisher={IEEE}
}

J38: Tatiana Maximovap, Erion Plaku*, and Amarda Shehu*. Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm. IEEE/ACM Trans Comp Biol and Bioinf (TCBB) 2016, (doi: 10.1109/TCBB.2016.2586044).

Abstract
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
Bibliography

@article{MaximovaShehuTCBB2016,
title={Structure-guided protein transition modeling with a probabilistic roadmap algorithm},
author={Maximova, Tatiana and Plaku, Erion and Shehu, Amarda},
journal={IEEE/ACM transactions on computational biology and bioinformatics},
year={2016},
publisher={IEEE} }

J37: Daniel Veltrig, Uday Kamath, and Amarda Shehu*. Improving Recognition of Antimicrobial Peptides and Target Selectivity through Machine Learning and Genetic Programming. IEEE/ACM Trans Comp Biol and Bioinf (TCBB), 14(2): 1545-5963, 2017.

Abstract
Growing bacterial resistance to antibiotics is spurring research on utilizing naturally-occurring antimicrobial peptides (AMPs) as templates for novel drug design. While experimentalists mainly focus on systematic point mutations to measure the effect on antibacterial activity, the computational community seeks to understand what determines such activity in a machine learning setting. The latter seeks to identify the biological signals or features that govern activity. In this paper, we advance research in this direction through a novel method that constructs and selects complex sequence-based features which capture information about distal patterns within a peptide. Comparative analysis with state-of-the-art methods in AMP recognition reveals our method is not only among the top performers, but it also provides transparent summarizations of antibacterial activity at the sequence level. Moreover, this paper demonstrates for the first time the capability not only to recognize that a peptide is an AMP or not but also to predict its target selectivity based on models of activity against only Gram-positive, only Gram-negative, or both types of bacteria. The work described in this paper is a step forward in computational research seeking to facilitate AMP design or modification in the wet laboratory.
Bibliography

@article{VeltriKamathShehuTCBB15,
author = {Veltri, D. AND Kamath, U. AND Shehu, A.},
journal = {IEEE/ACM Trans Comput Biol and Bioinf},
title = {Improving Recognition of Antimicrobial Peptides and Target Selectivity through Machine Learning and Genetic Programming},
year = 2017,
volume = {14},
number = {2},
pages = {300-313}
}

J36: Amarda Shehu* and Erion Plaku*. A Survey of Computational Treatments of Biomolecules by Robotics-inspired Methods Modeling Equilibrium Structure and Dynamics. J Artif Intel Res (JAIR) 57:509-572, 2016.

Abstract
More than fifty years of research in molecular biology have demonstrated that the ability of small and large molecules to interact with one another and propagate the cellular processes in the living cell lies in the ability of these molecules to assume and switch between specific structures under physiological conditions. Elucidating biomolecular structure and dynamics at equilibrium is therefore fundamental to furthering our understanding of biological function, molecular mechanisms in the cell, our own biology, disease, and disease treatments. By now, there is a wealth of methods designed to elucidate biomolecular structure and dynamics contributed from diverse scientific communities. In this survey, we focus on recent methods contributed from the Robotics community that promise to address outstanding challenges regarding the disparate length and time scales that characterize dynamic molecular processes in the cell. In particular, we survey robotics-inspired methods designed to obtain efficient representations of structure spaces of molecules in isolation or in assemblies for the purpose of characterizing equilibrium structure and dynamics. While an exhaustive review is an impossible endeavor, this survey balances the description of important algorithmic contributions with a critical discussion of outstanding computational challenges. The objective is to spur further research to address outstanding challenges in modeling equilibrium biomolecular structure and dynamics.
Bibliography

@article{ShehuPlakuJAIR16,
author = {Shehu, A. AND Plaku, E.}
journal = {J Artif Intel Res},
title = {“A Survey of Computational Treatments of Biomolecules by Robotics-Inspired Methods Modeling Equilibrium Structure and Dynamics”},
year = 2016,
volume = {57},
pages = {509-572}
}

J35: Emmanuel Sapinp, Daniel B Carr, Kenneth A De Jong*, and Amarda Shehu*. Computing energy landscape maps and structural excursions of proteins. BMC Genomics 17(Suppl 4):546, 2016.

Abstract

Background

Structural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics.

Methods

We present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions.

Results

Evaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship.

Bibliography

@article{SapinShehuBMCGeonmics16,
author = {Sapin, E. AND Carr, D. AND {De Jong}, K. A. AND Shehu, A.}
journal = {BMC Genomics},
title = {Computing energy landscape maps and structural excursions of proteins},
year = 2016,
volume = {14},
number = {Suppl 4},
pages = {546}
}

J34: Kevin Molloyg, Rudy Clauseng, and Amarda Shehu*. A Stochastic Roadmap Method to Model Protein Structural Transitions. Robotica 34(08):1705-1733, 2016 (featured on issue cover).

Abstract
Evidence is emerging that the role of protein structure in disease needs to be rethought. Sequence mutations in proteins are often found to affect the rate at which a protein switches between structures. Modeling structural transitions in wildtype and variant proteins is central to understanding the molecular basis of disease. This paper investigates an efficient algorithmic realization of the stochastic roadmap simulation framework to model structural transitions in wildtype and variants of proteins implicated in human disorders. Our results indicate that the algorithm is able to extract useful information on the impact of mutations on protein structure and function.
Bibliography

@article{MolloyShehuRobotica16,
author = {Molloy, K. AND Shehu, A.},
journal = {Robotica},
title = {A stochastic roadmap method to model protein structural transitions},
year = 2015,
volume = {34},
number = {08},
pages = {1705-1733}
}

J33: Kevin Molloyg and Amarda Shehu*. A General, Adaptive, Roadmap-based Algorithm for Protein Motion Computation. IEEE Trans NanoBioScience (TNB) 15(2): 158-165, 2016.

Abstract
Precious information on protein function can be extracted from a detailed characterization of protein equilibrium dynamics. This remains elusive in wet and dry laboratories, as function-modulating transitions of a protein between functionally-relevant, thermodynamically-stable and meta-stable structural states often span disparate time scales. In this paper we propose a novel, robotics-inspired algorithm that circumvents time-scale challenges by drawing analogies between protein motion and robot motion. The algorithm adapts the popular roadmap-based framework in robot motion computation to handle the more complex protein conformation space and its underlying rugged energy surface. Given known structures representing stable and meta-stable states of a protein, the algorithm yields a time- and energy-prioritized list of transition paths between the structures, with each path represented as a series of conformations. The algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. Promising results are presented on a variety of proteins that demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
Bibliography

@article{MolloyShehuTNB16,
author = {Molloy, K. AND Shehu, A.},
journal = {IEEE Trans NanoBioScience},
number = {15},
pages = {158-165},
title = {A General, Adaptive, Roadmap-based Algorithm for Protein Motion Computation},
volume = {2},
year = 2016 }

J32: Tatiana Maximovap, Ryan Moffattg, Buyong Ma, Ruth Nussinov*, and Amarda Shehu*. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comp Biol 12(4): e1004619, 2016, (top 50 most downloaded in 2016 and featured on April issue front cover. Also featured in the PLoS Comp Biol blog.)

Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Bibliography

@article{MaximovaNussinovShehu15,
author = {Maximova, T. AND Moffatt R. AND Ma, B. AND Nussinov, R. AND Shehu, A.},
journal = {PLoS Comput Biol},
title = {Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics},
year = 2015,
volume = {12},
number = {4},
pages = {e1004619}
}

J31: Amarda Shehu* and Ruth Nussinov*. Computational Methods for Exploration and Analysis of Macromolecular Structure and Dynamics. PLoS Comput Biol (PCB) 11(10): e1004585, 2015 (editorial).

Abstract
All processes that maintain and replicate a living cell involve fluctuating biological macromolecules. As computational biologists, our aim is to discern the behavior of macromolecules in a way that experimental biology is not able to achieve. No single technique—experimental or computational—can capture all the relevant scales of cellular functional behavior. In principle, computations are the tools that can integrate different kinds of experimental and computational characterizations at different resolutions to obtain a more complete description of the processes of life. Computer simulations can act as a bridge between the microscopic length and time scales, and the macroscopic world of the laboratory. They can start from a macroscopic experiment-based guess of interactions between molecules, and obtain “exact” predictions of bulk and detailed properties subject to limitations. They are able to test a theory by constructing and simulating the model, and comparing the results with experimental measurements; and they are able to provide models that experiments can test. Computations can provide leads by processing large sets of data, predicting molecular behaviors, and supplying the mechanistic underpinning that experiments alone may not be able to achieve.
Bibliography

@article{ShehuNussinovPCB2015,
title={Computational methods for exploration and analysis of macromolecular structure and dynamics},
author={Shehu, Amarda and Nussinov, Ruth},
journal={PLoS computational biology}, volume={11},
number={10},
pages={e1004585},
year={2015},
publisher={Public Library of Science}
}

J30: Didier Devaurs, Kevin Molloy, Marc Vaisset, Amarda Shehu, Thierry Simeon, and Juan Cortes*. Characterizing Energy Landscapes of Peptides using a Combination of Stochastic Algorithms. IEEE Trans NanoBioScience (TNB), 14(5): 545-552, 2015.

Abstract
Obtaining accurate representations of energy landscapes of biomolecules such as proteins and peptides is central to the study of their physicochemical properties and biological functions. Peptides are particularly interesting, as they exploit structural flexibility to modulate their biological function. Despite their small size, peptide modeling remains challenging due to the complexity of the energy landscape of such highly-flexible dynamic systems. Currently, only stochastic sampling-based methods can efficiently explore the conformational space of a peptide. In this paper, we suggest to combine two such methods to obtain a full characterization of energy landscapes of small yet flexible peptides. First, we propose a simplified version of the classical Basin Hopping algorithm to reveal low-energy regions in the landscape, and thus to identify the corresponding meta-stable structural states of a peptide. Then, we present several variants of a robotics-inspired algorithm, the Transition-based Rapidly-exploring Random Tree, to quickly determine transition path ensembles, as well as transition probabilities between meta-stable states. We demonstrate this combined approach on met-enkephalin.
Bibliography

@article{DevaursCortes15,
author = {Devaurs, D. AND Molloy, K. AND Vaisset, M. AND Shehu, A. AND Simeon, T. AND Cortes, J.},
journal = {IEEE Trans NanoBioScience},
title = {Characterizing Energy Landscapes of Peptides Using a Combination of Stochastic Algorithms},
year = 2015,
volume = {14},
number = {5},
pages = {545–552}
}

J29: Irina Hashmig and Amarda Shehu*. idDock+:Integrating Machine Learning in Probabilistic Search for Protein-protein Docking. J Computational Biology (JCB), 22(9):806-822, 2015.

Abstract
Predicting the three-dimensional native structures of protein dimers, a problem known as protein-protein docking, is key to understanding molecular interactions. Docking is a computationally challenging problem due to the diversity of interactions and the high dimensionality of the configuration space. Existing methods draw configurations systematically or at random from the configuration space. The inaccuracy of scoring functions used to evaluate drawn configurations presents additional challenges. Evidence is growing that optimization of a scoring function is an effective technique only once the drawn configuration is sufficiently similar to the native structure. Therefore, in this article we present a method that employs optimization of a sophisticated energy function, FoldX, only to locally improve a promising configuration. The main question of how promising configurations are identified is addressed through a machine learning method trained a priori on an extensive dataset of functionally diverse protein dimers. To deal with the vast configuration space, a probabilistic search algorithm operates on top of the learner, feeding to it configurations drawn at random. We refer to our method as idDock+, for informatics-driven Docking. idDock+is tested on 15 dimers of different sizes and functional classes. Analysis shows that on all systems idDock+finds a near-native structure and is comparable in accuracy to other state-of-the-art methods. idDock+ represents one of the first highly efficient hybrid methods that combines fast machine learning models with demanding optimization of sophisticated energy scoring functions. Our results indicate that this is a promising direction to improve both efficiency and accuracy in docking.
Bibliography

@article{HashmiShehuJCB15,
author = {Hashmi, I. AND Shehu, A.},
journal = {J Comput Biol},
title = {idDock+:Integrating Machine Learning in Probabilistic Search for Protein-protein Docking},
year = 2015,
volume = {22},
number = {9},
pages = {806-822}
}

J28: Rudy Clauseng and Amarda Shehu*. A Data-driven Evolutionary Algorithm for Mapping Multi-basin Protein Energy Landscapes. J Computational Biology (JCB), 22(9): 844-860, 2015.

Abstract
Evidence is emerging that many proteins involved in proteinopathies are dynamic molecules switching between stable and semistable structures to modulate their function. A detailed understanding of the relationship between structure and function in such molecules demands a comprehensive characterization of their conformation space. Currently, only stochastic optimization methods are capable of exploring conformation spaces to obtain sample-based representations of associated energy surfaces. These methods have to address the fundamental but challenging issue of balancing computational resources between exploration (obtaining a broad view of the space) and exploitation (going deep in the energy surface). We propose a novel algorithm that strikes an effective balance by employing concepts from evolutionary computation. The algorithm leverages deposited crystal structures of wildtype and variant sequences of a protein to define a reduced, low-dimensional search space from where to rapidly draw samples. A multiscale technique maps samples to local minima of the all-atom energy surface of a protein under investigation. Several novel algorithmic strategies are employed to avoid premature convergence to particular minima and obtain a broad view of a possibly multibasin energy surface. Analysis of applications on different proteins demonstrates the broad utility of the algorithm to map multibasin energy landscapes and advance modeling of multibasin proteins. In particular, applications on wildtype and variant sequences of proteins involved in proteinopathies demonstrate that the algorithm makes an important first step toward understanding the impact of sequence mutations on misfunction by providing the energy landscape as the intermediate explanatory link between protein sequence and function.
Bibliography

@article{ClausenShehuJCB15,
author = {Clausen, R. AND Shehu, A.},
journal = {J Comput Biol},
title = {A Data-driven Evolutionary Algorithm for Mapping Multi-basin Protein Energy Landscapes},
year = 2015,
volume = {22},
number = {9},
pages = {844-860}
}

J27: Rudy Clauseng, Buyong Ma, Ruth Nussinov, and Amarda Shehu*. Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm. PLoS Computational Biology (PCB) 11(9): e1004470, 2015.

Abstract
An important goal in molecular biology is to understand functional changes upon single-point mutations in proteins. Doing so through a detailed characterization of structure spaces and underlying energy landscapes is desirable but continues to challenge methods based on Molecular Dynamics. In this paper we propose a novel algorithm, SIfTER, which is based instead on stochastic optimization to circumvent the computational challenge of exploring the breadth of a protein’s structure space. SIfTER is a data-driven evolutionary algorithm, leveraging experimentally-available structures of wildtype and variant sequences of a protein to define a reduced search space from where to efficiently draw samples corresponding to novel structures not directly observed in the wet laboratory. The main advantage of SIfTER is its ability to rapidly generate conformational ensembles, thus allowing mapping and juxtaposing landscapes of variant sequences and relating observed differences to functional changes. We apply SIfTER to variant sequences of the H-Ras catalytic domain, due to the prominent role of the Ras protein in signaling pathways that control cell proliferation, its well-studied conformational switching, and abundance of documented mutations in several human tumors. Many Ras mutations are oncogenic, but detailed energy landscapes have not been reported until now. Analysis of SIfTER-computed energy landscapes for the wildtype and two oncogenic variants, G12V and Q61L, suggests that these mutations cause constitutive activation through two different mechanisms. G12V directly affects binding specificity while leaving the energy landscape largely unchanged, whereas Q61L has pronounced, starker effects on the landscape. An implementation of SIfTER is made available at http://www.cs.gmu.edu/~ashehu/?q=OurTools. We believe SIfTER is useful to the community to answer the question of how sequence mutations affect the function of a protein, when there is an abundance of experimental structures that can be exploited to reconstruct an energy landscape that would be computationally impractical to do via Molecular Dynamics.
Bibliography

@article{ClausenShehuPLoSCB15, author = {Clausen, R. AND Ma, B. AND Nussinov, R. AND Shehu, A.},
journal = {PLoS Comput Biol},
title = {Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm},
year = 2015,
volume = 11,
number = 9,
pages = {e1004470}
}

J26: Uday Kamathg, Kenneth A De Jong*, and Amarda Shehu*. Effective Automated Feature Construction and Selection for Classification of Biological Sequences. PLoS One, 9(7): e99982, 2014.

Abstract

Background

Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.

Methodology

We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.

Results

To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools.

Bibliography

@article{KamathDeJongShehuPLoS14,
author = {Kamath, U. AND {De Jong}, K. A. AND Shehu, A.},
journal = {PLoS {ONE}},
title = {Effective Automated Feature Construction and Selection for Classification of Biological Sequences},
year = 2014,
volume = 9,
number = 7,
pages = {e99982}
}

J25: Kevin Molloyg, M. Jennifer Vanu, Daniel Barbara*, and Amarda Shehu*. Exploring Representations of Protein Structure for Automated Remote Homology Detection and Mapping of Protein Structure Space. BMC Bioinformatics 15 (Suppl 8):S4, 2014.

Abstract

Background

Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space.

Methods

Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain.

Results

We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership.

Conclusions

This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.

Bibliography

@article{MolloyBarbaraShehuBMCBioinf14,
author = {Molloy, K. AND Min, J. V. AND Barbara, D. AND Shehu, A.},
journal = {BMC Bioinf},
title = {Exploring Representations of Protein Structure for Automated Remote Homology Detection and Mapping of Protein Structure Space},
volume = 15,
number = {Suppl 8},
pages = {S4},
year = 2014}

J24: Nadine Kabbani*, Jacob C. Nordman, Brian Corgiat, Daniel VeltrigAmarda Shehu, and David J. Adams. Are Nicotinic Receptors Coupled to G Proteins? BioEssays 35(12): 1025–1034, 2013, (selected for journal front cover video display. Read the highlight written on our article in same issue by Edward Howrot.)

Abstract
It was, until recently, accepted that the two classes of acetylcholine (ACh) receptors are distinct in an important sense: muscarinic ACh receptors signal via heterotrimeric GTP binding proteins (G proteins), whereas nicotinic ACh receptors (nAChRs) open to allow flux of Na+, Ca2+, and K+ ions into the cell after activation. Here we present evidence of direct coupling between G proteins and nAChRs in neurons. Based on proteomic, biophysical, and functional evidence, we hypothesize that binding to G proteins modulates the activity and signaling of nAChRs in cells. It is important to note that while this hypothesis is new for the nAChR, it is consistent with known interactions between G proteins and structurally related ligand-gated ion channels. Therefore, it underscores an evolutionarily conserved metabotropic mechanism of G protein signaling via nAChR channels.
Bibliography

@article{KabbaniShehuAdams,
author = {Kabbani, N. AND Nordman, J. C. AND Corgiat, B. AND Veltri, D. AND Shehu, A. AND Adams, D. J.},
journal = {BioEssays},
volume = {35},
title = {Are nicotinic receptors coupled to G Proteins?},
number = {12},
pages = {1025-1034},
year = 2013 }

J23: Abrar Ashoor, Jacob C. Nordman, Daniel Veltrig, Keun-Hang Susan Yang, Lina Al Kury, Yaroslav Shuba, Mohamed Mahgoub, Frank C. Howarth, Carl Lupica, Amarda Shehu, Nadine Kabbani, and Murat Oz*. Menthol Inhibits 5-HT3 Receptor-mediated Currents. J of Pharmacology and Experimental Therapeutics (JPET) 347(2):398-409, 2013, (selected for issue front cover).

Abstract
The effects of alcohol monoterpene menthol, a major active ingredient of the peppermint plant, were tested on the function of human 5-hydroxytryptamine type 3 (5-HT3) receptors expressed in Xenopus laevis oocytes. 5-HT (1 μM)-evoked currents recorded by two-electrode voltage-clamp technique were reversibly inhibited by menthol in a concentration-dependent (IC50 = 163 μM) manner. The effects of menthol developed gradually, reaching a steady-state level within 10–15 minutes and did not involve G-proteins, since GTPγS activity remained unaltered and the effect of menthol was not sensitive to pertussis toxin pretreatment. The actions of menthol were not stereoselective as (−), (+), and racemic menthol inhibited 5-HT3 receptor–mediated currents to the same extent. Menthol inhibition was not altered by intracellular 1,2-bis(o-aminophenoxy)ethane-N,N,N′,N′-tetraacetic acid injections and transmembrane potential changes. The maximum inhibition observed for menthol was not reversed by increasing concentrations of 5-HT. Furthermore, specific binding of the 5-HT3 antagonist [3H]GR65630 was not altered in the presence of menthol (up to 1 mM), indicating that menthol acts as a noncompetitive antagonist of the 5-HT3 receptor. Finally, 5-HT3 receptor–mediated currents in acutely dissociated nodose ganglion neurons were also inhibited by menthol (100 μM). These data demonstrate that menthol, at pharmacologically relevant concentrations, is an allosteric inhibitor of 5-HT3 receptors.
Bibliography

@article{MuratJPET13,
author = {Ashoor, A. AND Nordman, J. C. AND Veltri, D. AND Yang, K.-H. S. AND {Al Kury}, L. AND Shuba, Y. AND Mahgoub, M. AND Howarth, F. C. AND Lupica, C. AND Shehu, A. AND Kabbani, N. AND Oz, M.},
journal = {J of Pharmacology and Experimental Therapeutics (JPET)},
volume = {347},
title = {Menthol Inhibits 5-HT3 Receptor-mediated Currents},
number = {2},
pages = {398-409},
year = 2013}

J22: Abrar Ashoor, Jacob C. Nordman, Daniel Veltrig, Keun-Hang Susan Yang, Lina Al Kury, Yaroslav Shuba, Mohamed Mahgoub, Frank C. Howarth, Bassem Sadek, Amarda Shehu, Nadine Kabbani, and Murat Oz*. Menthol Binding and Inhibition of Alpha7-nicotinic Acetylcholine Receptors. PLos One 8(7):e67674, 2013.

Abstract
Menthol is a common compound in pharmaceutical and commercial products and a popular additive to cigarettes. The molecular targets of menthol remain poorly defined. In this study we show an effect of menthol on the α7 subunit of the nicotinic acetylcholine (nACh) receptor function. Using a two-electrode voltage-clamp technique, menthol was found to reversibly inhibit α7-nACh receptors heterologously expressed in Xenopus oocytes. Inhibition by menthol was not dependent on the membrane potential and did not involve endogenous Ca2+-dependent Cl channels, since menthol inhibition remained unchanged by intracellular injection of the Ca2+ chelator BAPTA and perfusion with Ca2+-free bathing solution containing Ba2+. Furthermore, increasing ACh concentrations did not reverse menthol inhibition and the specific binding of [125I] α-bungarotoxin was not attenuated by menthol. Studies of α7– nACh receptors endogenously expressed in neural cells demonstrate that menthol attenuates α7 mediated Ca2+ transients in the cell body and neurite. In conclusion, our results suggest that menthol inhibits α7-nACh receptors in a noncompetitive manner.
Bibliography

@article{MuratPLOSONE13,
author = {Ashoor, A. AND Nordman, J. C. AND Veltri, D. AND Yang, K.-H. S. AND {Al Kury}, L. AND Shuba, Y. AND Mahgoub, M. AND Howarth, F. C. AND Sadek, B. AND Shehu, A. AND Kabbani, N. AND Oz, M.},
journal = {{PLoS} One},
volume = {8},
number = {7},
title = {Menthol Binding and Inhibition of Alpha7-nicotinic Acetylcholine Receptors}, pages = {e67674},
year = 2013}

J21: Kevin Molloyg, Sameh Salehu, and Amarda Shehu*. Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab-initio Protein Structure Prediction. IEEE/ACM Trans Comp Biol and Bioinf 10(5):1162-1175, 2013.

Abstract
Adequate sampling of the conformational space is a central challenge in ab initio protein structure prediction. In the absence of a template structure, a conformational search procedure guided by an energy function explores the conformational space, gathering an ensemble of low-energy decoy conformations. If the sampling is inadequate, the native structure may be missed altogether. Even if reproduced, a subsequent stage that selects a subset of decoys for further structural detail and energetic refinement may discard near-native decoys if they are high energy or insufficiently represented in the ensemble. Sampling should produce a decoy ensemble that facilitates the subsequent selection of near-native decoys. In this paper, we investigate a robotics-inspired framework that allows directly measuring the role of energy in guiding sampling. Testing demonstrates that a soft energy bias steers sampling toward a diverse decoy ensemble less prone to exploiting energetic artifacts and thus more likely to facilitate retainment of near-native conformations by selection techniques. We employ two different energy functions, the associative memory Hamiltonian with water and Rosetta. Results show that enhanced sampling provides a rigorous testing of energy functions and exposes different deficiencies in them, thus promising to guide development of more accurate representations and energy functions.
Bibliography

@article{MolloyShehuTCBB13,
author = {Molloy, K. AND Saleh, S. AND Shehu, A.},
journal = {IEEE/ACM Trans Bioinf and Comp Biol},
volume = {10},
title = {Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab-initio Protein Structure Prediction},
number = {5},
pages = {1162-1175},
year = 2013}

J20: Irina Hashmig and Amarda Shehu*. HopDock: A Probabilistic Search Algorithm for Decoy Sampling in Protein-protein Docking. Proteome Sci 11(Suppl1):S6, 2013.

Abstract

Background

Elucidating the three-dimensional structure of a higher-order molecular assembly formed by interacting molecular units, a problem commonly known as docking, is central to unraveling the molecular basis of cellular activities. Though protein assemblies are ubiquitous in the cell, it is currently challenging to predict the native structure of a protein assembly in silico.

Methods

This work proposes HopDock, a novel search algorithm for protein-protein docking. HopDock efficiently obtains an ensemble of low-energy dimeric configurations, also known as decoys, that can be effectively used by ab-initio docking protocols. HopDock is based on the Basin Hopping (BH) framework which perturbs the structure of a dimeric configuration and then follows it up with an energy minimization to explicitly sample a local minimum of a chosen energy function. This process is repeated in order to sample consecutive energy minima in a trajectory-like fashion. HopDock employs both geometry and evolutionary conservation analysis to narrow down the interaction search space of interest for the purpose of efficiently obtaining a diverse decoy ensemble.

Results and conclusions

A detailed analysis and a comparative study on seventeen different dimers shows HopDock obtains a broad view of the energy surface near the native dimeric structure and samples many near-native configurations. The results show that HopDock has high sampling capability and can be employed to effectively obtain a large and diverse ensemble of decoy configurations that can then be further refined in greater structural detail in ab-initio docking protocols.

Bibliography

@article{HashmiShehuProteomeSci13,
author = {Hashmi, I. AND Shehu, A.},
journal = {Proteome Sci},
volume = {11},
title = {HopDock: A Probabilistic Search Algorithm for Decoy Sampling in Protein-protein Docking},
number = {Suppl1},
pages = {S6},
year = 2013}

J19: Sameh Salehu, Brian Olsong, and Amarda Shehu*. A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction. BMC Structural Biology J 13(Suppl1):S4, 2013.

Abstract

Background

Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured.

Methods

We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima.

Results and conclusions

Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be key to enhancing sampling capability and obtaining a diverse ensemble of decoy conformations, circumventing premature convergence to sub-optimal regions in the conformational space, and approaching the native structure with proximity that is comparable to state-of-the-art decoy sampling methods. The results are shown to be robust and valid when using two representative state-of-the-art coarse-grained energy functions.

Bibliography

@article{SalehShehuBMCStructBiol13,
author = {Saleh, S. AND Olson, B. AND Shehu, A.},
journal = {BMC Struct Biol},
volume = {13},
title = {A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction},
number = {Suppl1},
pages = {S4},
year = 2013}

J18: Brian Olsong and Amarda Shehu*. Rapid Sampling of Local Minima in Protein Energy Surface and Effective Reduction through a Multi-objective Filter. Proteome Sci 11(Suppl1):S12 2013.

Abstract

Background

Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles.

Methods

This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima.

Results and conclusions

We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface.

Bibliography

@article{OlsonShehuProteomSci13,
author = {Olson, B. AND Shehu, A.},
journal = {Proteome Sci},
volume = {11},
title = {Rapid Sampling of Local Minima in Protein Energy Surface and Effective Reduction through a Multi-objective Filter},
number = {Suppl1},
pages = {S12},
year = 2013}

J17: Kevin Molloyg and Amarda Shehu*. Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method. BMC Structural Biology J 13(Suppl1):S8, 2013.

Abstract

Background

Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space.

Methods

We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers.

Results and conclusions

Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.

Bibliography

@article{MolloyShehuBMCStructBiol13,
author = {Molloy, K. AND Shehu, A.},
journal = {BMC Struct Biol},
volume = {13},
title = {Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method},
number = {Suppl 1},
pages = {S8},
year = 2013}

J16: Brian Olsong, Irina Hashmig, Kevin Molloyg, and Amarda Shehu*. Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules. Advances in Artificial Intelligence J 2012, 674832 (special issue on Artificial Intelligence Applications in Biomedicine).

Abstract
Since its introduction, the basin hopping (BH) framework has proven useful for hard nonlinear optimization problems with multiple variables and modalities. Applications span a wide range, from packing problems in geometry to characterization of molecular states in statistical physics. BH is seeing a reemergence in computational structural biology due to its ability to obtain a coarse-grained representation of the protein energy surface in terms of local minima. In this paper, we show that the BH framework is general and versatile, allowing to address problems related to the characterization of protein structure, assembly, and motion due to its fundamental ability to sample minima in a high-dimensional variable space. We show how specific implementations of the main components in BH yield algorithmic realizations that attain state-of-the-art results in the context of ab initio protein structure prediction and rigid protein-protein docking. We also show that BH can map intermediate minima related with motions connecting diverse stable functionally relevant states in a protein molecule, thus serving as a first step towards the characterization of transition trajectories connecting these states.
Bibliography

@article{OlsonShehuAdvAI12,
author = {Olson, B. AND Hashmi, I. AND Molloy, K. AND Shehu, A.},
journal = {Advances in AI J},
number = {674832},
title = {Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules},
volume = {2012},
year = 2012}

J15: Brian Olsong and Amarda Shehu*. Evolutionary-inspired Probabilistic Search for Enhancing Sampling of Local Minima in the Protein Energy Surface. Proteome Science 2012, 10(Suppl1): S5.

Abstract

Background

Despite computational challenges, elucidating conformations that a protein system assumes under physiologic conditions for the purpose of biological activity is a central problem in computational structural biology. While these conformations are associated with low energies in the energy surface that underlies the protein conformational space, few existing conformational search algorithms focus on explicitly sampling low-energy local minima in the protein energy surface.

Methods

This work proposes a novel probabilistic search framework, PLOW, that explicitly samples low-energy local minima in the protein energy surface. The framework combines algorithmic ingredients from evolutionary computation and computational structural biology to effectively explore the subspace of local minima. A greedy local search maps a conformation sampled in conformational space to a nearby local minimum. A perturbation move jumps out of a local minimum to obtain a new starting conformation for the greedy local search. The process repeats in an iterative fashion, resulting in a trajectory-based exploration of the subspace of local minima.

Results and conclusions

The analysis of PLOW’s performance shows that, by navigating only the subspace of local minima, PLOW is able to sample conformations near a protein’s native structure, either more effectively or as well as state-of-the-art methods that focus on reproducing the native structure for a protein system. Analysis of the actual subspace of local minima shows that PLOW samples this subspace more effectively that a naive sampling approach. Additional theoretical analysis reveals that the perturbation function employed by PLOW is key to its ability to sample a diverse set of low-energy conformations. This analysis also suggests directions for further research and novel applications for the proposed framework.

Bibliography

@article{OlsonShehuProtSci12,
author = {Olson, B. AND Shehu, A.},
journal = {Proteome Sci},
number = {10},
pages = {S5},
title = {Evolutionary-inspired probabilistic search for enhancing sampling of local minima in the
protein energy surface},
volume = {10},
year = 2012}

J14: Irina Hashmig, Bahar Aklbal-Delibas, Nurit Haspel, and Amarda Shehu*. Guiding Protein Docking with Geometric and Evolutionary Information. J Bioinf and Comp Biol 2012, 10(3): 1242002.

Abstract
Structural modeling of molecular assemblies promises to improve our understanding of molecular interactions and biological function. Even when focusing on modeling structures of protein dimers from knowledge of monomeric native structure, docking two rigid structures onto one another entails exploring a large configurational space. This paper presents a novel approach for docking protein molecules and elucidating native-like configurations of protein dimers. The approach makes use of geometric hashing to focus the docking of monomeric units on geometrically complementary regions through rigid-body transformations. This geometry-based approach improves the feasibility of searching the combined configurational space. The search space is narrowed even further by focusing the sought rigid-body transformations around molecular surface regions composed of amino acids with high evolutionary conservation. This condition is based on recent findings, where analysis of protein assemblies reveals that many functional interfaces are significantly conserved throughout evolution. Different search procedures are employed in this work to search the resulting narrowed configurational space. A proof-of-concept energy-guided probabilistic search procedure is also presented. Results are shown on a broad list of 18 protein dimers and additionally compared with data reported by other labs. Our analysis shows that focusing the search around evolutionary-conserved interfaces results in lower lRMSDs.
Bibliography

@article{HashmiShehu12,
author = {Hashmi, I. AND Akbal-Delibas, B. AND Haspel, N. AND Shehu, A.},
journal = {J Bioinf and Comp Biol},
number = {3},
pages = {1242002},
title = {Guiding Protein Docking with Geometric and Evolutionary Information},
volume = {10},
year = 2012}

J13: Bahar Aklbal-Delibas, Irina HashmigAmarda Shehu, and Nurit Haspel*. An Evolutionary Conservation Based Method for Re fining and Reranking Protein Complex Structures. J Bioinf and Comp Biol 2012, 10(3):1242008.

Abstract
Detection of protein complexes and their structures is crucial for understanding their role in the basic biology of organisms. Computational docking methods can provide researchers with a good starting point for the analysis of protein complexes. However, these methods are often not accurate and their results need to be further refined to improve interface packing. In this paper, we introduce a refinement method that incorporates evolutionary information into a novel scoring function by employing Evolutionary Trace (ET)-based scores. Our method also takes Van der Waals interactions into account to avoid atomic clashes in refined structures. We tested our method on docked candidates of eight protein complexes and the results suggest that the proposed scoring function helps bias the search toward complexes with native interactions. We show a strong correlation between evolutionary-conserved residues and correct interface packing. Our refinement method is able to produce structures with better lRMSD (least RMSD) with respect to the known complexes and lower energies than initial docked structures. It also helps to filter out false-positive complexes generated by docking methods, by detecting little or no conserved residues on false interfaces. We believe this method is a step toward better ranking and prediction of protein complexes.
Bibliography

@article{AkbalHaspel12,
author = {Akbal-Delibas, B. AND Hashmi, I. AND Shehu, A. AND Haspel, N.},
journal = {J Bioinf and Comp Biol},
number = {3},
pages = {1242008},
title = {An Evolutionary Conservation Based Method for Refining and Reranking Protein Complex Structures},
volume = {10},
year = 2012}

J12: Brian Olsong, Kevin Molloyg, S.-Farid Hendig, and Amarda Shehu*. Guiding Search in the Protein Conformational Space with Structural Profiles. J Bioinf and Comp Biol 2012, 10(3):1242005.

Abstract
The roughness of the protein energy surface poses a significant challenge to search algorithms that seek to obtain a structural characterization of the native state. Recent research seeks to bias search toward near-native conformations through one-dimensional structural profiles of the protein native state. Here we investigate the effectiveness of such profiles in a structure prediction setting for proteins of various sizes and folds. We pursue two directions. We first investigate the contribution of structural profiles in comparison to or in conjunction with physics-based energy functions in providing an effective energy bias. We conduct this investigation in the context of Metropolis Monte Carlo with fragment-based assembly. Second, we explore the effectiveness of structural profiles in providing projection coordinates through which to organize the conformational space. We do so in the context of a robotics-inspired search framework proposed in our lab that employs projections of the conformational space to guide search. Our findings indicate that structural profiles are most effective in obtaining physically realistic near-native conformations when employed in conjunction with physics-based energy functions. Our findings also show that these profiles are very effective when employed instead as projection coordinates to guide probabilistic search toward undersampled regions of the conformational space.
Bibliography

@article{OlsonMolloyShehu12,
author = {Olson, B. S. AND Molloy, K. AND Hendi, S.-F. AND Shehu, A.},
journal = {J Bioinf and Comp Biol},
number = {3},
pages = {1242005},
title = {Guiding Search in the Protein Conformational Space with Structural Profiles},
volume = {10},
year = 2012}

J11: Amarda Shehu* and Lydia Kavraki*. Modeling Structures and Motions of Loops in Protein Molecules. Entropy 2012, 14(2):252-290 (invited review article), IF 2011: 1.109).

Abstract
Unlike the secondary structure elements that connect in protein structures, loop fragments in protein chains are often highly mobile even in generally stable proteins. The structural variability of loops is often at the center of a protein’s stability, folding, and even biological function. Loops are found to mediate important biological processes, such as signaling, protein-ligand binding, and protein-protein interactions. Modeling conformations of a loop under physiological conditions remains an open problem in computational biology. This article reviews computational research in loop modeling, highlighting progress and challenges. Important insight is obtained on potential directions for future research.
Bibliography

@article{ShehuKavrakiEntropy12,
author = {Shehu, A. AND Kavraki, L. E.},
journal = {Entropy J},
number = {2},
pages = {252-290},
title = {Modeling Structures and Motions of Loops in Protein Molecules},
volume = {14},
year = 2012}

J10: Uday Kamathg, Jack Comptonu, Rezarta Islamaj Dogan, Kenneth A. De Jong*, and Amarda Shehu*. An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and its Application to DNA Splice-Site Prediction. IEEE Trans Comp Biol and Bioinf 2012, 9(5):1387-1398 (IF 2011: 2.25).

Abstract
Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.
Bibliography

@article{KamathShehuTCBB12,
author = {Kamath, U. AND Compton, J. AND Islamaj Dogan, R. AND De Jong, K. A. AND Shehu, A.},
journal = {IEEE Trans Comp Biol and Bioinf},
number = {5},
pages = {1387-1398},
title = {An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and its Application to DNA Splice-Site Prediction},
volume = {9},
year = 2012}

J9: Uday KamathgAmarda Shehu*, and Kenneth A. De Jong*. A Two-Stage Evolutionary Approach for Effective Classification of Hypersensitive DNA Sequences. J Bioinf and Comp Biol 2011, 9(3): 399-413.

Abstract
Hypersensitive (HS) sites in genomic sequences are reliable markers of DNA regulatory regions that control gene expression. Annotation of regulatory regions is important in understanding phenotypical differences among cells and diseases linked to pathologies in protein expression. Several computational techniques are devoted to mapping out regulatory regions in DNA by initially identifying HS sequences. Statistical learning techniques like Support Vector Machines (SVM), for instance, are employed to classify DNA sequences as HS or non-HS. This paper proposes a method to automate the basic steps in designing an SVM that improves the accuracy of such classification. The method proceeds in two stages and makes use of evolutionary algorithms. An evolutionary algorithm first designs optimal sequence motifs to associate explicit discriminating feature vectors with input DNA sequences. A second evolutionary algorithm then designs SVM kernel functions and parameters that optimally separate the HS and non-HS classes. Results show that this two–stage method significantly improves SVM classification accuracy. The method promises to be generally useful in automating the analysis of biological sequences, and we post its source code on our website.
Bibliography

@article{KamathShehuDeJongJBCB11,
author = {Kamath, U. AND Shehu, A. AND De Jong, K.},
journal = {J. Bioinf. and Comp. Biol.},
title = {A Two-Stage Evolutionary Approach for Effective Classification of Hypersensitive DNA Sequences},
number = {3},
pages = {399-413},
volume = {9},
year = 2011 }

J8: Brian Olsong, Kevin Molloyg, and Amarda Shehu*. In Search of the Protein Native State with a Probabilistic Sampling Approach. J Bioinf and Comp Biol 2011, 9(3):383-398.

Abstract
The three-dimensional structure of a protein is a key determinant of its biological function. Given the cost and time required to acquire this structure through experimental means, computational models are necessary to complement wet-lab efforts. Many computational techniques exist for navigating the high-dimensional protein conformational search space, which is explored for low-energy conformations that comprise a protein’s native states. This work proposes two strategies to enhance the sampling of conformations near the native state. An enhanced fragment library with greater structural diversity is used to expand the search space in the context of fragment-based assembly. To manage the increased complexity of the search space, only a representative subset of the sampled conformations is retained to further guide the search towards the native state. Our results make the case that these two strategies greatly enhance the sampling of the conformational space near the native state. A detailed comparative analysis shows that our approach performs as well as state-of-the-art ab initio structure prediction protocols.
Bibliography

@article{OlsonMolloyShehuJBCB11,
author = {Olson, B. AND Molloy, K. AND Shehu, A.},
journal = {J. Bioinf. and Comp. Biol.},
title = {In Search of the Protein Native State with a Probabilistic Sampling Approach},
number = {3},
pages = {383-398},
volume = {9},
year = 2011 }

J7: Amarda Shehu* and Brian Olsong. Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration. Intl J of Robot Res 2010, 29(8):1106-1127.

Abstract
In this paper we propose a robotics-inspired method to enhance sampling of native–like conformations when employing only aminoacid sequence information for a protein at hand. Computing such conformations, essential to associating structural and functional information with gene sequences, is challenging due to the high-dimensionality and the rugged energy surface of the protein conformational space. The contribution of this paper is a novel two-layered method to enhance the sampling of geometrically distinct low-energy conformations at a coarse-grained level of detail. The method grows a tree in conformational space reconciling two goals: (i) guiding the tree towards lower energies; and (ii) not oversampling geometrically similar conformations. Discretizations of the energy surface and a low-dimensional projection space are employed to select more often for expansion low-energy conformations in under-explored regions of the conformational space. The tree is expanded with low-energy conformations through a Metropolis Monte Carlo framework that uses a move set of physical fragment configurations. Testing on sequences of eight small-to-medium structurally diverse proteins shows that the method rapidly samples native–like conformations in a few hours on a single CPU. Analysis shows that computed conformations are good candidates for further detailed energetic refinements by larger studies in protein engineering and design.
Bibliography

@article{ShehuOlsonIJRR10,
author = {Shehu, A. AND Olson, B.},
journal = {Intl. J. Robot. Res.},
title = {Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration},
number = {8},
pages = {1106-1127},
volume = {29},
year = 2010 }

J6: Joseph A. Hegler, Joachim Laetzer, Amarda Shehu, Cecilia Clementi, and Peter G. Wolynes*. Restriction vs. Guidance: Fragment Assembly and Associative Memory Hamiltonians for Protein Structure Prediction. Proc. Nat. Acad. Sci. USA 2009, 106(36):15302-15307.

Abstract
Conformational restriction by fragment assembly and guidance in molecular dynamics are alternate conformational search strategies in protein structure prediction. We examine both approaches using a version of the associative memory Hamiltonian that incorporates the influence of water-mediated interactions (AMW). For short proteins (<70 residues), fragment assembly, while searching a restricted space, compares well to molecular dynamics and is often sufficient to fold such proteins to near-native conformations (4Å) via simulated annealing. Longer proteins encounter kinetic sampling limitations in fragment assembly not seen in molecular dynamics which generally samples more native-like conformations. We also present a fragment enriched version of the standard AMW energy function, AMW-FME, which incorporates the local sequence alignment derived fragment libraries from fragment assembly directly into the energy function. This energy function, in which fragment information acts as a guide not a restriction, is found by molecular dynamics to improve on both previous approaches.
Bibliography

@article{HeglerWolynesPNAS09,
author = {Hegler, J. A. AND Laetzer, J. AND Shehu, A. AND Clementi, C. AND Wolynes, P. G.},
journal = {Proc. Nat. Acad. Sci. USA},
title = {Restriction vs. Guidance: Fragment Assembly and Associative Memory Hamiltonians for Protein Structure Prediction},
number = {36},
pages = {15302-15307},
volume = {106},
year = 2009, }

J5: Amarda Shehu, Lydia E. Kavraki*, and Cecilia Clementi*. Multiscale Characterization of Protein Conformational Ensembles. Proteins: Structure, Function, and Bioinformatics, 2009,76(4):837-851.

Abstract
We propose a multiscale exploration method to characterize the conformational space populated by a protein at equilibrium. The method efficiently obtains a large set of equilibrium conformations in two stages: first exploring the entire space at a coarse-grained level of detail, then narrowing a refined exploration to selected low-energy regions. The coarse-grained exploration periodically adds all-atom detail to selected conformations to ensure that the search leads to regions which maintain low energies in all-atom detail. The second stage reconstructs selected low-energy coarse-grained conformations in all-atom detail. A low-dimensional energy landscape associated with all-atom conformations allows focusing the exploration to energy minima and their conformational ensembles. The lowest energy ensembles are enriched with additional all-atom conformations through further multiscale exploration. The lowest energy ensembles obtained from the application of the method to three different proteins correctly capture the known functional states of the considered systems.
Bibliography

@article{ShehuKavrakiClementiProteins09,
author = {Shehu, A. AND Kavraki, L. E. AND Clementi, C.},
journal = {Proteins: Struct, Funct, and Bioinf},
title = {Multiscale Characterization of Protein Conformational Ensembles},
number = {4},
pages = {837-851},
volume = {76},
year = 2009, }

J4: Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi*. Unfolding the Fold of Cyclic Cysteine-rich Peptides. Protein Science,  2008, 17(3):482-493.

Abstract
We propose a method to extensively characterize the native state ensemble of cyclic cysteine-rich peptides. The method uses minimal information, namely, amino acid sequence and cyclization, as a topological feature that characterizes the native state. The method does not assume a specific disulfide bond pairing for cysteines and allows the possibility of unpaired cysteines. A detailed view of the conformational space relevant for the native state is obtained through a hierarchic multi-resolution exploration. A crucial feature of the exploration is a geometric approach that efficiently generates a large number of distinct cyclic conformations independently of one another. A spatial and energetic analysis of the generated conformations associates a free-energy landscape to the explored conformational space. Application to three long cyclic peptides of different folds shows that the conformational ensembles and cysteine arrangements associated with free energy minima are fully consistent with available experimental data. The results provide a detailed analysis of the native state features of cyclic peptides that can be further tested in experiment.
Bibliography

@article{ShehuKavrakiClementiProtSci08,
author = {Shehu, A. AND Kavraki, L. E. AND Clementi, C.},
journal = {Protein Sci},
number = {3},
pages = {482-493},
title = {Unfolding the Fold of Cyclic Cysteine-rich Peptides},
volume = {17},
year = 2008}

J3: Amarda Shehu, Cecilia Clementi, and Lydia E. Kavraki*. Sampling Conformation Space to Model Equilibrium Fluctuations in Proteins. Algorithmica,  2007, 48(4):303-327.

Abstract
This paper proposes the Protein Ensemble Method (PEM) to model equilibrium fluctuations in proteins where fragments of the protein polypeptide chain can move independently of one another. PEM models global equilibrium fluctuations of a polypeptide chain by combining local fluctuations of consecutive overlapping fragments of the chain. Local fluctuations are computed by a probabilistic exploration that exploits analogies between proteins and robots. All generated conformations are subjected to energy minimization and then are weighted according to a Boltzmann distribution. Using the theory of statistical mechanics the Boltzmann-weighted fluctuations corresponding to each fragment are combined to obtain fluctuations for the entire protein. The agreement obtained between PEM-modeled fluctuations, wet-lab experiment and guided simulation measurements, indicates that PEM is able to reproduce with high accuracy protein equilibrium fluctuations that occur over a broad range of timescales.
Bibliography

@article{ShehuClementiKavrakiAlgo07,
author = {Shehu, A. AND Clementi, C. AND Kavraki, L. E.},
journal = {Algorithmica},
number = {4},
pages = {303-327},
title = {Sampling Conformation Space to Model Equilibrium Fluctuations in Proteins},
volume = {48},
year = 2007}

J2: Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi*. On the Characterization of Protein Native State Ensembles. Biophysical Journal,  2007, 92(5):1503-1511.

Abstract
Describing and understanding the biological function of a protein requires a detailed structural and thermodynamic description of the protein’s native state ensemble. Obtaining such a description often involves characterizing equilibrium fluctuations that occur beyond the nanosecond timescale. Capturing such fluctuations remains nontrivial even for very long molecular dynamics and Monte Carlo simulations. We propose a novel multiscale computational method to exhaustively characterize, in atomistic detail, the protein conformations constituting the native state with no inherent timescale limitations. Applications of this method to proteins of various folds and sizes show that thermodynamic observables measured as averages over the native state ensembles obtained by the method agree remarkably well with nuclear magnetic resonance data that span multiple timescales. By characterizing equilibrium fluctuations at atomistic detail over a broad range of timescales, from picoseconds to milliseconds, our method offers to complement current simulation techniques and wet-lab experiments and can impact our understanding and description of the relationship between protein flexibility and function.
Bibliography

@article{ShehuKavrakiClementiBiophysJ07,
author = {Shehu, A. AND Kavraki, L. E. AND Clementi, C.},
journal = {BiophysJ},
number = {5},
pages = {1503-1511},
title = {On the Characterization of Protein Native State Ensembles},
volume = {92},
year = 2007}

J1: Amarda Shehu, Cecilia Clementi*, and Lydia E. Kavraki*. Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations. Proteins: Structure, Function, and Bioinformatics  2006, 65(1):164-179.

Abstract
Characterizing protein flexibility is an important goal for understanding the physical–chemical principles governing biological function. This paper presents a Fragment Ensemble Method to capture the mobility of a protein fragment such as a missing loop and its extension into a Protein Ensemble Method to characterize the mobility of an entire protein at equilibrium. The underlying approach in both methods is to combine a geometric exploration of conformational space with a statistical mechanics formulation to generate an ensemble of physical conformations on which thermodynamic quantities can be measured as ensemble averages. The Fragment Ensemble Method is validated by applying it to characterize loop mobility in both instances of strongly stable and disordered loop fragments. In each instance, fluctuations measured over generated ensembles are consistent with data from experiment and simulation. The Protein Ensemble Method captures the mobility of an entire protein by generating and combining ensembles of conformations for consecutive overlapping fragments defined over the protein sequence. This method is validated by applying it to characterize flexibility in ubiquitin and protein G. Thermodynamic quantities measured over the ensembles generated for both proteins are fully consistent with available experimental data. On these proteins, the method recovers nontrivial data such as order parameters, residual dipolar couplings, and scalar couplings. Results presented in this work suggest that the proposed methods can provide insight into the interplay between protein flexibility and function.
Bibliography

@article{ShehuClementiKavrakiProt06,
author = {Shehu, A. AND Clementi, C. AND Kavraki, L. E.},
journal = {Proteins: Struct, Funct, and Bioinf},
number = {1},
pages = {164-179},
title = {Modeling Protein Conformational Ensembles: {F}rom Missing Loops to Equilibrium Fluctuations},
volume = {65},
year = 2006}