Conferences and Workshops

Articles are listed in reverse chronological order. Acceptance rates (ARs) are provided where available. Links to publishers are provided for each article. Local copies are also made available, under the warning that articles are provided under the copyright permission for noncommercial dissemination of academic work.

Shehu’s advisees indicated by: undergraduate (u), graduate (g), and postdoctoral (p) students. Corresponding authors are indicated by (*).

C55: Nasrin Akhterg, Gopinath Chennupati, Hristo Djidjev*, and Amarda Shehu*. ML-Select: Improved Decoy Selection via Machine Learning and Ranking. IEEE Intl Conf on Comput Adv in Bio and Medical Sciences (ICCABS), Las Vegas, Nevada 2018 (accepted).

Abstract Bibliography

Publisher

Local Copy

citations:

AR:-

C54: Ahmed Bin Zamang and Amarda Shehu*. A Multi-objective, Non-dominated Sorting Evolutionary Algorithm for Template-free Protein Structure Prediction. IEEE Intl Conf on Bioinf and Biomed (BIBM), Madrid, Spain 2018 (under review).

Abstract Bibliography

Publisher

Local Copy

citations:

AR:-

C53: Nasrin Akhterg, Jing Leig, Wanli Qiao, and Amarda Shehu*. Reconstructing and Decomposing Protein Energy Landscapes to Organize Structure Spaces and Reveal Biologically-active States. IEEE Intl Conf on Bioinf and Biomed (BIBM), Madrid, Spain 2018 (under review).

Abstract Bibliography

Publisher

Local Copy

citations:

AR:-

C52: Liban Hassanp, Zahra Rajabip, Nasrin Akhterp, and Amarda Shehu*. Community Detection for Decoy Selection in Template-free Protein Structure Prediction. Comput Struct Biol Workshop (CSBW) – ACM BCB Workshops, Washington, D.C. 2018,pg. 621-625..

Abstract Bibliography

Publisher

Local Copy

citations:

AR:-

C51: Fahad Almsnedp, Gideon Gogovip, Nicole Braccip, Kylene Kehn-Hall, Estela Blaisten-Barojas, and Amarda Shehu*. Modeling the Tertiary Structure of a Multi-domain Protein. Comput Struct Biol Workshop (CSBW) – ACM BCB Workshops, Washington, D.C. 2018,pg. 615-620.

Abstract Bibliography

Publisher

Local Copy

citations:

AR:-

C50: Nasrin Akhterg and Amarda Shehu*.  Analysis of Energy Landscapes for Improved Decoy Selection in Template-free Protein Structure Prediction. Intl Conf on Bioinf and Comp Biol (BICoB),Las Vegas, NV 2018, pg. 111-116 (finalist for best paper award).

Abstract
Decoy selection is the task of automatically extracting near-native structures from an ensemble of low-energy structures generated in silico by a template-free method. Current research shows that discriminating by energy misses near-native structures and allows the inclusion of too many non-native structures. The predominant strategy is to ignore energy and cluster structures by their similarity, offering the top-populated clusters as prediction. In this paper we show that energy can improve accuracy in decoy selection when its inclusion is carried out under the energy landscape view. Specifically, we identify basins in the energy landscape and demonstrate basin selection schemes to outperform clustering. The results are promising and point to further directions of research for improving decoy selection and decoy generation.
Bibliography

@inproceedings{AkhterShehuBICOB2018,
title={Analysis of Energy Landscapes for Improved Decoy Selection in Template-free Protein Structure Prediction},
author={Akhter, Nasrin and Shehu, Amarda},
booktitle={Intl Conf on Bioinf and Comp Biol (BICoB), Las Vegas, NV},
year={2018}
}

Publisher

Local Copy

citations:

AR:-

C49: Wanli Qiao, Tatiana Maximovap, Xiaowen Fangu, Erion Plaku, and Amarda Shehu*. Reconstructing and Mining Protein Energy Landscape to Understand Disease. IEEE Intl Conf on Bioinf and Biomed (BIBM), Kansas City, MO 2017, pg. 22-27.

Abstract
Many pathogenic mutations percolate to protein dysfunction by altering dynamics. Reconstructing protein energy landscapes promises to relate dynamics to function but is generally infeasible due to the disparate spatio-temporal scales involved. Recent algorithmic innovation allows reconstructing energy landscapes of medium-size proteins in the presence of sufficient prior wet-laboratory structure data. The ability to do so on healthy and pathogenic variants of a protein is renewing the need for landscape analysis and comparison. Here we describe a novel landscape analysis method that detects altered landscape features in response to mutations and allows formulating hypotheses on the impact of mutations on (dys)function. This work opens up interesting avenues into automated analysis and summarization of landscapes.
Bibliography

@inproceedings{QiaoShehu17,
author = {Qiao, W. AND Maximova, T. AND Fang, X. AND Plaku, E. AND Shehu, A.},
title = {Reconstructing and Mining Protein Energy Landscape to Understand Disease},
booktitle = {IEEE Intl Conf on Biomed and Bioinf (BIBM)},
year = {2017},
publisher = {IEEE},
pages = {22-27},
location = {Kansas City, MO}
}

C48: David Morrisg, Tatiana Maximovap, Erion Plaku, and Amarda Shehu*. Out of One, Many: Exploiting Intrinsic Motions to Explore Protein Structure Spaces. IEEE Intl Conf on Comput Adv in Bio and Medical Sciences (ICCABS), Orlando, FL 2017.

Abstract
Reconstructing the energy landscape of a protein holds the key to characterizing its structural dynamics and function [1]. While the disparate spatio-temporal scales spanned by the slow dynamics challenge reconstruction in wet and dry laboratories, computational efforts have had recent success on proteins where a wealth of experimentally-known structures can be exploited to extract modes of motion. In [2], the authors propose the SoPriM method that extracts principle components (PCs) and utilizes them as variables of the structure space of interest. Stochastic optimization is employed to sample the structure space and its associated energy landscape in the defined varible space. We refer to this algorithm as SoPriM-PCA and compare it here to SoPriM-NMA, which investigates whether the landscape can be reconstructed with knowledge of modes of motion (normal modes) extracted from one single known structure. Some representative results are shown in Figure 1, where structures obtained by SoPriM-PCA and those obtained by SoPriM-NMA for the H-Ras enzyme are compared via color-coded projections onto the top two variables utilized by each algorithm. The results show that precious information can be obtained on the energy landscape even when one structural model is available. The presented work opens up interesting venues of research on structure-based inference of dynamics.
Bibliography

@inproceedings{MorrisMaximovaShehuICCABS17,
author = {Morris, D. AND Maximova, T. AND Plaku, E. AND Shehu, A.},
title = {Out of One, Many: Exploiting Intrinsic Motions to Explore Protein Structure Spaces},
booktitle = {Intl Conf on Comput Adv in Bio and Medical Sciences (ICCABS)},
year = {2017},
publisher = {IEEE},
pages = {1-6},
location = {Orlando, FL}
}

C47: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. Evolving Conformation Paths to Model Protein Structural Transitions. Comput Struct Biol Workshop (CSBW) – ACM BCB Workshops, Boston, MA 2017, pg. 673-678.

Abstract
Proteins are dynamic biomolecules. A structure-by-structure characterization of a protein’s transition between two different functional structures is central to elucidating the role of dynamics in modulating protein function and designing therapeutic drugs. Characterizing transitions challenges both dry and wet laboratories. Some computational methods compute discrete representations of the energy landscape that organizes structures of a protein by their potential energies. The representations support queries for paths (series of structures) connecting start and goal structures of interest. Here we address the problem of modeling protein structural transitions under the umbrella of stochastic optimization and propose a novel evolutionary algorithm (EA). The EA evolves paths without reconstructing the energy landscape, addressing two competing optimization objectives, energetic cost and structural resolution. Rather than seek one path, the EA yields an ensemble of paths to represent a transition. Preliminary applications suggest the EA is effective while operating under a reasonable computational budget.
Bibliography

@inproceedings{SapinDeJongShehuCSBW17,
author = {Sapin, E. AND {De Jong}, K. A. AND Shehu, A.},
title = {Evolving Conformation Paths to Model Protein Structural Transitions},
booktitle = {ACM BCB Workshops},
year = {2017},
publisher = {ACM},
pages = {673-678},
location = {Boston, MA}
}

C46: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. Modeling Protein Structural Transitions as a Multiobjective Optimization Problem. IEEE Intl Conf on Comput Intel in Bioinf and Comput Biol (CIBCB), Manchester, UK 2017, pg. 1-8.

Abstract
Proteins of importance to human biology can populate significantly different three-dimensional (3d) structures at equilibrium. By doing so, a protein is able to interface with different molecules in the cell and so modulate its function. A structure-by-structure characterization of a protein’s transition between two structures is central to elucidate the role of structural dynamics in regulating molecular interactions, understand the impact of sequence mutations on function, and design molecular therapeutics. Much wet- and dry-laboratory research is devoted to characterizing structural transitions. Computational approaches rely on constructing a full or partial, structured representation of the energy landscape that organizes structures by potential energy. The representation readily yields one or more paths that consist of series of structures connecting start and goal structures of interest. In this paper, we propose instead to cast the problem of computing transition paths as a multiobjective optimization one. We identify two desired characteristics of computed paths, energetic cost and structural resolution, and propose a novel evolutionary algorithm (EA) to compute low-cost and highresolution paths. The EA evolves paths representing a specific structural excursion without a priori constructing the energy landscape. Preliminary applications suggest the EA is effective while operating under a reasonable computational budget.
Bibliography

@article{SapinDeJongShehuCIBCB7,
author = {Sapin, E. AND {De Jong}, K. A. AND Shehu, A.},
title = {Modeling Protein Structural Transitions as a Multiobjective Optimization Problem},
booktitle = {IEEE Comput Intel Magazine},
year = {2017},
volume = {12},
number = {2},
pages = {8058536},
doi = {10.1109/CIBCB.2017.8058536}
}

C45: Wanli Qiao, Tatiana Maximovap, Erion Plaku, and Amarda Shehu*. Statistical Analysis of Computed Energy Landscapes to Understand Dysfunction in Pathogenic Protein Variants. Comput Struct Biol Workshop (CSBW) – ACM BCB Workshops, Boston, MA 2017.

Abstract
The energy landscape underscores the inherent nature of proteins as dynamic systems interconverting between structures with varying energies. The protein energy landscape contains much of the information needed to characterize protein equilibrium dynamics and relate it to function. It is now possible to reconstruct energy landscapes of medium-size proteins with sufficient prior structure data. These developments turn the focus to tools for analysis and comparison of energy landscapes as a means of formulating hypotheses on the impact of sequence mutations on (dys)function via altered landscape features. We present such a method here and provide a detailed evaluation of its capabilities on an enzyme central to human biology. The work presented here opens up an interesting avenue into automated analysis and summarization of landscapes that yields itself to machine learning approaches at the energy landscape level.
Bibliography

@inproceedings{QiaoMaximovaPlakuShehuCSBW7,
author = {Qiao, W. AND Maximova, T. AND Plaku, E. AND Shehu, A.},
title = {Statistical Analysis of Computed Energy Landscapes to Understand Dysfunction in Pathogenic Protein Variants},
booktitle = {ACM BCB Workshops},
year = {2017},
publisher = {ACM},
pages = {1-6},
location = {Boston, MA}
}

C44: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. An Evolutionary Algorithm to Model Structural Excursions of a Protein. ACM GECCO Workshop, Berlin, Germany 2017, pg. 1669-1673.

Abstract
Excursions of a protein between different structures at equilibrium are key to its ability to modulate its biological function. The energy landscape, which organizes structures available to a protein by their energetics, contains all the information needed to characterize and simulate structural excursions. Computational research aims to uncover such excursions to complement wet-laboratory studies in characterizing protein equilibrium dynamics. Popular strategies adapt the robot motion planning framework and construct full or partial, structured representations of the energy landscape. In this paper, we present a novel, complementary approach based on evolutionary computation. We propose an evolutionary algorithm that evolves path representations of a specific structural excursion without a priori construction of the energy landscape. Preliminary applications on healthy and pathogenic variants of a protein central to human health are promising and warranting further investigation of evolutionary search techniques for modeling protein structural excursions.
Bibliography

@inproceedings{SapinDeJongShehuGECCOW17,
author = {Sapin, E. AND {De Jong}, K. A. AND Shehu, A.},
title = {An Evolutionary Algorithm to Model Structural Excursions of a Protein},
booktitle = {ACM Conf on Genetic and Evolutionary Computation (GECCO) Workshop},
year = {2017},
publisher = {ACM},
pages = {1669-1673},
location = {Berlin, Germany}
}

C43: Tatiana Maximovap, Daniel Carr, Erion Plaku, and Amarda Shehu*. Sample-based Models of Protein Structural Transitions. ACM Conf on Bioinf and Comp Biol (BCB), Seattle, Washington 2016, pg. 128-137.

Abstract
Modeling structural transitions of a protein at equilibrium is central to understanding function modulation but challenging due to the disparate spatio-temporal scales involved. Of particular interest are sampling-based methods that embed sampled structures in discrete, graph-based models of dynamics to answer path queries. These methods have to balance between further exploiting low-energy regions and exploring unpopulated, possibly high-energy regions needed for a transition. We recently presented a strategy that leverages experimentally-known structures to improve sampling. Here we demonstrate how such structures can further be leveraged to improve both exploitation and exploration and obtain paths of very high granularity. We show that such improvement is key to accurate sample-based modeling of structural transitions. We further demonstrate that ranking methods by the best transition cost obtained can be deceptive, as denser sampling, which follows a rugged landscape more faithfully, may result in higher costs. The work presented here improves understanding of the current capabilities and limitations of sampling-based methods. Proposing strategies to address some of these limitations in this paper is a first step towards sampling-based methods becoming reliable tools for modeling protein structural transitions.
Bibliography

@inproceedings{MaximovaShehuBCB16,
author = {Maximova, T. AND Carr, D. AND Plaku, E. AND Shehu, A.},
title = {Sample-based Models of Protein Structural Transitions},
booktitle = {ACM Conf Bioinf and Comput Biol (BCB)},
year = {2016},
pages = {128-137},
publisher = {ACM},
location = {Seattle, WA, USA}
}

C42: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. Path-based Guidance of an Evolutionary Algorithm in Mapping a Fitness Landscape and its Connectivity. ACM GECCO Workshop, Denver, Colorado 2016, pg. 1293-1298.

Abstract
Understanding function regulation in proteins that switch between different structural states at equilibrium requires both finding the basins that correspond to such states and computing the sequence of intermediate structures employed (i.e., the path taken) in basin-to-basin switching. Recent worksuggests that evolutionary strategies can be used to map protein energy landscapes effectively. Further work has shown that the constructed maps can be additionally equipped with connectivity information to help identify basin-switching paths. Here we highlight a potential issue when the problems of mapping and path finding are considered separately. We conduct a simple, proof-of principle study that demonstrates the ability of an EA to allow extracting better paths from an EA-built map when the EA is supplied with the right information. The study is conducted on two key, multi-state proteins of importance to human biology and disease. The results presented here suggest that further research efforts to guide an EA with path-based information are warranted and feasible.
Bibliography

@inproceedings{SapinDeJongShehuGECCOW16,
author = {Sapin, E. AND {De Jong}, K. A. AND Shehu, A.},
title = {Path-based Guidance of an Evolutionary Algorithm in Mapping a Fitness Landscape and its Connectivity},
booktitle = {ACM Conf on Genetic and Evolutionary Computation (GECCO) Workshop},
year = {2016},
publisher = {ACM},
pages = {1293-1298},
location = {Denver, Colorado, USA}
}

C41: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. A Novel EA-based Memetic Approach for Efficiently Mapping Complex Fitness Landscapes. GECCO, Denver, Colorado 2016, pg. 85-92.

Abstract
Recent work in computational structural biology focuses on modeling intrinsically dynamic proteins important to human biology and health. The energy landscapes of these proteins are rich in minima that correspond to alternative structures with which a dynamic protein binds to molecular partners in the cell. On such landscapes, evolutionary algorithms that switch their objective from classic optimization to mapping are more informative of protein structure function relationships. While techniques for mapping energy landscapes have been developed in computational chemistry and physics, protein landscapes are more difficult for mapping due to their high dimensionality and multimodality. In this paper, we describe a memetic evolutionary algorithm that is capable of efficiently mapping complex landscapes. In conjunction with a hall of fame mechanism, the algorithm makes use of a novel, lineage- and neighborhood-aware local search procedure or better exploration and mapping of complex landscapes. We evaluate the algorithm on several benchmark problems and demonstrate the superiority of the novel local search mechanism. In addition, we illustrate its effectiveness in mapping the complex multimodal landscape of an intrinsically dynamic protein important to human health.
Bibliography

@inproceedings{SapinDeJongShehuGECCO16,
author = {Sapin, E. AND {De Jong}, K. A. AND Shehu, A.},
title = {A Novel EA-based Memetic Approach for Efficiently Mapping Complex Fitness Landscapes},
booktitle = {ACM Conf on Genetic and Evolutionary Computation (GECCO)}, year = {2016}, pages = {85-92}, publisher = {ACM}, location = {Denver, Colorado, USA} }

C40: Rohan Pandith and Amarda Shehu*. A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data. Intl Conf on Bioinf and Comp Biol (BICoB), Las Vegas, NV, 2016, pg. 43-48.

Abstract
In this paper we investigate the utility of dimensionality reduction as a tool to analyze and simplify the structure space probed by de novo protein structure prediction methods. We conduct a principled comparative analysis in order to identify which techniques are effective and can be further used in decoy selection. The analysis allows drawing several interesting observations. For instance, many of the reportedly state-ofthe-art non-linear dimensionality reduction techniques fare poorly and are outperformed by linear techniques that tend to have consistent performance across various protein structure data sets. The analysis in this paper is likely to open the way to new techniques that make use of the reduced dimensions to organize protein structure data so as to automatically detect the elusive native structure of a protein. We show some preliminary results in this direction.
Bibliography

@INPROCEEDINGS{PanditShehuBICOB16,
AUTHOR = {R. Pandit AND A. Shehu},
TITLE = {A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data},
BOOKTITLE = {Intl Conf on Bioinf and Comput Biol},
EDITOR = {Ioerger, T. AND Haspel, N.},
YEAR = {2016},
PAGES = {43-48},
PUBLISHER = {ISCA},
LOCATION = {Las Vegas, NV}
}

C39: Tatiana Maximovap, Erion Plaku*, and Amarda Shehu*. Computing Transition Paths in Multiple-Basin Proteins with a Probabilistic Roadmap Algorithm Guided by Structure Data. IEEE Intl Conf on Bioinf and BioMed (BIBM), Washington, D.C. 2015, pg. 35-42.

Abstract
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff12SB force field are used to obtain energetically-credible paths at atomistic detail.
Bibliography

@inproceedings{maximova2015computing,
title={Computing transition paths in multiple-basin proteins with a probabilistic roadmap algorithm guided by structure data},
author={Maximova, Tatiana and Plaku, Erion and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on},
pages={35–42},
year={2015},
organization={IEEE}
}

C38: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. Evolutionary Search Strategies for Efficient Sample-based Representations of Multiple-basin Protein Energy Landscapes. IEEE Intl Conf on Bioinf and BioMed (BIBM), Washington, D.C. 2015, pg. 13-20.

Abstract
Protein function is the result of a complex yet precise relationship between protein structure and dynamics. The ability of a protein to assume different structural states is key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to complement experimental techniques in obtaining a comprehensive and detailed characterization of protein equilibrium dynamics. This is a non-trivial task, as it requires mapping the structure space (and underlying energy landscape) available to a protein under physiological conditions. Existing algorithms invariably adopt a stochastic optimization approach to explore the non-linear and multimodal protein energy landscapes. At the present, such algorithms suffer from limited sampling, particularly in high-dimensional and non-linear variable spaces rich in local minima. In this paper, we equip a recently published evolutionary algorithm with novel evolutionary search strategies to enhance the sampling capability for mapping multi-basin protein energy landscapes. We investigate initialization strategies to delay premature convergence and techniques to maintain and update on-the-fly a sample-based representation that serves as a map of the energy landscape. Applications on three proteins central to human disease show that the novel strategies are effective at locating basins in complex energy landscapes with a practical computational budget.
Bibliography

@inproceedings{sapin2015evolutionary,
title={Evolutionary search strategies for efficient sample-based representations of multiple-basin protein energy landscapes},
author={Sapin, Emmanuel and De Jong, Kenneth A and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on},
pages={13–20},
year={2015},
organization={IEEE} }

C37: Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. Mapping Multiple Minima in Protein Energy Landscapes with Evolutionary Algorithms. ACM GECCO Workshop, Madrid, Spain, 2015, pg. 923-927.

Abstract
Many proteins involved in human proteinopathies exhibit complex energy landscapes with multiple thermodynamically-stable and semi-stable structural states. Landscape reconstruction is crucial to understanding functional modulations, but one is confronted with the multiple minima problem. While traditionally the objective for evolutionary algorithms (EAs) is to find the global minimum, here we present work on an EA that maps the various minima in a protein’s energy landscape. Specifically, we investigate the role of initialization of the initial population in the rate of convergence and solution diversity. Results are presented on two key proteins, H-Ras and SOD1, related to human cancers and familial Amyotrophic lateral sclerosis (ALS).
Bibliography

@inproceedings{sapin2015mapping,
title={Mapping multiple minima in protein energy landscapes with evolutionary algorithms},
author={Sapin, Emmanuel and De Jong, Kenneth and Shehu, Amarda},
booktitle={Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation},
pages={923–927},
year={2015},
organization={ACM} }

C36: Kevin Molloyg and Amarda Shehu*. Interleaving Global and Local Search for Protein Motion Computation. LNCS: Bioinformatics Research and Applications, vol. 9096, pg. 175-186 (Proc. of 11th International Symposium on Bioinformatics Research and Applications — ISBRA), Norfolk, VA, 2015.

Abstract
We propose a novel robotics-inspired algorithm to compute physically-realistic motions connecting thermodynamically-stable and semi-stable structural states in protein molecules. Protein motion computation is a challenging problem due to the high-dimensionality of the search space involved and ruggedness of the potential energy surface underlying the space. To handle the multiple local minima issue, we propose a novel algorithm that is not based on the traditional Molecular Dynamics or Monte Carlo frameworks but instead adapts ideas from robot motion planning. In particular, the algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. We present here promising results on a variety of proteins and demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
Bibliography

@INPROCEEDINGS{MolloyShehuISBRA15,
AUTHOR = {K. Molloy AND A. Shehu},
TITLE = {Interleaving Global and Local Search for Protein Motion Computation},
BOOKTITLE = {LNCS: Bioinformatics Research and Applications},
EDITOR = { R. Harrison AND Y. Li AND I. Mandoiu},
YEAR = {2015},
VOLUME = {9096},
PAGES = {175-186},
PUBLISHER = {Springer International Publishing},
ADDRESS = {Norfolk, VA}
}

C35: Rudy Clauseng, Emmanuel Sapinp, Kenneth A De Jong, and Amarda Shehu*. Evolution Strategies for Exploring Protein Energy Landscapes. GECCO, Madrid, Spain, 2015, pg. 217-224.

Abstract
The focus on important diseases of our time has prompted many experimental labs to resolve and deposit functional structures of disease-causing or disease-participating proteins. At this point, many functional structures of wildtype and disease-involved variants of a protein exist in structural databases. The objective for computational approaches is to employ such information to discover features of the underlying energy landscape on which functional structures reside. Important questions about which subset of structures are most thermodynamically-stable remain unanswered. The challenge is how to transform an essentially discrete problem into one where continuous optimization is suitable and effective. In this paper, we present such a transformation, which allows adapting and applying evolution strategies to explore an underlying continuous variable space and locate the global optimum of a multimodal fitness landscape. The paper presents results on wildtype and mutant sequences of proteins implicated in human disorders, such as cancer and Amyotrophic lateral sclerosis. More generally, the paper offers a methodology for transforming a discrete problem into a continuous optimization one as a way to possibly address outstanding discrete problems in the evolutionary computation community.
Bibliography

@inproceedings{clausen2015evolution,
title={Evolution strategies for exploring protein energy landscapes},
author={Clausen, Rudy and Sapin, Emmanuel and De Jong, Kenneth A and Shehu, Amarda},
booktitle={Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation},
pages={217–224},
year={2015},
organization={ACM} }

C34: Didier Devaurs, Amarda Shehu, Thierry Simeon, and Juan Cortes*. Sampling-based Methods for a Full Characterization of Energy Landscapes of Small Peptides. IEEE Intl Conf on Bioinf and Biomed (BIBM), Belfast, UK, 2014, pg. 37-44.

Abstract
Obtaining accurate representations of energy landscapes of biomolecules such as proteins and peptides is central to structure-function studies. Peptides are particularly interesting, as they exploit structural flexibility to modulate their biological function. Despite their small size, peptide modeling remains challenging due to the complexity of the energy landscape of such highly-flexible dynamic systems. Currently, only sampling-based methods can efficiently explore the conformational space of a peptide. In this paper, we suggest to combine two such methods to obtain a full characterization of energy landscapes of small yet flexible peptides. First, we propose a simplified version of the classical Basin Hopping algorithm to quickly reveal the meta-stable structural states of a peptide and the corresponding low-energy basins in the landscape. Then, we present several variants of a robotics-inspired algorithm, the Transition-based Rapidly-exploring Random Tree, to quickly determine transition state and transition path ensembles, as well as transition probabilities between meta-stable states. We demonstrate this combined approach on the terminally-blocked alanine.
Bibliography

@INPROCEEDINGS{DevaursCortesBIBM14,
AUTHOR = {D. Devaurs AND A. Shehu AND T. Simeon AND J. Cortes},
TITLE = {Sampling-based Methods for a Full Characterization of Energy Landscapes of Small Peptides},
BOOKTITLE = {IEEE Intl Conf on Bioinformatics and Biomedicine (BIBM)},
YEAR = {2014},
PAGES = {37-44},
ADDRESS = {Belfast, UK}
}

C33: Daniel Veltrig, Uday Kamath, and Amarda Shehu*. A Novel Method to Improve Recognition of Antimicrobial Peptides through Distal Sequence-based Features. IEEE Intl Conf on Bioinf and Biomed (BIBM), Belfast, UK, 2014, pg. 371-378 (Best Student Paper Award).

Abstract
Growing bacterial resistance to antibiotics is urging the development of new lines of treatment. The discovery of naturally-occurring antimicrobial peptides (AMPs) is motivating many experimental and computational researchers to pursue AMPs as possible templates. In the experimental community, the focus is generally on systematic point mutation studies to measure the effect on antibacterial activity. In the computational community, the goal is to understand what determines such activity in a machine learning setting. In the latter, it is essential to identify biological signals or features in AMPs that are predictive of antibacterial activity. Construction of effective features has proven challenging. In this paper, we advance research in this direction. We propose a novel method to construct and select complex sequence-based features able to capture information about distal patterns within a peptide. Thorough comparative analysis in this paper indicates that such features compete with the state-of-the-art in AMP recognition while providing transparent summarizations of antibacterial activity at the sequence level. We demonstrate that these features can be combined with additional physicochemical features of interest to a biological researcher to facilitate specific AMP design or modification in the wet laboratory. Code, data, results, and analysis accompanying this paper are publicly available online
Bibliography

@INPROCEEDINGS{VeltriShehuBIBM14,
AUTHOR = {D. Veltri AND U. Kamath AND A. Shehu},
TITLE = {A Novel Method to Improve Recognition of Antimicrobial Peptides through Distal Sequence-based Features},
BOOKTITLE = {IEEE Intl Conf on Bioinformatics and Biomedicine (BIBM)},
YEAR = {2014},
PAGES = {371-378},
ADDRESS = {Belfast, UK}
}

C32: Rudy Clauseng and Amarda Shehu*. A Multiscale Hybrid Evolutionary Algorithm to Obtain Sample-based Representations of Multi-basin Protein Energy Landscapes. ACM Conf on Bioinf and Comp Biol (BCB), Newport Beach, CA, 2014, pg. 269-278.

Abstract
The emerging picture of proteins as dynamic systems switching between structures to modulate function demands a comprehensive structural characterization only possible through an energy landscape treatment. Only sample-based representations of a protein energy landscape are viable in silico, and sampling-based exploration algorithms have to address the fundamental but challenging issue of balancing between exploration (broad view) and exploitation (going deep). We propose here a novel algorithm that achieves this balance by combining concepts from evolutionary computation and protein modeling research. The algorithm draws samples from a reduced space obtained via principal component analysis of known experimental structures. Samples are lifted from the reduced to an all-atom structure space where they are then mapped to nearby local minima in the all-atom energy landscape. From an algorithmic point of view, this paper makes several contributions, including the design of a local selection operator that is crucial to avoiding premature convergence. From an application point of view, this paper demonstrates the utility of the proposed evolutionary algorithm to advance understanding of multi-basin proteins. In particular, the proposed algorithm makes the first steps to answering the question of how sequence mutations affect function in proteins at the center of proteinopathies by providing the energy landscape as the intermediate explanatory link between protein sequence and function.
Bibliography

@inproceedings{clausen2014multiscale,
title={A multiscale hybrid evolutionary algorithm to obtain sample-based representations of multi-basin protein energy landscapes},
author={Clausen, Rudy and Shehu, Amarda},
booktitle={Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics},
pages={269–278},
year={2014},
organization={ACM}
}

C31: Irina Hashmig, Daniel Veltrig, Nadine Kabbani, and Amarda Shehu*. Knowledge-based Search and Multiobjective Filters: Proposed Structural Models of GPCR Dimerization. ACM Conf on Bioinf and Comp Biol (BCB), Newport Beach, CA, 2014, pg. 279-288.

Abstract
Many experimental studies point to the ubiquitous role of protein complexation in the cell while lamenting the lack of structural models to permit structure-function studies. This scarcity is due to persisting challenges in protein-protein docking. Methods based on energetic optimization have to handle vast and high-dimensional configuration spaces and inaccurate energy functions only to arrive at the wrong interface. Methods that employ learned models to replace or precede energetic evaluations are limited by the generality of these models. Computational approaches designed to be general often fail to provide realistic models on protein classes of interest in the wet laboratory. One such class are G protein-coupled receptors, which wet-lab studies suggest undergo complexation, possibly affecting drug efficacy. In this paper, we propose a computational protocol to address the unique challenges posed by these receptors. To deal with challenges, such as receptor size and inaccuracy of energy functions, the protocol takes a geometry-driven approach and integrates in the search geometric constraints posed by the environment where the receptors operate. Various filters are designed to handle the computational cost of energetic evaluation, and analysis techniques based on new scoring strategies, including multi-objective analysis, are employed to reduce the sampled ensemble to a few credible structural models. We demonstrate that dimeric models of the Dopamine D2 receptor targeted to treat psychotic disorders reproduce macroscopic knowledge extracted in the wet-laboratory and can be employed to further spur detailed structure-function studies.
Bibliography

@inproceedings{hashmi2014knowledge,
title={Knowledge-based search and multi-objective filters: proposed structural models of GPCR dimerization},
author={Hashmi, Irina and Veltri, Daniel and Kabbani, Nadine and Shehu, Amarda},
booktitle={Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics},
pages={279–288},
year={2014},
organization={ACM}
}

C30: Kevin Molloyg, Rudy Clauseng, and Amarda Shehu*. On the Stochastic Roadmap to Model Functionally-related Structural Transitions in Wildtype and Variant Proteins. Workshop on Robotics Methods for Structural and Dynamic Modeling of Molecular Systems – Robotics: Science and Systems (RSS) Workshops, Berkeley, CA, 2014, pg. 1-6.

Abstract
Evidence is emerging that the role of protein structure in disease needs to be rethought. While many proteinopathies are caused by sequence mutations removing the ability of a protein to assume a specific structure, some of the most complex human diseases are not so easily explained. Mutations may not invalidate structures populated by the wildtype protein but instead affect the rate at which the protein switches between structures. Modeling structural transitions and estimating transition rates in wildtype and variants is central to a better understanding of the molecular basis of disease. Building on seminal work on the stochastic roadmap simulation framework, this paper investigates an efficient algorithmic realization of this framework to model structural transitions in wildtype and variants of an oncogene. Our results indicate that the algorithm is able to extract useful kinetic information and elucidates the role of structure in how sequence mutations affect protein function.
Bibliography

@inproceedings{molloystochastic,
title={On the stochastic roadmap to model functionally-related structural transitions in wildtype and variant proteins},
author={Molloy, Kevin and Clausen, Rudy and Shehu, Amarda},
booktitle={Robotics: Science and Systems (RSS) Workshop},
pages={1–6} }

C29: Amarda Shehu* and Kenneth A De Jong. Multi-Objective, Off-Lattice, and Multiscale Evolutionary Algorithms for De-novo and Guided Protein Structure Modeling. Workshop on Natural Computing for Protein Structure Prediction – Intl Conf on Parallel Problem Solving from nature (PPSN) Workshops, Ljubljana, Slovenia, 2014.

Abstract

The goal of mapping out the biologically-active structural states of a protein is central to understanding the healthy and diseased cell, but it encompasses many challenging problems for an in-silico treatment. One of these is de novo protein structure prediction (PSP), where a single structure assumed to be representative of the

active state (valid only in single-basin proteins) is sought for a given amino-acid sequence. Until recently, EAs for PSP were outperformed by Monte Carlo-based platforms, such as Rosetta and Quark.

Bibliography

@article{ShehuDeJongPPSN2014,
title={Memetic, Multi-Objective, Off-Lattice, and Multiscale Evolutionary Algorithms for De novo and Guided Protein Structure Modeling},
author={Shehu, Amarda and De Jong, Kenneth A} }

C28: Brian Olsong and Amarda Shehu*. Multi-Objective Optimization Techniques for Conformational Sampling in Template-Free Protein Structure Prediction. Intl Conf on Bioinf and Comp Biol (BICoB), Las Vegas, NV, 2014, pg. 143-148.

Abstract
Template-free protein structure prediction continues to be a challenging problem in computational biology. State-of-the-art protocols are Monte Carlo-based and pay special emphasis on the set of moves and energy guidance. We report here on a complementary platform for decoy sampling that makes use of evolutionary search strategies. We propose that Evolutionary Algorithms (EAs) are effective platforms for the structure prediction problem as an optimization problem, outperforming the Monte Carlo-based sampling of protein conformational space. Moreover, these platforms allow casting the problem as a multi-objective optimization one to deal with the known imperfections in protein energy functions. We compare here different EAs to decoy sampling in the popular Rosetta protocol and show that multi-objective EAs have higher exploration capability and warrant further investigation.
Bibliography

@inproceedings{OlsonShehuBICOB2014,
title={Multi-objective optimization techniques for conformational sampling in template-free protein structure prediction},
author={Olson, Brian and Shehu, Amarda},
booktitle={Intl Conf on Bioinf and Comp Biol (BICoB), Las Vegas, NV},
year={2014}
}

C27: Kevin Molloyg and Amarda Shehu*. Probabilistic Roadmap-based Method to Model Conformational Switching of a Protein Among Many Functionally-relevant Structures. Intl Conf on Bioinf and Comp Biol (BICoB), Las Vegas, NV, 2014, pg. 137-142 (finalist for best paper award).

Abstract
Obtaining a detailed microscopic view of protein transitions among key structural states is central to obtaining a deeper understanding of the relationship between protein dynamics and function. Doing so in the wet laboratory is currently not possible. It is also infeasible to model conformational switching through computational treatments based on Molecular Dynamics, particularly when the objective is expanded to model switching of medium-sized proteins among an arbitrary number of given states. In this paper, we consider this expanded objective and propose a novel probabilistic method to sample conformational paths connecting functionally-relevant structures of a protein. The method achieves this without launching expensive simulations but instead by mapping the connectivity of the conformational space around given thermodynamically-stable and semi-stable structural states. This is achieved through an adaptation of the probabilistic roadmap framework that has been shown successful at planning motions of articulated mechanisms in robotics. Preliminary analysis shows the method is promising and efficient in modeling motions among various states for medium-size proteins.
Bibliography

@inproceedings{MolloyShehuBICOB20014,
title={A probabilistic roadmap-based method to model conformational switching of a protein among many functionally-relevant structures},
author={Molloy, Kevin and Shehu, Amarda},
booktitle={Intl Conf on Bioinf and Comp Biol (BICoB), Las Vegas, NV},
year={2014}
}

C26: Eleni Randou, Daniel Veltrig, and Amarda Shehu*. Binary Response Models for Recognition of Antimicrobial Peptides. ACM Conf on Bioinf and Comp Biol (BCB), Washington, DC, 2013, pg. 76-85.

Abstract
There is now great urgency in developing new antibiotics to combat bacterial resistance. Recent attention has turned to naturally-occurring antimicrobial peptides (AMPs) that can serve as templates for antibacterial drug research. As natural AMPs have a wide range of activity against various bacteria, current research is focusing on modifying existing peptides or designing new ones to increase potency. This paper presents a computational approach to further our understanding of what physicochemical properties or features confer to a peptide antimicrobial activity. One of the contributions of this paper is the ability to rigorously test the relevance of features obtained by biological or computational researchers in the context of AMP recognition. A second contribution is the construction of a predictive model that employs relevant features and their combinations to associate with a novel peptide sequence a probability to have antimicrobial activity. Taken together, the work in this paper seeks to help researchers elucidate features of importance for antimicrobial activity. This is an important first step towards modification or design of novel AMPs for treatment. With this goal in mind, we provide access to the proposed methodology through a web server, which allows users to replicate the findings here or evaluate their own feature set.
Bibliography

@inproceedings{RandouVeltriShehuBCB2013,
title={Binary response models for recognition of antimicrobial peptides},
author={Randou, Elena G and Veltri, Daniel and Shehu, Amarda},
booktitle={Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics},
pages={76},
year={2013},
organization={ACM}
}

C25: Brian Olsong and Amarda Shehu*. Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface. ACM Conf on Bioinf and Comp Biol (BCB), Washington, DC, 2013, pg. 430-439.

Abstract
We present an evolutionary stochastic search algorithm to obtain a discrete representation of the protein energy surface in terms of an ensemble of conformations representing local minima. This objective is of primary importance in protein structure modeling, whether the goal is to obtain a broad view of potentially different structural states thermodynamically available to a protein system or to predict a single representative structure of a unique functional native state. In this paper, we focus on the latter setting, and show how approaches from evolutionary computation for effective stochastic search and multi-objective analysis can be combined to result in protein conformational search algorithms with high exploration capability. From a broad computational perspective, the contributions of this paper are on how to balance global and local search of some high-dimensional search space and how to guide the search in the presence of a noisy, inaccurate scoring function. From an application point of view, the contributions are demonstrated in the domain of template-free protein structure prediction on the primary subtask of sampling diverse low-energy decoy conformations of an amino-acid sequence. Comparison with the approach used for decoy sampling in the popular Rosetta protocol on 20 diverse protein sequences shows that the evolutionary algorithm proposed in this paper is able to access lower-energy regions with similar or better proximity to the known native structure.
Bibliography

@inproceedings{OlsonShehuBCB2013,
title={Multi-objective stochastic search for sampling local minima in the protein energy surface},
author={Olson, Brian and Shehu, Amarda},
booktitle={Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics},
pages={430},
year={2013},
organization={ACM}
}

C24: Rudy Clauseng and Amarda Shehu*. Exploring the Structure Space of Wildtype Ras Guided by Experimental Data. Comput Struct Biol Workshop (CSBW) – ACM BCB Workshops, Washington, DC, 2013, pg. 757-764.

Abstract
The Ras enzyme mediates critical signaling pathways in cell proliferation and development by transitioning between GTP- (active) and GDP-bound (inactive) states. Many cancers are linked to specific Ras mutations affecting its conformational switching between active and inactive states. A detailed understanding of the sequence-structure-function space in Ras is missing. In this paper, we provide the first steps towards such an understanding. We conduct a detailed analysis of X-ray structures of wildtype and mutant variants of Ras. We embed the structures onto a low-dimensional structure space by means of Principal Component Analysis (PCA) and show that these structures are energetically feasible for wildtype Ras. We then propose a probabilistic conformational search algorithm to further populate the structure space of wildtype Ras. The algorithm explores a low-dimensional map as guided by the principal components obtained through PCA. Generated conformations are rebuilt in all-atom detail and energetically refined through Rosetta in order to further populate the structure space of wildtype Ras with energetically-feasible structures. Results show that a variety of novel structures are revealed, some of which reproduce experimental structures not subjected to the PCA but withheld for the purpose of validation. This work is a first step towards a comprehensive characterization of the sequence-structure space in Ras, which promises to reveal novel structures not probed in the wet laboratory, suggest new mutations, propose new binding sites, and even elucidate unknown interacting partners of Ras.
Bibliography

@inproceedings{ClausenShehuACMCSBW2013,
title={Exploring the structure space of wildtype ras guided by experimental data},
author={Clausen, Rudy and Shehu, Amarda},
booktitle={Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics},
pages={756},
year={2013},
organization={ACM}
}

C23: Irina Hashmig and Amarda Shehu*. Informatics-driven Protein-protein Docking. Comput Struct Biol Workshop (CSBW) – ACM BCB Workshops, Washington, DC, 2013, pg. 772-779.

Abstract
Predicting the structure of protein assemblies is fundamental to our ability to understand the molecular basis of biological function. The basic protein-protein docking problem involving two protein units docking onto each-other remains challenging. One direction of research is exploring probabilistic search algorithms with high exploration capability, but these algorithms are limited by errors in current energy functions. A complementary direction is choosing to understand what constitutes true interaction interfaces. In this paper we present a method that combines the two directions and advances research into computationally-efficient yet high-accuracy docking. We present an informatics-driven probabilistic search algorithm for rigid protein-protein docking. The algorithm builds upon the powerful basin hopping framework, which we have shown in many settings in molecular modeling to have high exploration capability. Rather than operate de novo, the algorithm employs information on what constitutes a native interaction interface. A predictive machine learning model is built and trained a priori on known dimeric structures to learn features correlated with a true interface. The model is fast, accurate, and replaces expensive physics-based energy functions in scoring sampled configurations. A sophisticated energy function is used to refine only high-scoring configurations. The result is an ensemble of high-quality decoy configurations that we show here to approach the known native dimeric structure better than other state-of-the-art docking methods. We believe the proposed method advances computationally-efficient high-accuracy docking.
Bibliography

@inproceedings{HashmiShehuACMCSBW2013,
title={Informatics-driven protein-protein docking},
author={Hashmi, Irina and Shehu, Amarda},
booktitle={Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics},
pages={771},
year={2013},
organization={ACM}
}

C22: Brian Olsong and Amarda Shehu*. An Evolutionary Search Algorithm to Guide Stochastic Search for Near-native Protein Conformations with Multiobjective Analysis. Workshop on Arti cial Intelligence and Robotics Meth- ods in Computational Biology – Intl Conf of Association for Advancement of Arti cial Intelligence (AAAI) Workshop, Bellevue, WA, 2013.

Abstract
Predicting native conformations of a protein sequence is known as de novo structure prediction and is a central challenge in computational biology. Most computational protocols employ Monte Carlo sampling. Evolutionary search algorithms have also been proposed to enhance sampling of near-native conformations. These approaches bias stochastic search by an energy function, even though current energy functions are known to be inaccurate and drive sampling to non-native energy minima. This paper proposes a multiobjective approach which employs Pareto dominance, rather than total energy, to evaluate a conformation. This multiobjective approach accounts for the fact that terms in an energy function are conflicting optimization criteria. Our analysis is conducted on a diverse set of 20 proteins. Results show that employing Pareto dominance, rather than total energy, to guide stochastic search is more effective at sampling conformations which are both lower in energy and near the protein native structure.
Bibliography

@inproceedings{OlsonShehuAAAI2013,
title={An evolutionary-inspired algorithm to guide stochastic search for near-native protein conformations with multiobjective analysis},
author={Olson, Brian and Shehu, Amarda},
booktitle={Association for Advancement of Artificial Intelligence Workshops (AAAIW)},
year={2013},
organization={Bellevue}
}

C21: Eleni Randou, Daniel Veltrig, and Amarda Shehu*. Systematic Analysis of Global Features and Model Building for Recognition of Antimicrobial Peptides. IEEE Intl Conf on Comput Adv in Bio and Medical Sciences (ICCABS), New Orleans, LA, 2013.

Abstract
With growing bacterial resistance to antibiotics, it is becoming paramount to seek out new antibacterials. Antimicrobial peptides (AMPs) provide interesting templates for antibacterial drug research. Our understanding of what it is that confers to these peptides their antimicrobial activity is currently poor. Yet, such understanding is the first step towards modification or design of novel AMPs for treatment. Research in machine learning is beginning to focus on recognition of AMPs from non-AMPs as a means of understanding what features confer to an AMP its activity. Methods either seek new features and test them in the context of classification or measure the classification power of features provided by biologists. In this paper, we provide a rigorous evaluation of features provided by a biologist or resulting from a combination of experimental and computational research. We present a statistics-based approach to carefully measure the significance of each feature and use this knowledge to construct predictive models. We present here logistic regression models, which are capable of associating probabilities on whether a peptide is antimicrobial or not with the feature values of the peptide. We provide access to the proposed methodology through a web server. The server allows users to replicate the findings in this paper or evaluate their own features.We believe research in this direction will allow the community to make further progress and elucidate features that capture antimicrobial activity. This is an important first step towards assisting modification and/or de novo design of AMPs in the wet laboratory.
Bibliography

@inproceedings{RandouVeltriShehuICCABS2013,
title={Systematic analysis of global features and model building for recognition of antimicrobial peptides},
author={Randou, Elena G and Veltri, Daniel and Shehu, Amarda},
booktitle={Computational Advances in Bio and Medical Sciences (ICCABS), 2013 IEEE 3rd International Conference on},
pages={1–6},
year={2013},
organization={IEEE} }

C20: Kevin Molloyg, Jennifer Minh Vanu, Daniel Barbara, and Amarda Shehu*. Higher-order Representations for Automated Organization of Protein Structure Space. IEEE Intl Conf on Comput Adv in Bio and Medical Sciences (ICCABS), New Orleans, LA, 2013.

Abstract
Fragment-based representations of protein structure have recently been proposed to identify remote homologs with reasonable accuracy. The representations have also been shown through PCA to elucidate low-dimensional maps of protein structure space. In this work we conduct further analysis of these representations, showing that the low-dimensional maps preserve functional co-localization. Moreover, we employ Latent Dirichlet Allocation to investigate a new, topic-based representation. We show through various techniques adapted from text mining that the topics have unique signatures over structural classes and allow a complementary yet informative organization of protein structure space.
Bibliography

@inproceedings{MolloyShehuICCABS2013,
title={Higher-order representations of protein structure space},
author={Molloy, Kevin and Van, M Jennifer and Barbara, Daniel and Shehu, Amarda},
booktitle={Computational Advances in Bio and Medical Sciences (ICCABS), 2013 IEEE 3rd International Conference on},
pages={1–2},
year={2013},
organization={IEEE} }

C19: Brian Olsong, Kenneth A De Jong, and Amarda Shehu*. Off-Lattice Protein Structure Prediction with Homologous Crossover. Genet and Evol Comp Conf (GECCO), Amsterdam, Netherlands, 2013, pg. 287-294.

Abstract
Ab-initio structure prediction refers to the problem of using only knowledge of the sequence of amino acids in a protein molecule to find spatial arrangements, or conformations, of the amino-acid chain capturing the protein in its biologically-active or native state. This problem is a central challenge in computational biology. It can be posed as an optimization problem, but current top ab-initio protocols employ Monte Carlo sampling rather than evolutionary algorithms (EAs) for conformational search. This paper presents a hybrid EA that incorporates successful strategies used in state-of-the-art ab-initio protocols. Comparison to a top Monte-Carlo-based sampling method shows that the domain-specific enhancements make the proposed hybrid EA competitive. A detailed analysis on the role of crossover operators and a novel implementation of homologous 1-point crossover shows that the use of crossover with mutation is more effective than mutation alone in navigating the protein energy surface.
Bibliography

@inproceedings{OlsonDeJongShehuGECCO2013,
title={Off-lattice protein structure prediction with homologous crossover},
author={Olson, Brian and De Jong, Kenneth and Shehu, Amarda},
booktitle={Proceedings of the 15th annual conference on Genetic and evolutionary computation},
pages={287–294},
year={2013},
organization={ACM} }

C18: Daniel Veltrig and Amarda Shehu*. Physicochemical Determinants of Antimicrobial Activity. Intl Conf on Bioinf and Comput Biol (BICoB), Hawaii, 2013.C18: Daniel Veltrig and Amarda Shehu*. Physicochemical Determinants of Antimicrobial Activity. Intl Conf on Bioinf and Comput Biol (BICoB), Hawaii, 2013.

Abstract
Antimicrobial peptides (AMPs) are lately receiving significant attention as targets for antibacterial drug research. While many machine learning techniques are shown effective for AMP recognition, their utility for the rational design of novel AMP-based drugs in the wet laboratory is questionable. In this paper we seek to elucidate determinants of antimicrobial activity in a well-studied class of AMPs, cathelicidins. We do so by considering an extensive set of physicochemical properties at the residue level as features in the context of SVM-based classification, employing a carefully-constructed decoy dataset. A detailed statistical analysis of feature profiles reveals interesting physicochemical properties to preserve when modifying or designing novel AMPs in the wet laboratory. The method presented here is a first step towards assisting de novo design of AMPs in the wet laboratory.
Bibliography

@inproceedings{VeltriShehuBICOB2013,
title={Physicochemical determinants of antimicrobial activity},
author={Veltri, Daniel and Shehu, Amarda},
booktitle={Intl. Conf. on Bioinf. and Comp. Biol.(BICoB)},
pages={1–6},
year={2013}
}

C17: Kevin Molloyg and Amarda Shehu*. Biased Decoy Sampling to Aid the Selection of Near-Native Protein Conformations. ACM Bioinf and Comp Biol (BCB), Orlando, FL, 2012, pg. 131-138.

Abstract
A central challenge in ab-initio protein structure prediction is the selection of low-resolution decoy conformations whose subsequent refinement leads to high-resolution near-native conformations. Successful selection strategies are tightly coupled with the exploration method employed to obtain decoys. Density-based clustering is often used to identify regions of the energy surface that are highly sampled by exploration trajectories. The trajectories are often numerous and long, because the goal is to obtain both a broad view of the energy surface and to converge to regions that are promising for further refinement. In this paper we separate this into two subgoals. We first investigate a robotics-inspired exploration framework and demonstrate its ability to steer sampling towards diverse decoy conformations. Once a broad view of the energy surface is obtained, Metropolis Monte Carlo trajectories continue the exploration from selected decoys. Density-based clustering then identifies regions where trajectories converge. The two exploration stages both employ molecular fragment replacement but gradually add more detail through different fragment lengths. Results on a diverse list of proteins show that highly-sampled regions contain near-native conformations that are worthy of further refinement for use in a blind prediction setting.
Bibliography

@inproceedings{MolloyShehuBCB2012,
title={Biased decoy sampling to aid the selection of near-native protein conformations},
author={Molloy, Kevin and Shehu, Amarda},
booktitle={Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine},
pages={131–138},
year={2012},
organization={ACM} }

C16: Brian Olsong and Amarda Shehu*. Efficient Basin Hopping in the Protein Energy Surface. IEEE Intl Conference on Bioinformatics and Biomedicine (BIBM), Philadelphia, PA, 2012, pg. 119-124.

Abstract
The vast and rugged protein energy surface can be effectively represented in terms of local minima. The basin-hopping framework, where a structural perturbation is followed by an energy minimization, is particularly suited to obtaining this coarse-grained representation. Basin hopping is effective for small systems both in locating lower-energy minima and obtaining conformations near the native structure. The efficiency decreases for large systems. Our recent work improves efficiency on large systems through molecular fragment replacement. In this paper, we conduct a detailed investigation of two components in basin hopping, perturbation and minimization, and how they work in concert to affect the sampling of near-native local minima. We show that controlling the magnitude of perturbation jumps is related to the ability to effectively steer the exploration towards conformations near the protein native state. In minimization, we show that a simple greedy search is just as effective as Metropolis Monte Carlo-based minimization. Finally, we show that an evolutionary-inspired approach based on the Pareto front is particularly effective in reducing the ensemble of sampled local minima and obtains a simpler representation of the probed energy surface.
Bibliography

@inproceedings{OlsonShehuBIBM2012,
title={Efficient basin hopping in the protein energy surface},
author={Olson, Brian and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on},
pages={1–6},
year={2012},
organization={IEEE}
}

C15: Irina Hashmig and Amarda Shehu*. A Basin Hopping Algorithm for Protein-Protein Docking. IEEE Intl Conference on Bioinformatics and Biomedicine (BIBM), Philadelphia, PA, 2012, pg. 466-469.

Abstract
We present a novel probabilistic search algorithm to efficiently search the structure space of protein dimers. The algorithm is based on the basin hopping framework that repeatedly follows up structural perturbation with energy minimization to obtain a coarse-grained view of the dimeric energy surface in terms of its local minima. A Metropolis criterion biases the search towards lower-energy minima over time. Extensive analysis highlights efficient and effective implementations for the perturbation and minimization components. Testing on a broad list of dimers shows the algorithm recovers the native dimeric configuration with great accuracy and produces many minima near the native configuration. The algorithm can be employed to efficiently produce relevant decoys that can be further refined at greater detail to predict the native configuration.
Bibliography

@inproceedings{HashmiShehuBIBM2012,
title={A basin hopping algorithm for protein-protein docking},
author={Hashmi, Irina and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on},
pages={1–4},
year={2012},
organization={IEEE}
}

C14: Kevin Molloyg and Amarda Shehu*. A Robotics-inspired Method to Sample Conformational Paths Connecting Known Functionally-relevant Structures in Protein Systems. Comput Struct Biol Workshop (CSBW) – IEEE BIBM Workshops, Philadelphia, PA, 2012, pg. 56-63.

Abstract
Characterization of transition trajectories that take a protein between different functional states is an important yet challenging problem in computational biology. Approaches based on Molecular Dynamics can obtain the most detailed and accurate information but at considerable computational cost. To address the cost, sampling-based path planning methods adapted from robotics forego protein dynamics and seek instead conformational paths, operating under the assumption that dynamics can be incorporated later to transform paths to transition trajectories. Existing methods focus either on short peptides or large proteins; on the latter, coarse representations simplify the search space. Here we present a robotics-inspired tree-based method to sample conformational paths that connect known structural states of small- to medium- size proteins. We address the dimensionality of the search space using molecular fragment replacement to efficiently obtain physically-realistic conformations. The method grows a tree in conformational space rooted at a given conformation and biases the growth of the tree to steer it to a given goal conformation. Different bias schemes are investigated for their efficacy. Experiments on proteins up to 214 amino acids long with known functionally-relevant states more than 13ÅA apart show that the method effectively obtains conformational paths connecting significantly different structural states.
Bibliography

@inproceedings{MolloyShehuCSBW2012,
title={A robotics-inspired method to sample conformational paths connecting known functionally-relevant structures in protein systems},
author={Molloy, Kevin and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on},
pages={56–63},
year={2012},
organization={IEEE}
}

C13: Sameh Salehu, Brian Olsong, and Amarda Shehu*. A Population-based Evolutionary Algorithm for Sampling Minima in the Protein Energy Surface. Comput Struct Biol Workshop (CSBW) – IEEE BIBM Workshops, Philadelphia, PA, 2012, pg. 48-55.

Abstract
Obtaining a structural characterization of the biologically active (native) state of a protein is a long standing problem in computational biology. The high dimensionality of the conformational space and ruggedness of the associated energy surface are key challenges to algorithms in search of an ensemble of low-energy decoy conformations relevant for the native state. As the native structure does not often correspond to the global minimum energy, diversity is key. We present a memetic evolutionary algorithm to sample a diverse ensemble of conformations that represent low-energy local minima in the protein energy surface. Conformations in the algorithm are members of an evolving population. The molecular fragment replacement technique is employed to obtain children from parent conformations. A greedy search maps a child conformation to its nearest local minimum. Resulting minima and parent conformations are merged and truncated back to the initial population size based on potential energies. Results show that the additional minimization is key to obtaining a diverse ensemble of decoys, circumvent premature convergence to sub-optimal regions in the conformational space, and approach the native structure with IRMSDs comparable to state-of-the-art decoy sampling methods.
Bibliography

@inproceedings{SalehShehuCSBW2012,
title={A population-based evolutionary algorithm for sampling minima in the protein energy surface},
author={Saleh, Sameh and Olson, Brian and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on},
pages={64–71},
year={2012},
organization={IEEE}
}

C12: Uday Kamathg, Jonathan Kaers, Kenneth A De Jong, and Amarda Shehu*. A Spatial EA Framework for Parallelizing Machine Learning Methods. Intl Conf on Parallel Problem Solving From Nature (PPSN), Taormina, Italy, 2012, LNCS vol. 7491, pg. 206-215.

Abstract
The scalability of machine learning (ML) algorithms has become increasingly important due to the ever increasing size of datasets and increasing complexity of the models induced. Standard approaches for dealing with this issue generally involve developing parallel and distributed versions of the ML algorithms and/or reducing the dataset sizes via sampling techniques. In this paper we describe an alternative approach that combines features of spatially-structured evolutionary algorithms (SSEAs) with the well-known machine learning techniques of ensemble learning and boosting. The result is a powerful and robust framework for parallelizing ML methods in a way that does not require changes to the ML methods. We first describe the framework and illustrate its behavior on a simple synthetic problem, and then evaluate its scalability and robustness using several different ML methods on a set of benchmark problems from the UC Irvine ML database.
Bibliography

@inproceedings{KamathShehuPPSN2012,
title={A spatial EA framework for parallelizing machine learning methods},
author={Kamath, Uday and Kaers, Johan and Shehu, Amarda and De Jong, Kenneth A},
booktitle={International Conference on Parallel Problem Solving from Nature},
pages={206–215},
year={2012},
organization={Springer}
}

C11: Brian Olsong and Amarda Shehu*. Populating Local Minima in the Protein Conformational Space. IEEE Intl Conference on Bioinformatics and Biomedicine (BIBM), Atlanta, GA, 2011, pg. 114-117.

Abstract
Protein Modeling conceptualizes the protein energy landscape as a funnel with the native structure at the low-energy minimum. Current protein structure prediction algorithms seek the global minimum by searching for low- energy conformations in the hope that some of these reside in local minima near the native structure. The search techniques employed, however, fail to explicitly model these local minima. This work proposes a memetic algorithm which combines methods from evolutionary computation with cutting-edge structure prediction protocols. The Protein Local Optima Walk (PLOW) algorithm proposed here explores the space of local minima by explicitly projecting each move in the conformation space to a nearby local minimum. This allows PLOW to jump over local energy barriers and more effectively sample near-native conformations. Analysis across a broad range of proteins shows that PLOW outperforms an MMC-based method and compares favorably against other published ab-inito structure prediction algorithms.
Bibliography

@inproceedings{OlsonShehuBIBM2011,
title={Populating local minima in the protein conformational space},
author={Olson, Brian and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on},
pages={114–117},
year={2011},
organization={IEEE}
}

C10: Brian Olsong, Seyed-Farid Hendig, and Amarda Shehu*. Protein Conformational Search with Geometric Projections. Comput Struct Biol Workshop (CSBW) – IEEE BIBM Workshops, Atlanta, GA, 2011, pg. 366-373.

Abstract
Protein structure prediction remains a central challenge in computational structural biology. Even at the coarse-grained level of detail, the protein conformational space is vast, and available energy functions contain many false local minima. In order to effectively characterize this space, a conformational search must sample a geometrically-diverse set of low-energy conformations. Our recently published FeLTr framework achieves this goal by employing a low-dimensional geometric projection layer to bias conformational sampling towards unexplored regions of the search space. In this work we present a new geometric projection layer based on the effective connectivity measure, which encapsulates interatomic distances within a conformation. Extensive analysis indicates that effective connectivity allows equipping the high-dimensional conformational search with an effective projection layer. On several target proteins, this layer improves significantly over our previous work, resulting in sampling of conformations with significantly lower lRMSDs to the known native structure.
Bibliography

@inproceedings{OlsonShehuCSBW2011,
title={Protein conformational search with geometric projections},
author={Olson, Brian and Hendi, S Farid and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on},
pages={366–373},
year={2011},
organization={IEEE}
}

C9: Bahar Akbal, Irina HashmigAmarda Shehu, and Nurit Haspel*. Refinement of Docked Protein Complex Structures Using Evolutionary Traces. Comput Struct Biol Workshop (CSBW) – IEEE BIBM Workshops, Atlanta, GA, 2011, pg. 400-404.

Abstract
Detection of protein complexes and their structures is crucial for understanding the role of protein complexes in the basic biology of organisms. Computational methods can provide researchers with a good starting point for the analysis of protein complexes. However, computational docking methods are often not accurate and their results need to be further refined to improve interface packing. In this paper, we introduce a novel refinement method that incorporates evolutionary information by employing an energy function containing Evolutionary Trace (ET)-based scoring function, which also takes shape complementarity, electrostatic and Van der Waals interactions into account. We tested our method on docked candidates of three protein complexes produced by a separate docking method. Our results suggest that the energy function can help biasing the results towards complexes with native interactions, filtering out false results. Our refinement method is able to produce structures with better RMSDs with respect to the known complexes and lower energies than those initial docked structures.
Bibliography

@inproceedings{AkbalHashmiShehuHaspelCSBW2011,
title={Refinement of docked protein complex structures using evolutionary traces},
author={Akbal-Delibas, Bahar and Hashmi, Irina and Shehu, Amarda and Haspel, Nurit},
booktitle={Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on},
pages={400–404},
year={2011},
organization={IEEE}
}

C8: Irina Hashmig, Bahar Akbal, Nurit Haspel, and Amarda Shehu*. Protein Docking with Information on Evolutionary Conserved Interfaces. Comput Struct Biol Workshop (CSBW) – IEEE BIBM Workshops, Atlanta, GA, 2011, pg. 358-365.

Abstract
Structural modeling of molecular assemblies lies at the heart of understanding molecular interactions and biological function. We present a method for docking protein molecules and elucidating native-like structures of protein dimers. Our method is based on geometric hashing to ensure the feasibility of searching the combined conformational space of dimeric structures. The search space is narrowed by focusing the sought rigid-body transformations around surface areas with evolutionary-conserved amino-acids. Recent analysis of protein assemblies reveals that many functional interfaces are significantly conserved throughout evolution. We test our method on a broad list of sixteen diverse protein dimers and compare the structures found to have lowest lRMSD to the known native dimeric structures to those reported by other groups. Our results show that focusing the search around evolutionary-conserved interfaces results in lower lRMSDs.
Bibliography

@inproceedings{HashmiShehuCSBW2011,
title={Protein docking with information on evolutionary conserved interfaces},
author={Hashmi, Irina and Akbal-Delibas, Bahar and Haspel, Nurit and Shehu, Amarda},
booktitle={Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on},
pages={358–365},
year={2011},
organization={IEEE}
}

C7: Uday Kamathg, Kenneth A De Jong, and Amarda Shehu*. An Evolutionary-based Approach for Feature Generation: Eukaryotic Promoter Recognition. IEEE Congress on Evolutionary Computation (CEC), New Orleans, LA, 2011, pg. 277-284.

Abstract
Prediction of promoter regions continues to be a challenging subproblem in mapping out eukaryotic DNA. While this task is key to understanding the regulation of differential transcription, the gene-specific architecture of promoter sequences does not readily lend itself to general strategies. To date, the best approaches are based on Support Vector Machines (SVMs) that employ standard “spectrum” features and achieve promoter region classification accuracies from a low of 84% to a high of 94% depending on the particular species involved. In this paper, we propose a general and powerful methodology that uses Genetic Programming (GP) techniques to generate more complex and more gene-specific features to be used with a standard SVM for promoter region identification. We evaluate our methodology on three data sets from different species and observe consistent classification accuracies in the 94 95% range. In addition, because the GP-generated features are gene-specific, they can be used by biologists to advance their understanding of the architecture of eukaryotic promoter regions.
Bibliography

@inproceedings{KamathDeJongShehuCEC2011,
title={An evolutionary-based approach for feature generation: Eukaryotic promoter recognition},
author={Kamath, Uday and De Jong, Kenneth A and Shehu, Amarda},
booktitle={Evolutionary Computation (CEC), 2011 IEEE Congress on},
pages={277–284},
year={2011},
organization={IEEE}
}

C6: Brian Olsong, Kevin Molloyg, and Amarda Shehu*. Enhancing Sampling of the Conformational Space Near the Protein Native State. Intl Conference on Bio-inspired Models of Network, Information, and Computing Systems (BIONETICS), Boston, MA, 2010, LNICST (Springer), vol. 87, pg. 249-263 (best student paper award).

Abstract
A protein molecule assumes specific conformations under native conditions to fit and interact with other molecules. Due to the role that three-dimensional structure plays in protein function, significant efforts are devoted to elucidating native conformations. Many search algorithms are proposed to navigate the high-dimensional protein conformational space and its underlying energy surface in search of low-energy conformations that comprise the native state. In this work, we identify two strategies to enhance the sampling of native conformations. We show that employing an enhanced fragment library with greater structural diversity to assemble low-energy conformations allows sampling more native conformations. To efficiently handle the ensuing vast conformational space, only a representative subset of the sampled conformations are maintained and employed to further guide the search for native conformations. Our results show that these two strategies greatly enhance the sampling of the conformational space near the native state.
Bibliography

@inproceedings{OlsonMolloyShehuBIONETICS2010,
title={Enhancing sampling of the conformational space near the protein native state},
author={Olson, Brian and Molloy, Kevin and Shehu, Amarda},
booktitle={International Conference on Bio-Inspired Models of Network, Information, and Computing Systems},
pages={249–263},
year={2010},
organization={Springer}
}

C5: Uday KamathgAmarda Shehu*, and Kenneth A De Jong*. Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences. Intl Conference on Bio-inspired Models of Network, Information, and Computing Systems (BIONETICS), Boston, MA, 2010, LNICST (Springer), vol. 87, pg. 213-238.

Abstract
The annotation of DNA regions that regulate gene transcription is the first step towards understanding phenotypical differences among cells and many diseases. Hypersensitive (HS) sites are reliable markers of regulatory regions. Mapping HS sites is the focus of many statistical learning techniques that employ Support Vector Machines (SVM) to classify a DNA sequence as HS or non-HS. The contribution of this paper is a novel methodology inspired by biological evolution to automate the basic steps in SVM and improve classification accuracy. First, an evolutionary algorithm designs optimal sequence motifs used to associate feature vectors with the input sequences. Second, a genetic programming algorithm designs optimal kernel functions that map the feature vectors into a high-dimensional space where the vectors can be optimally separated into the HS and non-HS classes. Results show that the employment of evolutionary computation techniques improves classification accuracy and promises to automate the analysis of biological sequences.
Bibliography

@inproceedings{KamathShehuDeJongBIONETICS2010,
title={Feature and kernel evolution for recognition of hypersensitive sites in DNA sequences},
author={Kamath, Uday and Shehu, Amarda and De Jong, Kenneth A},
booktitle={International Conference on Bio-Inspired Models of Network, Information, and Computing Systems},
pages={213–228},
year={2010},
organization={Springer} }

C4: Uday KamathgAmarda Shehu*, and Kenneth A De Jong*. Using Evolutionary Computation to Improve SVM Classification. IEEE World Congress on Computational Intelligence (WCCI), Barcelona, Spain, 2010.

Abstract
Support vector machines (SVMs) are now one of the most popular machine learning techniques for solving difficult classification problems. Their effectiveness depends on two critical design decisions: 1) mapping a decision problem into an n-dimensional feature space, and 2) choosing a kernel function that maps the n-dimensional feature space into a higher dimensional and more effective classification space. The choice of kernel functions is generally limited to a small set of well-studied candidates. However, the choice of a feature set is much more open-ended without much design guidance. In fact, many SVMs are designed with standard generic feature space mappings embedded a priori. In this paper we describe a procedure for using an evolutionary algorithm to design more compact non-standard feature mappings that, for a fixed kernel function, significantly improves the classification accuracy of the constructed SVM.
Bibliography

@inproceedings{KamathShehuDeJongWCCI2010,
title={Using evolutionary computation to improve svm classification},
author={Kamath, Uday and Shehu, Amarda and De Jong, Kenneth},
booktitle={Evolutionary Computation (CEC), 2010 IEEE Congress on},
pages={1–8},
year={2010},
organization={IEEE}
}

C3: Uday Kamathg, Kenneth A De Jong*, and Amarda Shehu*. Selecting Predictive Features for Recognition of Hypersensi- tive Sites of Regulatory Genomic Sequences with an Evolutionary Algorithm. Genet and Evol Comp Conf (GECCO), Portland, Oregon, 2010, pg. 179-186.

Abstract
This paper proposes a method to improve the recognition of regulatory genomic sequences. Annotating sequences that regulate gene transcription is an emerging challenge in genomics research. Identifying regulatory sequences promises to reveal underlying reasons for phenotypic differences among cells and for diseases associated with pathologies in protein expression. Computational approaches have been limited by the scarcity of experimentally-known features specific to regulatory sequences. High-throughput experimental technology is finally revealing a wealth of hypersensitive (HS) sequences that are reliable markers of regulatory sequences and currently the focus of classification methods. The contribution of this paper is a novel method that combines evolutionary computation and SVM classification to improve the recognition of HS sequences. Based on experimental evidence that HS regions employ sequence features to interact with enzymes, the method seeks motifs to discriminate between HS and non-HS sequences. An evolutionary algorithm (EA) searches the space of sequences of different lengths to obtain such motifs. Experiments reveal that these motifs improve recognition of HS sequences by more than 10% compared to state-of-the-art classification methods. Analysis of these motifs reveals interesting insight into features employed by regulatory sequences to interact with DNA-binding enzymes.
Bibliography

@inproceedings{KamathDeJongShehuGECCO2010,
title={Selecting predictive features for recognition of hypersensitive sites of regulatory genomic sequences with an evolutionary algorithm},
author={Kamath, Uday and De Jong, Kenneth A and Shehu, Amarda},
booktitle={Proceedings of the 12th annual conference on Genetic and evolutionary computation},
pages={179–186},
year={2010},
organization={ACM}
}

C2: SM Richardson, Brian Olsong, JS Dymond, S Burns, S Chandrasegaran, Jeff D Boeke, Amarda Shehu, and Joel S Bader*. Automated Design of Assemblable, Modular, Synthetic Chromosomes. Lecture Notes in Computer Science, Parallel Processing and Applied Mathematics (PPAM), Poland, 2009, vol. 6068, pg. 280-289.

Abstract
The goal of the Saccharomyces cerevisiae v2.0 project is the complete synthesis of a re-designed genome for baker’s yeast. The resulting organism will permit systematic studies of eukaryotic chromosome structure that have been impossible to explore with traditional gene-at-a-time experiments. The efficiency of chemical synthesis of DNA does not yet permit direct synthesis of an entire chromosome, although it is now feasible to synthesize multi-kilobase pieces of DNA that can be combined into larger molecules. Designing a chromosome-sized sequence that can be assembled from smaller pieces has to date been accomplished by biological experts in a laborious and error-prone fashion. Here we pose DNA design as an optimization problem and obtain optimal solutions with a parallelizable dynamic programming algorithm.
Bibliography

@inproceedings{RichardsonShehuBaderPPAM2009,
title={Automated design of assemblable, modular, synthetic chromosomes},
author={Richardson, Sarah M and Olson, Brian S and Dymond, Jessica S and Burns, Randal and Chandrasegaran, Srinivasan and Boeke, Jef D and Shehu, Amarda and Bader, Joel S},
booktitle={International Conference on Parallel Processing and Applied Mathematics},
pages={280–289},
year={2009},
organization={Springer}
}

C1: Amarda Shehu*. An Ab-initio Tree-based Exploration to Enhance Sampling of Low-energy Protein Conformations. Robotics: Science and Systems (RSS), Seattle, WA, 2009, pg. 31-39.

Abstract
This paper proposes a robotics-inspired method to enhance sampling of native-like protein conformations when employing only amino-acid sequence. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and the rugged energy surface of the protein conformational space. The contribution of this work is a novel two-layered method to enhance the sampling of geometrically-distinct low-energy conformations at a coarse-grained level of detail. The method grows a tree in conformational space reconciling two goals: (i) guiding the tree towards lower energies and (ii) not over-sampling geometrically-similar conformations. Discretizations of the energy surface and a low-dimensional projection space are employed to select more often for expansion low-energy conformations in under-explored regions of the conformational space. The tree is expanded with low energy conformations through a Metropolis Monte Carlo framework that uses a move set of physical fragment configurations. Testing on sequences of seven small-to medium structurally-diverse proteins shows that the method rapidly samples native-like conformations in a few hours on a single CPU. Analysis shows that computed conformations are good candidates for further detailed energetic refinements by larger studies in protein engineering and design.
Bibliography

@inproceedings{ShehuRSS2009,
title={An Ab-initio tree-based exploration to enhance sampling of low-energy protein conformations.},
author={Shehu, Amarda},
booktitle={Robotics: Science and Systems},
pages={241–248},
year={2009},
organization={Seattle, Wash, USA}
}