Participants: Alexander, Allen, Cho, Furdui, John, l. Miller, Muday, Norris, Pease, Poole, Salsbury, Turkett, Zhang
Questions: This group brings together biologists and biochemists who generate original data sets and computer scientists and mathematicians who build models to find patterns and relationships in those data sets. The current foci of this group are a) computational modeling of networks of molecules that facilitate communication within and between cells and b) motif analysis of DNA and proteins sequences. All work involves both the design of algorithms for and application of algorithms to biological data sets.
Work in network modeling (Jim Norris, David John, Gloria Muday) focuses on creating gene & protein interaction models from gene transcript abundance data by integrating Bayesian-based statistics with algorithmic techniques to produce specialized machine learning algorithms. The algorithms are based upon the examination and manipulation of directed acyclic graphs (DAG). The Norris-Patton likelihood computes the likelihood of transcript abundance data (possibly multiple sets) given a DAG, f(R1,…,Rk| DAG). This likelihood can be considered in a variety of situations (hierarchical, independent) using a number of analysis paradigms (cotemporal, next state one step, next state one-and-two steps). The team has integrated the Norris-Patton likelihood in specialized forms of the Metropolis-Hasting Algorithm and genetic algorithms. Research continues to improve these approaches both in the accuracy of the models and the ability to scale up to larger numbers of genes. Work in motif analysis (William Turkett, Leslie Poole, Gloria Muday, Kim Nelson) focuses on algorithms to accurately and efficiently find sequence motifs. This work includes the development of an efficient enumerative algorithm to search short sequence regions around post-translational modification sites, as well as the development of k-mer-based algorithms to discover motifs that distinguish between protein family subgroups.
Previous work developed out of this group includes methods for employing robust clustering on replicate microarray data sets and methods for searching across known network models to find gene/protein subnetworks exhibiting significant changes in gene expression.
Technology: A number of software tools have been developed, many as the result of Masters thesis work of graduate students in the Departments of Mathematics and Computer Science. Algorithms are developed and employed on high-performance computing resources available from the Department of Mathematics, the Department of Computer Science, and the Wake Forest University DEAC cluster.
Emphasis Group Activities: Evolving from a long-running series of Computational Modeling of Signaling Networks weekly group meetings that proved to be a fruitful forum for collaboration and communication, this group now hosts bi-weekly meetings. Activities at these meetings include group member research updates, readings of relevant papers, and peer review of preliminary ideas and data. In addition, the group works to promote educational activities at the University related to bioinformatics and computational biology, as well as facilitate guest speakers through departmental seminar series. Many members of this group also participate in the Structural and Computational Biophysics monthly seminars. Activities of this subgroup are currently organized by William Turkett.
Implications: Understanding cell signaling networks has been characterized as one of the top unanswered questions in modern science, with high throughput data being generated, but limited proven techniques for organizing this data. Modeling algorithms can suggest pathways that could eventually provide targets for perturbations and interventions, resulting in outcomes including better crop yield and improved human health. The results of motif analysis will improve our understanding of the protein functional landscape.