bionet.biology.computational
newsgroup:Estimating Large Scale Phylogenies: Biological, Statistical, and Algorithmic Problems
SPONSORS: DIMACS and University of Pennsilvania Program in Computational BiologyPAPER SUBMISSION DEADLINE:
April 15, 1998. Please submit papers by mail
or email (
Junhyong Kim Dept. of Biology Yale University 165 Prospect st. New Haven, CT 06511 (203)-432-9917 (203)-432-3854 (fax) [email] Co-organizers: Junhyong Kim, Tandy Warnow, and Ken Rice
Biological organization is fundamentally based on an evolutionary history of bifurcating descent-with-modification. Phylogenetic estimation is the inference of this genealogical history from present day data. Phylogenetic trees, the graph representation of the genealogical history, play a central role in evolutionary biology and phylogenetic estimation techniques are being applied to a wide variety of computational biology problems.
The size of a phylogenetic estimation problem is measured by the number of taxa and the number of characters. Until recently, computational and data limitations kept most phylogenetic estimation problems to small numbers of taxa. But, the availability of computational resources and the influx of large molecular data sets are enabling researchers to tackle increasingly larger problems, and the analysis of large-scale data sets is rapidly becoming a central problem in phylogenetic biology.
Recent experimental evidence has established the existence of large trees that can be estimated accurately as well as those that are difficult to accurately estimate with reasonable numbers of characters. Some of these examples have suggested that taxon sampling (increasing the size of the estimation problem through the addition of taxa rather than characters) might lead to more easily estimated trees. Conversely, it has been argued that big trees are hard to infer for a variety of reasons: NP-hardness of the optimization problems, properties of the search space, inadequacy of the heuristics, and even possible inadequacy of the optimization criteria. Unfortunately, very little actual evidence is available to support any conjectures about how the performance of estimators scale with respect to the size of the phylogenetic problem. In addition, the question of scaling is itself confused by poorly delineated notions. For example, the size of the tree also involves the maximum amount of divergence (not only the number of taxa and characters) and measures of estimator performance have also not been standardly agreed upon.
The goal of this symposium is to precisely identify the key problems with respect to how the performance of phylogenetic estimators scale as with the size of the problem, and gather experimental and theoretical results addressing this problem.
The symposium will consist of four topic sessions with paper presentations followed by a panel discussion of invited experts. The four topics and some of the questions to be addressed in each session are: