[MolBio]
From the bionet.biology.computational newsgroup:

Symposium on Evolutionary Trees

SYMPOSIUM ANNOUNCEMENT AND CALL FOR PAPERS

Estimating Large Scale Phylogenies: Biological, Statistical, and Algorithmic Problems

SPONSORS: DIMACS and University of Pennsilvania Program in Computational Biology
LOCATION: Princeton University
DATE: June 26-28, 1998
FORMAT: Paper presentations and posters. All papers for oral presentation must be submitted in full and they will be peer reviewed.

PAPER SUBMISSION DEADLINE: April 15, 1998. Please submit papers by mail or email (ps file/MS Word file only) to:

Junhyong Kim
Dept. of Biology
Yale University
165 Prospect st.
New Haven, CT 06511
(203)-432-9917
(203)-432-3854 (fax)
[email]

Co-organizers: Junhyong Kim, Tandy Warnow, and Ken Rice

INTRODUCTION

Biological organization is fundamentally based on an evolutionary history of bifurcating descent-with-modification. Phylogenetic estimation is the inference of this genealogical history from present day data. Phylogenetic trees, the graph representation of the genealogical history, play a central role in evolutionary biology and phylogenetic estimation techniques are being applied to a wide variety of computational biology problems.

The size of a phylogenetic estimation problem is measured by the number of taxa and the number of characters. Until recently, computational and data limitations kept most phylogenetic estimation problems to small numbers of taxa. But, the availability of computational resources and the influx of large molecular data sets are enabling researchers to tackle increasingly larger problems, and the analysis of large-scale data sets is rapidly becoming a central problem in phylogenetic biology.

Recent experimental evidence has established the existence of large trees that can be estimated accurately as well as those that are difficult to accurately estimate with reasonable numbers of characters. Some of these examples have suggested that taxon sampling (increasing the size of the estimation problem through the addition of taxa rather than characters) might lead to more easily estimated trees. Conversely, it has been argued that big trees are hard to infer for a variety of reasons: NP-hardness of the optimization problems, properties of the search space, inadequacy of the heuristics, and even possible inadequacy of the optimization criteria. Unfortunately, very little actual evidence is available to support any conjectures about how the performance of estimators scale with respect to the size of the phylogenetic problem. In addition, the question of scaling is itself confused by poorly delineated notions. For example, the size of the tree also involves the maximum amount of divergence (not only the number of taxa and characters) and measures of estimator performance have also not been standardly agreed upon.

The goal of this symposium is to precisely identify the key problems with respect to how the performance of phylogenetic estimators scale as with the size of the problem, and gather experimental and theoretical results addressing this problem.

FORMAT

The symposium will consist of four topic sessions with paper presentations followed by a panel discussion of invited experts. The four topics and some of the questions to be addressed in each session are:

Biological problems

1. What are the limits to sampling characters and taxa?
2. What are examples of very difficult problems?
3. What are the reasonable models of character evolution and tree shape?
4. What are the most important problems in systematics?
5. What can we say about evolutionary history from data other than rows and columns of homologous characters?

Empirical results

1. What do simulation studies tell us about performance of different methods and how they scale with the size of the problem?
2. What properties of the tree models affect accuracy and how do those properties scale?
3. Are there any methodological biases?
4. What can we say about performance under more realistic models of sequence evolution from the existing studies?
5. Is there a need to standardize experimental studies, perhaps through the establishment of a testbed of different model trees, methods, etc?

Algorithmic problems

1. What is the relationship between standard optimization problems (distance-based criteria, parsimony, etc) and estimating the topology of evolutionary trees? Which of the standard optimization criteria are best suited to obtaining highly accurate topology estimations, given bounds on the available sequence length?
2. How much of the difficulty is due to inadequate solution to the right NP-hard optimization problems?
3. Are there new optimization problems or approaches (not necessarily linked to optimization criteria) that are promising?
4. How good are the existing heuristics for solving the relevent optimization problems, and what new approaches might give better results on important optimization problems?
5. How should we evaluate performance of algorithms?
6. Are there ``algorithms engineering" issues which will make these methods less powerful, and how do we handle them?
7. Is it possible to design methods which can efficiently characterize all optimal and near-optimal trees, rather than just a single optima?

Statistical problems

1. What bounds can we obtain on the convergence rate of different methods?
2. How do various statistical properties of different methods scale with the size of the problem?
3. What is the relationship between estimating the whole tree versus some subset of the tree?
4. What is the distribution of specific tree characteristics such as smallest edge length, smallest diameter for quartet covering, steminess, etc. with respect to tree model sampling distribution?
5. Can we obtain accuracy bounded estimates (sacrificing resolution)?