Gene Finding With Hidden Markov ModelsSeminar by Marina Alexandersson
[email]

Notes taken for the Monash CSSE Bioinformatics group by L.A. M.A. didn't give any actual % success rates etc. Did not seem to use any numerical measure of model complexity. HMM example based on two (the hidden variable) dice. Intro' to genes, exons, introns [here]. 
Splice Site Prediction (for intron editing out)
Exon Length:

Positions considered independent  i.e. a "profile" or "block". You might use mixtures of geometric distributions to flatten out a distribution, but they just don't do peaked distributions. Pity  because geometric d's give linear log (cost), which has some algorithmic advantages in DPAs.( 
Generalised Hidden Markov Models (GPHMM)

I'm not sure why this was called "generalised". Interesting to compare architecture with Glimmer / GlimmerM etc. 
Pair Hidden Markov Models (PHMM) i.e. alignment< > X  <>   begin > M > end    <> > Y < M  match 
Of course, 3states for linear gap costs.
It looked better in powerpoint than ascii art,
but was topologically similar
to the 3state mutation and generation machines L.Allison, C.S.Wallace & C.N.Yee, FiniteState Models in the Alignment of MacroMolecules, J.Molec.Evol. 35(1) 7789, 1992. 
Algorithms
 
Viterbi on HalfPhat 
Showed lattice of states and finding most probably path. 
Generalised Pair Hidden Markov Models (GPHMM)? ~ product of Half_Phat x alignment model, too big to draw. DoublePhat  two sequence alignment under a gene model.Model Time Space HMM N^{2}.T NT PHMM N^{2}.T.U N.T.U (2 sequences) GHMM D^{2}.N^{2}.T NT GPHMM D^{4}.N^{2}.T.U N.T.U (2 sequences) where N = # of states, D = max exon length, T=seq1, U=seq2. Speed up for GPHMM  get quick alignment, put window around it, work in that area. 
Seemed to be related to direct product of
the sequence (gene) machine (model) and
the alignment machine (model).
Interesting to c.f. with: 
Could enrich the upstream model, i.e. do something with promoters.
Acknowledged: Simon Cawley, Lior Pachter, Terry Speed 
Nice talk. 