MML, MDL, Minimum Encoding Length Inference

  • Scholarship available: New 2012 PhD scholarship on MML (computer science, machine learning, statistics): ``Rating and ranking sports players and teams using Minimum Message Length'' (and here) [and here, and here, and here], supervised by David Dowe.

    Welcome to David Dowe's minimum encoding length inference page. This page discusses primarily the information-theoretic inference methods of: Minimum Description Length (MDL) (Rissanen, 1978; Rissanen, 1989; Rissanen, 1999) and the earlier Minimum Message Length (MML) (Wallace and Boulton, Comp. J. 1968; Wallace and Freeman, 1987; Wallace and Dowe, 1999a; Wallace, 2005). Related earlier works were Solomonoff, 1964; Kolmogorov, 1965; and Chaitin, 1966.

    Minimum Message Length (MML) is an information-theoretic Bayesian principle of inductive inference. It was first published in C. S. Wallace and D.M. Boulton, Comp. J., 1968.
    The first MML papers were C. S. Wallace and D.M. Boulton, Comp. J., 1968; D.M. Boulton and C. S. Wallace, 1969; D.M. Boulton and C. S. Wallace, 1970; D.M. Boulton and C. S. Wallace, 1973; D.M. Boulton and C. S. Wallace, 1975; C. S. Wallace and D.M. Boulton, 1975; and David M. Boulton's 1975 Ph.D. thesis.


    [See also Ray Solomonoff (1926-2009) 85th memorial conference (Wedn 30 Nov - Fri 2 Dec 2011), 3rd Call for Papers, 2nd Call for Papers, 1st Call for Papers, Invited speakers, accepted papers.]


    Pieces of literature I particularly recommend include:
  • Wallace, C.S. (2005) [posthumous], Statistical and Inductive Inference by Minimum Message Length, Springer (Series: Information Science and Statistics), 2005, XVI, 432 pp., 22 illus., Hardcover, ISBN: 0-387-23795-X [table of contents, chapter headings and more]; and
  • D. L. Dowe, "Foreword re C. S. Wallace", Computer Journal, Vol. 51, No. 5 (Sept. 2008) [Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560; and
  • C.S. Wallace and D. L. Dowe, "Minimum Message Length and Kolmogorov complexity", Comp. J., Vol. 42, No. 4 (1999a), pp270-283 [which is also Chris Wallace's most cited work which is co-authored by a still active MML researcher] and also other articles in that special issue of the Comp. J. on Kolmogorov complexity; and
  • Comley, Joshua W. and D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P. Gru:nwald, I. J. Myung and M. A. Pitt (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press (MIT Press), April 2005, ISBN 0-262-07262-9 [Final camera ready copy was submitted in October 2003] [pp265-284 and pp285-294; p265, p266, p268, p268, p269, p270, p271, p272, p273, p274, p275, p276, p278, p278, p279, p280, p281, p282, p283, p284, p285, p286, p288, p288, p289, p290, p291, p292, p293, p294] (and other chapters in that book); and
  • Dowe, D.L., S. Gardner and G. Oppy (December 2007). "Bayes Not Bust! Why Simplicity is no problem for Bayesians", British Journal for the Philosophy of Science (BJPS): Abstract, full text, .pdf; doi:10.1093/bjps/axm033 . [accepted, 2006, earlier, near-final version of the paper. See also here.]; and
  • D. L. Dowe (2011 [was 2010]), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, Elsevier [ISBN: 978-0-444-51862-0 {ISBN 10: 0-444-51542-9 / ISBN 13: 978-0-444-51862-0}], pp901-982.

    Below are some frequently asked questions (FAQs):
    What is the difference between MML and MDL?
    What are the differences between MML and MDL?
    What is the similarity between MML and MDL?
    What are the similarities between MML and MDL?
    Can you compare MML and MDL, and give a comparison between MML and MDL?
    ``Answers'': Have a look at \cite[sec. 10.2]{Wallace2005} and
    \cite[sec. 11.4.3, pp272-273]{ComleyDowe2005} and
    \cite[sec. 6.7]{Dowe2010} and the references within these, and
    also have a look at Comp. J., Vol 42, No. 4, Wallace and Dowe (1999a), and anything else you can find.

    There are also links below to related topics and to people doing related research. Please e-mail me if you'd like to be included.


    MDL, MML and algorithmic information theory links
    Computer Journal special issue on Kolmogorov complexity and algorithmic information theory, Volume 42, Issue 4: 1999. This issue features contributions by many authors - including articles by Rissanen, by Solomonoff, by Wallace and Dowe, and by others; and responses and rejoinders on MDL and MML by Rissanen and by Wallace and Dowe respectively.

    Calendar of Machine Learning, RUUG, CSSE, Monash Univ. MML talks and
    CSE455 Minimum Message Length (formerly Learning and Prediction II: MML Data Mining [formerly before that CSC423 Learning and Prediction course]), CSSE, Monash Univ.
    Information, Statistics and Induction in Science (ISIS) conference, Aug. 1996.
    "Minimum Message Length and Kolmogorov complexity", Comp. J., Vol 42, No. 4 (1999a), pp270-283, by C. Wallace and D. Dowe [which is the Computer Journal's most downloaded ``full text as .pdf'' article - see, e.g., here].
    "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", by J. W. Comley and D.L. Dowe; Chapter 11 (pp265-294) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9. {This is about Generalised Bayesian nets (or even the special case of hybrid Bayesian nets), generalising MML Bayesian nets or MML Bayesian networks or MML Bayes nets; and it deals with a mix of both continuous and discrete variables. (See also Comley and Dowe (2003), .pdf.)}


    Related links and, below, People
    Artificial Intelligence (AI) on the Web: Machine Learning and Artificial Intelligence Resources.
    Bayesian Net Repository - see also MML Bayes Nets: Comley & Dowe (2005) and (2003, .pdf).
    Bayesian Statisticians worldwide (was here and was previously here), a link re Rev. Thomas Bayes, a link to a small drawing of Rev. Thomas Bayes and (his posthumously published) "An Essay towards solving a Problem in the Doctrine of Chances". MML uses Bayesian information theory.
    Bayesianiam, compression and intelligence [Dowe and Hajek (1997, 1998)].
    Boosting Research Site: boosting.org.
    Clustering, mixture modelling (or finite mixture modeling) and/or unsupervised learning using MML: Snob - see, e.g., Wallace and Dowe (2000) (was here).
    Compression and intelligence [Dowe and Hajek (1997, 1998)], and other compression pointers.
    Computational complexity links.
    Computational Learning Theory (CoLT) page: theorists and resources, and Thomas Zeugmann's COLTBIB.
    CSE455, 4th Year Hons course entitled Minimum Message Length, Monash University.
    Data collections and Vlad's KD and data mining page.
    Kernel machines WWW site: www.kernel-machines.org; an Alex Smola SVM talk; and Support Vector Machine mailing list.
    Machine Learning and CBR people and Mach. Learning Resources (maintained by David Aha). Online Mach. Learn. Resources.
    Nit - a nit is a unit of information (1 nit = log2(e) bits, 1 bit = ln(2) nits) going back at least as far as Boulton and Wallace (1970). See sec. 11.4.1, p271 of Comley and Dowe (2005) (MIT Press, pp265-294) for a very concise history.
    Occam's razor links: Minimum Encoding Length Inference is an operational form of Ockham's razor.
    Probabilistic prediction, probabilistic prediction competition and Gaussian prediction competition for Australian football, and history.
    Statistics: University of Florida Department of Statistics's Statistics page; and AskDrMath's Prob/Stat and Statistics.
    Support Vector Machine mailing list; and Kernel machines WWW site, www.kernel-machines.org. David Albrecht's SVM site, another, and another; and Tan and Dowe (2004, .pdf) on MML SVMs.

    People interested in MDL, MML, algorithmic probability and algorithmic information theory
    [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
    A
    Yudi Agusta.
    Akaike Information Criterion (AIC): To compare MML and AIC, see, e.g., here.
    David Albrecht (and his SVM site).
    Lloyd Allison's MDL (stochastic complexity) and MML pages.
    Christoph Arndt's information theory notes and "Information measures" (in English) book.
    Azat Arslanov
    B
    Andrew Barron
    Rohan Baxter
    Rev. Thomas Bayes (dec. 1761), a small drawing of Rev. Thomas Bayes, a German article on him; and Bayesian Statisticians worldwide. MML uses Bayesian information theory.
    Bayesian Nets using Minimum message length (MML).
    Adrian Bickerstaffe.
    Olivier Bousquet's Kolmogorov complexity resources, Kolmogorov complexity mail list and list archive.
    David M. Boulton
    Matthew Brand
    Peter G. Bryant
    C
    Cristian S. Calude
    CDMS.
    Greg Chaitin
    Bertrand Clarke
    John G. Cleary
    Clustering, mixture modelling (or finite mixture modeling) and/or unsupervised learning using MML: Snob - see, e.g., Wallace and Dowe (2000).
    Josh Comley and (with D. L. Dowe) "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", Chapter 11 (pp265-294) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9. {This is about Generalised Bayesian nets, generalising MML Bayesian nets or MML Bayesian networks or MML Bayes nets (or generalised directed graphical models, or generalised MML directed graphical models); and it deals with a mix of both continuous and discrete variables. (See also Comley and Dowe (2003), .pdf.)}
    Consistency: The Wallace-Freeman (1987) version of MML has shown to be statistically consistent on the Neyman-Scott problem - see Dowe & Wallace (1997). We have conjectured (or asked the question) [Dowe, Baxter, Oliver & Wallace (1998, sec. 6, p93), Edwards & Dowe(1998, sec. 5.3, p104), Wallace & Dowe (1999a, p282), Wallace & Dowe (2000, sec. 5, p78), Comley & Dowe (2005, sec. 11.3.1, p269), and DoweGardnerOppy2007 (sec. 8 (Conclusion), currently(?) p52)] whether or not only MML and closely related Bayesian methods can, in general, infer fully-specified models with both statistical consistency and invariance.
    Thomas M. Cover
    D
    Ian Davidson.
    A. Phil Dawid
    Ivanoe De Falco (and colleagues)'s papers and Genetic Programming Approach to Kolmogorov Complexity page.
    Trevor Dix
    Byron Dom's publications.
    David Dowe and (with C. S. Wallace): "Minimum Message Length and Kolmogorov complexity" Comp. J., Vol 42, No. 4 (1999a), pp270-283 [this article is the Computer Journal's most downloaded ``full text as .pdf'' - see, e.g., here]; and (with J. W. Comley) "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", Chapter 11 (pp265-294) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9; and D. Dowe publications (or other D. Dowe publications).
    Mike Dowman.
    E
    Bruce Edmonds's publications and Complexity Related Links.
    Russell Edwards, Astronomical Institute "Anton Pannekoek", Faculty of Mathematics, Computer Science, Physics and Astronomy (WINS), University of Amsterdam.
    Mark Ellison (was here).
    Vladimir Estivill-Castro
    F
    Graham Farr's publications.
    Ronald Aylmer Fisher (1890-1962): While R. A. Fisher is generally considered far from Bayesian, both Rissanen's MDL work (e.g., 1996) and Wallace's (Bayesian) MML work (e.g., with Freeman, 1987) use the notion of Fisher information. Image of R. A. Fisher (was here).
    Leigh Fitzgibbon's publications.
    Simon Fox
    Roberto Fraile (and here).
    Pasi Fra:nti.
    Norio Fukuda
    G
    Peter Gacs
    Alex Gammerman
    Andres Garcia
    Russell Greiner
    Peter Grünwald (Peter Grunwald, Peter Gru:nwald) and here; and (MDL source book) ``Advances in Minimum Description Length'', April 2005, MIT Press.
    H
    Alan Hajek
    Brian Hanlon
    Hongxing He
    José Hernàndez i Orallo (and in English, Jose Hernandez-Orallo).
    Phil Hingston
    Marcus Hutter
    I
    Information theory: Min. Enc. Length Inference, MDL and MML are based in information theory - as is also AIC. See also quantum info. theory.
    Invariance: Strict MML and many of its approximations - such as Wallace & Freeman (JRSS, 1987) and MMLD (or I1D) - are statistically invariant. See, e.g., Dowe, Gardner & Oppy2007 (page ?7? and elsewhere) and Comley & Dowe (2005) (sec. 11.3 [pp268, 269 and 270]). See also invariance and consistency conjecture.
    Laurence Irlicht
    J
    Bruno Janvier.
    Tao Jiang
    Murray Jorgensen
    K
    Paul Kabaila
    Kenichi Kanatani's recent publications.
    Aram Karalic
    Gary Warren King
    Andrei N. Kolmogorov (1903-1987) sites: http://www.Kolmogorov.com/Kolmogorov.html, a c.v. (in Russian), http://www.pms.ru/kolmogorov (in Russian), exploratorium.edu, dcs.st-and.ac.uk and (by Paul Vitanyi) a biography.
    Petri Kontkanen
    L
    Aaron Lanterman (was here).
    Thomas Lee's publications and technical reports.
    Shane Legg
    Leonid Levin
    Mengxiang Li
    Ming Li
    Xiaodong Li
    M
    Dean McKenzie's publications.
    Enes Makalic.
    Steve Maybank (and here).
    www.MDL-research.org, people, reading, demonstrations and related topics; and (MDL source book) ``Advances in Minimum Description Length'', April 2005, MIT Press.
    Paul Ming's Virtual Turing Machine (VTM) and Virtual Turing Machine 2 (VTM2).
    Minimum Description Length (MDL) and comparisons with MML (on pp270-283 and elsewhere) in Comp. J., Vol 42, No. 4, 1999.
    Mixture modelling (or finite mixture modeling), clustering and/or unsupervised learning using MML: Snob - see, e.g., Wallace and Dowe (2000).
    MML and the "Turing test".
    Suzie Molloy.
    Robert J. Munro
    Petri Myllymäki
    N
    Julian R. Neil
    Doug Newlands
    O
    Ockham, William of (circa. 1280 or 1285 till circa. 1347 or 1349, apparently 10th April 1349), Ockham's razor and town of Ockham.
    R. D. Ogden, Computer Science Department, Southwest Texas State University, San Marcos, TX 78666, USA; ro01@swt.edu.
    Jon Oliver
    Graham Oppy
    P
    Jon Patrick.
    Philosophy, Stanford Encyclopaedia of.
    Q
    Guoqi Qian
    Quantum information theory - Quantum Computer Technology (U Melb, U NSW, U Qld).
    R
    Anand Venkataraman (was Anand (Venkt) Raman).
    Fengrong Ren
    Jorma Rissanen (and publications, although links seems dead) and some sample publications.
    Ricardo Rocha
    Teemu Roos
    Juho Rousu.
    S
    Pritika Sanghi.
    J. G. Sanjayan
    Ju:rgen Schmidhuber
    Stanley L. Sclove's journal publications and working papers.
    Claude Shannon ("father of information theory")'s obituary (and another obituary), 1916-2001.
    Alexander Shen
    Tomi Silander
    Jatinder Singh.
    Snob (software): mixture modelling (to infer a finite mixture model) using MML.
    Ray Solomonoff's biography, publications and obituary (New York Times, Wedn 13/Jan/2010): online and scanned - and Solomonoff 85th memorial conference (Wedn 30 Nov - Fri 2 Dec 2011) : 3rd Call for Papers, 2nd Call for Papers, 1st Call for Papers, Invited speakers, accepted papers.
    David Suter.
    T
    Peter Jing Tan; and P. J. Tan and D. L. Dowe (2003), "MML Inference of Decision Graphs with Multi-Way Joins and Dynamic Attributes", Proc. 16th Australian Joint Conference on Artificial Intelligence (AI'03), Perth, Australia, 3-5 Dec. 2003, Published in Lecture Notes in Artificial Intelligence (LNAI) 2903, Springer-Verlag, pp269-281.
    Hiroshi Tanaka
    Kai Ming Ting
    Henry Tirri
    Peter Tischer
    Phil H. S. Torr (was here).
    John Tromp's Home Page (was here).
    Alan Turing (1912-1954), developer of (Universal) Turing Machines, among many other things. Sites maintained by Andrew Hodges and dcs.st-and.ac.uk; The Turing archive for the history of computing (maintained by Jack Copeland and Gordon Aston); and image of Alan Turing.
    A.M. Turing's (1950) "Computing Machinery and Intelligence", Mind, 59, pp433-460 - which is the paper which introduced the imitation game or "Turing Test".
    Turing Machine simulator: http://wap03.informatik.fh-wiesbaden.de/weber1/turing/tm.html and documentation.
    Virtual Turing Machine (VTM) and Virtual Turing Machine 2 (VTM2), by Paul Ming.
    The "Turing test" and MML.
    Charles Twardy and Bayesian Models for Search & Rescue.
    U
    Unsupervised learning, mixture modelling (or finite mixture modeling) and/or clustering using MML: Snob - see, e.g., Wallace and Dowe (2000).
    Aleksey M. Urmanov.
    William Uther (was here).
    V
    Farshid Vahid
    Manuela Veloso.
    Brani Vidakovic.
    Murli Viswananthan (and here)'s publications.
    Paul Vitanyi's Publications & Areas of Interest, Kolmogorov complexity publications, selected recent papers and publications.
    Volodya Vovk
    V. V. V'yugin
    W
    Chris Wallace's publications, including "Minimum Message Length and Kolmogorov complexity" (with D. L. Dowe), Comp. J., Vol 42, No. 4 (1999a), pp270-283 [this article is the Computer Journal's most downloaded ``full text as .pdf'' - see, e.g., here]; and including his (posthumous) book: Wallace, C.S. (2005) [posthumous], Statistical and Inductive Inference by Minimum Message Length, Springer (Series: Information Science and Statistics), 2005, XVI, 432 pp., 22 illus., Hardcover, ISBN: 0-387-23795-X [table of contents, chapter headings (was here) and more];
    and also including
    Chris Wallace MML publications, 1990- and also Chris Wallace MML publications, 1968-1991. See also D. L. Dowe, "Foreword re C. S. Wallace", Computer Journal, Vol. 51, No. 5 (Sept. 2008) [Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560 (and here).
    Ian H. Witten
    Y
    Kenji Yamanishi
    Bin Yu's publications.
    Z
    Alexander K. Zvonkin


    Miscellaneous, other, links
    Chris Wallace (1933-2004) (developer of MML in 1968) and the Computer J's Christopher Stewart WALLACE (1933-2004) memorial special issue [Vol. 51, No. 5 (Sept. 2008)],
    Bayesian Nets using Minimum message length (MML),
    clustering and mixture modelling,
    data repositories,
    decision trees using Minimum message length (MML),
    Occam's razor (Ockham's razor),
    Snob (program for MML clustering and mixture modelling) to infer finite mixture models,
    (econometric) time series using MML,
    medical research,
    a probabilistic sports prediction competition (and further reading on probabilistic scoring),
    chess and game theory research;
    Feeding the world (TheHungerSite), TheRainforestSite, "do-goody"/"do-goody stuff, improving the world and saving the planet".

  • Please e-mail me if you would like to know more.
  • This page is http://www.csse.monash.edu.au/~dld/MELI.html.
    Copyright David L. Dowe, Monash University, Australia, 31 Jan. 2000, etc.
    Copying is not permitted without expressed permission from David L. Dowe.