Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection (Marcus Hutter)

previous home search

LaTeX - PostScript - PDF - Html/Gif

contact up next

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection

Authors: Marcus Hutter and Marco Zaffalon (2003)

Comments: 11 pages

Subj-class: Artificial Intelligence; Learning

ACM-class: G.3; G.1.2

Reference: Proceedings of the 26th German Conference on Artificial Intelligence (KI-2003) pages 396-406

Report-no: IDSIA-15-03 and cs.LG/0306126

Paper: LaTeX - PostScript - PDF - Html/Gif

Slides: PowerPoint - PDF

Keywords: Incomplete data, Bayesian statistics, expectation maximization, global optimization, Mutual Information, Cross Entropy, Dirichlet distribution, Second order distribution, Credible intervals, expectation and variance of mutual information, missing data, Robust feature selection, Filter approach, naive Bayes classifier.

Abstract: Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.

previous home search

LaTeX - PostScript - PDF - Html/Gif

contact up next

BibTeX Entry

@InProceedings{Hutter:03mimiss,
  author =       "Marcus Hutter and Marco Zaffalon",
  title =        "Bayesian Treatment of Incomplete Discrete Data applied
                  to Mutual Information and Feature Selection",
  year =         "2003",
  pages =        "396--406",
  series =       "Lecture Notes in Computer Science",
  volume =       "2821",
  booktitle =    "Proceedings of the 26th German Conference on Artificial Intelligence (KI-2003)",
  editor =       "A. G{\"u}nter, R. Kruse and B. Neumann",
  publisher =    "Springer",
  address =      "Heidelberg",
  http =         "http://www.hutter1.net/ai/mimiss.htm",
  url =          "http://arxiv.org/abs/cs.LG/0306126",
  ftp =          "ftp://ftp.idsia.ch/pub/techrep/IDSIA-15-03.ps.gz",
  keywords =     "Incomplete data, Bayesian statistics, expectation maximization,
                  global optimization, Mutual Information, Cross Entropy, Dirichlet
                  distribution, Second order distribution, Credible intervals,
                  expectation and variance of mutual information, missing data,
                  Robust feature selection, Filter approach, naive Bayes classifier.",
  abstract =     "Given the joint chances of a pair of random variables one can
                  compute quantities of interest, like the mutual information. The
                  Bayesian treatment of unknown chances involves computing, from a
                  second order prior distribution and the data likelihood, a
                  posterior distribution of the chances. A common treatment of
                  incomplete data is to assume ignorability and determine the
                  chances by the expectation maximization (EM) algorithm. The two
                  different methods above are well established but typically
                  separated. This paper joins the two approaches in the case of
                  Dirichlet priors, and derives efficient approximations for the
                  mean, mode and the (co)variance of the chances and the mutual
                  information. Furthermore, we prove the unimodality of the
                  posterior distribution, whence the important property of
                  convergence of EM to the global maximum in the chosen framework.
                  These results are applied to the problem of selecting features for
                  incremental learning and naive Bayes classification. A fast filter
                  based on the distribution of mutual information is shown to
                  outperform the traditional filter based on empirical mutual
                  information on a number of incomplete real data sets.",
}

previous home search

LaTeX - PostScript - PDF - Html/Gif

contact up next

Authors:	Marcus Hutter and Marco Zaffalon (2003)
Comments:	11 pages
Subj-class:	Artificial Intelligence; Learning
ACM-class:	G.3; G.1.2
Reference:	Proceedings of the 26th German Conference on Artificial Intelligence (KI-2003) pages 396-406
Report-no:	IDSIA-15-03 and cs.LG/0306126
Paper:	LaTeX - PostScript - PDF - Html/Gif
Slides:	PowerPoint - PDF