Robust Feature Selection using Distributions of Mutual Information
Keywords: Robust feature selection, naive Bayes classifier, Mutual Information,
Cross Entropy, Dirichlet distribution, Second order distribution,
expectation and variance of mutual information.
Abstract: Mutual information is widely used in artificial intelligence, in a
descriptive way, to measure the stochastic dependence of discrete random
variables. In order to address questions such as the reliability of the
empirical value, one must consider sample-to-population inferential
approaches. This paper deals with the distribution of mutual information, as
obtained in a Bayesian framework by a second-order Dirichlet prior
distribution. The exact analytical expression for the mean and an
analytical approximation of the variance are reported. Asymptotic
approximations of the distribution are proposed. The results are applied to
the problem of selecting features for incremental learning and
classification of the naive Bayes classifier. A fast, newly defined method
is shown to outperform the traditional approach based on empirical mutual
information on a number of real data sets. Finally, a theoretical
development is reported that allows one to efficiently extend the above
methods to incomplete samples in an easy and effective way.
Table of Contents
- Introduction
- Distribution of Mutual Information
- Feature Selection
- Experimental Analysis
- Extensions
- Conclusions
BibTeX Entry
@InProceedings{Hutter:02feature,
author = "Marco Zaffalon and Marcus Hutter",
title = "Robust Feature Selection using Distributions of Mutual Information",
year = "2002",
pages = "577--584",
booktitle = "Proceedings of the 18th International Conference on
Uncertainty in Artificial Intelligence (UAI-2002)",
editor = "A. Darwiche and N. Friedman",
publisher = "Morgan Kaufmann",
address = "San Francisco, CA.",
report = "IDSIA-08-02 and cs.AI/0206006",
url = "http://www.hutter1.net/ai/feature.htm",
url2 = "http://arxiv.org/abs/cs.AI/0206006",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-08-02.ps.gz",
categories = "I.2. [Artificial Intelligence]",
keywords = "Robust feature selection, naive Bayes classifier,
Mutual Information, Cross Entropy, Dirichlet distribution, Second
order distribution, expectation and variance of mutual
information.",
abstract = "Mutual information is widely used in artificial intelligence, in a
descriptive way, to measure the stochastic dependence of discrete random
variables. In order to address questions such as the reliability of the
empirical value, one must consider sample-to-population inferential
approaches. This paper deals with the distribution of mutual information, as
obtained in a Bayesian framework by a second-order Dirichlet prior
distribution. The exact analytical expression for the mean and an
analytical approximation of the variance are reported. Asymptotic
approximations of the distribution are proposed. The results are applied to
the problem of selecting features for incremental learning and
classification of the naive Bayes classifier. A fast, newly defined method
is shown to outperform the traditional approach based on empirical mutual
information on a number of real data sets. Finally, a theoretical
development is reported that allows one to efficiently extend the above
methods to incomplete samples in an easy and effective way.",
}