previous  home  search PostScript (420kb)   PDF (293kb)   Html/Gif contact    up    next

## Optimality of Universal Bayesian Sequence Prediction

 Author: Marcus Hutter (2001) Comments: 15 LaTeX pages Subj-class: Artificial Intelligence; Learning ACM-class: I.2; I.2.6; I.2.8; F.1.3; F.2 Paper: PostScript (420kb)  -  PDF (293kb)  -  Html/Gif Slides: PostScript - PDF Remark: Technical Report

Keywords: Bayesian sequence prediction; Mixture distributions, Solomonoff induction; Kolmogorov complexity; learning; universal probability; tight loss and error bounds; Pareto-optimality.

Abstract: Various optimality properties of universal sequence prediction based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular will be studied. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $\M$ one can base ones prediction on the Bayes-mix $\xi$ defined as a $w_\nu$ weighted sum of distributions $\nu\in\M$. The number of additional prediction errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is bounded by $O(\sqrt{E_\mu})$. We show that the bound is tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Pareto-optimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $\nu\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_\nu\sim 2^{-K(\nu)}$ for the weights is optimal, where $K(\nu)$ is the length of the shortest program describing $\nu$.

 previous  home  search PostScript (420kb)   PDF (293kb)   Html/Gif contact    up    next

• Introduction
• Setup and Convergence
• Error & Loss Bounds
• Lower Error Bound
• Pareto Optimality of ß
• On the Optimal Choice of Weights
• Conclusions
 previous  home  search PostScript (420kb)   PDF (293kb)   Html/Gif contact    up    next

### BibTeX Entry

@TechReport{Hutter:02splower,
author =       "Marcus Hutter",
number =       "IDSIA-09-01",
institution =  "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)",
title =        "Optimality of Universal {Bayesian} Sequence Prediction",
year =         "2002",
pages =        "15",
abstract =     "Various optimality properties of universal sequence prediction
based on Bayes-mixtures in general, and Solomonoff's prediction
scheme in particular will be studied. The probability of observing
$x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be
computed with Bayes' rule if the true generating distribution
$\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is
unknown, but known to belong to a class $\M$ one can base ones
prediction on the Bayes-mix $\xi$ defined as a $w_\nu$ weighted
sum of distributions $\nu\in\M$. The number of additional
prediction errors $E_\xi$ made by the optimal universal prediction
scheme based on $\xi$ minus the number of errors $E_\mu$ of the
optimal informed prediction scheme based on $\mu$ is bounded by
$O(\sqrt{E_\mu})$. We show that the bound is tight and that no
other predictor can lead to smaller bounds. Furthermore, for
various performance measures we show Pareto-optimality of $\xi$ in
the sense that there is no other predictor which performs better
or equal in all environments $\nu\in\M$ and strictly better in at
least one. So, optimal predictors can (w.r.t.\ to most performance
measures in expectation) be based on the mixture $\xi$. Finally we
give an Occam's razor argument that Solomonoff's choice $w_\nu\sim 2^{-K(\nu)}$ for the weights is optimal, where $K(\nu)$ is the
length of the shortest program describing $\nu$.",
url  =         "http://www.hutter1.net/ai/splower.htm",
}

 previous  home  search PostScript (420kb)   PDF (293kb)   Html/Gif contact    up    next