Unlike most statistical methods, which are based on assumptions about a ``true'' underlying probability distribution, Minimum Description Length (MDL) methods are designed to optimize an information theoretic criterion. Although it is known that both design criteria tend to lead to similar statistical performance, there do exist cases where they disagree. In my thesis, I analyze two such cases.
In the first case it is found that a standard MDL method can be improved, both from a information theoretic and a probabilistic point of view, after which the two criteria turn out to agree after all. In the second case the disagreement turns out to be fundamental.
| Chapter | Description |
|---|---|
| 1 | General introduction to the Minimum Description Length principle |
| 2 | The catch-up phenomenon in Bayesian model selection |
| 3 & 4 | Switching between prediction strategies (online learning, related to the catch-up phenomenon) |
| 5 | Convergence results for MDL parameter estimation |
| 6 | Overview of the basic properties of Rényi divergence |