Introduction

I taught myself Dirichlet processes and Hierarchical DPs in the spring of 2015 in order to understand nonparametric Bayesian models and related inference algorithms. In the process, I wrote a bunch of code and took a bunch of notes. I preserved those notes here for the benefit of others trying to learn this material.


Table of Contents

Code

I implemented the HDP-LDA component of the data microscopes project. You can install it with

$ conda install -c datamicroscopes -c distributions microscopes-lda

A Note on the term "Dirichlet Process"

Part of the impetus for compiling these notes was how carelessly the term "Dirichlet process" seemed to be used in literature on nonparametric Bayesian models.

Although I thought I had come to the correct understanding (which is presented here), Dan Roy helpfully pointed out that I probably got it wrong given how Dirichlet Process is defined by Ferguson 1973. Ferguson's use of Dirichlet process does not make it a "distribution over distributions" as Neal, Teh, Jordan, and Blei call it. At best, I believe there is equivocation on the term "Dirichlet Process" in the NPB literature. At worst, there is wide scale confusion on what a Dirichlet process is!

At some point, I intend to write a post trying to explain the subtleties of this discussion. In the mean time, I would suggest that my posts will still be valuable in understanding the literature on nonparametric Bayes, even if it won't get you a Ph.D. in measure theory.

Other Resources

HDP-LDA Implementations