Introduction
I taught myself Dirichlet processes and Hierarchical DPs in the spring of 2015 in order to understand nonparametric Bayesian models and related inference algorithms. In the process, I wrote a bunch of code and took a bunch of notes. I preserved those notes here for the benefit of others trying to learn this material.
Table of Contents
- Dirichlet Distribution and Dirichlet Processes: A quick review of the Dirichlet Distribution and an introduction to the Dirichlet Process by analogy with the Dirichlet Distribution.
- Sampling from a Hierarchical Dirichlet Process: Code demonstrating how you can sample from a Hierarchical Dirichlet Process without generating an infinite number of parameters first.
- Nonparametric Latent Dirichlet Allocation: An alternative view of latent Dirichlet allocation using a Dirichlet process, and a demonstration of how it can be easily extended to a nonparametric model (where the number of topics becomes a random variable fit by the inference algorithm) using a hierarchical Dirichlet process.
- Fitting a Mixture Model with Gibbs Sampling: Derivation of a full Gibbs sampler for a finite mixture model with a uniform Dirichlet prior. This is a step on the way to deriving a Gibbs sampler for the Dirichlet Process Mixture Model.
- Collapsed Gibbs Sampling for Bayesian Mixture Models (with a Nonparametric Extension): Derivation of a collapsed Gibbs sampler for a finite mixture model with a uniform Dirichlet prior. Extension (without derivation) of this Gibbs sampler to the Dirichlet Process Mixture Model.
- Notes on Gibbs Sampling in Hierarchical Dirichlet Process Models: Notes on apply the equations given in the Hierarchical Dirichlet Process paper to nonparametric Latent Dirichlet Allocation.
-
Sample from Antoniak Distribution with Python: Code for drawing samples from the distribution of tables created by a Chinese restaurant process after
n
patrons are seated.
Code
I implemented the HDP-LDA component of the data microscopes project. You can install it with
$ conda install -c datamicroscopes -c distributions microscopes-lda
A Note on the term "Dirichlet Process"
Part of the impetus for compiling these notes was how carelessly the term "Dirichlet process" seemed to be used in literature on nonparametric Bayesian models.
Although I thought I had come to the correct understanding (which is presented here), Dan Roy helpfully pointed out that I probably got it wrong given how Dirichlet Process is defined by Ferguson 1973. Ferguson's use of Dirichlet process does not make it a "distribution over distributions" as Neal, Teh, Jordan, and Blei call it. At best, I believe there is equivocation on the term "Dirichlet Process" in the NPB literature. At worst, there is wide scale confusion on what a Dirichlet process is!
At some point, I intend to write a post trying to explain the subtleties of this discussion. In the mean time, I would suggest that my posts will still be valuable in understanding the literature on nonparametric Bayes, even if it won't get you a Ph.D. in measure theory.
Other Resources
HDP-LDA Implementations
- Gregor Heinrich's ILDA: A Java-based implementation of the "Posterior Assignment by Direct Sampling" MCMC algorithm from Teh et al (2005). Includes hyperparameter sampling.
- Shuyo's Implementation: Pure Python implementation of "Posterior sampling in the Chinese restaurant franchise" MCMC algorithm. Doesn't include hyperparameter sampling.
- Teh's Original Implementations: Matlab and C code for MCMC accompanying original paper. I found it impenetrable.
- HCA: C implementation
- HDP-Faster: C++ implementation by Chong Wang using split-merge algorithm.
- Gensim: Python-based variational inference (following Chong Wang et al (2011)).
- bnpy: Python implementation of variational inference.