Accounting for Burstiness in Topic Models (2009)

Download PDF

Authors

Doyle, Gabriel and Elkan, Charles

Abstract

Topic models are used in a variety of tasks to great success. However, even state-of-the-art topic models suffer from an important flaw. They do not capture the tendency of words to appear in bursts: if a word appears once in a document, it is more likely to appear again. We introduce a topic model that uses Dirichlet compound multinomials to account for this burstiness. This model outperforms the standard LDA model at modeling text and non-text data, and can be incorporated into more complex topic models as well.