Learning Diverse Rankings with Multi-Armed Bandits (2008)

Authors

Abstract

Algorithms for learning to rank Web documents usually assume a document's relevance is independent of other documents. This leads to learned ranking functions that produce rankings with redundant results. In contrast, user studies have shown that diversity at high ranks is often preferred. We present two new learning algorithms that directly learn a diverse ranking of documents based on users' clicking behavior. We show that these algorithms minimize abandonment, or alternatively, maximize the probability that a relevant document is found in the top k positions of a ranking. We show that one of our algorithms asymptotically achieves the best possible payoff obtainable in polynomial time even as user's interests change. The other performs better empirically when user interests are static, and is still theoretically near-optimal in that case.

Discussion

John Langford, 2008/07/09 00:07

Kleinberg and I had a discussion after the talk about extending this to a setting with generalization. I believe it is pretty straightforward, under the model of user interests in this paper. Essentially, you use an existing technique for contextual bandits (such as exp4 or epoch-greedy) to learn to predict the top element, then conditioned on the learned predictor, you learn a new one for the second position, third, etc… It should be possible to prove a learning-reduction style regret bound for this kind of process, and I would expect it to work in practice.

Enter your comment (wiki syntax is allowed):
DJLKL
 
paper/2008/264.txt · Last modified: 2009/05/24 17:48 (external edit)
 
Driven by DokuWiki