Robust Bounds for Classification via Selective Sampling (2009)

Authors

Abstract

We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as the base classifier, and thus can be efficiently run in any RKHS. Unlike previous margin-based semi-supervised algorithms, our sampling condition hinges on a simultaneous upper bound on bias and variance of the RLS estimate working in a simple linear label noise model. This fact allows us to prove performance bounds that hold for an arbitrary sequence of instances. In particular, we show that our sampling strategy allows to approximate the margin of the Bayes optimal classifier to any desired accuracy $\ve$ by asking $\widetilde{\scO}\bigl(d/\ve^2\bigr)$ queries (in the RKHS case $d$ is replaced by a suitable spectral quantity). While these are the standard fully-supervised rates for the i.i.d.\ case, the best previously known result in our harder setting was $\widetilde{\scO}\bigl(d^3/\ve^4\bigr)$. Preliminary experiments show that some of our algorithms also enjoy a good practical performance.

Discussion

Enter your comment (wiki syntax is allowed):
CBLDL
 
paper/2009/289.txt · Last modified: 2009/05/24 18:42 (external edit)
 
Driven by DokuWiki