Prototype Vector Machine for Large Scale Semi-supervised Learning (2009)

Authors

Abstract

Practical data analysis and mining rarely falls exactly into the supervised learning scenario. Rather, the growing amount of unlabelled data from various scientific domains poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computational intensiveness of graph-based SSL arises largely from the manifold or graph regularization, which may in turn lead to large models that are difficult to handle. To alleviate this, we proposed the {prototype vector machine} (PVM), a highly scalable, graph-based algorithm for large-scale SSL. Our key innovation is the use of ``prototypes vectors'' for efficient approximation on both the graph-based regularizer and the model representation. The choice of prototypes are grounded upon two important criterion: they not only perform effective low-rank approximation on the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. These criterion lead to consistent prototype selection scheme, allowing us to design a unified algorithm (PVM) that demonstrates encouraging performance while at the same time possessing appealing scaling properties (empirically linear with sample size).

Discussion

Enter your comment (wiki syntax is allowed):
WHCKT
 
paper/2009/198.txt · Last modified: 2009/05/24 18:42 (external edit)
 
Driven by DokuWiki