Image and Informatics Group, LBNL : Home
Image and Informatics Group, LBNL : Home

Prototype Vector for Large Scale Semi-Supervised Learning

Int. conf. on Machine Learning, 2009

    K. Zhang
    J. Kwok
    B. Parvin


    Practical datamining rarely falls exactly into the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computational intensiveness ofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are difficult to handle. To alleviate this, we proposed the prototype vector ma chine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of "prototypes vectors" for efficient approximation on both the graph- based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouragingperformance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.

    click here to see the full version of the paper in Acrobat format