Self-taught Clustering (2008)

Authors

Abstract

This paper focuses on a new clustering task, called \emph{self-taught clustering}. Self-taught clustering is an instance of \emph{unsupervised transfer learning}, which aims at clustering a small collection of target unlabeled data with the help of a large amount of \emph{auxiliary} unlabeled data. The target and auxiliary data can be different in topic distribution. We show that even when the target data are not sufficient to allow effective learning of a high quality feature representation, it is possible to learn the useful features with the help of the auxiliary data on which the target data can be clustered effectively. We propose a co-clustering based self-taught clustering algorithm to tackle this problem, by clustering the target and auxiliary data simultaneously to allow the feature representation from the auxiliary data to influence the target data through a common set of features. Under the new data representation, clustering on the target data can be improved. Our experiments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data.

Discussion

Tie-Yan Liu, 2008/07/07 07:15

Your paper reminds me of an ICDM 2006 paper named “Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory”, which used exactly the same mathematical formulation and algorithm, although a different application is investigated. Have you noticed and ever compared with that work?

Wenyuan Dai, 2008/07/08 04:24

Thank Tie-Yan.

I'm sorry that we haven't noticed your paper “Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory”.

I have just read your paper. It seems your paper focuses on an application on heterogeneous data co-clustering, while our problem is transfer learning for clustering.

Both your algorithm and ours are derived from the information theoretic co-clustering (Dhillon et al., KDD 2003). Therefore, it seems that we use very similar techniques to solve two different problems, but i'm afraid the algorithm in your paper is NOT EXACTLY THE SAME as ours. I think you may find some differences between the two algorithms, e.g. Steps 4 & 9 in your algorithm.

I think your algorithm can be applied to our problem, but unfortunately, we were not aware of your paper, and thus haven't compared our algorithm with yours. We are sorry about that.

Furthermore, I'm also not quite sure about the convergence of the algorithm in your paper, as I'm not sure the Steps 4 & 9 can guarantee to reduce the value of objective function. You omitted the proofs about these issues in your paper, however, to my knowledge, selecting Y clusters in the ways of Steps 4 & 9 will probably increase the value of objective function. Do I misunderstand your algorithm?

Janaka, 2012/07/11 04:08

Hi, I wonder whether I can find the source code or binaries. I work on Unsupervised Transfer Learning, and your contribution is one of the main. It would be very helpful if you can make it available.

Thank you.

Enter your comment (wiki syntax is allowed):
BTGRG
 
paper/2008/432.txt · Last modified: 2009/05/24 18:48 (external edit)
 
Driven by DokuWiki