Publication Date: 9/10/2018
Event: ECML 2018
Reference: pp. 1-16, 2018
Authors: Dinghan Shen, Duke University; Martin Renqiang Min, NEC Laboratories America, Inc.; Hongyu Guo, National Research Council Canada
Abstract: Parametric embedding methods such as parametric t-distributed Stochastic Neighbor Embedding (pt-SNE) enables out-of-sample data visualization without further computationally expensive optimization or approximation. However, pt-SNE favors small mini-batches to train a deep neural network but large mini-batches to approximate its cost function involving all pairwise data point comparisons, and thus has difficulty in finding a balance. To resolve the conflicts, we present parametric t-distributed stochastic exemplar-centered embedding. Our strategy learns embedding parameters by comparing training data only with precomputed exemplars to indirectly preserve local neighborhoods, resulting in a cost function with significantly reduced computational and memory complexity. Moreover, we propose a shallow embedding network with high-order feature interactions for data visualization, which is much easier to tune but produces comparable performance in contrast to a deep feedforward neural network employed by pt-SNE. We empirically demonstrate, using several benchmark datasets, that our proposed method significantly outperforms pt-SNE in terms of robustness, visual effects, and quantitative evaluations.
Publication Link: https://link.springer.com/chapter/10.1007/978-3-030-10925-7_29