Baseline Needs More Love: On SimpleWord-Embedding-Based Models and Associated Pooling Mechanisms

Publication Date: 7/15/2018

Event: ACL 2018

Reference: pp. 440-450, 2018

Authors: Dinghan Shen, Duke University; Guoyin Wang, Duke University; Wenlin Wang, Duke University; Martin Renqiang Min, NEC Laboratories America, Inc.; Qinliang Su, Sun Yat-sen University; Yizhe Zhang, Microsoft Research; Chunyuan Li, Duke University; Ricardo Henao, Duke University; Lawrence Carin, Duke University

Abstract: Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

Publication Link: