Christopher Malon NEC Labs America

Christopher Malon

Senior Researcher

Machine Learning

Posts

Overcoming Poor Word Embeddings with Word Definitions

Modern natural language understanding models depend on pretrained subword embeddings, but applications may need to reason about words that were never or rarely seen during pretraining. We show that examples that depend critically on a rarer word are more challenging for natural language inference models. Then we explore how a model could learn to use definitions, provided in natural text, to overcome this handicap. Our model’s understanding of a definition is usually weaker than a well-modeled word embedding, but it recovers most of the performance gap from using a completely untrained word.

Improving neural network robustness through neighborhood preserving layers

One major source of vulnerability of neural nets in classification tasks is from overparameterized fully connected layers near the end of the network. In this paper, we propose a new neighborhood preserving layer which can replace these fully connected layers to improve the network robustness. Networks including these neighborhood preserving layers can be trained efficiently. We theoretically prove that our proposed layers are more robust against distortion because they effectively control the magnitude of gradients. Finally, we empirically show that networks with our proposed layers are more robust against state-of-the-art gradient descent-based attacks, such as a PGD attack on the benchmark image classification datasets MNIST and CIFAR10.

Improving Disentangled Text Representation Learning with Information Theoretical Guidance

Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g., the manipulation over the data space cannot be easily achieved). Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text, without any supervision on semantics. A new mutual information upper bound is derived and leveraged to measure dependence between style and content. By minimizing this upper bound, the proposed method induces style and content embeddings into two independent low-dimensional spaces. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation in terms of content and style preservation.

Generating Followup Questions for Interpretable Multi hop Question Answering

We propose a framework for answering open domain multi hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single hop answer extractor. This framework makes each hop interpretable, and makes the retrieval associated with later hops as flexible and specific as for the first hop. As a first instantiation of this framework, we train a pointer generator network to predict followup questions based on the question and partial information. This provides a novel application of a neural question generation network, which is applied to give weak ground truth single hop followup questions based on the final answers and their supporting facts. Learning to generate followup questions that select the relevant answer spans against downstream supporting facts, while avoiding distracting premises, poses an exciting semantic challenge for text generation. We present an evaluation using the two hop bridge questions of HotpotQA

Team Papelo: Transformer Networks at FEVER

We develop a system for the FEVER fact extraction and verification challenge that uses a high precision entailment classifier based on transformer networks pretrained with language modeling, to classify a broad set of potential evidence. The precision of the entailment classifier allows us to enhance recall by considering every statement from several articles to decide upon each claim. We include not only the articles best matching the claim text by TFIDF score, but read additional articles whose titles match named entities and capitalized expressions occurring in the claim text. The entailment module evaluates potential evidence one statement at a time, together with the title of the page the evidence came from (providing a hint about possible pronoun antecedents). In preliminary evaluation, the system achieves .5736 FEVER score, .6108 label accuracy, and .6485 evidence F1 on the FEVER shared task test set.

Teaching Syntax by Adversarial Distraction

Existing entailment datasets mainly pose problems which can be answered without attention to grammar or word order. Learning syntax requires comparing examples where different grammar and word order change the desired classification. We introduce several datasets based on synthetic transformations of natural entailment examples in SNLI or FEVER, to teach aspects of grammar and word order. We show that without retraining, popular entailment models are unaware that these syntactic differences change meaning. With retraining, some but not all popular entailment models can learn to compare the syntax properly.