Christopher Malon NEC Labs America

Christopher Malon

Senior Researcher

Machine Learning

Posts

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models Knowledge Graphs (KGs) store information in the form of (head, predicate, tail)-triples. To augment KGs with new knowledge, researchers proposed models for KG Completion (KGC) tasks such as link prediction, i.e., answering (h, p, ?) or (?, p, t) queries. Such models are usually evaluated with averaged metrics on a held-out test set. While useful for tracking progress, averaged single-score metrics cannotreveal what exactly a model has learned — or failed to learn. To address this issue, we propose KGxBoard: an interactive framework for performing fine-grained evaluation on meaningful subsets of the data, each of which tests individual and interpretable capabilities of a KGC model. In our experiments, we highlight the findings that we discovered with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.

Analyzing Coreference and Bridging in Product Reviews

Analyzing Coreference and Bridging in Product Reviews Product reviews may have complex discourse including coreference and bridging relations to a main product, competing products, and interacting products. Current approaches to aspect-based sentiment analysis (ABSA) and opinion summarization largely ignore this complexity. On the other hand, existing systems for coreference and bridging were trained in a different domain. We collect mention type annotations relevant to coreference and bridging for 498 product reviews. Using these annotations, we show that a state-of-the-art factuality score fails to catch coreference errors in product reviews, and that a state-of-the-art coreference system trained on OntoNotes does not perform nearly as well on product mentions. As our dataset grows, we expect it to help ABSA and opinion summarization systems to avoid entity reference errors.

Fast Few-shot Debugging for NLU Test Suites

Fast Few-shot Debugging for NLU Test Suites We study few-shot debugging of transformer based natural language understanding models, using recently popularized test suites to not just diagnose but correct a problem. Given a few debugging examples of a certain phenomenon, and a held-out test set of the same phenomenon, we aim to maximize accuracy on the phenomenon at a minimal cost of accuracy on the original test set. We examine several methods that are faster than full epoch retraining. We introduce a new fast method, which samples a few in-danger examples from the original training set. Compared to fast methods using parameter distance constraints or Kullback-Leibler divergence, we achieve superior original accuracy for comparable debugging accuracy.

Fast Few Shot Debugging for NLU Test Suites (arXiv)

Read Fast Few shot Debugging for NLU Test Suites (arXiv) from our Machine Learning Department. We study few shot debugging of transformer based natural language understanding models, using recently popularized test suites to not just diagnose but correct a problem. Given a few debugging examples of a certain phenomenon, and a held out test set of the same phenomenon, we aim to maximize accuracy on the phenomenon at a minimal cost of accuracy on the original test set. We examine several methods that are faster than full epoch retraining. We introduce a new fast method, which samples a few in danger examples from the original training set. Compared to fast methods using parameter distance constraints or Kullback Leibler divergence, we achieve superior original accuracy for comparable debugging accuracy.

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training. In spite of significant progress in image captioning with the help of the autoregressive generation framework, current approaches fail to generalize well to novel concept combinations. We propose a new framework that revolves around probing several similar image caption training instances (retrieval), performing analogical reasoning over relevant entities in retrieved prototypes (analogy), and enhancing the generation process with reasoning outcomes (composition). Our method augments the generation model by referring to the neighboring instances in the training set to produce novel concept combinations in generated captions. We perform experiments on the widely used image captioning benchmarks. The proposed models achieve substantial improvement over the compared baselines on both composition-related evaluation metrics and conventional image captioning metrics.

Team Papelo at FEVEROUS: Multi-hop Evidence Pursuit

Team Papelo at FEVEROUS: Multi-hop Evidence Pursuit We develop a system for the FEVEROUS fact extraction and verification task that ranks an initial set of potential evidence and then pursues missing evidence in subsequent hops by trying to generate it, with a “next hop prediction module” whose output is matched against page elements in a predicted article. Seeking evidence with the next hop prediction module continues to improve FEVEROUS score for up to seven hops. Label classification is trained on possibly incomplete extracted evidence chains, utilizing hints that facilitate numerical comparison. The system achieves .281 FEVEROUS score and .658 label accuracy on the development set, and finishes in second place with .259 FEVEROUS score and .576 label accuracy on the test set.

Overcoming Poor Word Embeddings with Word Definitions

Overcoming Poor Word Embeddings with Word Definitions Modern natural language understanding models depend on pretrained subword embeddings, but applications may need to reason about words that were never or rarely seen during pretraining. We show that examples that depend critically on a rarer word are more challenging for natural language inference models. Then we explore how a model could learn to use definitions, provided in natural text, to overcome this handicap. Our model’s understanding of a definition is usually weaker than a well-modeled word embedding, but it recovers most of the performance gap from using a completely untrained word.

Improving neural network robustness through neighborhood preserving layers

Improving neural network robustness through neighborhood preserving layers One major source of vulnerability of neural nets in classification tasks is from overparameterized fully connected layers near the end of the network. In this paper, we propose a new neighborhood preserving layer which can replace these fully connected layers to improve the network robustness. Networks including these neighborhood preserving layers can be trained efficiently. We theoretically prove that our proposed layers are more robust against distortion because they effectively control the magnitude of gradients. Finally, we empirically show that networks with our proposed layers are more robust against state-of-the-art gradient descent-based attacks, such as a PGD attack on the benchmark image classification datasets MNIST and CIFAR10.

Improving Disentangled Text Representation Learning with Information Theoretical Guidance

Improving Disentangled Text Representation Learning with Information Theoretical Guidance Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g., the manipulation over the data space cannot be easily achieved). Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text, without any supervision on semantics. A new mutual information upper bound is derived and leveraged to measure dependence between style and content. By minimizing this upper bound, the proposed method induces style and content embeddings into two independent low-dimensional spaces. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation in terms of content and style preservation.

Generating Followup Questions for Interpretable Multi hop Question Answering

Generating Followup Questions for Interpretable Multi hop Question Answering We propose a framework for answering open domain multi hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single hop answer extractor. This framework makes each hop interpretable, and makes the retrieval associated with later hops as flexible and specific as for the first hop. As a first instantiation of this framework, we train a pointer generator network to predict followup questions based on the question and partial information. This provides a novel application of a neural question generation network, which is applied to give weak ground truth single hop followup questions based on the final answers and their supporting facts. Learning to generate followup questions that select the relevant answer spans against downstream supporting facts, while avoiding distracting premises, poses an exciting semantic challenge for text generation. We present an evaluation using the two hop bridge questions of HotpotQA