Yanchi Liu NEC Labs America

Yanchi Liu

Senior Researcher

Data Science and System Security

Posts

Multi-Faceted Knowledge-Driven Pre-training for Product Representation Learning

As a key component of e-commerce computing, product representation learning (PRL) provides benefits for a variety of applications, including product matching, search, and categorization. The existing PRL approaches have poor language understanding ability due to their inability to capture contextualized semantics. In addition, the learned representations by existing methods are not easily transferable to new products. Inspired by the recent advance of pre-trained language models (PLMs), we make the attempt to adapt PLMs for PRL to mitigate the above issues. In this article, we develop KINDLE, a Knowledge-drIven pre-trainiNg framework for proDuct representation LEarning, which can preserve the contextual semantics and multi-faceted product knowledge robustly and flexibly. Specifically, we first extend traditional one-stage pre-training to a two-stage pre-training framework and exploit a deliberate knowledge encoder to ensure a smooth knowledge fusion into PLM. In addition, we propose a multi-objective heterogeneous embedding method to represent thousands of knowledge elements. This helps KINDLE calibrate knowledge noise and sparsity automatically by replacing isolated classes as training targets in knowledge acquisition tasks. Furthermore, an input-aware gating network is proposed to select the most relevant knowledge for different downstream tasks. Finally, extensive experiments have demonstrated the advantages of KINDLE over the state-of-the-art baselines across three downstream tasks.

CAT: Beyond Efficient Transformer for Content-Aware Anomaly Detection in Event Sequences

It is critical and important to detect anomalies in event sequences, which becomes widely available in many application domains. Indeed, various efforts have been made to capture abnormal patterns from event sequences through sequential pattern analysis or event representation learning. However, existing approaches usually ignore the semantic information of event content. To this end, in this paper, we propose a self-attentive encoder-decoder transformer framework, Content-Aware Transformer CAT, for anomaly detection in event sequences. In CAT, the encoder learns preamble event sequence representations with content awareness, and the decoder embeds sequences under detection into a latent space, where anomalies are distinguishable. Specifically, the event content is first fed to a content-awareness layer, generating representations of each event. The encoder accepts preamble event representation sequence, generating feature maps. In the decoder, an additional token is added at the beginning of the sequence under detection, denoting the sequence status. A one-class objective together with sequence reconstruction loss is collectively applied to train our framework under the label efficiency scheme. Furthermore, CAT is optimized under a scalable and efficient setting. Finally, extensive experiments on three real-world datasets demonstrate the superiority of CAT.

Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph

We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path. Our code is available at https://github.com/lxucs/multilingual-mrc-isdg.

Interpreting Convolutional Sequence Model by Learning Local Prototypes with Adaptation Regularization

In many high-stakes applications of machine learning models, outputting only predictions or providing statistical confidence is usually insufficient to gain trust from end users, who often prefer a transparent reasoning paradigm. Despite the recent encouraging developments on deep networks for sequential data modeling, due to the highly recursive functions, the underlying rationales of their predictions are difficult to explain. Thus, in this paper, we aim to develop a sequence modeling approach that explains its own predictions by breaking input sequences down into evidencing segments (i.e., sub-sequences) in its reasoning. To this end, we build our model upon convolutional neural networks, which, in their vanilla forms, associates local receptive fields with outputs in an obscure manner. To unveil it, we resort to case-based reasoning, and design prototype modules whose units (i.e., prototypes) resemble exemplar segments in the problem domain. Each prediction is obtained by combining the comparisons between the prototypes and the segments of an input. To enhance interpretability, we propose a training objective that delicately adapts the distribution of prototypes to the data distribution in latent spaces, and design an algorithm to map prototypes to human-understandable segments. Through extensive experiments in a variety of domains, we demonstrate that our model can achieve high interpretability generally, together with a competitive accuracy to the state-of-the-art approaches.

Domain oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment

Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domain-oriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowledge is hard to be captured by existing pre-training scheme. Also, the word co-occurrences guided semantic learning of pre-training models can be largely augmented by entity-level association knowledge. But meanwhile, there is a risk of introducing noise due to the lack of ground truth word-level alignment. To address the issues, we provide a generalized domain-oriented approach, which leverages auxiliary domain knowledge to improve the existing pre-training framework from two aspects. First, to preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary knowledge, meanwhile we introduce Adaptive Hybrid Masked Model to incorporate such knowledge. It integrates two learning modes, word learning and phrase learning, and allows them to switch between each other. Second, we introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models. To alleviate the potential noise in this process, we introduce an interpretable Optimal Transport based approach to guide alignment learning. Experiments on four domain-oriented tasks demonstrate the superiority of our framework.

Unsupervised Concept Representation Learning for Length-Varying Text Similarity

Measuring document similarity plays an important role in natural language processing tasks. Most existing document similarity approaches suffer from the information gap caused by context and vocabulary mismatches when comparing varying-length texts. In this paper, we propose an unsupervised concept representation learning approach to address the above issues. Specifically, we propose a novel Concept Generation Network (CGNet) to learn concept representations from the perspective of the entire text corpus. Moreover, a concept-based document matching method is proposed to leverage advances in the recognition of local phrase features and corpus-level concept features. Extensive experiments on real-world data sets demonstrate that new method can achieve a considerable improvement in comparing length-varying texts. In particular, our model achieved 6.5% better F1 Score compared to the best of the baseline models for a concept-project benchmark dataset.

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series

Forecasting on Sparse Multivariate Time Series Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS’s individually, and do not leverage the dynamic distributions underlying the MTS’s, leading to sub-optimal results when the sparsity is high. To address this challenge, we propose a novel generative model, which tracks the transition of latent clusters, instead of isolated feature representations, to achieve robust modeling. It is characterized by a newly designed dynamic Gaussian mixture distribution, which captures the dynamics of clustering structures, and is used for emitting time series. The generative model is parameterized by neural networks. A structured inference network is also designed for enabling inductive analysis. A gating mechanism is further introduced to dynamically tune the Gaussian mixture distributions. Extensive experimental results on a variety of real-life datasets demonstrate the effectiveness of our method.

T2-Net: A Semi-supervised Deep Model for Turbulence Forecasting

Accurate air turbulence forecasting can help airlines avoid hazardous turbulence, guide the routes that keep passengers safe, maximize efficiency, and reduce costs. Traditional turbulence forecasting approaches heavily rely on painstakingly customized turbulence indexes, which are less effective in dynamic and complex weather conditions. The recent availability of high-resolution weather data and turbulence records allows more accurate forecasting of the turbulence in a data-driven way. However, it is a non-trivial task for developing a machine learning based turbulence forecasting system due to two challenges: (1) Complex spatio-temporal correlations, turbulence is caused by air movement with complex spatio-temporal patterns, (2) Label scarcity, very limited turbulence labels can be obtained. To this end, in this paper, we develop a unified semi-supervised framework, T2-Net, to address the above challenges. Specifically, we first build an encoder-decoder paradigm based on the convolutional LSTM to model the spatio-temporal correlations. Then, to tackle the label scarcity problem, we propose a novel Dual Label Guessing method to take advantage of massive unlabeled turbulence data. It integrates complementary signals from the main Turbulence Forecasting task and the auxiliary Turbulence Detection task to generate pseudo-labels, which are dynamically utilized as additional training data. Finally, extensive experimental results on a real-world turbulence dataset validate the superiority of our method on turbulence forecasting.

Inductive and Unsupervised Representation Learning on Graph Structured Objects

Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain. It is also challenging to make graph learning inductive and unsupervised at the same time, as learning processes guided by reconstruction error based loss functions inevitably demand graph similarity evaluation that is usually computationally intractable. In this paper, we propose a general framework SEED (Sampling, Encoding, and Embedding Distributions) for inductive and unsupervised representation learning on graph structured objects. Instead of directly dealing with the computational challenges raised by graph similarity evaluation, given an input graph, the SEED framework samples a number of subgraphs whose reconstruction errors could be efficiently evaluated, encodes the subgraph samples into a collection of subgraph vectors, and employs the embedding of the subgraph vector distribution as the output vector representation for the input graph. By theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism. Using public benchmark datasets, our empirical study suggests the proposed SEED framework is able to achieve up to 10% improvement, compared with competitive baseline methods.

Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series

The problem of learning and forecasting underlying trends in time series data arises in a variety of applications, such as traffic management, energy optimization, etc. In literature, a trend in time series is characterized by the slope and duration, and its prediction is then to forecast the two values of the subsequent trend given historical data of the time series. For this problem, existing approaches mainly deal with the case in univariate time series. However, in many real-world applications, there are multiple variables at play, and handling all of them at the same time is crucial for an accurate prediction. A natural way is to employ multi-task learning (MTL) techniques in which the trend learning of each time series is treated as a task. The key point of MTL is to learn task relatedness to achieve better parameter sharing, which however is challenging in trend prediction task. First, effectively modeling the complex temporal patterns in different tasks is hard as the temporal and spatial dimensions are entangled. Second, the relatedness among tasks may change over time. In this paper, we propose a neural network, DeepTrends, for multivariate time series trend prediction. The core module of DeepTrends is a tensorized LSTM with adaptive shared memory (TLASM). TLASM employs the tensorized LSTM to model the temporal patterns of long-term trend sequences in an MTL setting. With an adaptive shared memory, TLASM is able to learn the relatedness among tasks adaptively, based upon which it can dynamically vary degrees of parameter sharing among tasks. To further consider short-term patterns, DeepTrends utilizes a multi-task 1dCNN to learn the local time series features, and employs a task-specific sub-network to learn a mixture of long-term and short-term patterns for trend prediction. Extensive experiments on real datasets demonstrate the effectiveness of the proposed model.