Wei Cheng NEC Labs America

Wei Cheng

Senior Researcher

Data Science and System Security

Posts

Deep Multi-Instance Contrastive Learning with Dual Attention for Anomaly Precursor Detection

Prognostics or early detection of incipient faults by leveraging the monitoring time series data in complex systems is valuable to automatic system management and predictive maintenance. However, this task is challenging. First, learning the multi-dimensional heterogeneous time series data with various anomaly types is hard. Second, the precise annotation of anomaly incipient periods is lacking. Third, the interpretable tools to diagnose the precursor symptoms are lacking. Despite some recent progresses, few of the existing approaches can jointly resolve these challenges. In this paper, we propose MCDA, a deep multi-instance contrastive learning approach with dual attention, to detect anomaly precursor. MCDA utilizes multi-instance learning to model the uncertainty of precursor period and employs recurrent neural network with tensorized hidden states to extract precursor features encoded in temporal dynamics as well as the correlations between different pairs of time series. A dual attention mechanism on both temporal aspect and time series variables is developed to pinpoint the time period and the sensors the precursor symptoms are involved in. A contrastive loss is designed to address the issue that annotated anomalies are few. To the best of our knowledge, MCDA is the first method studying the problem of ‘when’ and ‘where’ for the anomaly precursor detection simultaneously. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of MCDA.

Learning to Drop: Robust Graph Neural Network via Topological Denoising

Graph Neural Networks (GNNs) have shown to be powerful tools for graph analytics. The key idea is to recursively propagate and aggregate information along the edges of the given graph. Despite their success, however, the existing GNNs are usually sensitive to the quality of the input graph. Real-world graphs are often noisy and contain task-irrelevant edges, which may lead to suboptimal generalization performance in the learned GNN models. In this paper, we propose PTDNet, a parameterized topological denoising network, to improve the robustness and generalization performance of GNNs by learning to drop task-irrelevant edges. PTDNet prunes task-irrelevant edges by penalizing the number of edges in the sparsified graph with parameterized networks. To take into consideration the topology of the entire graph, the nuclear norm regularization is applied to impose the low-rank constraint on the resulting sparsified graph for better generalization. PTDNet can be used as a key component in GNN models to improve their performances on various tasks, such as node classification and link prediction. Experimental studies on both synthetic and benchmark datasets show that PTDNet can improve the performance of GNNs significantly and the performance gain becomes larger for more noisy datasets.

Multi-Task Recurrent Modular Networks

We consider the models of deep multi-task learning with recurrent architectures that exploit regularities across tasks to improve the performance of multiple sequence processing tasks jointly. Most existing architectures are painstakingly customized to learn task relationships for different problems, which is not flexible enough to model the dynamic task relationships and lacks generalization abilities to novel test-time scenarios. We propose multi-task recurrent modular networks (MT-RMN) that can be incorporated in any multi-task recurrent models to address the above drawbacks. MT-RMN consists of a shared encoder and multiple task-specific decoders, and recurrently operates over time. For better flexibility, it modularizes the encoder into multiple layers of sub-networks and dynamically controls the connection between these sub-networks and the decoders at different time steps, which provides the recurrent networks with varying degrees of parameter sharing for tasks with dynamic relatedness. For the generalization ability, MT-RMN aims to discover a set of generalizable sub-networks in the encoder that are assembled in different ways for different tasks. The policy networks augmented with the differentiable routers are utilized to make the binary connection decisions between the sub-networks. The experimental results on three multi-task sequence processing datasets consistently demonstrate the effectiveness of MT-RMN.

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series

Forecasting on Sparse Multivariate Time Series Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS’s individually, and do not leverage the dynamic distributions underlying the MTS’s, leading to sub-optimal results when the sparsity is high. To address this challenge, we propose a novel generative model, which tracks the transition of latent clusters, instead of isolated feature representations, to achieve robust modeling. It is characterized by a newly designed dynamic Gaussian mixture distribution, which captures the dynamics of clustering structures, and is used for emitting time series. The generative model is parameterized by neural networks. A structured inference network is also designed for enabling inductive analysis. A gating mechanism is further introduced to dynamically tune the Gaussian mixture distributions. Extensive experimental results on a variety of real-life datasets demonstrate the effectiveness of our method.

Parameterized Explainer for Graph Neural Network

Despite recent progress in Graph Neural Networks (GNNs), explaining predictions made by GNNs remains a challenging open problem. The leading method independently addresses the local explanations (i.e., important subgraph structure and node features) to interpret why a GNN model makes the prediction for a single instance, e.g. a node or a graph. As a result, the explanation generated is painstakingly customized for each instance. The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to the lack of generalizability and hindering it from being used in the inductive setting. Besides, as it is designed for explaining a single instance, it is challenging to explain a set of instances naturally (e.g., graphs of a given class). In this study, we address these key challenges and propose PGExplainer, a parameterized explainer for GNNs. PGExplainer adopts a deep neural network to parameterize the generation process of explanations, which enables PGExplainer a natural approach to explaining multiple instances collectively. Compared to the existing work, PGExplainer has better generalization ability and can be utilized in an inductive setting easily. Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24.7% relative improvement in AUC on explaining graph classification over the leading baseline.

T2-Net: A Semi-supervised Deep Model for Turbulence Forecasting

Accurate air turbulence forecasting can help airlines avoid hazardous turbulence, guide the routes that keep passengers safe, maximize efficiency, and reduce costs. Traditional turbulence forecasting approaches heavily rely on painstakingly customized turbulence indexes, which are less effective in dynamic and complex weather conditions. The recent availability of high-resolution weather data and turbulence records allows more accurate forecasting of the turbulence in a data-driven way. However, it is a non-trivial task for developing a machine learning based turbulence forecasting system due to two challenges: (1) Complex spatio-temporal correlations, turbulence is caused by air movement with complex spatio-temporal patterns, (2) Label scarcity, very limited turbulence labels can be obtained. To this end, in this paper, we develop a unified semi-supervised framework, T2-Net, to address the above challenges. Specifically, we first build an encoder-decoder paradigm based on the convolutional LSTM to model the spatio-temporal correlations. Then, to tackle the label scarcity problem, we propose a novel Dual Label Guessing method to take advantage of massive unlabeled turbulence data. It integrates complementary signals from the main Turbulence Forecasting task and the auxiliary Turbulence Detection task to generate pseudo-labels, which are dynamically utilized as additional training data. Finally, extensive experimental results on a real-world turbulence dataset validate the superiority of our method on turbulence forecasting.

Node Classification in Temporal Graphs through Stochastic Sparsification and Temporal Structural Convolution

Node classification in temporal graphs aims to predict node labels based on historical observations. In real-world applications, temporal graphs are complex with both graph topology and node attributes evolving rapidly, which poses a high overfitting risk to existing graph learning approaches. In this paper, we propose a novel Temporal Structural Network (TSNet) model, which jointly learns temporal and structural features for node classification from the sparsified temporal graphs. We show that the proposed TSNet learns how to sparsify temporal graphs to favor the subsequent classification tasks and prevent overfitting from complex neighborhood structures. The effective local features are then extracted by simultaneous convolutions in temporal and spatial domains. Using the standard stochastic gradient descent and backpropagation techniques, TSNet iteratively optimizes sparsification and node representations for subsequent classification tasks. Experimental study on public benchmark datasets demonstrates the competitive performance of the proposed model in node classification. Besides, TSNet has the potential to help domain experts to interpret and visualize the learned models.

Robust Graph Representation Learning via Neural Sparsification

Graph representation learning serves as the core of important prediction tasks, ranging from product recommendation to fraud detection. Reallife graphs usually have complex information in the local neighborhood, where each node is described by a rich set of features and connects to dozens or even hundreds of neighbors. Despite the success of neighborhood aggregation in graph neural networks, task-irrelevant information is mixed into nodes’ neighborhood, making learned models suffer from sub-optimal generalization performance. In this paper, we present NeuralSparse, a supervised graph sparsification technique that improves generalization power by learning to remove potentially task-irrelevant edges from input graphs. Our method takes both structural and nonstructural information as input, utilizes deep neural networks to parameterize sparsification processes, and optimizes the parameters by feedback signals from downstream tasks. Under the NeuralSparse framework, supervised graph sparsification could seamlessly connect with existing graph neural networks for more robust performance. Experimental results on both benchmark and private datasets show that NeuralSparse can yield up to 7.2% improvement in testing accuracy when working with existing graph neural networks on node classification tasks.

Inductive and Unsupervised Representation Learning on Graph Structured Objects

Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain. It is also challenging to make graph learning inductive and unsupervised at the same time, as learning processes guided by reconstruction error based loss functions inevitably demand graph similarity evaluation that is usually computationally intractable. In this paper, we propose a general framework SEED (Sampling, Encoding, and Embedding Distributions) for inductive and unsupervised representation learning on graph structured objects. Instead of directly dealing with the computational challenges raised by graph similarity evaluation, given an input graph, the SEED framework samples a number of subgraphs whose reconstruction errors could be efficiently evaluated, encodes the subgraph samples into a collection of subgraph vectors, and employs the embedding of the subgraph vector distribution as the output vector representation for the input graph. By theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism. Using public benchmark datasets, our empirical study suggests the proposed SEED framework is able to achieve up to 10% improvement, compared with competitive baseline methods.

Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes

Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor’s treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors, 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance, 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.