Wei Cheng NEC Labs America

Wei Cheng

Senior Researcher

Data Science and System Security


Robust Graph Representation Learning via Neural Sparsification

Robust Graph Representation Learning via Neural Sparsification Graph representation learning serves as the core of important prediction tasks, ranging from product recommendation to fraud detection. Reallife graphs usually have complex information in the local neighborhood, where each node is described by a rich set of features and connects to dozens or even hundreds of neighbors. Despite the success of neighborhood aggregation in graph neural networks, task-irrelevant information is mixed into nodes’ neighborhood, making learned models suffer from sub-optimal generalization performance. In this paper, we present NeuralSparse, a supervised graph sparsification technique that improves generalization power by learning to remove potentially task-irrelevant edges from input graphs. Our method takes both structural and nonstructural information as input, utilizes deep neural networks to parameterize sparsification processes, and optimizes the parameters by feedback signals from downstream tasks. Under the NeuralSparse framework, supervised graph sparsification could seamlessly connect with existing graph neural networks for more robust performance. Experimental results on both benchmark and private datasets show that NeuralSparse can yield up to 7.2% improvement in testing accuracy when working with existing graph neural networks on node classification tasks.

Inductive and Unsupervised Representation Learning on Graph Structured Objects

Inductive and Unsupervised Representation Learning on Graph Structured Objects Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain. It is also challenging to make graph learning inductive and unsupervised at the same time, as learning processes guided by reconstruction error based loss functions inevitably demand graph similarity evaluation that is usually computationally intractable. In this paper, we propose a general framework SEED (Sampling, Encoding, and Embedding Distributions) for inductive and unsupervised representation learning on graph structured objects. Instead of directly dealing with the computational challenges raised by graph similarity evaluation, given an input graph, the SEED framework samples a number of subgraphs whose reconstruction errors could be efficiently evaluated, encodes the subgraph samples into a collection of subgraph vectors, and employs the embedding of the subgraph vector distribution as the output vector representation for the input graph. By theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism. Using public benchmark datasets, our empirical study suggests the proposed SEED framework is able to achieve up to 10% improvement, compared with competitive baseline methods.

Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes

Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor’s treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors, 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance, 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.

You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis

You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis o subvert recent advances in perimeter and host security, the attacker community has developed and employed various attack vectors to make a malware much more stealthy than before to penetrate the target system and prolong its presence. The advanced malware, or stealthy malware, impersonates or abuses benign applications and legitimate system tools to minimize its footprints in the target system. One example of such stealthy malware is fileless malware, which resides its malicious logic mostly in the memory of well-trusted processes. It is difficult for traditional detection tools, such as malware scanners, to detect it, as the malware normally does not expose its malicious payload in a file and hides its malicious behaviors among the benign behaviors of the processes.In this paper, we present PROVDETECTOR, a provenance-based approach for detecting stealthy malware. The intuition behind PROVDETECTOR is that although a stealthy malware may impersonate or abuse a benign process, it still exposes its malicious behaviors in the OS (operating system) level provenance. Based on this intuition, PROVDETECTOR first employs a novel selection algorithm to identify possibly malicious parts in the OS level provenance data of a process. Then, it applies a neural embedding and machine learning pipeline to automatically detect any behavior that deviates significantly from normal behaviors. We evaluate our approach on a large provenance dataset from an enterprise network and demonstrate that it achieves very high detection performance (an average F1 score of 0.974) of stealthy malware. Further, we conduct thorough interpretability studies to understand the internals of the learned machine learning models.

Asymmetrically Hierarchical Networks with Attentive Interactions for Interpretable Review-based Recommendation

Asymmetrically Hierarchical Networks with Attentive Interactions for Interpretable Review-based Recommendation Recently, recommender systems have been able to emit substantially improved recommendations by leveraging user-provided reviews. Existing methods typically merge all reviews of a given user (item) into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users’ reviews reflect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item’s reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.

Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series

Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series The problem of learning and forecasting underlying trends in time series data arises in a variety of applications, such as traffic management, energy optimization, etc. In literature, a trend in time series is characterized by the slope and duration, and its prediction is then to forecast the two values of the subsequent trend given historical data of the time series. For this problem, existing approaches mainly deal with the case in univariate time series. However, in many real-world applications, there are multiple variables at play, and handling all of them at the same time is crucial for an accurate prediction. A natural way is to employ multi-task learning (MTL) techniques in which the trend learning of each time series is treated as a task. The key point of MTL is to learn task relatedness to achieve better parameter sharing, which however is challenging in trend prediction task. First, effectively modeling the complex temporal patterns in different tasks is hard as the temporal and spatial dimensions are entangled. Second, the relatedness among tasks may change over time. In this paper, we propose a neural network, DeepTrends, for multivariate time series trend prediction. The core module of DeepTrends is a tensorized LSTM with adaptive shared memory (TLASM). TLASM employs the tensorized LSTM to model the temporal patterns of long-term trend sequences in an MTL setting. With an adaptive shared memory, TLASM is able to learn the relatedness among tasks adaptively, based upon which it can dynamically vary degrees of parameter sharing among tasks. To further consider short-term patterns, DeepTrends utilizes a multi-task 1dCNN to learn the local time series features, and employs a task-specific sub-network to learn a mixture of long-term and short-term patterns for trend prediction. Extensive experiments on real datasets demonstrate the effectiveness of the proposed model.

Deep Unsupervised Binary Coding Networks for Multivariate Time Series Retrieval

Deep Unsupervised Binary Coding Networks for Multivariate Time Series Retrieval Multivariate time series data are becoming increasingly ubiquitous in varies real-world applications such as smart city, power plant monitoring, wearable devices, etc. Given the current time series segment, how to retrieve similar segments within the historical data in an efficient and effective manner is becoming increasingly important. As it can facilitate underlying applications such as system status identification, anomaly detection, etc. Despite the fact that various binary coding techniques can be applied to this task, few of them are specially designed for multivariate time series data in an unsupervised setting. To this end, we present Deep Unsupervised Binary Coding Networks (DUBCNs) to perform multivariate time series retrieval. DUBCNs employ the Long Short-Term Memory (LSTM) encoder-decoder framework to capture the temporal dynamics within the input segment and consist of three key components, i.e., a temporal encoding mechanism to capture the temporal order of different segments within a mini-batch, a clustering loss on the hidden feature space to capture the hidden feature structure, and an adversarial loss based upon Generative Adversarial Networks (GANs) to enhance the generalization capability of the generated binary codes. Thoroughly empirical studies on three public datasets demonstrated that the proposed DUBCNs can outperform state-of-the-art unsupervised binary coding techniques.

Temporal Context-aware Representation Learning for Question Routing

Temporal Context-aware Representation Learning for Question Routing Question routing (QR) aims at recommending newly posted questions to the potential answerers who are most likely to answer the questions. The existing approaches that learn users’ expertise from their past question-answering activities usually suffer from challenges in two aspects: 1) multi-faceted expertise and 2) temporal dynamics in the answering behavior. This paper proposes a novel temporal context-aware model in multiple granularities of temporal dynamics that concurrently address the above challenges. Specifically, the temporal context-aware attention characterizes the answerer’s multi-faceted expertise in terms of the questions’ semantic and temporal information simultaneously. Moreover, the design of the multi-shift and multi-resolution module enables our model to handle temporal impact on different time granularities. Extensive experiments on six datasets from different domains demonstrate that the proposed model significantly outperforms competitive baseline models.

Interpretable Click-Through Rate Prediction through Hierarchical Attention

Interpretable Click-Through Rate Prediction through Hierarchical Attention Click-through rate (CTR) prediction is a critical task in online advertising and marketing. For this problem, existing approaches, with shallow or deep architectures, have three major drawbacks. First, they typically lack persuasive rationales to explain the outcomes of the models. Unexplainable predictions and recommendations may be difficult to validate and thus unreliable and untrustworthy. In many applications, inappropriate suggestions may even bring severe consequences. Second, existing approaches have poor efficiency in analyzing high-order feature interactions. Third, the polysemy of feature interactions in different semantic subspaces is largely ignored. In this paper, we propose InterHAt that employs a Transformer with multi-head self-attention for feature learning. On top of that, hierarchical attention layers are utilized for predicting CTR while simultaneously providing interpretable insights of the prediction results. InterHAt captures high-order feature interactions by an efficient attentional aggregation strategy with low computational complexity. Extensive experiments on four public real datasets and one synthetic dataset demonstrate the effectiveness and efficiency of InterHAt.

Adaptive Neural Network for Node Classification in Dynamic Networks

Adaptive Neural Network for Node Classification in Dynamic Networks Given a network with the labels for a subset of nodes, transductive node classification targets to predict the labels for the remaining nodes in the network. This technique has been used in a variety of applications such as voxel functionality detection in brain network and group label prediction in social network. Most existing node classification approaches are performed in static networks. However, many real-world networks are dynamic and evolve over time. The dynamics of both node attributes and network topology jointly determine the node labels. In this paper, we study the problem of classifying the nodes in dynamic networks. The task is challenging for three reasons. First, it is hard to effectively learn the spatial and temporal information simultaneously. Second, the network evolution is complex. The evolving patterns lie in both node attributes and network topology. Third, for different networks or even different nodes in the same network, the node attributes, the neighborhood node representations and the network topology usually affect the node labels differently, it is desirable to assess the relative importance of different factors over evolutionary time scales. To address the challenges, we propose AdaNN, an adaptive neural network for transductive node classification. AdaNN learns node attribute information by aggregating the node and its neighbors, and extracts network topology information with a random walk strategy. The attribute information and topology information are further fed into two connected gated recurrent units to learn the spatio-temporal contextual information. Additionally, a triple attention module is designed to automatically model the different factors that influence the node representations. AdaNN is the first node classification model that is adaptive to different kinds of dynamic networks. Extensive experiments on real datasets demonstrate the effectiveness of AdaNN.