haifeng chen Archives | Page 7 of 13

Haifeng Chen is the Department Head of the Data Science and System Security Department at NEC Laboratories America. He received his PhD in Computer Engineering from Rutgers University. His research focuses on data mining, system security, and industrial AI. He leads NEC’s work on secure systems, anomaly detection, and AI-driven automation solutions. Based in Princeton, Dr. Chen brings deep expertise in machine learning, anomaly detection, and system health monitoring, with a particular focus on building trustworthy and scalable AI-driven platforms. He has spearheaded numerous high-impact projects, including AI for spacecraft systems, root-cause analysis in cloud environments, and dynamic graph analysis for network security.

His leadership has helped shape the department’s role as a key contributor to NEC’s innovations in fields such as enterprise systems, national defense, and space technology. Dr. Chen holds more than 80 patents and has published over 100 peer-reviewed papers in top-tier venues, earning multiple best paper awards. His contributions extend beyond technical leadership; he serves on program committees for major AI and data science conferences such as SIGKDD and AAAI and has been a panelist for NSF grant reviews. Recognized with NEC’s highest corporate honor, the Contributor of the Year award, Haifeng Chen continues to drive the lab’s efforts in developing real-world, high-impact solutions that merge cutting-edge research with scalable applications across industries.

Posts

Towards Robust Graph Neural Networks via Adversarial Contrastive Learning

December 20, 2022/in Publications/by NEC Labs America

Graph Neural Network (GNN), as a powerful representation learning model on graph data, attracts much attention across various disciplines. However, recent studies show that GNN is vulnerable to adversarial attacks. How to make GNN more robust? What are the key vulnerabilities in GNN? How to address the vulnerabilities and defend GNN against the adversarial attacks? Adversarial training has shown to be effective in improving the robustness of traditional Deep Neural Networks (DNNs). However, existing adversarial training works mainly focus on the image data, which consists of continuous features, while the features and structures of graph data are often discrete. Moreover, rather than assuming each sample is independent and identically distributed as in DNN, GNN leverages the contextual information across the graph (e.g., neighborhoods of a node). Thus, existing adversarial training techniques cannot be directly applied to defend GNN. In this paper, we propose ContrastNet, an effective adversarial defense framework for GNN. In particular, we propose an adversarial contrastive learning method to train the GNN over the adversarial space. To further improve the robustness of GNN, we investigate the latent vulnerabilities in every component of a GNN encoder and propose corresponding refining strategies. Extensive experiments on three public datasets demonstrate the effectiveness of ContrastNet in improving the robustness of popular GNN variants, such as Graph Convolutional Network and GraphSage, under various types of adversarial attacks.

DeepGAR: Deep Graph Learning for Analogical Reasoning

December 3, 2022/in Publications/by NEC Labs America

Analogical reasoning is the process of discovering and mapping correspondences from a target subject to a base subject. As the most well-known computational method of analogical reasoning, Structure-Mapping Theory (SMT) abstracts both target and base subjects into relational graphs and forms the cognitive process of analogical reasoning by finding a corresponding subgraph (i.e., correspondence) in the target graph that is aligned with the base graph. However, incorporating deep learning for SMT is still under-explored due to several obstacles: 1) the combinatorial complexity of searching for the correspondence in the target graph, 2) the correspondence mining is restricted by various cognitive theory-driven constraints. To address both challenges, we propose a novel framework for Analogical Reasoning (DeepGAR) that identifies the correspondence between source and target domains by assuring cognitive theory-driven constraints. Specifically, we design a geometric constraint embedding space to induce subgraph relation from node embeddings for efficient subgraph search. Furthermore, we develop novel learning and optimization strategies that could end-to-end identify correspondences that are strictly consistent with constraints driven by the cognitive theory. Extensive experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of the proposed DeepGAR over existing methods. The code and data are available at: https://github.com/triplej0079/DeepGAR.

Personalized Federated Learning via Heterogeneous Modular Networks

December 3, 2022/in Publications/by NEC Labs America

Personalized Federated Learning (PFL) which collaboratively trains a federated model while considering local clients under privacy constraints has attracted much attention. Despite its popularity, it has been observed that existing PFL approaches result in sub-optimal solutions when the joint distribution among local clients diverges. To address this issue, we present Federated Modular Network (FedMN), a novel PFL approach that adaptively selects sub-modules from a module pool to assemble heterogeneous neural architectures for different clients. FedMN adopts a light-weighted routing hypernetwork to model the joint distribution on each client and produce the personalized selection of the module blocks for each client. To reduce the communication burden in existing FL, we develop an efficient way to interact between the clients and the server. We conduct extensive experiments on the real-world test beds and the results show both effectiveness and efficiency of the proposed FedMN over the baselines.

Using AI To Safely Put The First Woman On The Moon

October 7, 2022/in News/by NEC Labs America

We are helping to safely bring the first woman astronaut to the moon as part of NASA – National Aeronautics and Space Administration’s Artemis Project with our System Invariant Analysis Technology (SIAT). With Lockheed Martin Space’s T-Tauri AI platform, our SIAT analytics engine takes the data from the 150,000 sensors and creates a model incorporating over 22 billion data relationships. The AI model is then analyzed to find any irregularities which could lead to a possible malfunction of any of the spacecraft’s systems.

Multi-Faceted Knowledge-Driven Pre-training for Product Representation Learning

September 28, 2022/in Publications/by NEC Labs America

As a key component of e-commerce computing, product representation learning (PRL) provides benefits for a variety of applications, including product matching, search, and categorization. The existing PRL approaches have poor language understanding ability due to their inability to capture contextualized semantics. In addition, the learned representations by existing methods are not easily transferable to new products. Inspired by the recent advance of pre-trained language models (PLMs), we make the attempt to adapt PLMs for PRL to mitigate the above issues. In this article, we develop KINDLE, a Knowledge-drIven pre-trainiNg framework for proDuct representation LEarning, which can preserve the contextual semantics and multi-faceted product knowledge robustly and flexibly. Specifically, we first extend traditional one-stage pre-training to a two-stage pre-training framework and exploit a deliberate knowledge encoder to ensure a smooth knowledge fusion into PLM. In addition, we propose a multi-objective heterogeneous embedding method to represent thousands of knowledge elements. This helps KINDLE calibrate knowledge noise and sparsity automatically by replacing isolated classes as training targets in knowledge acquisition tasks. Furthermore, an input-aware gating network is proposed to select the most relevant knowledge for different downstream tasks. Finally, extensive experiments have demonstrated the advantages of KINDLE over the state-of-the-art baselines across three downstream tasks.

Explainable Anomaly Detection System for Categorical Sensor Data in Internet of Things

September 23, 2022/in Publications/by NEC Labs America

Internet of things (IoT) applications deploy massive number of sensors to monitor the system and environment. Anomaly detection on streaming sensor data is an important task for IoT maintenance and operation. However, there are two major challenges for anomaly detection in real IoT applications: (1) many sensors report categorical values rather than numerical readings, (2) the end users may not understand the detection results, they require additional knowledge and explanations to make decision and take action. Unfortunately, most existing solutions cannot satisfy such requirements. To bridge the gap, we design and develop an eXplainable Anomaly Detection System (XADS) for categorical sensor data. XADS trains models from historical normal data and conducts online monitoring. XADS detects the anomalies in an explainable way: the system not only reports anomalies’ time periods, types, and detailed information, but also provides explanations on why they are abnormal, and what the normal data look like. Such information significantly helps the decision making for users. Moreover, XADS requires limited parameter setting in advance, yields high accuracy on detection results and comes with a user-friendly interface, making it an efficient and effective tool to monitor a wide variety of IoT applications.

3D Histogram-Based Anomaly Detection for Categorical Sensor Data in Internet of Things

September 9, 2022/in Publications/by NEC Labs America

The applications of Internet-of-things (IoT) deploy a massive number of sensors to monitor the system and environment. Anomaly detection on streaming sensor data is an important task for IoT maintenance and operation. In real IoT applications, many sensors report categorical values rather than numerical readings. Unfortunately, most existing anomaly detection methods are designed only for numerical sensor data. They cannot be used to monitor the categorical sensor data. In this study, we design and develop a 3D Histogram-based Categorical Anomaly Detection (HCAD) solution to monitor categorical sensor data in IoT. HCAD constructs the histogram model by three dimensions: categorical value, event duration, and frequency. The histogram models are used to profile normal working states of IoT devices. HCAD automatically determines the range of normal data and anomaly threshold. It only requires very limited parameter setting and can be applied to a wide variety of different IoT devices. We implement HCAD and integrate it into an online monitoring system. We test the proposed solution on real IoT datasets such as telemetry data from satellite sensors, air quality data from chemical sensors, and transportation data from traffic sensors. The results of extensive experiments show that HCAD achieves higher detecting accuracy and efficiency than state-of-the-art methods.

CAT: Beyond Efficient Transformer for Content-Aware Anomaly Detection in Event Sequences

August 18, 2022/in Publications/by NEC Labs America

It is critical and important to detect anomalies in event sequences, which becomes widely available in many application domains. Indeed, various efforts have been made to capture abnormal patterns from event sequences through sequential pattern analysis or event representation learning. However, existing approaches usually ignore the semantic information of event content. To this end, in this paper, we propose a self-attentive encoder-decoder transformer framework, Content-Aware Transformer CAT, for anomaly detection in event sequences. In CAT, the encoder learns preamble event sequence representations with content awareness, and the decoder embeds sequences under detection into a latent space, where anomalies are distinguishable. Specifically, the event content is first fed to a content-awareness layer, generating representations of each event. The encoder accepts preamble event representation sequence, generating feature maps. In the decoder, an additional token is added at the beginning of the sequence under detection, denoting the sequence status. A one-class objective together with sequence reconstruction loss is collectively applied to train our framework under the label efficiency scheme. Furthermore, CAT is optimized under a scalable and efficient setting. Finally, extensive experiments on three real-world datasets demonstrate the superiority of CAT.

Towards Learning Disentangled Representations for Time Series

August 18, 2022/in Publications/by NEC Labs America

Promising progress has been made toward learning efficient time series representations in recent years, but the learned representations often lack interpretability and do not encode semantic meanings by the complex interactions of many latent factors. Learning representations that disentangle these latent factors can bring semantic-rich representations of time series and further enhance interpretability. However, directly adopting the sequential models, such as Long Short-Term Memory Variational AutoEncoder (LSTM-VAE), would encounter a Kullback?Leibler (KL) vanishing problem: the LSTM decoder often generates sequential data without efficiently using latent representations, and the latent spaces sometimes could even be independent of the observation space. And traditional disentanglement methods may intensify the trend of KL vanishing along with the disentanglement process, because they tend to penalize the mutual information between the latent space and the observations. In this paper, we propose Disentangle Time-Series, a novel disentanglement enhancement framework for time series data. Our framework achieves multi-level disentanglement by covering both individual latent factors and group semantic segments. We propose augmenting the original VAE objective by decomposing the evidence lower-bound and extracting evidence linking factorial representations to disentanglement. Additionally, we introduce a mutual information maximization term between the observation space to the latent space to alleviate the KL vanishing problem while preserving the disentanglement property. Experimental results on five real-world IoT datasets demonstrate that the representations learned by DTS achieve superior performance in various tasks with better interpretability.

SEED: Sound Event Early Detection via Evidential Uncertainty

May 27, 2022/in Publications/by NEC Labs America

Sound Event Early Detection (SEED) is an essential task in recognizing the acoustic environments and soundscapes. However, most of the existing methods focus on the offline sound event detection, which suffers from the over-confidence issue of early-stage event detection and usually yield unreliable results. To solve the problem, we propose a novel Polyphonic Evidential Neural Network (PENet) to model the evidential uncertainty of the class probability with Beta distribution. Specifically, we use a Beta distribution to model the distribution of class probabilities, and the evidential uncertainty enriches uncertainty representation with evidence information, which plays a central role in reliable prediction. To further improve the event detection performance, we design the backtrack inference method that utilizes both the forward and backward audio features of an ongoing event. Experiments on the DESED database show that the proposed method can simultaneously improve 13.0% and 3.8% in time delay and detection F1 score compared to the state-of-the-art methods.