Wei Cheng NEC Labs America

Wei Cheng

Senior Researcher

Data Science and System Security

Posts

AutoTCL: Automated Time Series Contrastive Learning with Adaptive Augmentations

AutoTCL: Automated Time Series Contrastive Learning with Adaptive Augmentations Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. Thus, although prevalent, contrastive learning with data augmentation has been less studied in the time series domain. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a parameterized augmentation method, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on benchmark datasets demonstrate the highly competitive results of our method, with an average 10.3% reduction in MSE and 7.0% in MAE over the leading baselines.

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations Imitation learning has achieved great success in many sequential decision-making tasks, in which a neural agent is learned by imitating collected human demonstrations. However, existing algorithms typically require a large number of high-quality demonstrations that are difficult and expensive to collect. Usually, a trade-off needs to be made between demonstration quality and quantity in practice. Targeting this problem, in this work we consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. Some pioneering works have been proposed, but they suffer from many limitations, e.g., assuming a demonstration to be of the same optimality throughout time steps and failing to provide any interpretation w.r.t knowledge learned from the noisy set. Addressing these problems, we propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills. Concretely, SDIL consists of a high-level controller to discover skills and a skill-conditioned module to capture action-taking policies and is trained following a two-phase pipeline by first discovering skills with all demonstrations and then adapting the controller to only the clean set. A mutual-information-based regularization and a dynamic sub-demonstration optimality estimator are designed to promote disentanglement in the skill space. Extensive experiments are conducted over two gym environments and a real-world healthcare dataset to demonstrate the superiority of SDIL in learning from sub-optimal demonstrations and its improved interpretability by examining learned skills.

FedSkill: Privacy Preserved Interpretable Skill Learning via Imitation

FedSkill: Privacy Preserved Interpretable Skill Learning via Imitation Imitation learning that replicates experts’ skills via their demonstrations has shown significant success in various decision-making tasks. However, two critical challenges still hinder the deployment of imitation learning techniques in real-world application scenarios. First, existing methods lack the intrinsic interpretability to explicitly explain the underlying rationale of the learned skill and thus making learned policy untrustworthy. Second, due to the scarcity of expert demonstrations from each end user (client), learning a policy based on different data silos is necessary but challenging in privacy-sensitive applications such as finance and healthcare. To this end, we present a privacy-preserved interpretable skill learning framework (FedSkill) that enables global policy learning to incorporate data from different sources and provides explainable interpretations to each local user without violating privacy and data sovereignty. Specifically, our proposed interpretable skill learning model can capture the varying patterns in the trajectories of expert demonstrations, and extract prototypical information as skills that provide implicit guidance for policy learning and explicit explanations in the reasoning process. Moreover, we design a novel aggregation mechanism coupled with the based skill learning model to preserve global information utilization and maintain local interpretability under the federated framework. Thoroughly experiments on three datasets and empirical studies demonstrate that our proposed FedSkill framework not only outperforms state-of-the-art imitation learning methods but also exhibits good interpretability under a federated setting. Our proposed FedSkill framework is the first attempt to bridge the gaps among federated learning, interpretable machine learning, and imitation learning.

Personalized Federated Learning under Mixture Distributions

Personalized Federated Learning under Mixture Distributions The recent trend towards Personalized Federated Learning (PFL) has garnered significant attention as it allows for the training of models that are tailored to each client while maintaining data privacy. However, current PFL techniques primarily focus on modeling the conditional distribution heterogeneity (i.e. concept shift), which can result in suboptimal performance when the distribution of input data across clients diverges (i.e. covariate shift). Additionally, these techniques often lack the ability to adapt to unseen data, further limiting their effectiveness in real-world scenarios. To address these limitations, we propose a novel approach, FedGMM, which utilizes Gaussian mixture models (GMM) to effectively fit the input data distributions across diverse clients. The model parameters are estimated by maximum likelihood estimation utilizing a federated Expectation-Maximization algorithm, which is solved in closed form and does not assume gradient similarity. Furthermore, FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.

Unsupervised Anomaly Detection Under A Multiple Modeling Strategy Via Model Set Optimization Through Transfer Learning

Unsupervised anomaly detection under a multiple modeling strategy via model set optimization through transfer learning Unsupervised anomaly detection approaches have been widely accepted in applications for industrial systems. Industrial systems often operate with multiple modes since they work for multiple purposes or under different conditions. In order to deal with the difficulty of anomaly detection due to multiple operating modes, multiple modeling strategies are employed. However, estimating the optimal set of models is a challenging problem due to the lack of supervision and computational burden. In this paper, we propose DeconAnomaly, a deep learning framework to estimate the optimal set of models using transfer learning for unsupervised anomaly detection under a multiple modeling strategy. It reduces computational burden with transfer learning and optimizes the number of models based on a surrogate metric of detection performance. The experimental results show clear advantages of DeconAnomaly.

Beyond One Model Fits All: A Survey of Domain Specialization for Large Language Models

Beyond One Model Fits All: A Survey of Domain Specialization for Large Language Models Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task agnostic foundation for a wide range of applications. The great promise of LLMs as general task solvers motivated people to extend their functionality largely beyond just a “chatbot”, and use it as an assistant or even replacement for domain experts and tools in specific domains such as healthcare, finance, and education. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). To fill such a gap, explosively increase research, and practices have been conducted in very recent years on the domain specialization of LLMs, which, however, calls for a comprehensive and systematic review to better summarizes and guide this promising domain. In this survey paper, first, we propose a systematic taxonomy that categorizes the LLM domain specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. We also present a comprehensive taxonomy of critical application domains that can benefit from specialized LLMs, discussing their practical significance and open challenges. Furthermore, we offer insights into the current research status and future trends in this area.

Interpretable Skill Learning for Dynamic Treatment Regimes through Imitation

Interpretable Skill Learning for Dynamic Treatment Regimes through Imitation Imitation learning that mimics experts’ skills from their demonstrations has shown great success in discovering dynamic treatment regimes, i.e., the optimal decision rules to treat an individual patient based on related evolving treatment and covariate history. Existing imitation learning methods, however, still lack the capability to interpret the underlying rationales of the learned policy in a faithful way. Moreover, since dynamic treatment regimes for patients often exhibit varying patterns, i.e., symptoms that transit from one to another, the flat policy learned by a vanilla imitation learning method is typically undesired. To this end, we propose an Interpretable Skill Learning (ISL) framework to resolve the aforementioned challenges for dynamic treatment regimes through imitation. The key idea is to model each segment of experts’ demonstrations with a prototype layer and integrate it with the imitation learning layer to enhance the interpretation capability. On one hand, the ISL framework is able to provide interpretable explanations by matching the prototype to exemplar segments during the inference stage, which enables doctors to perform reasoning of the learned demonstrations based on human-understandable patient symptoms and lab results. On the other hand, the obtained skill embedding consisting of prototypes serves as conditional information to the imitation learning layer, which implicitly guides the policy network to provide a more accurate demonstration when the patients’ state switches from one stage to another. Thoroughly empirical studies demonstrate that our proposed ISL technique can achieve better performance than state-of-the-art methods. Moreover, the proposed ISL framework also exhibits good interpretability which cannot be observed in existing methods.

Dynamic Prompting: A Unified Framework for Prompt Tuning

Dynamic Prompting: A Unified Framework for Prompt Tuning It has been demonstrated that prompt tuning is highly effective in efficiently eliciting knowledge from language models (LMs). However, the prompt tuning still lags behind fine tuning, especially when the LMs are small. P tuning v2 (Liu et al., 2021b) makes it comparable with finetuning by adding continuous prompts for every layer of the pre trained model. However, prepending fixed soft prompts for all instances, regardless of their discrepancy, is doubtful. In particular, the inserted prompt position, length, and the representations ofprompts for diversified instances through different tasks could all affect the prompt tuning performance. To fill this gap, we propose dynamic prompting (DP): the position, length, and prompt representation can all be dynamically optimized with respect to different tasks and instances. We conduct comprehensive experiments on the SuperGlue benchmark tovalidate our hypothesis and demonstrate substantial improvements. We also derive a unified framework for supporting our dynamic prompting strategy. In particular, we use a simple learning network and Gumble Softmax for learning instance dependent guidance. Experimental results show that simple instance level position aware soft prompts can improve the classification accuracy of up to 6 points on average on five datasets, reducing its gap with fine tuning. Besides, we also prove its universal usefulness under full data, few shot, andmultitask regimes. Combining them together can even further unleash the power of DP, narrowing the distance between fine tuning.

Exploring the limits of ChatGPT for Query or Aspect based Text Summarization

Exploring the limits of ChatGPT for Query or Aspect based Text Summarization Text summarization has been a crucial problem in natural language processing (NLP) for several decades. It aims to condense lengthy documents into shorter versions while retaining the most critical information. Various methods have been proposed for text summarization, including extractive and abstractive summarization. The emergence of large language models (LLMs) like GPT3 and ChatGPT has recently created significant interest in using these models for text summarization tasks. Recent studies (Goyal et al., 2022, Zhang et al., 2023) have shown that LLMs generated news summaries are already on par with humans. However, the performance of LLMs for more practical applications like aspect or query based summaries is underexplored. To fill this gap, we conducted an evaluation of ChatGPT’s performance on four widely used benchmark datasets, encompassing diverse summaries from Reddit posts, news articles, dialogue meetings, and stories. Our experiments reveal that ChatGPT’s performance is comparable to traditional fine tuning methods in terms of Rouge scores. Moreover, we highlight some unique differences between ChatGPT generated summaries and human references, providing valuable insights into the superpower of ChatGPT for diverse text summarization tasks. Our findings call for new directions in this area, and we plan to conduct further research to systematically examine the characteristics of ChatGPT generated summaries through extensive human evaluation.

Time Series Contrastive Learning with Information-Aware Augmentations

Time Series Contrastive Learning with Information-Aware Augmentations Various contrastive learning approaches have been proposed in recent years and achieve significant empirical success. While effective and prevalent, contrastive learning has been less explored for time series data. A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples, such that an encoder can be trained to learn robust and discriminative representations. Unlike image and language domains where “desired” augmented samples can be generated with the rule of thumb guided by prefabricated human priors, the ad-hoc manual selection of time series augmentations is hindered by their diverse and human-unrecognizable temporal structures. How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question. In this work, we address the problem by encouraging both high fidelity and variety based on information theory. A theoretical analysis leads to the criteria for selecting feasible data augmentations. On top of that, we propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning. Experiments on various datasets show highly competitive performance with up to a 12.0% reduction in MSE on forecasting tasks and up to 3.7% relative improvement in accuracy on classification tasks over the leading baselines.