Xujiang Zhao NEC Labs America

Xujiang Zhao

Researcher

Data Science and System Security

Posts

Position Really Matters: Towards a Holistic Approach for Prompt Tuning

Prompt tuning is highly effective in efficiently extracting knowledge from foundation models, encompassing both language, vision, and vision-language models. However, the efficacy of employing fixed soft prompts with a predetermined position for concatenation with inputs for all instances, irrespective of their inherent disparities, remains uncertain. Variables such as the position, length, and representations of prompts across diverse instances and tasks can substantially influence the performance of prompt tuning. We first provide a theoretical analysis, revealing that optimizing the position of the prompt to encompass the input can capture additional semantic information that traditional prefix or postfix prompt tuning methods fail to capture. Then, we present a holistic parametric prompt tuning strategy that dynamically determines different factors of prompts based on specific tasks or instances. Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks, including NLP, vision recognition, and vision-language tasks. Furthermore, we establish the universal applicability of our approach under full-data, few-shot, and multitask settings.

MixLLM: Dynamic Routing in Mixed Large Language Models

Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency. Given mixed LLMs with their own strengths and weaknesses, LLM routing aims to identify the most suitable model for each query in the stream to maximize response quality and minimize cost and latency. However, the challenges involve: (1) dynamic trade-offs among quality, cost, and latency; (2) enabling continual learning in deployed systems; and (3) navigating a varying (e.g., new LLM addition or old LLM removal) set of LLM candidates over time. To bridge these gaps, we develop MixLLM, a dynamic contextual-banditbased routing system for query-LLM assignment. Specifically, we first leverage query tags to enhance query embeddings for the routing task. Next, we design lightweight prediction models to estimate the response qualities and costs of queries over LLMs. We then devise a meta-decision maker to choose the query-LLM assignments to best tradeoff response quality, cost, and latency. Finally, the system benefits from continual training, allowing it to adapt to evolving queries and user feedback over time. Our extensive experiments show that MixLLM achieves the best trade-offs in response quality, cost, and latency (97.25% of GPT-4’s quality at 24.18% of the cost under the time constraint). 

SFS: Smarter Code Space Search improves LLM Inference Scaling

We frame code generation as a black-box optimization problem within the code space and demonstrate how optimization-inspired techniques can enhance inference scaling. Based on this perspective, we propose SCATTERED FOREST SEARCH (SFS), a novel approach that improves solution diversity and better exploits feedback during evolutionary search. Our theoretical analysis illustrates how these methods help avoid local optima during optimization, leading to more efficient exploration. Extensive experiments on HumanEval, MBPP, APPS, CodeContests, and Leetcode reveal significant performance gains. For instance, our method achieves a pass@1 rate of 67.1% on HumanEval+ and 87.2% on HumanEval with GPT-3.5, marking improvements of 8.6% and 4.3% over the state-of-the-art, while also halving the iterations needed to find the correct solution. Furthermore, our approach scales more efficiently than existing search techniques, including tree search, line search, and repeated sampling.

Large Language Models Can Be Contextual Privacy Protection Learners

The proliferation of Large Language Models (LLMs) has driven considerable interest in fine-tuning them with domain-specific data to create specialized language models. Nevertheless, such domain-specific fine-tuning data often contains contextually sensitive personally identifiable information (PII). Direct fine-tuning LLMs on this data without privacy protection poses a risk of data leakage of sensitive PII during inference time. To address this challenge, we introduce Contextual Privacy Protection Language Models (CPPLM), a novel paradigm for fine-tuning LLMs that effectively injects domain-specific knowledge while safeguarding inference-time data privacy. Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning, etc. Extensive experiments across diverse datasets and scenarios demonstrate the effectiveness of our approaches. In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model s knowledge. Our work underscores the potential for Large Language Models as robust contextual privacy protection learners.

Towards Counterfactual Fairness-aware Domain Generalization in Changing Environments

Recognizing domain generalization as a commonplace challenge in machine learning, data distribution might progressively evolve across a continuum of sequential domains in practical scenarios. While current methodologies primarily concentrate on bolstering model effectiveness within these new domains, they tend to neglect issues of fairness throughout the learning process. In response, we propose an innovative framework known as Disentanglement for Counterfactual Fairness-aware Domain Generalization (DCFDG). This approach adeptly removes domain-specific information and sensitive information from the embedded representation of classification features. To scrutinize the intricate interplay between semantic information, domain-specific information, and sensitive attributes, we systematically partition the exogenous factors into four latent variables. By incorporating fairness regularization, we utilize semantic information exclusively for classification purposes. Empirical validation on synthetic and authentic datasets substantiates the efficacy of our approach, demonstrating elevated accuracy levels while ensuring the preservation of fairness amidst the evolving landscape of continuous domains.

RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on their Riemannian geometry, where the geodesic distances accurately capture the development of correlations. We propose Rio-CPD, a non-parametric correlation-aware online change point detection framework that combines the Riemannian geometry of the manifold of symmetric positive definite matrices and the cumulative sum statistic (CUSUM) for detecting change points. Rio-CPD enhances CUSUM by computing the geodesic distance from present observations to the Frechet mean of previous observations. With careful choice of metrics equipped to the Riemannian geometry, Rio-CPD is simple and computationally efficient. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods in detection accuracy and efficiency.

Knowledge-enhanced Prompt Learning for Open-domain Commonsense Reasoning

Neural language models for commonsense reasoning often formulate the problem as a QA task and make predictions based on learned representations of language after fine-tuning. However, without providing any fine-tuning data and pre-defined answer candidates, can neural language models still answer commonsense reasoning questions only relying on external knowledge? In this work, we investigate a unique yet challenging problem-open-domain commonsense reasoning that aims to answer questions without providing any answer candidates and fine-tuning examples. A team comprising NECLA (NEC Laboratories America) and NEC Digital Business Platform Unit proposed method leverages neural language models to iteratively retrieve reasoning chains on the external knowledge base, which does not require task-specific supervision. The reasoning chains can help to identify the most precise answer to the commonsense question and its corresponding knowledge statements to justify the answer choice. This technology has proven its effectiveness in a diverse array of business domains.

Uncertainty Quantification for In-Context Learning of Large Language Models

In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) and revolutionized various fields by providing a few task-relevant demonstrations in the prompt. However, trustworthy issues with LLM’s response, such as hallucination, have also been actively discussed. Existing works have been devoted to quantifying the uncertainty in LLM’s response, but they often overlook the complex nature of LLMs and the uniqueness of in-context learning. In this work, we delve into the predictive uncertainty of LLMs associated with in-context learning, highlighting that such uncertainties may stem from both the provided demonstrations (aleatoric uncertainty) and ambiguities tied to the model’s configurations (epistemic uncertainty). We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion. Extensive experiments are conducted to demonstrate the effectiveness of the decomposition. The code and data are available at: https://github.com/lingchen0331/UQ_ICL.

Pruning as a Domain-specific LLM Extractor

Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the target domain or generality on different tasks when applied to domain-specific challenges. This work introduces an innovative unstructured dual-pruning methodology, D-PRUNER, for domain-specific compression on LLM. It extracts a compressed, domain-specific, and task agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. More specifically, we first assess general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, we utilize this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. Moreover, by efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Our comprehensive experiments across various tasks in healthcare and legal domains show the effectiveness of D-PRUNER in domain-specific compression. Our code is available at https: //github.com/psunlpgroup/D-Pruner.

Advancing Sustainability in Global Supply Chains through Agent-based Simulation

In today’s world, with its complex global supply chains, the difficulties and uncertainties we face offer both challenges and opportunities for making things better, especially in terms of efficiency and sustainability. These challenges grow due to unpredictable events, such as natural disasters, unexpected incidents, and unusual business practices, pushing us towards more advanced modeling methods that focus on reducing risks and enhancing sustainability. In this paper, we present a new agent-based simulation approach that goes beyond the usual limits of supply chain simulations by incorporating sustainability directly into supply chain operations using reinforcement learning (RL) algorithms. We introduce MOGI, a sustainable supply chain simulation system that takes carbon emissions into account in its main operations. Additionally, we examine how effective a multi-agent RL strategy is in dealing with the complex and uncertain nature of supply chains that span multiple levels. By comparing this strategy with traditional heuristic methods, our study looks at how well single versus multiple RL agents can manage risks and improve sustainability in both the beginning and end parts of the supply chain. The results of our experiments show that strategies based on RL are much better than traditional methods at managing risks, making profits, and achieving sustainability goals.