Xujiang Zhao NEC Labs America

Xujiang Zhao

Researcher

Data Science and System Security

Posts

Pruning as a Domain-specific LLM Extractor

Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the target domain or generality on different tasks when applied to domain-specific challenges. This work introduces an innovative unstructured dual-pruning methodology, D-PRUNER, for domain-specific compression on LLM. It extracts a compressed, domain-specific, and task agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. More specifically, we first assess general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, we utilize this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. Moreover, by efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Our comprehensive experiments across various tasks in healthcare and legal domains show the effectiveness of D-PRUNER in domain-specific compression. Our code is available at https: //github.com/psunlpgroup/D-Pruner.

Uncertainty Quantification for In-Context Learning of Large Language Models

In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) and revolutionized various fields by providing a few task-relevant demonstrations in the prompt. However, trustworthy issues with LLM’s response, such as hallucination, have also been actively discussed. Existing works have been devoted to quantifying the uncertainty in LLM’s response, but they often overlook the complex nature of LLMs and the uniqueness of in-context learning. In this work, we delve into the predictive uncertainty of LLMs associated with in-context learning, highlighting that such uncertainties may stem from both the provided demonstrations (aleatoric uncertainty) and ambiguities tied to the model’s configurations (epistemic uncertainty). We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion. Extensive experiments are conducted to demonstrate the effectiveness of the decomposition. The code and data are available at: https://github.com/lingchen0331/UQ_ICL.

Advancing Sustainability in Global Supply Chains through Agent-based Simulation

In today’s world, with its complex global supply chains, the difficulties and uncertainties we face offer both challenges and opportunities for making things better, especially in terms of efficiency and sustainability. These challenges grow due to unpredictable events, such as natural disasters, unexpected incidents, and unusual business practices, pushing us towards more advanced modeling methods that focus on reducing risks and enhancing sustainability. In this paper, we present a new agent-based simulation approach that goes beyond the usual limits of supply chain simulations by incorporating sustainability directly into supply chain operations using reinforcement learning (RL) algorithms. We introduce MOGI, a sustainable supply chain simulation system that takes carbon emissions into account in its main operations. Additionally, we examine how effective a multi-agent RL strategy is in dealing with the complex and uncertain nature of supply chains that span multiple levels. By comparing this strategy with traditional heuristic methods, our study looks at how well single versus multiple RL agents can manage risks and improve sustainability in both the beginning and end parts of the supply chain. The results of our experiments show that strategies based on RL are much better than traditional methods at managing risks, making profits, and achieving sustainability goals.

Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text, typically in the form of (subject, relation, object) triples. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks due to two key issues. First, LLMs struggle to distinguish irrelevant context from relevant relations and generate structured output due to the restrictions on fine-tuning the model. Second, LLMs generate responses autoregressively based on probability, which makes the predicted relations lack confidence. In this paper, we assess the capabilities of LLMs in improving the OIE task. Particularly, we propose various in-context learning strategies to enhance LLM’s instruction-following ability and a demonstration uncertainty quantification module to enhance the confidence of the generated relations. Our experiments on three OIE benchmark datasets show that our approach holds its own against established supervised methods, both quantitatively and qualitatively.

Open-Ended Commonsense Reasoning with Unrestricted Answer Scope

Open-ended Commonsense Reasoning is defined as solving a commonsense question without providing 1) a short list of answer candidates and 2) a pre-defined answer scope. Conventional ways of formulating the commonsense question into a question-answering form or utilizing external knowledge to learn retrieval-based methods are less applicable in the open-ended setting due to an inherent challenge. Without pre-defining an answer scope or a few candidates, open-ended commonsense reasoning entails predicting answers by searching over an extremely large searching space. Moreover, most questions require implicit multi-hop reasoning, which presents even more challenges to our problem. In this work, we leverage pre-trained language models to iteratively retrieve reasoning paths on the external knowledge base, which does not require task-specific supervision. The reasoning paths can help to identify the most precise answer to the commonsense question. We conduct experiments on two commonsense benchmark datasets. Compared to other approaches, our proposed method achieves better performance both quantitatively and qualitatively.

Adaptation Speed Analysis for Fairness-Aware Causal Models

For example, in machine translation tasks, to achieve bidirectional translation between two languages, the source corpus is often used as the target corpus, which involves the training of two models with opposite directions. The question of which one can adapt most quickly to a domain shift is of significant importance in many fields. Specifically, consider an original distribution p that changes due to an unknown intervention, resulting in a modified distribution p*. In aligning p with p*, several factors can affect the adaptation rate, including the causal dependencies between variables in p. In real-life scenarios, however, we have to consider the fairness of the training process, and it is particularly crucial to involve a sensitive variable (bias) present between a cause and an effect variable. To explore this scenario, we examine a simple structural causal model (SCM) with a cause-bias-effect structure, where variable A acts as a sensitive variable between cause (X) and effect (Y). The two models respectively exhibit consistent and contrary cause-effect directions in the cause-bias-effect SCM. After conducting unknown interventions on variables within the SCM, we can simulate some kinds of domain shifts for analysis. We then compare the adaptation speeds of two models across four shift scenarios. Additionally, we prove the connection between the adaptation speeds of the two models across all interventions.

Calibrate Graph Neural Networks under Out-of-Distribution Nodes via Deep Q-learning

Graph neural networks (GNNs) have achieved great success in dealing with graph-structured data that are prevalent in the real world. The core of graph neural networks is the message passing mechanism that aims to generate the embeddings of nodes by aggregating the neighboring node information. However, recent work suggests that GNNs also suffer the trustworthiness issues. Our empirical study shows that the calibration error of the in-distribution (ID) nodes would be exacerbated if a graph is mixed with out-of-distribution (OOD) nodes, and we assume that the noisy information from OOD nodes is the root for the worsened calibration error. Both previous study and our empirical study suggest that adjusting the weights of edges could be a promising way to reduce the adverse impact from the OOD nodes. However, how to precisely select the desired edges and modify the corresponding weights is not trivial, since the distribution of OOD nodes is unknown to us. To tackle this problem, we propose a Graph Edge Re-weighting via Deep Q-learning (GERDQ) framework to calibrate the graph neural networks. Our framework aims to explore the potential influence of the change of the edge weights on target ID nodes by sampling and traversing the edges in the graph, and we formulate this process as a Markov Decision Process (MDP). Many existing GNNs could be seamlessly incorporated into our framework. Experimental results show that when wrapped with our method, the existing GNN models can yield lower calibration error under OOD nodes as well as comparable accuracy compared to the original ones and other strong baselines. The source code is available at:https://github.com/DamoSWL/Calibration-GNN-OOD.

Multi-Label Temporal Evidential Neural Networks for Early Event Detection

Early event detection aims to detect events even before the event is complete. However, most of the existing methods focus on an event with a single label but fail to be applied to cases with multiple labels. Another non-negligible issue for early event detection is a prediction with overconfidence due to the high vacuity uncertainty that exists in the early time series. It results in an over-confidence estimation and hence unreliable predictions. To this end, technically, we propose a novel framework, Multi-Label Temporal Evidential Neural Network (MTENN), for multi-label uncertainty estimation in temporal data. MTENN is able to quality predictive uncertainty due to the lack of evidence for multi-label classifications at each time stamp based on belief/evidence theory. In addition, we introduce a novel uncertainty estimation head (weighted binomial comultiplication (WBC)) to quantify the fused uncertainty of a sub-sequence for early event detection. We validate the performance of our approach with state-of-the-art techniques on real-world audio datasets.

Beyond One Model Fits All: A Survey of Domain Specialization for Large Language Models

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task agnostic foundation for a wide range of applications. The great promise of LLMs as general task solvers motivated people to extend their functionality largely beyond just a “chatbot”, and use it as an assistant or even replacement for domain experts and tools in specific domains such as healthcare, finance, and education. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). To fill such a gap, explosively increase research, and practices have been conducted in very recent years on the domain specialization of LLMs, which, however, calls for a comprehensive and systematic review to better summarizes and guide this promising domain. In this survey paper, first, we propose a systematic taxonomy that categorizes the LLM domain specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. We also present a comprehensive taxonomy of critical application domains that can benefit from specialized LLMs, discussing their practical significance and open challenges. Furthermore, we offer insights into the current research status and future trends in this area.

Dynamic Prompting: A Unified Framework for Prompt Tuning

It has been demonstrated that prompt tuning is highly effective in efficiently eliciting knowledge from language models (LMs). However, the prompt tuning still lags behind fine tuning, especially when the LMs are small. P tuning v2 (Liu et al., 2021b) makes it comparable with finetuning by adding continuous prompts for every layer of the pre trained model. However, prepending fixed soft prompts for all instances, regardless of their discrepancy, is doubtful. In particular, the inserted prompt position, length, and the representations ofprompts for diversified instances through different tasks could all affect the prompt tuning performance. To fill this gap, we propose dynamic prompting (DP): the position, length, and prompt representation can all be dynamically optimized with respect to different tasks and instances. We conduct comprehensive experiments on the SuperGlue benchmark tovalidate our hypothesis and demonstrate substantial improvements. We also derive a unified framework for supporting our dynamic prompting strategy. In particular, we use a simple learning network and Gumble Softmax for learning instance dependent guidance. Experimental results show that simple instance level position aware soft prompts can improve the classification accuracy of up to 6 points on average on five datasets, reducing its gap with fine tuning. Besides, we also prove its universal usefulness under full data, few shot, andmultitask regimes. Combining them together can even further unleash the power of DP, narrowing the distance between fine tuning.