Posts

Position Really Matters: Towards a Holistic Approach for Prompt Tuning

Prompt tuning is highly effective in efficiently extracting knowledge from foundation models, encompassing both language, vision, and vision-language models. However, the efficacy of employing fixed soft prompts with a predetermined position for concatenation with inputs for all instances, irrespective of their inherent disparities, remains uncertain. Variables such as the position, length, and representations of prompts across diverse instances and tasks can substantially influence the performance of prompt tuning. We first provide a theoretical analysis, revealing that optimizing the position of the prompt to encompass the input can capture additional semantic information that traditional prefix or postfix prompt tuning methods fail to capture. Then, we present a holistic parametric prompt tuning strategy that dynamically determines different factors of prompts based on specific tasks or instances. Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks, including NLP, vision recognition, and vision-language tasks. Furthermore, we establish the universal applicability of our approach under full-data, few-shot, and multitask settings.

MixLLM: Dynamic Routing in Mixed Large Language Models

Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency. Given mixed LLMs with their own strengths and weaknesses, LLM routing aims to identify the most suitable model for each query in the stream to maximize response quality and minimize cost and latency. However, the challenges involve: (1) dynamic trade-offs among quality, cost, and latency; (2) enabling continual learning in deployed systems; and (3) navigating a varying (e.g., new LLM addition or old LLM removal) set of LLM candidates over time. To bridge these gaps, we develop MixLLM, a dynamic contextual-banditbased routing system for query-LLM assignment. Specifically, we first leverage query tags to enhance query embeddings for the routing task. Next, we design lightweight prediction models to estimate the response qualities and costs of queries over LLMs. We then devise a meta-decision maker to choose the query-LLM assignments to best tradeoff response quality, cost, and latency. Finally, the system benefits from continual training, allowing it to adapt to evolving queries and user feedback over time. Our extensive experiments show that MixLLM achieves the best trade-offs in response quality, cost, and latency (97.25% of GPT-4’s quality at 24.18% of the cost under the time constraint). 

DISC: Dynamic Decomposition Improves LLM Inference Scaling (SSI-FM)

Inference scaling methods often rely on decomposing problems into steps, followed by sampling and selecting the best next steps. However, these steps and their sizes are typically fixed or depend on domain knowledge. We propose dynamic decomposition, a method that adaptively and automatically breaks down solution and reasoning traces into manageable steps during inference. By allocating compute more effectively, particularly by subdividing challenging steps and sampling them more frequently, dynamic decomposition significantly enhances inference efficiency. Experiments on benchmarks such as APPS, MATH, and LiveCodeBench demonstrate that dynamic decomposition outperforms static approaches, including token-level, sentence-level, and single-step decompositions. These findings highlight the potential of dynamic decomposition to improve a wide range of inference scaling techniques.

DISC: Dynamic Decomposition Improves LLM Inference Scaling (DL4C)

Inference scaling methods often rely on decomposing problems into steps, followed by sampling and selecting the best next steps. However, these steps and their sizes are typically fixed or depend on domain knowledge. We propose dynamic decomposition, a method that adaptively and automatically breaks down solution and reasoning traces into manageable steps during inference. By allocating compute more effectively—particularly by subdividing challenging steps and sampling them more frequently—dynamic decomposition significantly enhances inference efficiency. Experiments on benchmarks such as APPS, MATH, and LiveCodeBench demonstrate that dynamic decomposition outperforms static approaches, including token-level, sentence-level, and single-step decompositions. These findings highlight the potential of dynamic decomposition to improve a wide range of inference scaling techniques.

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

The advent of large language models (LLMs) has revolutionized the field of text generation, producing outputs that closely mimic human-like writing. Although academic and industrial institutions have developed detectors to prevent the malicious usage of LLM-generated texts, other research has doubt about the robustness of these systems. To stress test these detectors, we introduce a humanized proxy-attack (HUMPA) strategy that effortlessly compromises LLMs, causing them to produce outputs that align with human-written text and mislead detection systems. Our method attacks the source model by leveraging a reinforcement learning (RL) fine-tuned humanized small language model (SLM) in the decoding phase. Through an in-depth analysis, we demonstrate that our attack strategy is capable of generating responses that are indistinguishable to detectors, preventing them from differentiating between machine-generated and human-written text. We conduct systematic evaluations on extensive datasets using proxy-attacked open-source models, including Llama2-13B, Llama3-70B, and Mixtral-8×7B in both white- and black-box settings. Our findings show that the proxy-attack strategy effectively deceives the leading detectors, resulting in an average AUROC drop of 70.4% across multiple datasets, with a maximum drop of 95.0% on a single dataset. Furthermore, in cross-discipline scenarios, our strategy also bypasses these detectors, leading to a significant relative decrease of up to 90.9%, while in cross-language scenario, the drop reaches 91.3%. Despite our proxy-attack strategy successfully bypassing the detectors with such significant relative drops, we find that the generation quality of the attacked models remains preserved, even within a modest utility budget, when compared to the text produced by the original, unattacked source model.

SFS: Smarter Code Space Search improves LLM Inference Scaling

We frame code generation as a black-box optimization problem within the code space and demonstrate how optimization-inspired techniques can enhance inference scaling. Based on this perspective, we propose SCATTERED FOREST SEARCH (SFS), a novel approach that improves solution diversity and better exploits feedback during evolutionary search. Our theoretical analysis illustrates how these methods help avoid local optima during optimization, leading to more efficient exploration. Extensive experiments on HumanEval, MBPP, APPS, CodeContests, and Leetcode reveal significant performance gains. For instance, our method achieves a pass@1 rate of 67.1% on HumanEval+ and 87.2% on HumanEval with GPT-3.5, marking improvements of 8.6% and 4.3% over the state-of-the-art, while also halving the iterations needed to find the correct solution. Furthermore, our approach scales more efficiently than existing search techniques, including tree search, line search, and repeated sampling.

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

Visual Language Models (VLMs) like GPT-4V have broadened the scope of LLM applications, yet they face significant challenges in accurately processing visual details, particularly in scientific diagrams. This paper explores the necessity of meticulous visual detail collection and region decomposition for enhancing the performance of VLMs in scientific diagram analysis. We propose a novel approach that combines traditional computer vision techniques with VLMs to systematically decompose diagrams into discernible visual elements and aggregate essential metadata. Our method employs techniques in OpenCV library to identify and label regions, followed by a refinement process using shape detection and region merging algorithms, which are particularly suited to the structured nature of scientific diagrams. This strategy not only improves the granularity and accuracy of visual information processing but also extends the capabilities of VLMs beyond their current limitations. We validate our approach through a series of experiments that demonstrate enhanced performance in diagram analysis tasks, setting a new standard for integrating visual and language processing in a multimodal context.

TSLA: Unified Time Series and Language Model

Real-world time series data often require analysis or interpretation from domain experts. Some tasks, like time series question answering, involve both time series and natural language questions, posing challenges for single-modality language models to understand their interaction. To this end, we present TSLA (Time Series Language Model), a framework designed to enhance the language model with the understanding of time series data for multi-modality tasks. TSLA comprises three key components. (1) Time Series Tokenizer learns how to represent time series data into discrete tokens, making it more manageable for language models. (2) Joint (Pre-)Training on task-agnostic time series and text data integrates time series tokens and text tokens to model the interplay between time series and language concepts. (3) Multi-task Instruction Tuning fine-tunes the pretrained TSLA for various downstream tasks relevant to user interests. For evaluation, we applied TSLA to time series data from human motions on four tasks: time series captioning, time series question answering, text-based time series synthesis, and text-based time series continuation. The results demonstrate TSLA’s effectiveness in handling multiple time series analysis tasks, pointing the way for future research endeavors.

Graph Neural Networks, Explained: Our Role in the Future of AI

NEC Laboratories America (NECLA) is advancing the frontier of Graph Neural Networks (GNNs), a transformative AI technology that processes complex, interconnected data. Through innovations like PTDNet for robust learning, novel frameworks for explainability, StrGNN for anomaly detection in dynamic graphs, and GERDQ for calibration with out-of-distribution nodes, NECLA is addressing critical challenges in GNN development. These breakthroughs have real-world implications in fields such as cybersecurity, bioinformatics, and recommendation systems, positioning NECLA as a leader in the evolution of graph-based AI.

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

Time series data is essential in various applications, including climate modeling, healthcare monitoring, and financial analytics. Understanding the contextual information associated with real-world time series data is often essential for accurate and reliable event predictions. In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions. In addition, TimeCAP employs a multi-modal encoder that synergizes with the LLM agents, enhancing predictive performance through mutual augmentation of inputs with in-context examples. Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction, including those utilizing LLMs as predictors, achieving an average improvement of 28.75% in F1 score.