arXiv (pronounced “archive”) is an online platform and preprint repository that serves as a digital archive for scholarly research papers and scientific manuscripts. It allows researchers from various academic disciplines, including physics, mathematics, computer science, biology, and many others, to share their work with the global scientific community before formal peer review and publication in traditional academic journals.

Posts

Beyond One Model Fits All: A Survey of Domain Specialization for Large Language Models

Beyond One Model Fits All: A Survey of Domain Specialization for Large Language Models Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task agnostic foundation for a wide range of applications. The great promise of LLMs as general task solvers motivated people to extend their functionality largely beyond just a “chatbot”, and use it as an assistant or even replacement for domain experts and tools in specific domains such as healthcare, finance, and education. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). To fill such a gap, explosively increase research, and practices have been conducted in very recent years on the domain specialization of LLMs, which, however, calls for a comprehensive and systematic review to better summarizes and guide this promising domain. In this survey paper, first, we propose a systematic taxonomy that categorizes the LLM domain specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. We also present a comprehensive taxonomy of critical application domains that can benefit from specialized LLMs, discussing their practical significance and open challenges. Furthermore, we offer insights into the current research status and future trends in this area.

Dynamic Prompting: A Unified Framework for Prompt Tuning

Dynamic Prompting: A Unified Framework for Prompt Tuning It has been demonstrated that prompt tuning is highly effective in efficiently eliciting knowledge from language models (LMs). However, the prompt tuning still lags behind fine tuning, especially when the LMs are small. P tuning v2 (Liu et al., 2021b) makes it comparable with finetuning by adding continuous prompts for every layer of the pre trained model. However, prepending fixed soft prompts for all instances, regardless of their discrepancy, is doubtful. In particular, the inserted prompt position, length, and the representations ofprompts for diversified instances through different tasks could all affect the prompt tuning performance. To fill this gap, we propose dynamic prompting (DP): the position, length, and prompt representation can all be dynamically optimized with respect to different tasks and instances. We conduct comprehensive experiments on the SuperGlue benchmark tovalidate our hypothesis and demonstrate substantial improvements. We also derive a unified framework for supporting our dynamic prompting strategy. In particular, we use a simple learning network and Gumble Softmax for learning instance dependent guidance. Experimental results show that simple instance level position aware soft prompts can improve the classification accuracy of up to 6 points on average on five datasets, reducing its gap with fine tuning. Besides, we also prove its universal usefulness under full data, few shot, andmultitask regimes. Combining them together can even further unleash the power of DP, narrowing the distance between fine tuning.

Exploring the limits of ChatGPT for Query or Aspect based Text Summarization

Exploring the limits of ChatGPT for Query or Aspect based Text Summarization Text summarization has been a crucial problem in natural language processing (NLP) for several decades. It aims to condense lengthy documents into shorter versions while retaining the most critical information. Various methods have been proposed for text summarization, including extractive and abstractive summarization. The emergence of large language models (LLMs) like GPT3 and ChatGPT has recently created significant interest in using these models for text summarization tasks. Recent studies (Goyal et al., 2022, Zhang et al., 2023) have shown that LLMs generated news summaries are already on par with humans. However, the performance of LLMs for more practical applications like aspect or query based summaries is underexplored. To fill this gap, we conducted an evaluation of ChatGPT’s performance on four widely used benchmark datasets, encompassing diverse summaries from Reddit posts, news articles, dialogue meetings, and stories. Our experiments reveal that ChatGPT’s performance is comparable to traditional fine tuning methods in terms of Rouge scores. Moreover, we highlight some unique differences between ChatGPT generated summaries and human references, providing valuable insights into the superpower of ChatGPT for diverse text summarization tasks. Our findings call for new directions in this area, and we plan to conduct further research to systematically examine the characteristics of ChatGPT generated summaries through extensive human evaluation.

RoVaR: Robust Multi agent Tracking through Dual layer Diversity in Visual and RF Sensor Fusion

RoVaR: Robust Multi agent Tracking through Dual layer Diversity in Visual and RF Sensor Fusion The plethora of sensors in our commodity devices provides a rich substrate for sensor fused tracking. Yet, today’s solutions are unable to deliver robust and high tracking accuracies across multiple agents in practical, everyday environments a feature central to the future of immersive and collaborative applications. This can be attributed to the limited scope of diversity leveraged by these fusion solutions, preventing them from catering to the multiple dimensions of accuracy, robustness (diverse environmental conditions) and scalability (multiple agents) simultaneously. In this work, we take an important step towards this goal by introducing the notion of dual layer diversity to the problem of sensor fusion in multi agent tracking. We demonstrate that the fusion of complementary tracking modalities, passive/relative (e.g., visual odometry) and active/absolute tracking (e.g., infrastructure assisted RF localization) offer a key first layer of diversity that brings scalability while the second layer of diversity lies in the methodology of fusion, where we bring together the complementary strengths of algorithmic (for robustness) and data driven (for accuracy) approaches. RoVaR is an embodiment of such a dual layer diversity approach that intelligently attends to cross modal information using algorithmic and data driven techniques that jointly share the burden of accurately tracking multiple agents in the wild. Extensive evaluations reveal RoVaR’s multi dimensional benefits in terms of tracking accuracy (median of 15cm), robustness (in unseen environments), light weight (runs in real time on mobile platforms such as Jetson Nano/TX2), to enable practical multi agent immersive applications in everyday environments.

Fast Few Shot Debugging for NLU Test Suites (arXiv)

Read Fast Few shot Debugging for NLU Test Suites (arXiv) from our Machine Learning Department. We study few shot debugging of transformer based natural language understanding models, using recently popularized test suites to not just diagnose but correct a problem. Given a few debugging examples of a certain phenomenon, and a held out test set of the same phenomenon, we aim to maximize accuracy on the phenomenon at a minimal cost of accuracy on the original test set. We examine several methods that are faster than full epoch retraining. We introduce a new fast method, which samples a few in danger examples from the original training set. Compared to fast methods using parameter distance constraints or Kullback Leibler divergence, we achieve superior original accuracy for comparable debugging accuracy.

Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation

Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain, dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well performing model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use the nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB 200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that the nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.

Ordinal Quadruplet: Retrieval of Missing Labels in Ordinal Time Series

Ordinal Quadruplet: Retrieval of Missing Labels in Ordinal Time Series In this paper, we propose an ordered time series classification framework that is robust against missing classes in the training data, i.e., during testing we can prescribe classes that are missing during training. This framework relies on two main components: (1) our newly proposed ordinal quadruplet loss, which forces the model to learn latent representation while preserving the ordinal relation among labels, (2) testing procedure, which utilizes the property of latent representation (order preservation). We conduct experiments based on real world multivariate time series data and show the significant improvement in the prediction of missing labels even with 40% of the classes are missing from training. Compared with the well known triplet loss optimization augmented with interpolation for missing information, in some cases, we nearly double the accuracy.

Multi user Beam Alignment in Presence of Multi path

Multi user Beam Alignment in Presence of Multi path To overcome the high path loss and the intense shadowing in millimeter wave (mmWave) communications, effective beamforming schemes are required which incorporate narrow beams with high beamforming gains. The mmWave channel consists of a few spatial clusters each associated with an angle of departure (AoD). The narrow beams must be aligned with the channel AoDs to increase the beamforming gain. This is achieved through a procedure called beam alignment (BA). Most of the BA schemes in the literature consider channels with a single dominant path while in practice the channel has a few resolvable paths with different AoDs, hence, such BA schemes may not work correctly in the presence of multi path or at the least do not exploit such multipath to achieve diversity or increase robustness. In this paper, we propose an efficient BA scheme in presence of multi path. The proposed BA scheme transmits probing packets using a set of scanning beams and receives feedback for all the scanning beams at the end of the probing phase from each user. We formulate the BA scheme as minimizing the expected value of the average transmission beamwidth under different policies. The policy is defined as a function from the set of received feedback to the set of transmission beams (TB). In order to maximize the number of possible feedback sequences, we prove that the set of scanning beams (SB) has a special form, namely, Tulip Design. Consequently, we rewrite the minimization problem with a set of linear constraints and a reduced number of variables which is solved by using an efficient greedy algorithm.

Codebook Design for Composite Beamforming in Next generation mmWave Systems

In pursuance of the unused spectrum in higher frequencies, millimeter wave (mmWave) bands have a pivotal role. However, the high path loss and poor scattering associated with mmWave communications highlight the necessity of employing effective beamforming techniques. In order to efficiently search for the beam to serve a user and to jointly serve multiple users it is often required to use a composite beam which consists of multiple disjoint lobes. A composite beam covers multiple desired angular coverage intervals (ACIs) and ideally has maximum and uniform gain (smoothness) within each desired ACI, negligible gain (leakage) outside the desired ACIs, and sharp edges. We propose an algorithm for designing such ideal composite codebook by providing an analytical closed form solution with low computational complexity. There is a fundamental trade off between the gain, leakage and smoothness of the beams. Our design allows to achieve different values in such trade off based on changing the design parameters. We highlight the shortcomings of the uniform linear arrays (ULAs) in building arbitrary composite beams. Consequently, we use a recently introduced twin ULA (TULA) antenna structure to effectively resolve these inefficiencies. Numerical results are used to validate the theoretical findings.

SplitBrain: Hybrid Data and Model Parallel Deep Learning

SplitBrain: Hybrid Data and Model Parallel Deep Learning The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as convolutional neural networks using model parallelism (as opposed to data parallelism) is challenging because the complex nature of communication between model shards makes it difficult to partition the computation efficiently across multiple machines with an acceptable trade off. This paper presents SplitBrain, a high performance distributed deep learning framework supporting hybrid data and model parallelism. Specifically, SplitBrain provides layer specific partitioning that co locates compute intensive convolutional layers while sharding memory demanding layers. A novel scalable group communication is proposed to further improve the training throughput with reduced communication overhead. The results show that SplitBrain can achieve nearly linear speedup while saving up to 67% of memory consumption for data and model parallel VGG over CIFAR 10.