arXiv (pronounced “archive”) is an online platform and preprint repository that serves as a digital archive for scholarly research papers and scientific manuscripts. It allows researchers from various academic disciplines, including physics, mathematics, computer science, biology, and many others, to share their work with the global scientific community before formal peer review and publication in traditional academic journals.


Beyond One Model Fits All: A Survey of Domain Specialization for Large Language Models

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task agnostic foundation for a wide range of applications. The great promise of LLMs as general task solvers motivated people to extend their functionality largely beyond just a “chatbot”, and use it as an assistant or even replacement for domain experts and tools in specific domains such as healthcare, finance, and education. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). To fill such a gap, explosively increase research, and practices have been conducted in very recent years on the domain specialization of LLMs, which, however, calls for a comprehensive and systematic review to better summarizes and guide this promising domain. In this survey paper, first, we propose a systematic taxonomy that categorizes the LLM domain specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. We also present a comprehensive taxonomy of critical application domains that can benefit from specialized LLMs, discussing their practical significance and open challenges. Furthermore, we offer insights into the current research status and future trends in this area.

Dynamic Prompting: A Unified Framework for Prompt Tuning

It has been demonstrated that prompt tuning is highly effective in efficiently eliciting knowledge from language models (LMs). However, the prompt tuning still lags behind fine tuning, especially when the LMs are small. P tuning v2 (Liu et al., 2021b) makes it comparable with finetuning by adding continuous prompts for every layer of the pre trained model. However, prepending fixed soft prompts for all instances, regardless of their discrepancy, is doubtful. In particular, the inserted prompt position, length, and the representations ofprompts for diversified instances through different tasks could all affect the prompt tuning performance. To fill this gap, we propose dynamic prompting (DP): the position, length, and prompt representation can all be dynamically optimized with respect to different tasks and instances. We conduct comprehensive experiments on the SuperGlue benchmark tovalidate our hypothesis and demonstrate substantial improvements. We also derive a unified framework for supporting our dynamic prompting strategy. In particular, we use a simple learning network and Gumble Softmax for learning instance dependent guidance. Experimental results show that simple instance level position aware soft prompts can improve the classification accuracy of up to 6 points on average on five datasets, reducing its gap with fine tuning. Besides, we also prove its universal usefulness under full data, few shot, andmultitask regimes. Combining them together can even further unleash the power of DP, narrowing the distance between fine tuning.

Exploring the limits of ChatGPT for Query or Aspect based Text Summarization

Text summarization has been a crucial problem in natural language processing (NLP) for several decades. It aims to condense lengthy documents into shorter versions while retaining the most critical information. Various methods have been proposed for text summarization, including extractive and abstractive summarization. The emergence of large language models (LLMs) like GPT3 and ChatGPT has recently created significant interest in using these models for text summarization tasks. Recent studies (Goyal et al., 2022, Zhang et al., 2023) have shown that LLMs generated news summaries are already on par with humans. However, the performance of LLMs for more practical applications like aspect or query based summaries is underexplored. To fill this gap, we conducted an evaluation of ChatGPT’s performance on four widely used benchmark datasets, encompassing diverse summaries from Reddit posts, news articles, dialogue meetings, and stories. Our experiments reveal that ChatGPT’s performance is comparable to traditional fine tuning methods in terms of Rouge scores. Moreover, we highlight some unique differences between ChatGPT generated summaries and human references, providing valuable insights into the superpower of ChatGPT for diverse text summarization tasks. Our findings call for new directions in this area, and we plan to conduct further research to systematically examine the characteristics of ChatGPT generated summaries through extensive human evaluation.

Confidence and Dispersity Speak – Characterizing Prediction Matrix for Unsupervised Accuracy Estimation

This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain, dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well performing model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use the nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB 200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that the nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.

Ordinal Quadruplet: Retrieval of Missing Labels in Ordinal Time Series

In this paper, we propose an ordered time series classification framework that is robust against missing classes in the training data, i.e., during testing we can prescribe classes that are missing during training. This framework relies on two main components: (1) our newly proposed ordinal quadruplet loss, which forces the model to learn latent representation while preserving the ordinal relation among labels, (2) testing procedure, which utilizes the property of latent representation (order preservation). We conduct experiments based on real world multivariate time series data and show the significant improvement in the prediction of missing labels even with 40% of the classes are missing from training. Compared with the well known triplet loss optimization augmented with interpolation for missing information, in some cases, we nearly double the accuracy.

Codebook Design for Composite Beamforming in Next generation mmWave Systems

In pursuance of the unused spectrum in higher frequencies, millimeter wave (mmWave) bands have a pivotal role. However, the high path loss and poor scattering associated with mmWave communications highlight the necessity of employing effective beamforming techniques. In order to efficiently search for the beam to serve a user and to jointly serve multiple users it is often required to use a composite beam which consists of multiple disjoint lobes. A composite beam covers multiple desired angular coverage intervals (ACIs) and ideally has maximum and uniform gain (smoothness) within each desired ACI, negligible gain (leakage) outside the desired ACIs, and sharp edges. We propose an algorithm for designing such ideal composite codebook by providing an analytical closed form solution with low computational complexity. There is a fundamental trade off between the gain, leakage and smoothness of the beams. Our design allows to achieve different values in such trade off based on changing the design parameters. We highlight the shortcomings of the uniform linear arrays (ULAs) in building arbitrary composite beams. Consequently, we use a recently introduced twin ULA (TULA) antenna structure to effectively resolve these inefficiencies. Numerical results are used to validate the theoretical findings.

Multi user Beam Alignment in Presence of Multi path

To overcome the high path loss and the intense shadowing in millimeter wave (mmWave) communications, effective beamforming schemes are required which incorporate narrow beams with high beamforming gains. The mmWave channel consists of a few spatial clusters each associated with an angle of departure (AoD). The narrow beams must be aligned with the channel AoDs to increase the beamforming gain. This is achieved through a procedure called beam alignment (BA). Most of the BA schemes in the literature consider channels with a single dominant path while in practice the channel has a few resolvable paths with different AoDs, hence, such BA schemes may not work correctly in the presence of multi path or at the least do not exploit such multipath to achieve diversity or increase robustness. In this paper, we propose an efficient BA scheme in presence of multi path. The proposed BA scheme transmits probing packets using a set of scanning beams and receives feedback for all the scanning beams at the end of the probing phase from each user. We formulate the BA scheme as minimizing the expected value of the average transmission beamwidth under different policies. The policy is defined as a function from the set of received feedback to the set of transmission beams (TB). In order to maximize the number of possible feedback sequences, we prove that the set of scanning beams (SB) has a special form, namely, Tulip Design. Consequently, we rewrite the minimization problem with a set of linear constraints and a reduced number of variables which is solved by using an efficient greedy algorithm.

SplitBrain: Hybrid Data and Model Parallel Deep Learning

The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as convolutional neural networks using model parallelism (as opposed to data parallelism) is challenging because the complex nature of communication between model shards makes it difficult to partition the computation efficiently across multiple machines with an acceptable trade off. This paper presents SplitBrain, a high performance distributed deep learning framework supporting hybrid data and model parallelism. Specifically, SplitBrain provides layer specific partitioning that co locates compute intensive convolutional layers while sharding memory demanding layers. A novel scalable group communication is proposed to further improve the training throughput with reduced communication overhead. The results show that SplitBrain can achieve nearly linear speedup while saving up to 67% of memory consumption for data and model parallel VGG over CIFAR 10.

CamTuner: Reinforcement Learning based System for Camera Parameter Tuning to enhance Analytics

Video analytics systems critically rely on video cameras, which capture high quality video frames, to achieve high analytics accuracy. Although modern video cameras often expose tens of configurable parameter settings that can be set by end users, deployment of surveillance cameras today often uses a fixed set of parameter settings because the end users lack the skill or understanding to reconfigure these parameters. In this paper, we first show that in a typical surveillance camera deployment, environmental condition changes can significantly affect the accuracy of analytics units such as person detection, face detection and face recognition, and how such adverse impact can be mitigated by dynamically adjusting camera settings. We then propose CAMTUNER, a framework that can be easily applied to an existing video analytics pipeline (VAP) to enable automatic and dynamic adaptation of complex camera settings to changing environmental conditions, and autonomously optimize the accuracy of analytics units (AUs) in the VAP. CAMTUNER is based on SARSA reinforcement learning (RL) and it incorporates two novel components: a light weight analytics quality estimator and a virtual camera. CAMTUNER is implemented in a system with AXIS surveillance cameras and several VAPs (with various AUs) that processed day long customer videos captured at airport entrances. Our evaluations show that CAMTUNER can adapt quickly to changing environments. We compared CAMTUNER with two alternative approaches where either static camera settings were used, or a strawman approach where camera settings were manually changed every hour (based on human perception of quality). We observed that for the face detection and person detection AUs, CAMTUNER is able to achieve up to 13.8% and 9.2% higher accuracy, respectively, compared to the best of the two approaches (average improvement of 8% for both AUs).

Uncertainty Aware Physically Guided Proxy Tasks for Unseen Domain Face Anti-Spoofing

Face anti-spoofing (FAS) seeks to discriminate genuine faces from fake ones arising from any type of spoofing attack. Due to the wide variety of attacks, it is implausible to obtain training data that spans all attack types. We propose to leverage physical cues to attain better generalization on unseen domains. As a specific demonstration, we use physically guided proxy cues such as depth, reflection, and material to complement our main anti-spoofing (a.k.a liveness detection) task, with the intuition that genuine faces across domains have consistent face like geometry, minimal reflection, and skin material. We introduce a novel uncertainty-aware attention scheme that independently learns to weigh the relative contributions of the main and proxy tasks, preventing the over confident issue with traditional attention modules. Further, we propose attribute-assisted hard negative mining to disentangle liveness irrelevant features with liveness features during learning. We evaluate extensively on public benchmarks with intra-dataset and inter-dataset protocols. Our method achieves superior performance especially in unseen domain generalization for FAS.