The University of California, Riverside (UCR), founded in 1954, is a public institution and part of the UC system, located in Southern California’s Inland Empire. UCR is recognized for its diverse student body, strong research centers, and colleges spanning the Humanities, Engineering, and Natural Sciences, offering a wide range of undergraduate programs. We have collaborated with UC Riverside on AI system optimization, including deep learning architecture refinement and robust classification techniques. Our joint efforts contribute to the development of scalable models suitable for deployment in diverse applications. Please read about our latest news and collaborative publications with the University of California, Riverside.

Posts

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

Visual Language Models (VLMs) like GPT-4V have broadened the scope of LLM applications, yet they face significant challenges in accurately processing visual details, particularly in scientific diagrams. This paper explores the necessity of meticulous visual detail collection and region decomposition for enhancing the performance of VLMs in scientific diagram analysis. We propose a novel approach that combines traditional computer vision techniques with VLMs to systematically decompose diagrams into discernible visual elements and aggregate essential metadata. Our method employs techniques in OpenCV library to identify and label regions, followed by a refinement process using shape detection and region merging algorithms, which are particularly suited to the structured nature of scientific diagrams. This strategy not only improves the granularity and accuracy of visual information processing but also extends the capabilities of VLMs beyond their current limitations. We validate our approach through a series of experiments that demonstrate enhanced performance in diagram analysis tasks, setting a new standard for integrating visual and language processing in a multimodal context.

Distantly-Supervised Joint Extraction with Noise-Robust Learning

Joint entity and relation extraction is a process that identifies entity pairs and their relations using a single model. We focus on the problem of joint extraction in distantly-labeled data,whose labels are generated by aligning entity mentions with the corresponding entity and relation tags using a knowledge base (KB). One key challenge is the presence of noisy labels arising from both incorrect entity and relation annotations, which significantly impairs the quality of supervised learning. Existing approaches, either considering only one source of noise or making decisions using external knowledge, cannot well-utilize significant information in the training data. We propose DENRL, a generalizable framework that 1) incorporates a lightweight transformer backbone into a sequence labeling scheme for joint tagging, and 2) employs a noise-robust framework that regularizes the tagging model with significant relation patterns and entity-relation dependencies, then iteratively self-adapts to instances with less noise from both sources. Surprisingly, experiments1 on two benchmark datasets show that DENRL, using merely its own parametric distribution and simple data-driven heuristics, outperforms large language model-based baselines by a large margin with better interpretability.

GLAD: Content-Aware Dynamic Graphs for Log Anomaly Detection

Logs play a crucial role in system monitoring and debugging by recording valuable system information, including events and status. Although various methods have been proposed to detect anomalies in log sequences, they often overlook the significance of considering relationships among system components, such as services and users, which can be identified from log contents. Understanding these relationships is vital for identifying anomalies and their underlying causes. To address this issue, we introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect relational anomalies in system logs. GLAD incorporates log semantics, relationship patterns, and sequential patterns into a unified framework for anomaly detection. Specifically, GLAD first introduces a field extraction module that utilizes prompt-based few-shot learning to extract essential field information, such as services and users, from log contents. Then GLAD constructs dynamic log graphs for sliding windows by leveraging the log events and extracted fields. These graphs represent events and fields as nodes and their relationships as edges. Subsequently, we propose atemporal-attentive graph edge anomaly detection model for identifying anomalous relationships in the dynamic log graphs. This model employs a Graph Neural Network (GNN)-based encoder enhanced with transformers to capture structural, content, and temporal features. We evaluate our proposed method on three datasets, and the results demonstrate the effectiveness of GLAD in detecting anomalies indicated by varying relation patterns.

Efficient Controllable Multi-Task Architectures

We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost by jointly adjusting the encoder capacity. This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures based on user’s constraints. Our training strategy involves a novel `Configuration-Invariant Knowledge Distillation’ loss that enforces backbone representations to be invariant under different runtime width configurations to enhance accuracy. Further, we present a simple but effective search algorithm that translates user constraints to runtime width configurations of both the shared encoder and task decoders, for sampling the sub-architectures. The key rule for the search algorithm is to provide a larger computational budget to the higher preferred task decoder, while searching a shared encoder configuration that enhances the overall MTL performance. Various experiments on three multi-task benchmarks (PASCALContext, NYUDv2, and CIFAR100-MTL) with diverse backbone architectures demonstrate the advantage of our approach. For example, our method shows a higher controllability by 33.5% in the NYUD-v2 dataset over prior methods, while incurring much less compute cost.

DeepGAR: Deep Graph Learning for Analogical Reasoning

Analogical reasoning is the process of discovering and mapping correspondences from a target subject to a base subject. As the most well-known computational method of analogical reasoning, Structure-Mapping Theory (SMT) abstracts both target and base subjects into relational graphs and forms the cognitive process of analogical reasoning by finding a corresponding subgraph (i.e., correspondence) in the target graph that is aligned with the base graph. However, incorporating deep learning for SMT is still under-explored due to several obstacles: 1) the combinatorial complexity of searching for the correspondence in the target graph, 2) the correspondence mining is restricted by various cognitive theory-driven constraints. To address both challenges, we propose a novel framework for Analogical Reasoning (DeepGAR) that identifies the correspondence between source and target domains by assuring cognitive theory-driven constraints. Specifically, we design a geometric constraint embedding space to induce subgraph relation from node embeddings for efficient subgraph search. Furthermore, we develop novel learning and optimization strategies that could end-to-end identify correspondences that are strictly consistent with constraints driven by the cognitive theory. Extensive experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of the proposed DeepGAR over existing methods. The code and data are available at: https://github.com/triplej0079/DeepGAR.

Controllable Dynamic Multi-Task Architectures

Multi-task learning commonly encounters competition for resources among tasks, specifically when model capacity is limited. This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. In contrast to the existing dynamic multi-task approaches that adjust only the weights within a fixed architecture, our approach affords the flexibility to dynamically control the total computational cost and match the user-preferred task importance better. We propose a disentangled training of two hyper networks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights. Experiments on three multi-task benchmarks, namely PASCAL-Context, NYU-v2, and CIFAR-100, show the efficacy of our approach. Project page is available at https://www.nec-labs.com/-mas/DYMU.

Domain Adaptive Semantic Segmentation using Weak Labels

We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. The weak labels may be obtained based on a model prediction for unsupervised domain adaptation (UDA), or from a human oracle in a new weakly-supervised domain adaptation (WDA) paradigm for semantic segmentation. Using weak labels is both practical and useful, since (i) collecting image-level target annotations is comparably cheap in WDA and incurs no cost in UDA, and (ii) it opens the opportunity for category-wise domain alignment. Our framework uses weak labels to enable the interplay between feature alignment and pseudo-labeling, improving both in the process of domain adaptation. Specifically, we develop a weak-label classification module to enforce the network to attend to certain categories, and then use such training signals to guide the proposed category-wise alignment method. In experiments, we show considerable improvements with respect to the existing state-of-the-arts in UDA and present a new benchmark in the WDA setting.