Integrated Systems

Our Integrated Systems department innovates, designs, and prototypes high-performance intelligent distributed systems, applications, and services on complex, large-scale communication networks like 5G and beyond. We develop next-generation wireless technologies for sensing the world, localizing critical assets, and improving the capacity, coverage, and scalability of communication networks like 5G and beyond.

New application needs have always sparked human innovation. A decade ago, cloud computing enabled high-value enterprise services with a global reach and scale but with several minutes or seconds of delay. Large-scale services like enterprise resource planning (ERP) were a corner-case scenario, often designed as one-off systems. Today, applications like social networks, automated trading, and video streaming have made large-scale services the norm rather than the exception. In the future, advances in 5G networks and an explosion in smart devices, microservices, databases, networking, and computing tiers will make services so complex that humans cannot tune or manage them.

The sheer scale, dynamic nature, and concurrency in services on 5G slices will require them to be intelligent and autonomic. They will need to continuously self-assess, learn, and automatically adjust for resource needs, data quality, and service reliability. The need for increased efficiency and reduced latency between measurement and action drives our design of real-time distributed systems for feature extraction, computation, and machine learning on multimodal streaming data. We are conducting extensive research on creating end-to-end solutions using multimodal sensing technologies in the retail, public safety, and transportation domains.

Our 5G cellular network research encompasses the development of technologies on the Radio Access Network (RAN), the mobile edge, and the 5G LAN. Within the RAN, we are developing technologies that optimize massive MIMO/MU-MIMO deployments and millimeter-wave access (e.g., transmission at 28 GHz to nomadic/mobile users). At the mobile edge (MEC), we focus on virtualization, scalability, and cloud deployment of appropriate services. Our 5G LAN research extends the benefits of 5G slicing technology to enterprise LANs to position the enterprise as the new MEC.

Read our news and publications from our world-class team of researchers from our Integrated Systems department.

Posts

SimCache: Similarity Caching for Efficient VLM-based Scene Understanding

Scene understanding systems analyze visual contexts by detecting objects, their attributes, and the interactions among them to provide a holistic interpretation. Understanding a scene requires analyzing multiple salient regions within a single video frame. Recently, Vision-Language Models (VLMs) have emerged as powerful tools for scene understanding, leveraging learned world knowledge to enable deployment without specialized training or fine-tuning. However, deploying VLMs in real-time applications is challenging due to their high computational and memory requirements, which limit processing throughput. We propose SimCache, a novel software-based caching mechanism that optimizes VLM-based scene understanding systems by reducing redundant computations. SimCache stores the embedding representation of a salient region and its detected activity, enabling reuse of VLM computations for similar regions in future frames. Specifically, SimCache exploits two types of redundancy: (1) temporal locality, reusing computations for similar regions across adjacent frames, and (2) semantic locality, reusing computations for visually distinct regions that represent the same activity at different times. SimCache includes a multi-tier cache architecture with specialized cache search and refinement policies to exploit redundancy efficiently and accurately. Experiments on action recognition datasets demonstrate that SimCache improves system throughput by up to 9.4× and reduces VLM computations by up to 24.4× with minimal accuracy loss.

Efficient Semantic Communication Through Transformer-Aided Compression

Transformers, known for their attention mechanisms, have proven highly effective in focusing on critical elements within complex data. This feature can effectively be used to address the time-varying channels in wireless communication systems. In this work, we introduce a channel-aware adaptive framework for semantic communication, where different regions of the image are encoded and compressed based on their semantic content. By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches and dynamically categorize the patches to be compressed at various rates as a function of the instantaneous channel bandwidth. Our method enhances communication efficiency by adapting the encoding resolution to the content’s relevance, ensuring that even in highly constrained environments, critical information is preserved. We evaluate the proposed adaptive transmission framework using the TinyImageNet dataset, measuring both reconstruction quality and accuracy. The results demonstrate that our approach maintains high semantic fidelity while optimizing bandwidth, providing an effective solution for transmitting multiresolution data in limited bandwidth conditions.

Latency-driven Execution of LLM-generated Application Code on the Computing Continuum

Latency-critical applications demand quick responses. Ideally, detailed insights are preferable for the best decision making and response actions. However, in situations when detailed insights cannot be provided quickly, even basic information goes a long way in tackling the situation effectively. For example, in marine security application, it is critical to immediately notify as soon as an unauthorized vessel is seen. Hence, timely response may be prioritized over the response based on entire details. To address such latency-critical situations, in this paper, we propose a novel system called DiCE-EC, which leverages LLM to generate distributed code with speculative execution on Edge (fast and simple response using resource constrained hardware) and Cloud (detailed response using powerful hardware, but may be fast or slow depending on network conditions). DiCE-EC breaks down application into smaller components and executes them asynchronously across the edge and cloud computing continuum. As network conditions vary, we show through real-world marine security application, that DiCE-EC is effective in dynamically choosing detailed insights from cloud when received within latency-constraint, or falling back to simple response from edge to guarantee timely alert delivery. Without such dynamic selection of response from edge or cloud, existing systems either always provide simple responses or drop alerts. We perform real network measurements in the Gulf of Pozzuoli in Naples, Italy along accessible areas (inland and in a Ferry) and generate 1 million realistic measurements across four inaccessible regions, and demonstrate that DiCE-EC never misses an alert, while baseline misses up to ?4% alerts with real data and up to ?1% (10,000 alerts) with generated data.

LLM-based Distributed Code Generation and Cost-Efficient Execution in the Cloud

The advancement of Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), is reshaping the software industry by automating code generation. Many LLM-driven distributed processing systems rely on serial code generation constrained by predefined libraries, limiting flexibility and adaptability. While some approaches enhance performance through parallel execution or optimize edge-cloud distributed processing for specific domains, they often overlook the cost implications of deployment, restricting scalability and economic feasibility across diverse cloud environments. This paper presents DiCE-C, a system that eliminates these constraints by starting directly from a natural language query. DiCE-C dynamically identifies available tools at runtime, programmatically refines LLM prompts, and employs a stepwise approach—first generating serial code and then transforming it into distributed code. This adaptive methodology enables efficient distributed execution without dependence on specific libraries. By leveraging high-level parallelism at the Application Programming Interface (API) level and managing API execution as services within a Kubernetes-based runtime, DiCE-C reduces idle GPU time and facilitates the use of smaller, cost-effective GPU instances. Experiments with a vision-based insurance application demonstrate that DiCE-C reduces cloud operational costs by up to 72% when using smaller GPUs (A6000 and A4000 GPU machines vs. A100 GPU machine) and by 32% when using identical GPUs (A100 GPU machines). This flexible and cost-efficient approach makes DiCE-C a scalable solution for deploying LLM-generated vision applications in cloud environments.

Real-Time Network-Aware Roadside LiDAR Data Compression

LiDAR technology has emerged as a pivotal tool in Intelligent Transportation Systems (ITS), providing unique capabilities that have significantly transformed roadside traffic applications. However, this transformation comes with a distinct challenge: the immense volume of data generated by LiDAR sensors. These sensors produce vast amounts of data every second, which can overwhelm both private and public 5G networks that are used to connect intersections. This data volume makes it challenging to stream raw sensor data across multiple intersections effectively. This paper proposes an efficient real-time compression method for roadside LiDAR data. Our approach exploits a special characteristic of roadside LiDAR data: the background points are consistent across all frames. We detect these background points and send them to edge servers only once. For each subsequent frame, we filter out the background points and compress only the remaining data. This process achieves significant temporal compression by eliminating redundant background data and substantial spatial compression by focusing only on the filtered points. Our method is sensor-agnostic, exceptionally fast, memory-efficient, and adaptable to varying network conditions. It offers a 2.5x increase in compression rates and improves application-level accuracy by 40% compared to current state-of-the-art methods.

CAMTUNER: Adaptive Video Analytics Pipelines via Real-time Automated Camera Parameter Tuning

In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition operating on remote servers rely heavily on surveillance cameras to capture high-quality video streams to achieve high accuracy. Modern network cameras offer an array of parameters that directly influence video quality. While a few of such parameters, e.g., exposure, focus and white balance, are automatically adjusted by the camera internally, the others are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this work, we first show that in a typical surveillance camera deployment, environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. Second, since most end-users lack the skill or understanding to appropriately configure these parameters and typically use a fixed parameter setting, we present CAMTUNER, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CAMTUNER is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CAMTUNER enhances VAP accuracy by detecting 15.9% additional persons and 2.6%-4.2% additional cars (without any false positives) in a large enterprise parking lot. CAMTUNER opens up new avenues for elevating video analytics accuracy, transcending mere incremental enhancements achieved through refining deep-learning models.

Optimal Single-User Interactive Beam Alignment with Feedback Delay

Communication in Millimeter wave (mmWave) band relies on narrow beams due to directionality, high path loss, and shadowing. One can use beam alignment (BA) techniques to find and adjust the direction of these narrow beams. In this paper, BA at the base station (BS) is considered, where the BS sends a set of BA packets to scan different angular regions while the user listens to the channel and sends feedback to the BS for each received packet. It is assumed that the packets and feedback received at the user and BS, respectively, can be correctly decoded. Motivated by practical constraints such as propagation delay, a feedback delay for each BA packet is considered. At the end of the BA, the BS allocates a narrow beam to the user including its angle of departure for data transmission and the objective is to maximize the resulting expected beamforming gain. A general framework for studying this problem is proposed based on which a lower bound on the optimal performance as well as an optimality achieving scheme are obtained. Simulation results reveal significant performance improvements over the state-of-the-art BA methods in the presence of feedback delay.

EdgeSync: Efficient Edge-Assisted Video Analytics via Network Contention-Aware Scheduling

With the advancement of 5G, edge-assisted video analytics has become increasingly popular, driven by the technology’s ability to support low-latency, high-bandwidth applications. However, in scenarios where multiple clients competing for network resources, network contention poses a significant challenge. In this paper, we propose a novel scheduling algorithm that intelligently batches and aligns the offloading of multiple video analytics clients to optimize both network and edge server resource utilization while meeting the Service Level Objective (SLO). Experiment with a cellular network testbed shows that our approach successfully processes 93% or more of inference requests from 7 different clients to the edge server while meeting the SLOs, whereas other approaches achieve a lower success rate, ranging from 65% to 85% under the same condition.

G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration

Marine litter detection is crucial for environmental monitoring, yet the imbalance in existing datasets limits model performance in identifying various types of waste accurately. This paper presents an efficient data augmentation pipeline that combines generative diffusion models (e.g., Stable Diffusion) and Large Language Models (LLMs) to expand the G-Litter dataset, a marine litter dataset designed for autonomous detection in heterogeneous environments. Leveraging scalable diffusion models for image generation and Alpaca LLMs for diverse prompt generation, our approach augments underrepresented classes by generating over 200 additional images per class, significantly improving the dataset’s balance. Training G-Litter augmented dataset using YOLOv8 for object detection demonstrated an increase in detection performance, improving recall by 7.82% and mAP50 by 3.87% (compared with baseline results). This study emphasizes the potential for combining generative AI with HPC resources to automate data augmentation on large-scale, unstructured datasets, particularly in edge computing contexts for real-time marine monitoring. The models were tested on real videos captured during simulated missions, demonstrating a superior ability to detect submerged objects in dynamic scenarios. These results highlight the potential of generative AI techniques to improve dataset quality and detection model performance, laying the foundation for further expansion in real-time marine monitoring.

RAG-check: Evaluating Multimodal Retrieval Augmented Generation Performance

Retrieval-augmented generation (RAG) improves large language models (LLMs) by using external knowledge to guide response generation, reducing hallucinations. However, RAG, particularly multi-modal RAG, can introduce new hallucination sources: (i) the retrieval process may select irrelevant pieces (e.g., documents, images) as raw context from the database, and (ii) retrieved images are processed into text-based context via vision-language models (VLMs) or directly used by multi-modal language models (MLLMs) like GPT-4o, which may hallucinate. To address this, we propose a novel framework to evaluate the reliability of multi-modal RAG using two performance measures: (i) the relevancy score (RS), assessing the relevance of retrieved entries to the query, and (ii) the correctness score (CS), evaluating the accuracy of the generated response. We train RS and CS models using a ChatGPT-derived database and human evaluator samples. Results show that both models achieve ~88% accuracy on test data. Additionally, we construct a 5000-sample human-annotated database evaluating the relevancy of retrieved pieces and the correctness of response statements. Our RS model aligns with human preferences 20% more often than CLIP in retrieval, and our CS model matches human preferences ~91% of the time. Finally, we assess various RAG systems’ selection and generation performances using RS and CS.