Integrated Systems

Our Integrated Systems department innovates, designs, and prototypes high-performance intelligent distributed systems, applications, and services on complex, large-scale communication networks like 5G and beyond. We develop next-generation wireless technologies for sensing the world, localizing critical assets, and improving the capacity, coverage, and scalability of communication networks like 5G and beyond.

New application needs have always sparked human innovation. A decade ago, cloud computing enabled high-value enterprise services with a global reach and scale but with several minutes or seconds of delay. Large-scale services like enterprise resource planning (ERP) were a corner-case scenario, often designed as one-off systems. Today, applications like social networks, automated trading, and video streaming have made large-scale services the norm rather than the exception. In the future, advances in 5G networks and an explosion in smart devices, microservices, databases, networking, and computing tiers will make services so complex that humans cannot tune or manage them.

The sheer scale, dynamic nature, and concurrency in services on 5G slices will require them to be intelligent and autonomic. They will need to continuously self-assess, learn, and automatically adjust for resource needs, data quality, and service reliability. The need for increased efficiency and reduced latency between measurement and action drives our design of real-time distributed systems for feature extraction, computation, and machine learning on multimodal streaming data. We are conducting extensive research on creating end-to-end solutions using multimodal sensing technologies in the retail, public safety, and transportation domains.

Our 5G cellular network research encompasses the development of technologies on the Radio Access Network (RAN), the mobile edge, and the 5G LAN. Within the RAN, we are developing technologies that optimize massive MIMO/MU-MIMO deployments and millimeter-wave access (e.g., transmission at 28 GHz to nomadic/mobile users). At the mobile edge (MEC), we focus on virtualization, scalability, and cloud deployment of appropriate services. Our 5G LAN research extends the benefits of 5G slicing technology to enterprise LANs to position the enterprise as the new MEC.

Read our news and publications from our world-class team of researchers from our Integrated Systems department.

Posts

Optimal Single-User Interactive Beam Alignment with Feedback Delay

Communication in Millimeter wave (mmWave) band relies on narrow beams due to directionality, high path loss, and shadowing. One can use beam alignment (BA) techniques to find and adjust the direction of these narrow beams. In this paper, BA at the base station (BS) is considered, where the BS sends a set of BA packets to scan different angular regions while the user listens to the channel and sends feedback to the BS for each received packet. It is assumed that the packets and feedback received at the user and BS, respectively, can be correctly decoded. Motivated by practical constraints such as propagation delay, a feedback delay for each BA packet is considered. At the end of the BA, the BS allocates a narrow beam to the user including its angle of departure for data transmission and the objective is to maximize the resulting expected beamforming gain. A general framework for studying this problem is proposed based on which a lower bound on the optimal performance as well as an optimality achieving scheme are obtained. Simulation results reveal significant performance improvements over the state-of-the-art BA methods in the presence of feedback delay.

EdgeSync: Efficient Edge-Assisted Video Analytics via Network Contention-Aware Scheduling

With the advancement of 5G, edge-assisted video analytics has become increasingly popular, driven by the technology’s ability to support low-latency, high-bandwidth applications. However, in scenarios where multiple clients competing for network resources, network contention poses a significant challenge. In this paper, we propose a novel scheduling algorithm that intelligently batches and aligns the offloading of multiple video analytics clients to optimize both network and edge server resource utilization while meeting the Service Level Objective (SLO). Experiment with a cellular network testbed shows that our approach successfully processes 93% or more of inference requests from 7 different clients to the edge server while meeting the SLOs, whereas other approaches achieve a lower success rate, ranging from 65% to 85% under the same condition.

G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration

Marine litter detection is crucial for environmental monitoring, yet the imbalance in existing datasets limits model performance in identifying various types of waste accurately. This paper presents an efficient data augmentation pipeline that combines generative diffusion models (e.g., Stable Diffusion) and Large Language Models (LLMs) to expand the G-Litter dataset, a marine litter dataset designed for autonomous detection in heterogeneous environments. Leveraging scalable diffusion models for image generation and Alpaca LLMs for diverse prompt generation, our approach augments underrepresented classes by generating over 200 additional images per class, significantly improving the dataset’s balance. Training G-Litter augmented dataset using YOLOv8 for object detection demonstrated an increase in detection performance, improving recall by 7.82% and mAP50 by 3.87% (compared with baseline results). This study emphasizes the potential for combining generative AI with HPC resources to automate data augmentation on large-scale, unstructured datasets, particularly in edge computing contexts for real-time marine monitoring. The models were tested on real videos captured during simulated missions, demonstrating a superior ability to detect submerged objects in dynamic scenarios. These results highlight the potential of generative AI techniques to improve dataset quality and detection model performance, laying the foundation for further expansion in real-time marine monitoring.

RAG-check: Evaluating Multimodal Retrieval Augmented Generation Performance

Retrieval-augmented generation (RAG) improves large language models (LLMs) by using external knowledge to guide response generation, reducing hallucinations. However, RAG, particularly multi-modal RAG, can introduce new hallucination sources: (i) the retrieval process may select irrelevant pieces (e.g., documents, images) as raw context from the database, and (ii) retrieved images are processed into text-based context via vision-language models (VLMs) or directly used by multi-modal language models (MLLMs) like GPT-4o, which may hallucinate. To address this, we propose a novel framework to evaluate the reliability of multi-modal RAG using two performance measures: (i) the relevancy score (RS), assessing the relevance of retrieved entries to the query, and (ii) the correctness score (CS), evaluating the accuracy of the generated response. We train RS and CS models using a ChatGPT-derived database and human evaluator samples. Results show that both models achieve ~88% accuracy on test data. Additionally, we construct a 5000-sample human-annotated database evaluating the relevancy of retrieved pieces and the correctness of response statements. Our RS model aligns with human preferences 20% more often than CLIP in retrieval, and our CS model matches human preferences ~91% of the time. Finally, we assess various RAG systems’ selection and generation performances using RS and CS.

DiCE-M: Distributed Code Generation and Execution for Marine Applications – An Edge-Cloud Approach

Edge computing has emerged as a transformative technology that reduces application latency, improves cost efficiency, enhances security, and enables large-scale deployment of applications across various domains. In environmental monitoring, systems such as MegaSense[49], use low-cost sensors to gather and process real-time air quality data through edge-cloud collaboration, highlighting the critical role of edge computing in enabling scalable, efficient solutions. Similarly, marine science increasingly requires real-time processing and analysis of marine data from remote, resource-constrained environments. In this paper, we extend the power of edge computing by integrating it with Generative Artificial Intelligence(GenAI),specifically large language models (LLMs), to address challenges in marine science applications. We propose DiCE-M (Distributed Code generation and Execution for Marine applications), a robust system that uses LLM to generate distributed code for marine applications and then utilizes a runtime to efficiently execute it on an edge+cloud computing infrastructure. Specifically, DiCE-M leverages edge computing to execute lightweight AI models locally on unmanned surface vehicles(USVs)while offloading complex tasks to the cloud, thus balancing computational load and enabling realtime monitoring in marine environments. We use marine litter identification as an example application to demonstrate the utility of DiCE-M. Our results show that DiCE-M reduces latency by more than 2X when marine litter is not detected and cuts cloud computing costs by more than half compared to traditional cloud-based approaches. By selectively cropping and transmitting relevant image portions, DiCE-M further improves bandwidth efficiency, making it a reliable and cost-effective solution for deploying AI-driven applications on resource-constrained USVs in dynamic marine environments.

DiCE: Distributed Code generation and Execution

Generative artificial intelligence (GenAI), specifically, Large Language Models (LLMs), have shown tremendous potential in automating several tasks and improving human productivity. Recent works have shown them to be quite useful in writing and summarizing text (articles, blogs, poems, stories, songs, etc.), answering questions, brainstorming ideas, and even writing code. Several LLMs have emerged specifically targeting code generation. Given a prompt, these LLMs can generate code in any desired programming language. Many tools like ChatGPT, CoPilot, CodeWhisperer, Cody, DeepSeek Coder, StarCoder, etc. are now routinely being used by software developers. However, most of the prior work in automatic code generation using LLMs is focused on obtaining “correct” and working code, and mainly runs on a single computer (serial code). In this paper, we take this to the next level, where LLMs are leveraged to generate code for execution on a distributed infrastructure. We propose a novel system called DiCE, which takes serial code as input and automatically generates distributed version of the code and efficiently executes it on a distributed setup. DiCE consists of two main components (a) LLM-based tool (Synthia) to understand dependencies in serial code and automatically generate distributed version of the code using specialized programming model and semantics, and (b) Runtime (Hermod) to understand the semantics in the distributed code and realize efficient execution on a cluster of machines (distributed infrastructure). DiCE currently focuses on visual programs synthesized by tools like ViperGPT [1] and VisReP [2] (serial code), automatically identifies higher-level task parallelism opportunities (e.g., parallel object detection), transforms the code to exploit the parallelism, and finally efficiently executes it on a cluster of machines. Through our experiments using 100 examples from the GQA dataset [3], we show that the serial codes generated by ViperGPT are successfully transformed into distributed codes which are then efficiently executed on a cluster of machines by DiCE. We note that DiCE correctly identifies opportunities for parallelism and distributes tasks on separate GPUs within the cluster. We observe an average speed-up of 2X, 2.95X, and 3.7X, and an average efficiency of 1, 0.74 and 0.48 for a cluster of 2 nodes, 4 nodes, and 8 nodes, respectively.

Transformer-Aided Semantic Communications

The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.

iRAG: Advancing RAG for Videos with an Incremental Approach

Retrieval-augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for understanding of videos is appealing but there are two critical limitations. One-time, upfront conversion of all content in large corpus of videos into text descriptions entails high processing times. Also, not all information in the rich video data is typically captured in the text descriptions. Since user queries are not known apriori, developing a system for video to text conversion and interactive querying of video data is challenging.To address these limitations, we propose an incremental RAG system called iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of a large corpus of videos. Unlike traditional RAG, iRAG quickly indexes large repositories of videos, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the videos to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long video to text conversion times, and overcomes information loss issues due to conversion of video to text, by doing on-demand query-specific extraction of details in video data. This ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of a large corpus of videos. Experimental results on real-world datasets demonstrate 23x to 25x faster video to text ingestion, while ensuring that latency and quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any user querying.

TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs

Traffic cameras are essential in urban areas, playing a crucial role in intelligent transportation systems. Multiple cameras at intersections enhance law enforcement capabilities, traffic management, and pedestrian safety. However, efficiently managing and analyzing multi-camera feeds poses challenges due to the vast amount of data. Analyzing such huge video data requires advanced analytical tools. While Large Language Models (LLMs) like ChatGPT, equipped with retrieval-augmented generation (RAG) systems, excel in text-based tasks, integrating them into traffic video analysis demands converting video data into text using a Vision-Language Model (VLM), which is time-consuming and delays the timely utilization of traffic videos for generating insights and investigating incidents. To address these challenges, we propose TrafficLens, a tailored algorithm for multi-camera traffic intersections. TrafficLens employs a sequential approach, utilizing overlapping coverage areas of cameras. It iteratively applies VLMs with varying token limits, using previous outputs as prompts for subsequent cameras, enabling rapid generation of detailed textual descriptions while reducing processing time. Additionally, TrafficLens intelligently bypasses redundant VLM invocations through an object-level similarity detector. Experimental results with real-world datasets demonstrate that TrafficLens reduces video-to-text conversion time by up to 4× while maintaining information accuracy.

Optimizing LLM API usage costs with novel query-aware reduction of relevant enterprise data

Costs of LLM API usage rise rapidly when proprietary enterprise data is used as context for user queries to generate more accurate responses from LLMs. To reduce costs, we propose LeanContext, which generates query-aware, compact and AI model-friendly summaries of relevant enterprise data context. This is unlike traditional summarizers that produce query-unaware human-friendly summaries that are also not as compact. We first use retrieval augmented generation (RAG) to generate a query-aware enterprise data context, which includes key, query-relevant enterprise data. Then, we use reinforcement learning to further reduce the context while ensuring that a prompt consisting of the user query and the reduced context elicits an LLM response that is just as accurate as the LLM response to a prompt that uses the original enterprise data context. Our reduced context is not only query-dependent, but it is also variable-sized. Our experimental results demonstrate that LeanContext (a) reduces costs of LLM API usage by 37% to 68% (compared to RAG), while maintaining the accuracy of the LLM response, and (b) improves accuracy of responses by 26% to 38% when state-of-the-art summarizers reduce RAG context.