Integrated Systems

Read our publications from our world-class team of researchers from our Integrated Systems department which innovates, designs, and prototypes high-performance intelligent distributed systems, applications, and services on complex, large-scale communication networks like 5G and beyond. We develop next-generation wireless technologies for sensing the world, localizing critical assets, and improving the capacity, coverage, and scalability of communication networks like 5G and beyond.

Posts

Deep Learning-Based Real-Time Rate Control for Live Streaming on Wireless Networks

Providing wireless users with high-quality video content has become increasingly important. However, ensuring consistent video quality poses challenges due to variable encodedbitrate caused by dynamic video content and fluctuating channel bitrate caused by wireless fading effects. Suboptimal selection of encoder parameters can lead to video quality loss due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning-based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with respect to the state-of-the art adaptive bitrate video streaming is achieved, with an average packet drop rate as low as 0.002.

CLAP: Cost and Latency-Aware Placement of Microservices on the Computing Continuum

For microservices-based real-time stream processing applications, computing at the edge delivers fast responses for low workloads, but as workload increases, the response time starts to slow down due to limited compute capacity. Abundant compute capacity in the cloud delivers fast responses even for higher workloads but incurs very high cost of operation. For applications which can tolerate latencies up to a certain limit, using either of them has one or the other drawback and for different applications and edge infrastructures, it is non-trivial to decide when to use only edge resources and when to leverage cloud resources. In this paper, we propose CLAP, which dynamically understands the relationship between workload and application latency, and automatically adjusts placement of microservices across edge and cloud computing continuum, with the goal of jointly reducing latency as well as cost of running microservices based streaming applications. CLAP leverages Reinforcement Learning (RL) technique to learn the optimal placement for a given workload and based on the learnings, adjusts placement of microservices as the application workload changes. We conduct experiments with real-world video analytics applications and show that CLAP adapts placement of microservices in response to varying workloads and achieves low latency for applications in a cost-efficient manner. Particularly, we show that for two real world video analytics applications i.e. human attributes and face recognition, CLAP is able to reduce average cost (across 4 days at different locations) by 47% and 58% for human attributes detection and face recognition application, respectively, while consistently maintaining latency below the tolerable limit.

Transformer-Aided Semantic Communications

The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.

iRAG: An Incremental Retrieval Augmented Generation System for Videos

Retrieval augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for combined understanding of multimodal data such as text, images and videos is appealing but two critical limitations exist: one-time, upfront capture of all content in large multimodal data as text descriptions entails high processing times, and not all information in the rich multimodal data is typically in the text descriptions. Since the user queries are not known apriori, developing a system for multimodal to text conversion and interactive querying of multimodal data is challenging.To address these limitations, we propose iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of large corpus of multimodal data. Unlike traditional RAG, iRAG quickly indexes large repositories of multimodal data, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the multimodal data to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long multimodal to text conversion times, overcomes information loss issues by doing on-demand query-specific extraction of details in multimodal data, and ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of large, real-world multimodal data. Experimental results on real-world long videos demonstrate 23x to 25x faster video to text ingestion, while ensuring that quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any querying.

Improving Real-time Data Streams Performance on Autonomous Surface Vehicles using DataX

In the evolving Artificial Intelligence (AI) era, the need for real-time algorithm processing in marine edge environments has become a crucial challenge. Data acquisition, analysis, and processing in complex marine situations require sophisticated and highly efficient platforms. This study optimizes real-time operations on a containerized distributed processing platform designed for Autonomous Surface Vehicles (ASV) to help safeguard the marine environment. The primary objective is to improve the efficiency and speed of data processing by adopting a microservice management system called DataX. DataX leverages containerization to break down operations into modular units, and resource coordination is based on Kubernetes. This combination of technologies enables more efficient resource management and real-time operations optimization, contributing significantly to the success of marine missions. The platform was developed to address the unique challenges of managing data and running advanced algorithms in a marine context, which often involves limited connectivity, high latencies, and energy restrictions. Finally, as a proof of concept to justify this platform’s evolution, experiments were carried out using a cluster of single-board computers equipped with GPUs, running an AI-based marine litter detection application and demonstrating the tangible benefits of this solution and its suitability for the needs of maritime missions.

LARA: Latency-Aware Resource Allocator for Stream Processing Applications

One of the key metrics of interest for stream processing applications is “latency”, which indicates the total time it takes for the application to process and generate insights from streaming input data. For mission-critical video analytics applications like surveillance and monitoring, it is of paramount importance to report an incident as soon as it occurs so that necessary actions can be taken right away. Stream processing applications are typically developed as a chain of microservices and are deployed on container orchestration platforms like Kubernetes. Allocation of system resources like “cpu” and “memory” to individual application microservices has direct impact on “latency”. Kubernetes does provide ways to allocate these resources e.g. through fixed resource allocation or through vertical pod autoscaler (VPA), however there is no straightforward way in Kubernetes to prioritize “latency” for an end-to end application pipeline. In this paper, we present LARA, which is specifically designed to improve “latency” of stream processing application pipelines. LARA uses a regression-based technique for resource allocation to individual microservices. We implement four real-world video analytics application pipelines i.e. license plate recognition, face recognition, human attributes detection and pose detection, and show that compared to fixed allocation, LARA is able to reduce latency by up to ? 2.8X and is consistently better than VPA. While reducing latency, LARA is also able to deliver over 2X throughput compared to fixed allocation and is almost always better than VPA.

Enabling Cooperative Hybrid Beamforming in TDD-based Distributed MIMO Systems

Distributed massive MIMO networks are envisioned to realize cooperative multi-point transmission in next-generation wireless systems. For efficient cooperative hybrid beamforming, the cluster of access points (APs) needs to obtain precise estimates of the uplink channel to perform reliable downlink precoding. However, due to the radio frequency (RF) impairments between the transceivers at the two en-points of the wireless channel, full channel reciprocity does not hold which results in performance degradation in the cooperative hybrid beamforming (CHBF) unless a suitable reciprocity calibration mechanism is in place. We propose a two-step approach to calibrate any two hybrid nodes in the distributed MIMO system. We then present and utilize the novel concept of reciprocal tandem to propose a low-complexity approach for jointly calibrating the cluster of APs and estimating the downlink channel. Finally, we validate our calibration technique’s effectiveness through numerical simulation.

Differentiable JPEG: The Devil is in The Details

JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by 3.47dB (PSNR) on average. For strong compression rates, we can even improve PSNR by 9.51dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.

Scale Up while Scaling Out Microservices in Video Analytics Pipelines

Modern video analytics applications comprise multiple microservices chained together as pipelines and executed on container orchestration platforms like Kubernetes. Kubernetes automatically handles the scaling of these microservices for efficient application execution. There are two popular choices for scaling microservices in Kubernetes i.e. scaling Out using Horizontal Pod Autoscaler (HPA) and scaling Up using Vertical Pod Autoscaler (VPA). Both these have been studied independently, but there isn’t much prior work studying the joint scaling of these two. This paper investigates joint scaling, i.e., scaling up while scaling out (HPA) is in action. In particular, we focus on scaling up CPU resources allocated to the application microservices. We show that allocating fixed resources does not work well for different workloads for video analytics pipelines. We also show that Kubernetes’ VPA in conjunction with HPA does not work well for varying application workloads. As a remedy to this problem, in this paper, we propose DataX AutoScaleUp, which performs efficiently scaling up of CPU resources allocated to microservices in video analytics pipelines while Kubernetes’ HPA is operational. DataX AutoScaleUp uses novel techniques to adjust the allocated computing resources to different microservices in video analytics pipelines to improve overall application performance. Through real-world video analytics applications like Face Recognition and Human Attributes, we show that DataX AutoScaleUp can achieve up to 1.45X improvement in application processing rate when compared to alternative approaches with fixed CPU allocation and dynamic CPU allocation using VPA.

Semantic Multi-Resolution Communications

Deep learning based joint source-channel coding (JSCC) has demonstrated significant advancements in data reconstruction compared to separate source-channel coding (SSCC). This superiority arises from the suboptimality of SSCC when dealing with finite block-length data. Moreover, SSCC falls short in reconstructing data in a multi-user and/or multi-resolution fashion, as it only tries to satisfy the worst channel and/or the highest quality data. To overcome these limitations, we propose a novel deep learning multi-resolution JSCC framework inspired by the concept of multi-task learning (MTL). This proposed framework excels at encoding data for different resolutions through hierarchical layers and effectively decodes it by leveraging both current and past layers of encoded data. Moreover, this framework holds great potential for semantic communication, where the objective extends beyond data reconstruction to preserving specific semantic attributes throughout the communication process. These semantic features could be crucial elements such as class labels, essential for classification tasks, or other key attributes that require preservation. Within this framework, each level of encoded data can be carefully designed to retain specific data semantics. As a result, the precision of a semantic classifier can be progressively enhanced across successive layers, emphasizing the preservation of targeted semantics throughout the encoding and decoding stages. We conduct experiments on MNIST and CIFAR10 dataset. The experiment with both datasets illustrates that our proposed method is capable of surpassing the SSCC method in reconstructing data with different resolutions, enabling the extraction of semantic features with heightened confidence in successive layers. This capability is particularly advantageous for prioritizing and preserving more crucial semantic features within the datasets.