Integrated Systems

Read our publications from our world-class team of researchers from our Integrated Systems department which innovates, designs, and prototypes high-performance intelligent distributed systems, applications, and services on complex, large-scale communication networks like 5G and beyond. We develop next-generation wireless technologies for sensing the world, localizing critical assets, and improving the capacity, coverage, and scalability of communication networks like 5G and beyond.

Posts

Why is the video analytics accuracy fluctuating, and what can we do about it?

It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this “leap of faith” that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually but are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an “unintentional adversary” because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. Our experiments with a number of different cameras, and a variety of different video analytics tasks, show that the inadvertent adversarial effect from the camera can be noticeably offset by quickly re-training the deep learning models using transfer learning. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects (∼40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera’s adversarial effect on deep learning models used for video analytics applications.

Efficient Compression Method for Roadside LiDAR Data

Roadside LiDAR (Light Detection and Ranging) sensors are recently being explored for intelligent transportation systems aiming at safer and faster traffic management and vehicular operations. A key challenge in such systems is to efficiently transfer massive point-cloud data from the roadside LiDAR devices to the edge connected through a 5G network for real-time processing. In this paper, we consider the problem of compressing roadside (i.e. static) LiDAR data in real-time that provides a unique condition unexplored by current methods. Existing point-cloud compression methods assume moving LiDARs (that are mounted on vehicles) and do not exploit spatial consistency across frames over time.To this end, we develop a novel grouped wavelet technique for static roadside LiDAR data compression (i.e. SLiC). Our method compresses LiDAR data both spatially and temporally using a kd-tree data structure based on Haar wavelet coefficients. Experimental results show that SLiC can compress up to 1.9× more effectively than the state-of-the-art compression method can do. Moreover, SLiC is computationally more efficient to achieve 2× improvement in bandwidth usage over the best alternative. Even with this impressive gain in communication and storage efficiency, SLiC retains down-the-pipeline application’s accuracy.

5GLoR: 5G LAN Orchestration for Enterprise IoT Applications

5G-LAN is an enterprise local area network (LAN) that leverages 5G technology for wireless connectivity instead of WiFi. 5G technology is unique: it uses network slicing to distinguish customers in the same traffic class using new QoS technologies in the RF domain. This unique ability is not supported by most enterprise LANs, which rely primarily on DiffServ-like technologies that distinguish among traffic classes rather than customers. We first show that this mismatch in QoS between the 5G network and the LAN affects the accuracy of insights from the LAN-resident analytics applications. We systematically analyze the root causes of the QoS mismatch and propose a first-of-a-kind 5G-LAN orchestrator (5GLoR). 5GLoR is a middleware that applications can use to preserve the QoS of their 5G data streams through the enterprise LAN. In most cases, the loss of QoS is not due to the oversubscription of LAN switches but primarily due to the inefficient assignment of 5G data to queues at ingress and egress ports. 5GLoR periodically analyzes the status of these queues, provides suitable DSCP identifiers to the application, and installs relevant switch re-write rules (to change DSCP identifiers between switches) to continuously preserve the QoS of the 5G data through the LAN. 5GLoR improves the RTP frame level delay and inter-frame delay by 212% and 122%, respectively, for the WebRTC application. Additionally, with 5GLoR, the accuracy of two example applications (face detection and recognition) improved by 33%, while the latency was reduced by about 25%. Our experiments show that the performance (accuracy and latency) of applications on a 5G-LAN performs well with the proposed 5GLoR compared to the same applications on MEC. This is significant because 5G-LAN offers an order of magnitude more computing, networking, and storage resources to the applications than the resource-constrained MEC, and mature enterprise technologies can be used to deploy, manage, and update IoT applications.

DataXc: Flexible and efficient communication in microservices-based stream analytics pipelines

A big challenge in changing a monolithic application into a performant microservices-based application is the design of efficient mechanisms for microservices to communicate with each other. Prior proposals range from custom point-to-point communication among microservices using protocols like gRPC to service meshes like Linkerd to a flexible, many-to-many communication using broker-based messaging systems like NATS. We propose a new communication mechanism, DataXc, that is more efficient than prior proposals in terms of message latency, jitter, message processing rate and use of network resources. To the best of our knowledge, DataXc is the first communication design that has the desirable flexibility of a broker-based messaging systems like NATS and the high-performance of a rigid, custom point-to-point communication method. DataXc proposes a novel “pull” based communication method (i.e consumers fetch messages from producers). This is unlike prior proposals like NATS, gRPC or Linkerd, all of which are “push” based (i.e. producers send messages to consumers). Such communication methods make it difficult to take advantage of differential processing rates of consumers like video analytics tasks. In contrast, DataXc proposes a “pull” based design that avoids unnecessary communication of messages that are eventually discarded by the consumers. Also, unlike prior proposals, DataXc successfully addresses several key challenges in streaming video analytics pipelines like non-uniform processing of frames from multiple cameras, and high variance in latency of frames processed by consumers, all of which adversely affect the quality of insights from streaming video analytics. We report results on two popular real-world, streaming video analytics pipelines (video surveillance, and video action recognition). Compared to NATS, DataXc is just as flexible, but it has far superior performance: upto 80% higher processing rate, 3X lower latency, 7.5X lower jitter and 4.5X lower network bandwidth usage. Compared to gRPC or Linkerd, DataXc is highly flexible, achieves up to 2X higher processing rate, lower latency and lower jitter, but it also consumes more network bandwidth.

RoVaR: Robust Multi-agent Tracking through Dual-layer Diversity in Visual and RF Sensor Fusion

The plethora of sensors in our commodity devices provides a rich substrate for sensor-fused tracking. Yet, today’s solutions are unable to deliver robust and high tracking accuracies across multiple agents in practical, everyday environments – a feature central to the future of immersive and collaborative applications. This can be attributed to the limited scope of diversity leveraged by these fusion solutions, preventing them from catering to the multiple dimensions of accuracy, robustness (diverse environmental conditions) and scalability (multiple agents) simultaneously.In this work, we take an important step towards this goal by introducing the notion of dual-layer diversity to the problem of sensor fusion in multi-agent tracking. We demonstrate that the fusion of complementary tracking modalities, – passive/relative (e.g. visual odometry) and active/absolute tracking (e.g.infrastructure-assisted RF localization) offer a key first layer of diversity that brings scalability while the second layer of diversity lies in the methodology of fusion, where we bring together the complementary strengths of algorithmic (for robustness) and data-driven (for accuracy) approaches. ROVAR is an embodiment of such a dual-layer diversity approach that intelligently attends to cross-modal information using algorithmic and data-driven techniques that jointly share the burden of accurately tracking multiple agents in the wild. Extensive evaluations reveal ROVAR’S multi-dimensional benefits in terms of tracking accuracy, scalability and robustness to enable practical multi-agent immersive applications in everyday environments.

Application-specific, Dynamic Reservation of 5G Compute and Network Resources by using Reinforcement Learning

5G services and applications explicitly reserve compute and network resources in today’s complex and dynamic infrastructure of multi-tiered computing and cellular networking to ensure application-specific service quality metrics, and the infrastructure providers charge the 5G services for the resources reserved. A static, one-time reservation of resources at service deployment typically results in extended periods of under-utilization of reserved resources during the lifetime of the service operation. This is due to a plethora of reasons like changes in content from the IoT sensors (for example, change in number of people in the field of view of a camera) or a change in the environmental conditions around the IoT sensors (for example, time of the day, rain or fog can affect data acquisition by sensors). Under-utilization of a specific resource like compute can also be due to temporary inadequate availability of another resource like the network bandwidth in a dynamic 5G infrastructure. We propose a novel Reinforcement Learning-based online method to dynamically adjust an application’s compute and network resource reservations to minimize under-utilization of requested resources, while ensuring acceptable service quality metrics. We observe that a complex application-specific coupling exists between the compute and network usage of an application. Our proposed method learns this coupling during the operation of the service, and dynamically modulates the compute and network resource requests to mimimize under-utilization of reserved resources. Through experimental evaluation using real-world video analytics application, we show that our technique is able to capture complex compute-network coupling relationship in an online manner i.e. while the application is running, and dynamically adapts and saves up to 65% compute and 93% network resources on average (over multiple runs), without significantly impacting application accuracy.

Cosine Similarity based Few-Shot Video Classifier with Attention-based Aggregation

Meta learning algorithms for few-shot video recognition use complex, episodic training but they often fail to learn effective feature representations. In contrast, we propose a new and simpler few-shot video recognition method that does not use meta-learning, but its performance compares well with the best meta-learning proposals. Our new few-shot video classification pipeline consists of two distinct phases. In the pre-training phase, we learn a good video feature extraction network that generates a feature vector for each video. After a sparse sampling strategy selects frames from the video, we generate a video feature vector from the sampled frames. Our proposed video feature extractor network, which consists of an image feature extraction network followed by a new transformer encoder, is trained end-to-end by including a classifier head that uses cosine similarity layer instead of the traditional linear layer to classify a corpus of labeled video examples. Unlike prior work in meta learning, we do not use episodic training to learn the image feature vector. Also, unlike prior work that averages frame-level feature vectors into a single video feature vector, we combine individual frame-level feature vectors by using a new Transformer encoder that explicitly captures the key, temporal properties in the sequence of sampled frames. End-to-end training of the video feature extractor ensures that the proposed Transformer encoder captures important temporal properties in the video, while the cosine similarity layer explicitly reduces the intra-class variance of videos that belong to the same class. Next, in the few-shot adaptation phase, we use the learned video feature extractor to train a new video classifier by using the few available examples from novel classes. Results on SSV2-100 and Kinetics-100 benchmarks show that our proposed few-shot video classifier outperforms the meta-learning-based methods and achieves the best state-of-the-art accuracy. We also show that our method can easily discern between actions and their inverse (for example, picking something up vs. putting something down), while prior art, which averages image feature vectors, is unable to do so.

Mosaic: Leveraging Diverse Reflector Geometries for Omnidirectional Around-Corner Automotive Radar

A large number of traffic collisions occur as a result of obstructed sight lines, such that even an advanced driver assistance system would be unable to prevent the crash. Recent work has proposed the use of around-the-corner radar systems to detect vehicles, pedestrians, and other road users in these occluded regions. Through comprehensive measurement, we show that these existing techniques cannot sense occluded moving objects in many important real-world scenarios. To solve this problem of limited coverage, we leverage multiple, curved reflectors to provide comprehensive coverage over the most important locations near an intersection. In scenarios where curved reflectors are insufficient, we evaluate the relative benefits of using additional flat planar surfaces. Using these techniques, we more than double the probability of detecting a vehicle near the intersection in three real urban locations and enable NLoS radar sensing using an entirely new class of reflectors.

Chimera: Context-Aware Splittable Deep Multitasking Models for Edge Intelligence

Design of multitasking deep learning models has mostly focused on improving the accuracy of the constituent tasks, but the challenges of efficiently deploying such models in a device-edge collaborative setup (that is common in 5G deployments) has not been investigated. Towards this end, in this paper, we propose an approach called Chimera 1 for training (done Offline) and deployment (done Online) of multitasking deep learning models that are splittable across the device and edge. In the offline phase, we train our multi-tasking setup such that features from a pre-trained model for one of the tasks (called the Primary task) are extracted and task-specific sub-models are trained to generate the other (Secondary) tasks’ outputs through a knowledge distillation like training strategy to mimic the outputs of pre-trained models for the tasks. The task-specific sub-models are designed to be significantly lightweight than the original pre-trained models for the Secondary tasks. Once the sub-models are trained, during deployment, for given deployment context, characterized by the configurations, we search for the optimal (in terms of both model performance and cost) deployment strategy for the generated multitasking model, through finding one or multiple suitable layer(s) for splitting the model, so that inference workloads are distributed between the device and the edge server and the inference is done in a collaborative manner. Extensive experiments on benchmark computer vision tasks demonstrate that Chimera generates splittable multitasking models that are at least ~ 3 x parameter efficient than the existing such models, and the end-to-end device-edge collaborative inference becomes ~ 1.35 x faster with our choice of context-aware splitting decisions.

Codebook Design for Hybrid Beamforming in 5G Systems

Massive MIMO and hybrid beamforming are among the key physical layer technologies for the next generation wireless systems. In the last stage of the hybrid beamforming, the goal is to generate sharp beam with maximal and preferably uniform gain. We highlight the shortcomings of uniform linear arrays (ULAs) in generating such perfect beams, i.e., beams with maximal uniform gain and sharp edges, and propose a solution based on a novel antenna configuration, namely, twin-ULA (TULA). Consequently, we propose two antenna configurations based on TULA: Delta and Star. We pose the problem of finding the beamforming coefficients as a continuous optimization problem for which we find the analytical closed-form solution by a quantization/aggregation method. Thanks to the derived closed-form solution the beamforming coefficients can be easily obtained with low complexity. Through numerical analysis, we illustrate the effectiveness of the proposed antenna structure and beamforming algorithm to reach close-to-perfect beams.