We innovate, design and prototype high-performance intelligent distributed systems, applications and services on complex and large-scale communication networks like 5G and beyond. We develop next generation wireless technologies for sensing the world, localizing critical assets, and improving the capacity, coverage and scalability of communication networks like 5G and beyond.


Deep Video Codec Control

Deep Video Codec Control Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the emph(Unknown sysvar: (de facto)) standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec’s compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.

Enabling Cooperative Hybrid Beamforming in TDD-based Distributed MIMO Systems

Enabling Cooperative Hybrid Beamforming in TDD-based Distributed MIMO Systems Distributed massive MIMO networks are envisioned to realize cooperative multi-point transmission in next-generation wireless systems. For efficient cooperative hybrid beamforming, the cluster of access points (APs) needs to obtain precise estimates of the uplink channel to perform reliable downlink precoding. However, due to the radio frequency (RF) impairments between the transceivers at the two en-points of the wireless channel, full channel reciprocity does not hold which results in performance degradation in the cooperative hybrid beamforming (CHBF) unless a suitable reciprocity calibration mechanism is in place. We propose a two-step approach to calibrate any two hybrid nodes in the distributed MIMO system. We then present and utilize the novel concept of reciprocal tandem to propose a low-complexity approach for jointly calibrating the cluster of APs and estimating the downlink channel. Finally, we validate our calibration technique’s effectiveness through numerical simulation.

Blind Cyclic Prefix-based CFO Estimation in MIMO-OFDM Systems

Blind Cyclic Prefix-based CFO Estimation in MIMO-OFDM Systems Low-complexity estimation and correction of carrier frequency offset (CFO) are essential in orthogonal frequency division multiplexing (OFDM). In this paper, we propose a low-overhead blind CFO estimation technique based on cyclic prefix (CP), in multi-input multi-output (MIMO)-OFDM systems. We propose to use antenna diversity for CFO estimation. Given that the RF chains for all antenna elements at a communication node share the same clock, the carrier frequency offset (CFO) between two points may be estimated by using the combination of the received signal at all antennas. We improve our method by combining the antenna diversity with time diversity by considering the CP for multiple OFDM symbols. We provide a closed-form expression for CFO estimation and present algorithms that can considerably improve the CFO estimation performance at the expense of a linear increase in computational complexity. We validate the effectiveness of our estimation scheme via extensive numerical analysis.

Semantic Multi-Resolution Communications

Semantic Multi-Resolution Communications Deep learning based joint source-channel coding (JSCC) has demonstrated significant advancements in data reconstruction compared to separate source-channel coding (SSCC). This superiority arises from the suboptimality of SSCC when dealing with finite block-length data. Moreover, SSCC falls short in reconstructing data in a multi-user and/or multi-resolution fashion, as it only tries to satisfy the worst channel and/or the highest quality data. To overcome these limitations, we propose a novel deep learning multi-resolution JSCC framework inspired by the concept of multi-task learning (MTL). This proposed framework excels at encoding data for different resolutions through hierarchical layers and effectively decodes it by leveraging both current and past layers of encoded data. Moreover, this framework holds great potential for semantic communication, where the objective extends beyond data reconstruction to preserving specific semantic attributes throughout the communication process. These semantic features could be crucial elements such as class labels, essential for classification tasks, or other key attributes that require preservation. Within this framework, each level of encoded data can be carefully designed to retain specific data semantics. As a result, the precision of a semantic classifier can be progressively enhanced across successive layers, emphasizing the preservation of targeted semantics throughout the encoding and decoding stages. We conduct experiments on MNIST and CIFAR10 dataset. The experiment with both datasets illustrates that our proposed method is capable of surpassing the SSCC method in reconstructing data with different resolutions, enabling the extraction of semantic features with heightened confidence in successive layers. This capability is particularly advantageous for prioritizing and preserving more crucial semantic features within the datasets.

Retrospective : A Dynamically Configurable Coprocessor For Convolutional Neural Networks

Retrospective : A dynamically configurable coprocessor for convolutional neural networks In 2008, parallel computing posed significant challenges due to the complexities of parallel programming and the bottlenecks associated with efficient parallel execution. Inspired by the remarkable scalability achieved by networking and storage systems in handling extensive packet traffic and persistent data respectively by leveraging best-effort service, we proposed a new and fundamentally different approach of best-effort computing.Having observed that a broad spectrum of existing and emerging computing workloads were from applications that had an inherent forgiving nature [2], [5], we proposed best effort computing. The new approach resulted in disproportionate gains in power, energy and latency, while improving performance. While contemplating the concept of best-effort computing [2], we noticed the resurgence of convolutional neural networks, which generated approximate but acceptable outcomes for numerous recognition, mining, and synthesis tasks. The lead author of this retrospective had previously conducted research on neural networks for his doctoral dissertation over a decade ago, and the reemergence of neural networks proved both surprising and exciting. Recognizing the connection between best-effort computing and convolutional neural networks, in 2008 we embarked on developing a programmable and dynamically reconfigurable convolutional neural network capable of performing best effort computing for various machine learning tasks that inherently allow for multiple acceptable answers. This combination of our thoughts on best-effort computing and the gradual evolution of convolutional neural networks (deep neural networks emerged much later) culminated in our 2010 ISCA work on dynamically reconfigurable convolutional neural networks.

AnB: Application-In-A-Box To Rapidly Deploy and Self-Optimize 5G Apps

AnB: Application-in-a-Box to rapidly deploy and self-optimize 5G apps We present Application in a Box (AnB) product concept aimed at simplifying the deployment and operation of remote 5G applications. AnB comes pre-configured with all necessary hardware and software components, including sensors like cameras, hardware and software components for a local 5G wireless network, and 5G-ready apps. Enterprises can easily download additional apps from an App Store. Setting up a 5G infrastructure and running applications on it is a significant challenge, but AnB is designed to make it fast, convenient, and easy, even for those without extensive knowledge of software, computers, wireless networks, or AI-based analytics. With AnB, customers only need to open the box, set up the sensors, turn on the 5G networking and edge computing devices, and start running their applications. Our system software automatically deploys and optimizes the pipeline of microservices in the application on a tiered computing infrastructure that includes device, edge, and cloud computing. Dynamic resource management, placement of critical tasks for low-latency response, and dynamic network bandwidth allocation for efficient 5G network usage are all automatically orchestrated. AnB offers cost savings, simplified setup and management, and increased reliability and security. We’ve implemented several real-world applications, such as collision prediction at busy traffic light intersections and remote construction site monitoring using video analytics. With AnB, deployment and optimization effort can be reduced from several months to just a few minutes. This is the first-of-its-kind approach to easing deployment effort and automating self-optimization of the application during system operation.

Elixir: A System To Enhance Data Quality For Multiple Analytics On A Video Stream

Elixir: A system to enhance data quality for multiple analytics on a video stream IoT sensors, especially video cameras, are ubiquitously deployed around the world to perform a variety of computer vision tasks in several verticals including retail, health- care, safety and security, transportation, manufacturing, etc. To amortize their high deployment effort and cost, it is desirable to perform multiple video analytics tasks, which we refer to as Analytical Units (AUs), off the video feed coming out of every camera. As AUs typically use deep learning-based AI/ML models, their performance depend on the quality of the input video, and recent work has shown that dynamically adjusting the camera setting exposed by popular network cameras can help improve the quality of the video feed and hence the AU accuracy, in a single AU setting. In this paper, we first show that in a multi-AU setting, changing the camera setting has disproportionate impact on different AUs performance. In particular, the optimal setting for one AU may severely degrade the performance for another AU, and further the impact on different AUs varies as the environmental condition changes. We then present Elixir, a system to enhance the video stream quality for multiple analytics on a video stream. Elixir leverages Multi-Objective Reinforcement Learning (MORL), where the RL agent caters to the objectives from different AUs and adjusts the camera setting to simultaneously enhance the performance of all AUs. To define the multiple objectives in MORL, we develop new AU-specific quality estimator values for each individual AU. We evaluate Elixir through real-world experiments on a testbed with three cameras deployed next to each other (overlooking a large enterprise parking lot) running Elixir and two baseline approaches, respectively. Elixir correctly detects 7.1% (22,068) and 5.0% (15,731) more cars, 94% (551) and 72% (478) more faces, and 670.4% (4975) and 158.6% (3507) more persons than the default-setting and time-sharing approaches, respectively. It also detects 115 license plates, far more than the time-sharing approach (7) and the default setting (0).

FactionFormer: Context-Driven Collaborative Vision Transformer Models for Edge Intelligence

FactionFormer: Context-Driven Collaborative Vision Transformer Models for Edge Intelligence Edge Intelligence has received attention in the recent times for its potential towards improving responsiveness, reducing the cost of data transmission, enhancing security and privacy, and enabling autonomous decisions by edge devices. However, edge devices lack the power and compute resources necessary to execute most Al models. In this paper, we present FactionFormer, a novel method to deploy resource-intensive deep-learning models, such as vision transformers (ViT), on resource-constrained edge devices. Our method is based on a key observation: edge devices are often deployed in settings where they encounter only a subset of the classes that the resource intensive Al model is trained to classify, and this subset changes across deployments. Therefore, we automatically identify this subset as a faction, devise on-the fly a bespoke resource-efficient ViT called a modelette for the faction and set up an efficient processing pipeline consisting of a modelette on the device, a wireless network such as 5G, and the resource-intensive ViT model on an edge server, all of which work collaboratively to do the inference. For several ViT models pre-trained on benchmark datasets, FactionFormer’s modelettes are up to 4× smaller than the corresponding baseline models in terms of the number of parameters, and they can infer up to 2.5× faster than the baseline setup where every input is processed by the resource-intensive ViT on the edge server. Our work is the first of its kind to propose a device-edge collaborative inference framework where bespoke deep learning models for the device are automatically devised on-the-fly for most frequently encountered subset of classes.

StreetAware: A High-Resolution Synchronized Multimodal Urban Scene Dataset

StreetAware: A High-Resolution Synchronized Multimodal Urban Scene Dataset Access to high-quality data is an important barrier in the digital analysis of urban settings, including applications within computer vision and urban design. Diverse forms of data collected from sensors in areas of high activity in the urban environment, particularly at street intersections, are valuable resources for researchers interpreting the dynamics between vehicles, pedestrians, and the built environment. In this paper, we present a high-resolution audio, video, and LiDAR dataset of three urban intersections in Brooklyn, New York, totaling almost 8 unique hours. The data were collected with custom Reconfigurable Environmental Intelligence Platform (REIP) sensors that were designed with the ability to accurately synchronize multiple video and audio inputs. The resulting data are novel in that they are inclusively multimodal, multi-angular, high-resolution, and synchronized. We demonstrate four ways the data could be utilized — (1) to discover and locate occluded objects using multiple sensors and modalities, (2) to associate audio events with their respective visual representations using both video and audio modes, (3) to track the amount of each type of object in a scene over time, and (4) to measure pedestrian speed using multiple synchronized camera views. In addition to these use cases, our data are available for other researchers to carry out analyses related to applying machine learning to understanding the urban environment (in which existing datasets may be inadequate), such as pedestrian-vehicle interaction modeling and pedestrian attribute recognition. Such analyses can help inform decisions made in the context of urban sensing and smart cities, including accessibility-aware urban design and Vision Zero initiatives.

RIS-aided mmWave Beamforming for Two-way Communications of Multiple Pairs

RIS-aided mmWave Beamforming for Two-way Communications of Multiple Pairs Millimeter‑wave (mmWave) communications is a key enabler towards realizing enhanced Mobile Broadband (eMBB) as a key promise of 5G and beyond, due to the abundance of bandwidth available at mmWave bands. An mmWave coverage map consists of blind spots due to shadowing and fading especially in dense urban environments. Beamformingemploying massive MIMO is primarily used to address high attenuation in the mmWave channel. Due to their ability in manipulating the impinging electromagnetic waves in an energy‑efficient fashion, Reconfigurable Intelligent Surfaces (RISs) are considered a great match to complement the massive MIMO systems in realizing the beamforming task and therefore effectively filling in the mmWave coverage gap. In this paper, we propose a novel RIS architecture, namely RIS‑UPA where the RIS elements are arranged in a Uniform Planar Array (UPA). We show how RIS‑UPA can be used in an RIS‑aided MIMO system to fill the coverage gap in mmWave by forming beams of a custom footprint, with optimized main lobe gain, minimum leakage, and fairly sharp edges. Further, we propose a configuration for RIS‑UPA that can support multiple two‑way communication pairs, simultaneously. We theoretically obtain closed‑form low‑complexity solutions for our design and validate our theoretical findings by extensive numerical experiments.