Learning Phase Mask for Privacy-Preserving Passive Depth Estimation

With over a billion sold each year, cameras are not only becoming ubiquitous, but are driving progress in a wide range of domains such as mixed reality, robotics, and more. However, severe concerns regarding the privacy implications of camera-based solutions currently limit the range of environments where cameras can be deployed. The key question we address is: Can cameras be enhanced with a scalable solution to preserve users’ privacy without degrading their machine intelligence capabilities? Our solution is a novel end-to-end adversarial learning pipeline in which a phase mask placed at the aperture plane of a camera is jointly optimized with respect to privacy and utility objectives. We conduct an extensive design space analysis to determine operating points with desirable privacy-utility tradeoffs that are also amenable to sensor fabrication and real-world constraints. We demonstrate the first working prototype that enables passive depth estimation while inhibiting face identification.

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection. Starting with a generic and class-agnostic region proposal mechanism, we use vision and language models to categorize each region of an image into any object category that is required for downstream tasks. We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection, where a model needs to generalize to unseen object categories, and semi-supervised object detection, where additional unlabeled images can be used to improve the model. Our empirical evaluation shows the effectiveness of the pseudo labels in both tasks, where we outperform competitive baselines and achieve a novel state-of-the-art for open-vocabulary object detection. Our code is available at https://github.com/xiaofeng94/VL-PLM.

Why is the video analytics accuracy fluctuating, and what can we do about it?

It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this “leap of faith” that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually but are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an “unintentional adversary” because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. Our experiments with a number of different cameras, and a variety of different video analytics tasks, show that the inadvertent adversarial effect from the camera can be noticeably offset by quickly re-training the deep learning models using transfer learning. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects (∼40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera’s adversarial effect on deep learning models used for video analytics applications.

Field Trials of Vibration Detection, Localization and Classification over Deployed Telecom Fiber Cables

We review sensing fusion results of integrating fiber sensing with video for machine-learning-based localization and classification of impulsive acoustic event detection. Classification accuracy >97% was achieved on aerial coils, and >99% using fiber-based signal enhancers.

Efficient Compression Method for Roadside LiDAR Data

Roadside LiDAR (Light Detection and Ranging) sensors are recently being explored for intelligent transportation systems aiming at safer and faster traffic management and vehicular operations. A key challenge in such systems is to efficiently transfer massive point-cloud data from the roadside LiDAR devices to the edge connected through a 5G network for real-time processing. In this paper, we consider the problem of compressing roadside (i.e. static) LiDAR data in real-time that provides a unique condition unexplored by current methods. Existing point-cloud compression methods assume moving LiDARs (that are mounted on vehicles) and do not exploit spatial consistency across frames over time.To this end, we develop a novel grouped wavelet technique for static roadside LiDAR data compression (i.e. SLiC). Our method compresses LiDAR data both spatially and temporally using a kd-tree data structure based on Haar wavelet coefficients. Experimental results show that SLiC can compress up to 1.9× more effectively than the state-of-the-art compression method can do. Moreover, SLiC is computationally more efficient to achieve 2× improvement in bandwidth usage over the best alternative. Even with this impressive gain in communication and storage efficiency, SLiC retains down-the-pipeline application’s accuracy.

COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale and learns group activity compositionally. In addition, prior works suffer from scene biases with privacy and ethical concerns. We only use the keypoint modality which reduces scene biases and prevents acquiring detailed visual data that may contain private or biased information of users. We improve the multiscale representations in COMPOSER by clustering the intermediate scale representations, while maintaining consistent cluster assignments between scales. Finally, we use techniques such as auxiliary prediction and data augmentations tailored to the keypoint signals to aid model training. We demonstrate the model’s strength and interpretability on two widely-used datasets (Volleyball and Collective Activity). COMPOSER achieves up to +5.4% improvement with just the keypoint modality (Code is available at https://github.com/hongluzhou/composer.).

Unsupervised Anomaly Detection with Self-Training and Knowledge Distillation

Anomaly Detection (AD) aims to find defective patterns or abnormal samples among data, and has been a hot research topic due to various real-world applications. While various AD methods have been proposed, most of them assume the availability of a clean (anomaly-free) training set, which, however, may be hard to guarantee in many real-world industry applications. This motivates us to investigate Unsupervised Anomaly Detection (UAD) in which the training set includes both normal and abnormal samples. In this paper, we address the UAD problem by proposing a Self-Training and Knowledge Distillation (STKD) model. STKD combats anomalies in the training set by iteratively alternating between excluding samples of high anomaly probabilities and training the model with the purified training set. Despite that the model is trained with a cleaner training set, the inevitably existing anomalies may still cause negative impact. STKD alleviates this by regularizing the model to respond similarly to a teacher model which has not been trained with noisy data. Experiments show that STKD consistently produces more robust performance with different levels of anomalies.

Analyzing Coreference and Bridging in Product Reviews

Product reviews may have complex discourse including coreference and bridging relations to a main product, competing products, and interacting products. Current approaches to aspect-based sentiment analysis (ABSA) and opinion summarization largely ignore this complexity. On the other hand, existing systems for coreference and bridging were trained in a different domain. We collect mention type annotations relevant to coreference and bridging for 498 product reviews. Using these annotations, we show that a state-of-the-art factuality score fails to catch coreference errors in product reviews, and that a state-of-the-art coreference system trained on OntoNotes does not perform nearly as well on product mentions. As our dataset grows, we expect it to help ABSA and opinion summarization systems to avoid entity reference errors.

5GLoR: 5G LAN Orchestration for Enterprise IoT Applications

5G-LAN is an enterprise local area network (LAN) that leverages 5G technology for wireless connectivity instead of WiFi. 5G technology is unique: it uses network slicing to distinguish customers in the same traffic class using new QoS technologies in the RF domain. This unique ability is not supported by most enterprise LANs, which rely primarily on DiffServ-like technologies that distinguish among traffic classes rather than customers. We first show that this mismatch in QoS between the 5G network and the LAN affects the accuracy of insights from the LAN-resident analytics applications. We systematically analyze the root causes of the QoS mismatch and propose a first-of-a-kind 5G-LAN orchestrator (5GLoR). 5GLoR is a middleware that applications can use to preserve the QoS of their 5G data streams through the enterprise LAN. In most cases, the loss of QoS is not due to the oversubscription of LAN switches but primarily due to the inefficient assignment of 5G data to queues at ingress and egress ports. 5GLoR periodically analyzes the status of these queues, provides suitable DSCP identifiers to the application, and installs relevant switch re-write rules (to change DSCP identifiers between switches) to continuously preserve the QoS of the 5G data through the LAN. 5GLoR improves the RTP frame level delay and inter-frame delay by 212% and 122%, respectively, for the WebRTC application. Additionally, with 5GLoR, the accuracy of two example applications (face detection and recognition) improved by 33%, while the latency was reduced by about 25%. Our experiments show that the performance (accuracy and latency) of applications on a 5G-LAN performs well with the proposed 5GLoR compared to the same applications on MEC. This is significant because 5G-LAN offers an order of magnitude more computing, networking, and storage resources to the applications than the resource-constrained MEC, and mature enterprise technologies can be used to deploy, manage, and update IoT applications.

Using AI To Safely Put The First Woman On The Moon

We are helping to safely bring the first woman astronaut to the moon as part of NASA – National Aeronautics and Space Administration’s Artemis Project with our System Invariant Analysis Technology (SIAT). With Lockheed Martin Space’s T-Tauri AI platform, our SIAT analytics engine takes the data from the 150,000 sensors and creates a model incorporating over 22 billion data relationships. The AI model is then analyzed to find any irregularities which could lead to a possible malfunction of any of the spacecraft’s systems.