Deep Learning is a subfield of artificial intelligence (AI) and machine learning (ML) that focuses on the development and application of neural networks, which are computational models inspired by the structure and function of the human brain. Deep learning algorithms aim to learn and represent data in increasingly abstract and hierarchical ways, allowing them to automatically discover patterns, features, and representations from raw input data.


DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

Data augmentation techniques, such as image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter’s built-in assumption that each training image’s contribution to the learned model is bounded. In this paper, we investigate why naive applications of multi-sample data augmentation techniques, such as mixup, fail to achieve good performance and propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning. Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data. Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process. We open-source the code at

Unsupervised Anomaly Detection Under A Multiple Modeling Strategy Via Model Set Optimization Through Transfer Learning

Unsupervised anomaly detection approaches have been widely accepted in applications for industrial systems. Industrial systems often operate with multiple modes since they work for multiple purposes or under different conditions. In order to deal with the difficulty of anomaly detection due to multiple operating modes, multiple modeling strategies are employed. However, estimating the optimal set of models is a challenging problem due to the lack of supervision and computational burden. In this paper, we propose DeconAnomaly, a deep learning framework to estimate the optimal set of models using transfer learning for unsupervised anomaly detection under a multiple modeling strategy. It reduces computational burden with transfer learning and optimizes the number of models based on a surrogate metric of detection performance. The experimental results show clear advantages of DeconAnomaly.

DyCo: Dynamic, Contextualized AI Models

Devices with limited computing resources use smaller AI models to achieve low-latency inferencing. However, model accuracy is typically much lower than the accuracy of a bigger model that is trained and deployed in places where the computing resources are relatively abundant. We describe DyCo, a novel system that ensures privacy of stream data and dynamically improves the accuracy of small models used in devices. Unlike knowledge distillation or federated learning, DyCo treats AI models as black boxes. DyCo uses a semi-supervised approach to leverage existing training frameworks and network model architectures to periodically train contextualized, smaller models for resource-constrained devices. DyCo uses a bigger, highly accurate model in the edge-cloud to auto-label data received from each sensor stream. Training in the edge-cloud (as opposed to the public cloud) ensures data privacy, and bespoke models for thousands of live data streams can be designed in parallel by using multiple edge-clouds. DyCo uses the auto-labeled data to periodically re-train, stream-specific, bespoke small models. To reduce the periodic training costs, DyCo uses different policies that are based on stride, accuracy, and confidence information.We evaluate our system, and the contextualized models, by using two object detection models for vehicles and people, and two datasets (a public benchmark and another real-world proprietary dataset). Our results show that DyCo increases the mAP accuracy measure of small models by an average of 16.3% (and up to 20%) for the public benchmark and an average of 19.0% (and up to 64.9%) for the real-world dataset. DyCo also decreases the training costs for contextualized models by more than an order of magnitude.

Semi-supervised Identification and Mapping of Water Accumulation Extent using Street-level Monitoring Videos

Urban flooding is becoming a common and devastating hazard, which causes life loss and economic damage. Monitoring and understanding urban flooding in a highly localized scale is a challenging task due to the complicated urban landscape, intricate hydraulic process, and the lack of high-quality and resolution data. The emerging smart city technology such as monitoring cameras provides an unprecedented opportunity to address the data issue. However, estimating water ponding extents on land surfaces based on monitoring footage is unreliable using the traditional segmentation technique because the boundary of the water ponding, under the influence of varying weather, background, and illumination, is usually too fuzzy to identify, and the oblique angle and image distortion in the video monitoring data prevents georeferencing and object-based measurements. This paper presents a novel semi-supervised segmentation scheme for surface water extent recognition from the footage of an oblique monitoring camera. The semi-supervised segmentation algorithm was found suitable to determine the water boundary and the monoplotting method was successfully applied to georeference the pixels of the monitoring video for the virtual quantification of the local drainage process. The correlation and mechanism-based analysis demonstrate the value of the proposed method in advancing the understanding of local drainage hydraulics. The workflow and created methods in this study have a great potential to study other street level and earth surface processes.

Learning Phase Mask for Privacy-Preserving Passive Depth Estimation

With over a billion sold each year, cameras are not only becoming ubiquitous, but are driving progress in a wide range of domains such as mixed reality, robotics, and more. However, severe concerns regarding the privacy implications of camera-based solutions currently limit the range of environments where cameras can be deployed. The key question we address is: Can cameras be enhanced with a scalable solution to preserve users’ privacy without degrading their machine intelligence capabilities? Our solution is a novel end-to-end adversarial learning pipeline in which a phase mask placed at the aperture plane of a camera is jointly optimized with respect to privacy and utility objectives. We conduct an extensive design space analysis to determine operating points with desirable privacy-utility tradeoffs that are also amenable to sensor fabrication and real-world constraints. We demonstrate the first working prototype that enables passive depth estimation while inhibiting face identification.

Field Tests of Impulsive Acoustic Event Detection, Localization, and Classification Over Telecom Fiber Networks

We report distributed-fiber-optic-sensing results on impulsive acoustic events localization/classification over telecom networks. A deep-learning-based model was trained to classify starter-gun and fireworks signatures with high accuracy of > 99% using fiber-based-signal-enhancer and >97% using aerial coils.

Superclass-Conditional Gaussian Mixture Model for Coarse-To-Fine Few-Shot Learning

Learning fine-grained embeddings is essential for extending the generalizability of models pre-trained on “coarse” labels (e.g., animals). It is crucial to fields for which fine-grained labeling (e.g., breeds of animals) is expensive, but fine-grained prediction is desirable, such as medicine. The dilemma necessitates adaptation of a “coarsely” pre-trained model to new tasks with a few “finer-grained” training labels. However, coarsely supervised pre-training tends to suppress intra-class variation, which is vital for cross-granularity adaptation. In this paper, we develop a training framework underlain by a novel superclass-conditional Gaussian mixture model (SCGM). SCGM imitates the generative process of samples from hierarchies of classes through latent variable modeling of the fine-grained subclasses. The framework is agnostic to the encoders and only adds a few distribution related parameters, thus is efficient, and flexible to different domains. The model parameters are learned end-to-end by maximum-likelihood estimation via a principled Expectation-Maximization algorithm. Extensive experiments on benchmark datasets and a real-life medical dataset indicate the effectiveness of our method.

F3S: Free Flow Fever Screening

Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F 3 S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F 3 S performs real-time sensor fusion of visual camera with thermal camera data streams to detect elevated body temperature, and it has several unique features: (a) visual and thermal streams represent very different modalities, and we dynamically associate semantically-equivalent regions across visual and thermal frames by using a new, dynamic alignment technique that analyzes content and context in real-time, (b) we track people through occlusions, identify the eye (inner canthus), forehead, face and head regions where possible, and provide an accurate temperature reading by using a prioritized refinement algorithm, and (c) we robustly detect elevated body temperature even in the presence of personal protective equipment like masks, or sunglasses or hats, all of which can be affected by hot weather and lead to spurious temperature readings. F 3 S has been deployed at over a dozen large commercial establishments, providing contact-less, free-flow, real-time fever screening for thousands of employees and customers in indoors and outdoor settings.

SIGL: Securing Software Installations Through Deep Graph Learning

Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most likely to have triggered malicious behavior, works on different audit platforms and operating systems, and is robust to training data contamination and adversarial attack. It can be used with application-specific models, even in the presence of new software versions, as well as application-agnostic meta-models that encompass a wide range of applications and installers.

Fusing the Old with the New: Learning Relative Pose with Geometry-Guided Uncertainty

Learning methods for relative camera pose estimation have been developed largely in isolation from classical geometric approaches. The question of how to integrate predictions from deep neural networks (DNNs) and solutions from geometric solvers, such as the 5-point algorithm [37], has as yet remained under-explored. In this paper, we present a novel framework that involves probabilistic fusion between the two families of predictions during network training, with a view to leveraging their complementary benefits in a learnable way. The fusion is achieved by learning the DNN un- certainty under explicit guidance by the geometric uncertainty, thereby learning to take into account the geometric solution in relation to the DNN prediction. Our network features a self-attention graph neural network, which drives the learning by enforcing strong interactions between different correspondences and potentially modeling complex relationships between points. We propose motion parmeterizations suitable for learning and show that our method achieves state-of-the-art performance on the challenging DeMoN [61] and ScanNet [8] datasets. While we focus on relative pose, we envision that our pipeline is broadly applicable for fusing classical geometry and deep learning.