Dual Projection Generative Adversarial Networks for Conditional Image Generation

Conditional Generative Adversarial Networks (cGANs) extend the standard unconditional GAN framework to learning joint data-label distributions from samples, and have been established as powerful generative models capable of generating high-fidelity imagery. A challenge of training such a model lies in properly infusing class information into its generator and discriminator. For the discriminator, class conditioning can be achieved by either (1) directly incorporating labels as input or (2) involving labels in an auxiliary classification loss. In this paper, we show that the former directly aligns the class-conditioned fake-and-real data distributions P (image|class) (data matching), while the latter aligns data-conditioned class distributions P (class|image) (label matching). Although class separability does not directly translate to sample quality and becomes a burden if classification itself is intrinsically difficult, the discriminator cannot provide useful guidance for the generator if features of distinct classes are mapped to the same point and thus become inseparable. Motivated by this intuition, we propose a Dual Projection GAN (P2GAN) model that learns to balance between data matching and label matching. We then propose an improved cGAN model with Auxiliary Classification that directly aligns the fake and real conditionals P (class|image) by minimizing their f-divergence. Experiments on a synthetic Mixture of Gaussian (MoG) dataset and a variety of real-world datasets including CIFAR100, ImageNet, and VGGFace2 demonstrate the efficacy of our proposed models.

Learning Cross-Modal Contrastive Features for Video Domain Adaptation

Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation and may capture scene context using various modalities that further increases compute costs. Efficient methods such as those used for AR/VR often only use human-keypoint information but suffer from a loss of scene context that hurts accuracy. In this paper, we describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition. Specifically, KeyNet introduces the use of object based keypoint information to capture context in the scene. Our method illustrates how to build a structured intermediate representation that allows modeling higher-order interactions in the scene from object and human keypoints without using any RGB information. We find that KeyNet is able to track and classify human actions at just 5 FPS. More importantly, we demonstrate that object keypoints can be modeled to recover any loss in context from using keypoint information over AVA action and Kinetics datasets.

Towards Robustness of Deep Neural Networks via Networks via Regularization

Recent studies have demonstrated the vulnerability of deep neural networks against adversarial examples. In-spired by the observation that adversarial examples often lie outside the natural image data manifold and the intrinsic dimension of image data is much smaller than its pixel space dimension, we propose to embed high-dimensional input images into a low-dimensional space and apply regularization on the embedding space to push the adversarial examples back to the manifold. The proposed framework is called Embedding Regularized Classifier (ER-Classifier), which improves the adversarial robustness of the classifier through embedding regularization. Besides improving classification accuracy against adversarial examples, the framework can be combined with detection methods to detect adversarial examples. Experimental results on several benchmark datasets show that, our proposed framework achieves good performance against strong adversarial at-tack methods.

UAC: An Uncertainty-Aware Face Clustering Algorithm

We investigate ways to leverage uncertainty in face images to improve the quality of the face clusters. We observe that popular clustering algorithms do not produce better quality clusters when clustering probabilistic face representations that implicitly model uncertainty – these algorithms predict up to 9.6X more clusters than the ground truth for the IJB-A benchmark. We empirically analyze the causes for this unexpected behavior and identify excessive false-positives and false-negatives (when comparing face-pairs) as the main reasons for poor quality clustering. Based on this insight, we propose an uncertainty-aware clustering algorithm, UAC, which explicitly leverages uncertainty information during clustering to decide when a pair of faces are similar or when a predicted cluster should be discarded. UAC considers (a) uncertainty of faces in face-pairs, (b) bins face-pairs into different categories based on an uncertainty threshold, (c) intelligently varies the similarity threshold during clustering to reduce false-negatives and false-positives, and (d) discards predicted clusters that exhibit a high measure of uncertainty. Extensive experimental results on several popular benchmarks and comparisons with state-of-the-art clustering methods show that UAC produces significantly better clusters by leveraging uncertainty in face images – predicted number of clusters is up to 0.18X more of the ground truth for the IJB-A benchmark.

Employing Telecom Fiber Cables as Sensing Media for Road Traffic Applications

Distributed fiber optic sensing systems (DFOS) allow deployed fiber cables to be sensing media, not only dedicated function of data transmission. The fiber cable can monitor the ambient environment over wide area for many applications. We review recent field trial results, and show how artificial intelligence (AI) can help on the application of road traffic monitoring. The results show that fiber sensing can monitor the periodic traffic changes in hourly, daily, weekly and seasonal.

AppSlice: A system for application-centric design of 5G and edge computing applications

Applications that use edge computing and 5G to improve response times consume both compute and network resources. However, 5G networks manage only network resources without considering the application’s compute requirements, and container orchestration frameworks manage only compute resources without considering the application’s network requirements. We observe that there is a complex coupling between an application’s compute and network usage, which can be leveraged to improve application performance and resource utilization. We propose a new, declarative abstraction called app slice that jointly considers the application’s compute and network requirements. This abstraction leverages container management systems to manage edge computing resources, and 5G network stacks to manage network resources, while the joint consideration of coupling between compute and network usage is explicitly managed by a new runtime system, which delivers the declarative semantics of the app slice. The runtime system also jointly manages the edge compute and network resource usage automatically across different edge computing environments and 5G networks by using two adaptive algorithms. We implement a complex, real-world, real-time monitoring application using the proposed app slice abstraction, and demonstrate on a private 5G/LTE testbed that the proposed runtime system significantly improves the application performance and resource usage when compared with the case where the coupling between the compute and network resource usage is ignored.

A Silicon Photonic-Electronic Neural Network for Fiber Nonlinearity Compensation

In optical communication systems, fibre nonlinearity is the major obstacle in increasing the transmission capacity. Typically, digital signal processing techniques and hardware are used to deal with optical communication signals, but increasing speed and computational complexity create challenges for such approaches. Highly parallel, ultrafast neural networks using photonic devices have the potential to ease the requirements placed on digital signal processing circuits by processing the optical signals in the analogue domain. Here we report a silicon photonic–electronic neural network for solving fibre nonlinearity compensation in submarine optical-fibre transmission systems. Our approach uses a photonic neural network based on wavelength-division multiplexing built on a silicon photonic platform compatible with complementary metal–oxide–semiconductor technology. We show that the platform can be used to compensate for optical fibre nonlinearities and improve the quality factor of the signal in a 10,080 km submarine fibre communication system. The Q-factor improvement is comparable to that of a software-based neural network implemented on a workstation assisted with a 32-bit graphic processing unit.

DataX: A system for Data eXchange and transformation of streams

The exponential growth in smart sensors and rapid progress in 5G networks is creating a world awash with data streams. However, a key barrier to building performant multi-sensor, distributed stream processing applications is high programming complexity. We propose DataX, a novel platform that improves programmer productivity by enabling easy exchange, transformations, and fusion of data streams. DataX abstraction simplifies the application’s specification and exposes parallelism and dependencies among the application functions (microservices). DataX runtime automatically sets up appropriate data communication mechanisms, enables effortless reuse of microservices and data streams across applications, and leverages serverless computing to transform, fuse, and auto-scale microservices. DataX makes it easy to write, deploy and reliably operate distributed applications at scale. Synthesizing these capabilities into a single platform is substantially more transformative than any available stream processing system.

Guided Acoustic Brillouin Scattering Measurements In Optical Communication Fibers

Guided acoustic Brillouin (GAWBS) noise is measured using a novel, homodyne measurement technique for four commonly used fibers in long-distance optical transmission systems. The measurements are made with single spans and then shown to be consistent with separate multi-span long-distance measurements. The inverse dependence of the GAWBS noise on the fiber effective area is confirmed by comparing different fibers with the effective area varying between 80 µm2 and 150 µm2. The line broadening effect of the coating is observed, and the correlation between the width of the GAWBS peaks to the acoustic mode profile is confirmed. An extensive model of the GAWBS noise in long-distance fibers is presented, including corrections to some commonly repeated mistakes in previous reports. It is established through the model and verified with the measurements that the depolarized scattering caused by TR2m modes contributes twice as much to the optical noise in the orthogonal polarization to the original source, as it does to the noise in parallel polarization. Using this relationship, the polarized and depolarized contributions to the measured GAWBS noise is separated for the first time. As a result, a direct comparison between the theory and the measured GAWBS noise spectrum is shown for the first time with excellent agreement. It is confirmed that the total GAWBS noise can be calculated from fiber parameters under certain assumptions. It is predicted that the level of depolarized GAWBS noise created by the fiber may depend on the polarization diffusion length, and consequently, possible ways to reduce GAWBS noise are proposed. Using the developed theory, dependence of GAWBS noise on the location of the core is calculated to show that multi-core fibers would have a similar level of GAWBS noise no matter where their cores are positioned.