Constellation Design with Geometric and Probabilistic Shaping

A systematic study, including theory, simulation and experiments, is carried out to review the generalized pairwise optimization algorithm for designing optimized constellation. In order to verify its effectiveness, the algorithm is applied in three testing cases: 2-dimensional 8 quadrature amplitude modulation (QAM), 4-dimensional set-partitioning QAM, and probabilistic-shaped (PS) 32QAM. The results suggest that geometric shaping can work together with PS to further bridge the gap toward the Shannon limit.

Joint Pixel and Feature-level Domain Adaptation in the Wild

Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance. At the feature level, inspired by insights from semi-supervised learning in a domain adversarial neural network, we propose a novel regularization in the form of domain adversarial entropy minimization. Next, we posit that insights from computer vision are more amenable to injection at the pixel level and specifically address the key challenge of adaptation across different semantic levels. In particular, we use 3D geometry and image synthetization based on a generalized appearance flow to preserve identity across higher-level pose transformations, while using an attribute-conditioned CycleGAN to translate a single source into multiple target images that differ in lower-level properties such as lighting. We validate on a novel problem of car recognition in unlabeled surveillance images using labeled images from the web, handling explicitly specified, nameable factors of variation through pixel-level and implicit, unspecified factors through feature-level adaptation. Extensive experiments achieve state-of-the-art results, demonstrating the effectiveness of complementing feature and pixel-level information via our proposed domain adaptation method.

Video Generation From Text

Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training a conditional generative model to extract both static and dynamic information from text. This is manifested in a hybrid framework, employing a Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN). The static features, called “gist,” are used to sketch text-conditioned background color and object layout structure. Dynamic features are considered by transforming input text into an image filter. To obtain a large amount of data for training the deep-learning model, we develop a method to automatically create a matched text-video corpus from publicly available online videos. Experimental results show that the proposed framework generates plausible and diverse short-duration smooth videos, while accurately reflecting the input text information. It significantly outperforms baseline models that directly adapt text-to-image generation procedures to produce videos. Performance is evaluated both visually and by adapting the inception score used to evaluate image generation in GANs.

Adaptive Feature Abstraction for Translating Video to Text

Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose a new approach to generating adaptive spatiotemporal representations of videos for the captioning task. A novel attention mechanism is developed, which adaptively and sequentially focuses on different layers of CNN features (levels of feature “abstraction”), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.

Learning random-walk label propagation for weakly-supervised semantic segmentation

Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data for this task relative to other vision tasks. We propose a novel training approach to address this difficulty. Given cheaply-obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings. A standard CNN-based segmentation network is trained to mimic these labelings. The label-propagation process is defined via random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss. We show that by learning the label-propagator jointly with the segmentation predictor, we are able to effectively learn semantic edges given no direct edge supervision. Experiments also show that training a segmentation network in this way outperforms the naive approach.

Adaptive Memory Networks

Adaptive Memory Networks We present Adaptive Memory Networks (AMN) that processes input-question pairs to dynamically construct a network architecture optimized for lower inference times for Question Answering (QA) tasks. AMN processes the input story to extract entities and stores them in memory banks. Starting from a single bank, as the number of input entities increases, AMN learns to create new banks as the entropy in a single bank becomes too high. Hence, after processing an input-question(s) pair, the resulting network represents a hierarchical structure where entities are stored in different banks, distanced by question relevance. At inference, one or few banks are used, creating a tradeoff between accuracy and performance. AMN is enabled by dynamic networks that allow input dependent network creation and efficiency in dynamic mini-batching as well as our novel bank controller that allows learning discrete decision making with high accuracy. In our results, we demonstrate that AMN learns to create variable depth networks depending on task complexity and reduces inference times for QA tasks.

SkyLiTE: End-to-End Design of Low-altitutde UAV Networks for Providing LTE Connectivity

Un-manned aerial vehicle (UAVs) have the potential to change the landscape of wide-area wireless connectivity by bringing them to areas where connectivity was sparing or non-existent (e.g. rural areas) or has been compromised due to disasters. While Google’s Project Loon and Facebook’s Project Aquila are examples of high-altitude, long-endurance UAV-based connectivity efforts in this direction, the telecom operators (e.g. AT&T and Verizon) have been exploring low-altitude UAV-based LTE solutions for on-demand deployments. Understandably, these projects are in their early stages and face formidable challenges in their realization and deployment. The goal of this document is to expose the reader to both the challenges as well as the potential offered by these unconventional connectivity solutions. We aim to explore the end-to-end design of such UAV-based connectivity networks particularly in the context of low-altitude UAV networks providing LTE connectivity. Specifically, we aim to highlight the challenges that span across multiple layers (access, core network, and backhaul) in an inter-twined manner as well as the richness and complexity of the design space itself. To help interested readers navigate this complex design space towards a solution, we also articulate the overview of one such end-to-end design, namely SkyLiTE– a self-organizing network of low-altitude UAVs that provide optimized LTE connectivity in a desired region.

Design and Comparison of Advanced Modulation Formats Based on Generalized Mutual Information

Generalized mutual information (GMI) has been comprehensively studied in multidimensional constellation and probabilistic-shaped (PS) constellation together with different forward error correction (FEC) coding schemes. The simulation results confirm that GMI is an efficient and accurate tool to compare their post-FEC performance. In particular for uniformly geometric-shaped constellation, the pre-FEC Q-factor is highly correlated with GMI though the correlation is reduced at lower FEC coding rate. Furthermore, GMI can be used to design optimized constellation together with generalized pairwise optimization algorithm to mitigate the GMI loss in non-Gray-mapped constellation. The GMI-optimized 32QAM (opt32) shows ~0.5 dB signal-to-noise ratio improvement between 3 and 4 b/s GMI in both simulated and experimental results. Optimized two-dimensional 8 QAM is also designed to show the consistent GMI improvement over multi-dimensional 8 QAM-equivalent formats. In simulations, PS-64 QAM outperforms opt32 when a long sequence block is used in the distribution matcher.

A 4D Light-Field Dataset & CNN Architectures for Material Recognition

We introduce a new light-field dataset of materials and take advantage of the recent success of deep learning to perform material recognition on the 4D light field. Our dataset contains 12 material categories, each with 100 images taken with a Lytro Illum, from which we extract about 30,000 patches in total. To the best of our knowledge, this is the first mid-size dataset for light-field images.

A Continuous Occlusion Model for Road Scene Understanding

We present a physically interpretable 3D model for handling occlusions with applications to road scene understanding. Given object detection and SFM point tracks, our unified model probabilistically assigns point tracks to objects and reasons about object detection scores and bounding boxes. It uniformly handles static and dynamic objects, thus outperforming motion segmentation for association problems. It also demonstrates occlusion-aware 3D localization in road scenes.