Posts

Joint Pixel and Feature-level Domain Adaptation in the Wild

Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance. At the feature level, inspired by insights from semi-supervised learning in a domain adversarial neural network, we propose a novel regularization in the form of domain adversarial entropy minimization. Next, we posit that insights from computer vision are more amenable to injection at the pixel level and specifically address the key challenge of adaptation across different semantic levels. In particular, we use 3D geometry and image synthetization based on a generalized appearance flow to preserve identity across higher-level pose transformations, while using an attribute-conditioned CycleGAN to translate a single source into multiple target images that differ in lower-level properties such as lighting. We validate on a novel problem of car recognition in unlabeled surveillance images using labeled images from the web, handling explicitly specified, nameable factors of variation through pixel-level and implicit, unspecified factors through feature-level adaptation. Extensive experiments achieve state-of-the-art results, demonstrating the effectiveness of complementing feature and pixel-level information via our proposed domain adaptation method.

Learning random-walk label propagation for weakly-supervised semantic segmentation

Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data for this task relative to other vision tasks. We propose a novel training approach to address this difficulty. Given cheaply-obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings. A standard CNN-based segmentation network is trained to mimic these labelings. The label-propagation process is defined via random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss. We show that by learning the label-propagator jointly with the segmentation predictor, we are able to effectively learn semantic edges given no direct edge supervision. Experiments also show that training a segmentation network in this way outperforms the naive approach.

WarpNet: Weakly Supervised Matching for Single-View Reconstruction

Our WarpNet matches images of objects in fine-grained datasets without using part annotations. It aligns an object in one image with a different object in another by exploiting a fine-grained dataset to create artificial data for training a Siamese network with an unsupervised discriminative learning approach. The output of the network acts as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation. This allows single-view reconstruction with quality comparable to using human annotation.