Nothing Found
Sorry, no posts matched your criteria
MEDIA ANALYTICS
PEOPLE
PUBLICATIONS
PATENTS
CVPR 2021 | How do you combine solutions from classical geometric methods with deep learning methods in a principled way? We present a framework that learns relative camera pose estimation along with its probabilistic fusion using estimation from geometric methods.
READ MORE
CVPR 2021 | Our work addresses two key challenges in trajectory prediction: (i) learning multimodal outputs and (ii) improving predictions by imposing constraints using driving knowledge. Recent methods have achieved strong performance using multi-choice learning objectives like winner-takes-all (WTA), but they highly depend on their initialization to provide diverse outputs. Our first contribution proposes a novel divide-and-conquer (DAC) approach.
READ MORE
BMCV 2020 | We tackle an unsupervised domain adaptation problem: when the domain discrepancy between labeled source and unlabeled target domains is large, due to many factors of inter- and intra-domain variation. We propose decomposing domain discrepancy into multiple smaller discrepancies by introducing unlabeled bridging domains that connect the source and target domains; this makes it easier to minimize each.
READ MORE
ECCV 2020 | We address the problem of domain adaptation in videos for the task of human action recognition. Inspired by image-based domain adaptation, we propose to (i) learn to align important (discriminative) clips to achieve improved representation for the target domain and (ii) employ a self-supervised task that encourages the model to focus on actions rather than scene-context information in order to learn representations, which are more robust to domain shifts.
READ MORE
ECCV 2020 | We propose a simple but effective multi-source domain-generalization technique based on deep neural networks that incorporates optimized normalization layers that are specific to individual domains. Our approach employs multiple normalization methods while learning separate affine parameters per domain.
READ MORE
ECCV 2020 | We propose advances that address two key challenges in future trajectory prediction: (i) multi-modality in both training data and predictions and (ii) constant time inference regardless of number of agents. Existing trajectory predictions are fundamentally limited by lack of diversity in training data, which is difficult to acquire with sufficient coverage of possible modes.
READ MORE
ECCV 2020 | Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged from arbitrary existing datasets.
READ MORE
CVPR 2020 | Traditional recognition models require target domain data to adapt from high-quality training data to conduct unconstrained/low-quality face recognition. Model ensemble is further needed for a universal representation purpose, which significantly increases model complexity.
READ MORE
CVPR 2020 | We address the problem of inferring the layout of complex road scenes from video sequences. To this end, we formulate it as a top-view road attributes prediction problem, and our goal is to predict these attributes for each frame both accurately and consistently.
READ MORE
With increasing applications of semantic segmentation, numerous datasets have been proposed in the past few years. Yet labeling remains expensive, thus, it is desirable to jointly train models across aggregations of datasets to enhance data volume and diversity. However, label spaces differ across datasets and may even be in conflict with one another.
READ MORE
WACV 2020 | We propose an active learning approach for transferring representations across domains. Active adversarial domain adaptation (AADA) explores a duality between two related problems: (i) adversarial domain alignment and (ii) importance sampling for adapting models across domains. The former uses a domain-discriminative model to align domains, while the latter utilizes it to weigh samples to account for distribution shifts.
READ MORE
WACV 2020 | Blind video deblurring is a challenging task because the blur due to camera shake, object movement and defocusing is heterogeneous in both temporal and spatial dimensions. Traditional methods train on datasets synthesized with a single level of blur, and thus do not generalize well across levels of blurriness.
READ MORE
AAAI 2020 | Our aim is to learn privacy-preserving and task-oriented representations that defend against model inversion attacks. To achieve this, we propose an adversarial reconstruction-based framework for learning latent representations that cannot be decoded to recover the original input images.
READ MORE
IROS 2019 | We address the problem of 3D object detection from 2D monocular images in autonomous driving scenarios. We lift the 2D images to 3D representations using learned neural networks and leverage existing networks working directly on 3D data to perform 3D object detection and localization.
READ MORE
CCV 2019 | Traditional intrinsic image decomposition focuses on decomposing images into reflectance and shading, leaving surface normals and lighting entangled in shading. In this work, we propose a global-local spherical harmonics (GLoSH) lighting model to improve the lighting component and jointly predict reflectance and surface normals.
READ MORE
CVPR 2019 | Training with under-represented data leads to biased classifiers in conventionally trained deep networks. We propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
READ MORE
CVPR 2019 | We provide a solution that allows knowledge transfer from fully annotated source images to unlabeled target ones, which are often captured in a different condition. We adapt at multiple semantic levels from feature to pixel, with complementary insights for each type. Utilizing the proposal, we achieve better recognition accuracy of car images in an unlabeled surveillance domain by adapting the knowledge from car images on the web.
READ MORE
CML 2019 | We introduce Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces. In contrast to previous models, ours runs without the aid of spectral clustering. This enables our algorithm to gracefully scale to large datasets.
READ MORE
CLR 2019 | We propose Feature Transfer Network, a novel deep neural network for image-based face verification and identification that can adapt to biases like ethnicity, gender or age in a target set. Unlike existing methods, our network can even handle novel identities existing in the target domain.
READ MORE
ICLR 2019 | Simulation can be a useful tool when obtaining and annotating train data is costly. However, optimal tuning of simulator parameters can itself be a laborious task. We implement a meta-learning algorithm in which a reinforcement learning agent, as the met learner, automatically adjusts the parameters of a non-differentiable simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data.
READ MORE
ACCV 2018 | We exploit existing annotations in source images and transfer such visual information to segment videos with unseen object categories. Without using any annotations in the target video, we propose a method to jointly mine useful segments and learn feature representations that better adapt to the target frames.
READ MORE
ACCV 2018 | We introduce a method that simultaneously learns an embedding space along with subspaces within it to minimize a notion of reconstruction error. This addresses the problem of subspace clustering in an end-to-end learning paradigm. To achieve our goal, we propose a scheme to update subspaces within a deep neural network.
READ MORE
ECCV 2018 | We propose a method to forecast a vehicle’s ego-motion as a distribution over spatiotemporal paths, conditioned on features embedded in an overhead map. The method learns a policy and induces a distribution over simulated trajectories that is both “diverse” (produces most paths likely under the data) and “precise” (mostly produces paths likely under the data).
READ MORE
ECCV 2018 | We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes that are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD.
READ MORE
ECCV 2018 | We propose a convolutional neural network that learns to predict occluded portions of a scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top view.
READ MORE
CVPR 2018 | We develop a semantic segmentation method for adapting source ground truth labels to the unseen target domain. To achieve it, we consider semantic segmentation as structured prediction with spatial similarities between the source and target domains and then adopt multi-level adversarial learning in the output space.
READ MORE
CVPR 2018 | We propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process after receiving images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion and cluttered background. We next construct an efficient region-of-interest segmentation network to generate part masks, with a similarity-based scoring function to refine these object parts and generate final segmentation outputs.
READ MORE
NeurIPS 2017 | Deep object detectors require prohibitive runtimes to process an image for real-time applications. Model compression can learn compact models with fewer parameters, but accuracy is significantly degraded. In this work, we propose a new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation and hint learning.
READ MORE
CCV 2017 | Despite rapid advances in face recognition, there remains a clear gap between the performance of still-image-based face recognition and video-based face recognition. To address this, we propose an image-to-video feature-level domain adaptation method to learn discriminative video-frame representations.
READ MORE
ICCV 2017 | We propose an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos. The proposed SegFlow has two branches where useful information of object segmentation and optical flow is propagated bidirectionally in a unified framework. The unified framework can be trained iteratively offline to learn a generic notion, or it can be fine-tuned online for specific objects.
READ MORE
ICCV 2017 | Generic data-driven deep face features might confound images of the same identity under large poses with other identities. We propose a feature reconstruction metric learning to disentangle identity and pose information in the latent feature space. The disentangled feature space encourages identity features of the same subject to be clustered together in spite of pose variation.
READ MORE
ICCV 2017 | Despite recent advances in deep face recognition, severe accuracy drops are observed under large pose variations. Learning pose-invariant features is feasible but needs expensively labeled data. In this work, we focus on frontalizing faces in the wild under various head poses.
READ MORE
ICCV 2017 | We present a scene-parsing method that utilizes global context information based on both parametric and non-parametric models. Compared to previous methods which only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts.
READ MORE
CVPR 2017 | We demonstrate that it is possible to learn features for network-flow-based data association via backpropagation by expressing the optimum of a smoothed network flow problem as a differentiable function of the pairwise association costs. We apply this approach to multi-object tracking with a network-flow formulation.
READ MORE
CVPR 2017 | Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data. Given cheaply obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings using random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss.
READ MORE
CVPR 2017 | We introduce a deep stochastic IOC RNN encoder-decoder framework, DESIRE, for the task of future prediction of multiple interacting agents in dynamic scenes. It produces accurate future predictions by tackling multi-modality of futures while accounting for a rich set of both static and dynamic scene contexts.
READ MORE
NeurIPS 2016 | We tackle the problem of unsatisfactory convergence of training a deep neural network for metric learning by proposing multi-class N-pair loss. Unlike many other objective functions that ignore the information lying in the interconnections between the samples, N-pair loss utilizes full interaction of the examples from different classes within a batch.
READ MORE
ECCV 2016 | We investigate the novel problem of generating images from visual attributes. We model the image as a composite of foreground and background, and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder.
READ MORE
ECCV 2016 | We propose a cascaded framework for localizing landmarks in non-rigid objects. The first stage initializes the shape as constrained to lie within a low-rank manifold, and the second stage estimates local deformations parameterized as thin-plate spline transformations. Since our framework does not incorporate either handcrafted features or part connectivity, it is easy to train and test and generally applicable to various object types.
READ MORE
ECCV 2016 | We introduce a new light-field dataset of materials and take advantage of the recent success of deep learning to perform material recognition on the 4D light field. Our dataset contains 12 material categories, each with 100 images taken with a Lytro Illum, from which we extract about 30,000 patches in total.
READ MORE
CVPR 2016 | We derive a spatially-varying (SV) BRDF-invariant theory for recovering 3D shape and reflectance from light-field cameras. Our key theoretical insight is a novel analysis of diffuse plus single-lobe SVBRDFs under a light-field setup. We show that although direct shape recovery is not possible, an equation relating depths and normals can still be derived.
READ MORE
CVPR 2016 | Our WarpNet matches images of objects in fine-grained datasets without using part annotations. It aligns an object in one image with a different object in another by exploiting a fine-grained dataset to create artificial data for training a Siamese network with an unsupervised discriminative learning approach.
READ MORE
CVPR 2016 | We model the multi-level relevance among fine-grained classes for fine-grained categorization. We jointly optimize classification and similarity constraints in a proposed multi-task learning framework, and we embed label structures such as hierarchy or shared attributes into the framework by generalizing the triplet loss.
READ MORE
CVPR 2016 | We propose an iterative framework for fine-grained categorization and dataset bootstrapping. Using deep metric learning with humans in the loop, we learn a low-dimensional feature embedding with anchor points on manifolds for each category. In each round, images with high confidence scores are sent to humans for labeling, and the model is retrained based on the updated dataset.
READ MORE
CVPR 2016 | We exploit the rich relationships among fine-grained classes for fine-grained image classification. We model the relations using the proposed bipartite-graph labels (BGL) and incorporate them into CNN training. Our system is computationally efficient in inference thanks to the bipartite structure.
READ MORE
CVPR 2016 | We propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process after receiving the images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion and cluttered background.
READ MORE
CVPR 2016 | We present a physically interpretable 3D model for handling occlusions with applications to road scene understanding. Given object detection and SFM point tracks, our unified model probabilistically assigns point tracks to objects and reasons about object detection scores and bounding boxes.
READ MORE
ICML 2016 | We propose a novel framework for monocular traffic scene recognition, relying on a decomposition into high-order and atomic scenes to meet those challenges. High-order scenes carry semantic meaning useful for AWS applications, while atomic scenes are easy to learn and represent elemental behaviors based on 3D localization of individual traffic participants.
READ MORE
WACV 2016 | We propose a novel framework for monocular traffic scene recognition, relying on a decomposition into high-order and atomic scenes to meet those challenges. High-order scenes carry semantic meaning useful for AWS applications, while atomic scenes are easy to learn and represent elemental behaviors based on 3D localization of individual traffic participants.
READ MORE
Sorry, no posts matched your criteria