Atomic Scenes for Scalable Traffic Scene Recognition in Monocular Videos

We propose a novel framework for monocular traffic scene recognition, relying on a decomposition into high-order and atomic scenes to meet those challenges. High-order scenes carry semantic meaning useful for AWS applications, while atomic scenes are easy to learn and represent elemental behaviors based on 3D localization of individual traffic participants. We propose a novel hierarchical model that captures co-occurrence and mutual-exclusion relationships while incorporating both low-level trajectory features and high-level scene features, with parameters learned using a structured support vector machine. We propose efficient inference that exploits the structure of our model to obtain real-time rates.

Attribute2Image: Conditional Image Generation From Visual Attributes

This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.