Overview: Multi-task learning commonly encounters competition for resources among tasks when model capacity is limited. We develop neural architectures that allow control over the relative importance of tasks and total compute cost during inference time.
Overview: Our foundational models enable ubiquitous usage of computer vision across scenarios, applications and user preferences.
Overview: Our simulation framework utilizes advances in neural rendering, diffusion models and large language models to automatically transform drive data into a full 3D sensor simulation testbed with unmatched photorealism.
Overview: We develop open vocabulary perception methods that combine the power of vision and language to provide rich descriptions of objects in scenes, including their attributes, behaviors, relations and interactions.
Overview: Our techniques from unsupervised and semi-supervised learning, such as domain adaptation and domain generalization, allow robust and responsible AI solutions across multiple applications such as image classification, face recognition, facial anti-spoofing, object detection and semantic segmentation.