MEDIA ANALYTICS
PEOPLE
PUBLICATIONS
PATENTS
Agentic LLMs for AI Orchestration
We develop an agentic LLM to solve complex workflows by deploying a combination of computer vision, logic and compute modules. Based on a natural language task specification, our LLM generates a plan to accomplish the task using available tools. The plan is represented as a Python program synthesized to deploy the available tools, which can be anything that can be invoked programmatically.
READ MORE
Autonomous Driving
While autonomous cars are rapidly becoming a reality, it remains a challenge to scalably deploy them across geographies and conditions. Our full-stack autonomy solutions include perception, prediction, planning, simulation and devops that leverage latest advances in generative AI, neural rendering, large language models, diffusion models and transformers.
READ MORE
Foundational Vision-Language Models
Our foundational models enable ubiquitous usage of computer vision across scenarios, applications and user preferences. By combining the power of very large-scale computer vision and natural language datasets, together with innovations in visual instruction following, our foundational models yield deeper domain-specific insights, at lower data center costs, and with lower hallucinations.
READ MORE
Neural Rendering and Diffusion for Simulation
Our simulation framework utilizes advances in neural rendering, diffusion models and large language models to automatically transform drive data into a full 3D sensor simulation testbed with unmatched photorealism. We offer language-based control to generate safety-critical scenarios such as collisions, traffic rule violations and other unsafe behaviors, to improve the perception and planning abilities of autonomous vehicles.
READ MORE
Open Vocabulary Perception
Perception methods such as object detection and image segmentation form a basic building block of most computer vision applications. We develop open vocabulary perception methods that combine the power of vision and language to provide rich descriptions of objects in scenes, including their attributes, behaviors, relations and interactions.
READ MORE
Prediction and Planning
We are pioneers in the development of generative models that predict long-horizon future trajectories of dynamic objects, with probabilistic outcomes that account for diverse future actions with the same past. Our methods such as DESIRE, SMART and DAC achieve various capabilities such as diversity, scene consistency, constant-time inference and multimodality that adheres to lane geometries and driving rules.
READ MORE
Multimodal LLMs for AI DevOps
Safety-critical applications must account for all scenarios, including those posing high risks despite being under-observed in usual scenarios. Applications like autonomous driving incur a high development cost since they require extensive data collection, data curation, model training and verification, which are prohibitively expensive and pose barriers to new entrants in the space.
READ MORE
3D Perception
We have pioneered the development of learned bird-eye view representations for road scenes which form a basis for 3D perception using images in applications like autonomous driving. Our techniques for 3D localization of objects achieve high accuracy for object position, orientation and part locations with just a monocular camera, using novel geometric and learned priors.
READ MORE
Robustness and Fairness
Modern applications of computer vision demand robustness across scenarios as well as social acceptability. For example, object detection must work across daytime and low-light conditions, or face recognition should produce accurate outputs across ethnicities. To deal with such scenarios, we develop universal representation learning methods that go beyond the limitations of expensive and high-quality labeled data, to utilize large-scale and diverse unlabeled data.
READ MORE
Robust and Unbiased Face Recognition
Our face recognition methods achieve high accuracy on competitive public benchmarks through the use of universal representation learning techniques that leverage very large-scale datasets, with robustness to variations such as occlusions, blur, lighting or accessories. We develop methods in long-tail recognition that account for the low sample diversity of most identities in face recognition datasets.
READ MORE
Privacy-Aware and Federated Learning
Privacy impacts every stakeholder in the AI solution ecosystem, including consumers, operators, solution providers and regulators. This is especially true for applications such as healthcare, safety and finance which require collecting and analyzing highly sensitive data. We develop AI solutions to assure customers that private information is not leaked at any stage of the data lifecycle.
READ MORE
Privacy-Aware Cameras
Besides privacy-aware learning, we also develop methods for privacy-aware sensing. In particular, we develop novel computational cameras that allow computer vision analysis even in sensitive environments like hospitals or smart homes. Our key innovation is a camera that removes private information. Our adversarial training approach allows simultaneously high accuracy and high privacy through learned phase masks, which are inserted in the focal plane of the camera.
READ MORE
Dynamic Multi-Task Architectures
Multi-task learning commonly encounters competition for resources among tasks when model capacity is limited. We develop neural architectures that allow control over the relative importance of tasks and total compute cost during inference time. Our controllable multi-task networks dynamically adjust architecture and weights to match desired task preferences as well as resource constraints.
READ MORE
Embodied AI
We develop embodied agents for robotics applications that require exploration, navigation and transport in complex scenes. Our modular hierarchical transport policy builds a topological graph of the scene to perform exploration, then combined motion planning algorithms to reach point goals within explored locations with object navigation policies for moving towards semantic targets at unknown locations.
READ MORE