Overview: We develop an agentic LLM to solve complex workflows by deploying a combination of computer vision, logic and compute modules. Based on a natural language task specification, our LLM generates a plan to accomplish the task using available tools.
Overview: We develop embodied agents for robotics applications that require exploration, navigation and transport in complex scenes. Our modular hierarchical transport policy builds a topological graph of the scene to perform exploration, then combined motion planning algorithms to reach point goals within explored locations with object navigation policies for moving towards semantic targets at unknown locations.
Overview: Our foundational models enable ubiquitous usage of computer vision across scenarios, applications and user preferences.
Overview: We develop open vocabulary perception methods that combine the power of vision and language to provide rich descriptions of objects in scenes, including their attributes, behaviors, relations and interactions.