Overview: We developed an agentic LLM to solve complex workflows by deploying a combination of computer vision, logic and compute modules. Based on a natural language task specification, our LLM generates a plan to accomplish the task using available tools.
Overview: Multi-task learning commonly encounters competition for resources among tasks when model capacity is limited. We develop neural architectures that allow control over the relative importance of tasks and total compute cost during inference time.
Overview: Our AI DevOps pipeline builds a high-fidelity digital twin of sensor data which allows for self-improvement of deployed models. We leverage our foundational vision-language models to automatically determine issues in currently deployed AI, pseudo-label or simulate training data, develop models with continual learning and use an LLM-based verification over diverse scenarios.
Overview: We develop open vocabulary perception methods that combine the power of vision and language to provide rich descriptions of objects in scenes, including their attributes, behaviors, relations and interactions.
Overview: Our techniques from unsupervised and semi-supervised learning, such as domain adaptation and domain generalization, allow robust and responsible AI solutions across multiple applications such as image classification, face recognition, facial anti-spoofing, object detection and semantic segmentation.