iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
Grounding large language models (LLMs) in domain-specific tasks like post-hoc dash-cam driving video analysis is challenging due to their general-purpose training and lack of structured inductive biases. As vision is often the sole modality available for such analysis (i.e. no LiDAR, GPS, etc.), existing




