Purdue University is a public research university in Indiana, known for top programs in engineering, aerospace, and data science. It supports innovation and entrepreneurship across academia and industry. NEC Labs America partners with Purdue University on federated analytics, interpretability, and privacy-preserving inference. Our joint work helps advance trusted machine learning pipelines. Please read about our latest news and collaborative publications with Purdue University.

Posts

SimCache: Similarity Caching for Efficient VLM-based Scene Understanding

Scene understanding systems analyze visual contexts by detecting objects, their attributes, and the interactions among them to provide a holistic interpretation. Understanding a scene requires analyzing multiple salient regions within a single video frame. Recently, Vision-Language Models (VLMs) have emerged as powerful tools for scene understanding, leveraging learned world knowledge to enable deployment without specialized training or fine-tuning. However, deploying VLMs in real-time applications is challenging due to their high computational and memory requirements, which limit processing throughput. We propose SimCache, a novel software-based caching mechanism that optimizes VLM-based scene understanding systems by reducing redundant computations. SimCache stores the embedding representation of a salient region and its detected activity, enabling reuse of VLM computations for similar regions in future frames. Specifically, SimCache exploits two types of redundancy: (1) temporal locality, reusing computations for similar regions across adjacent frames, and (2) semantic locality, reusing computations for visually distinct regions that represent the same activity at different times. SimCache includes a multi-tier cache architecture with specialized cache search and refinement policies to exploit redundancy efficiently and accurately. Experiments on action recognition datasets demonstrate that SimCache improves system throughput by up to 9.4× and reduces VLM computations by up to 24.4× with minimal accuracy loss.