Simultaneous Video Analytics refers to the concurrent analysis of multiple aspects or features within a video stream. This may include object detection, tracking, recognition, and other computer vision tasks performed simultaneously in real-time. The goal is to extract comprehensive information from the video feed to support various applications, including surveillance and automated monitoring.

Posts

Elixir: A System To Enhance Data Quality For Multiple Analytics On A Video Stream

IoT sensors, especially video cameras, are ubiquitously deployed around the world to perform a variety of computer vision tasks in several verticals including retail, health- care, safety and security, transportation, manufacturing, etc. To amortize their high deployment effort and cost, it is desirable to perform multiple video analytics tasks, which we refer to as Analytical Units (AUs), off the video feed coming out of every camera. As AUs typically use deep learning-based AI/ML models, their performance depend on the quality of the input video, and recent work has shown that dynamically adjusting the camera setting exposed by popular network cameras can help improve the quality of the video feed and hence the AU accuracy, in a single AU setting. In this paper, we first show that in a multi-AU setting, changing the camera setting has disproportionate impact on different AUs performance. In particular, the optimal setting for one AU may severely degrade the performance for another AU, and further the impact on different AUs varies as the environmental condition changes. We then present Elixir, a system to enhance the video stream quality for multiple analytics on a video stream. Elixir leverages Multi-Objective Reinforcement Learning (MORL), where the RL agent caters to the objectives from different AUs and adjusts the camera setting to simultaneously enhance the performance of all AUs. To define the multiple objectives in MORL, we develop new AU-specific quality estimator values for each individual AU. We evaluate Elixir through real-world experiments on a testbed with three cameras deployed next to each other (overlooking a large enterprise parking lot) running Elixir and two baseline approaches, respectively. Elixir correctly detects 7.1% (22,068) and 5.0% (15,731) more cars, 94% (551) and 72% (478) more faces, and 670.4% (4975) and 158.6% (3507) more persons than the default-setting and time-sharing approaches, respectively. It also detects 115 license plates, far more than the time-sharing approach (7) and the default setting (0).