Kunal Rao NEC Labs America

Kunal Rao

Researcher

Integrated Systems

Posts

Why is the video analytics accuracy fluctuating, and what can we do about it?

It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this “leap of faith” that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually but are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an “unintentional adversary” because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. Our experiments with a number of different cameras, and a variety of different video analytics tasks, show that the inadvertent adversarial effect from the camera can be noticeably offset by quickly re-training the deep learning models using transfer learning. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects (∼40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera’s adversarial effect on deep learning models used for video analytics applications.

DataXc: Flexible and efficient communication in microservices-based stream analytics pipelines

A big challenge in changing a monolithic application into a performant microservices-based application is the design of efficient mechanisms for microservices to communicate with each other. Prior proposals range from custom point-to-point communication among microservices using protocols like gRPC to service meshes like Linkerd to a flexible, many-to-many communication using broker-based messaging systems like NATS. We propose a new communication mechanism, DataXc, that is more efficient than prior proposals in terms of message latency, jitter, message processing rate and use of network resources. To the best of our knowledge, DataXc is the first communication design that has the desirable flexibility of a broker-based messaging systems like NATS and the high-performance of a rigid, custom point-to-point communication method. DataXc proposes a novel “pull” based communication method (i.e consumers fetch messages from producers). This is unlike prior proposals like NATS, gRPC or Linkerd, all of which are “push” based (i.e. producers send messages to consumers). Such communication methods make it difficult to take advantage of differential processing rates of consumers like video analytics tasks. In contrast, DataXc proposes a “pull” based design that avoids unnecessary communication of messages that are eventually discarded by the consumers. Also, unlike prior proposals, DataXc successfully addresses several key challenges in streaming video analytics pipelines like non-uniform processing of frames from multiple cameras, and high variance in latency of frames processed by consumers, all of which adversely affect the quality of insights from streaming video analytics. We report results on two popular real-world, streaming video analytics pipelines (video surveillance, and video action recognition). Compared to NATS, DataXc is just as flexible, but it has far superior performance: upto 80% higher processing rate, 3X lower latency, 7.5X lower jitter and 4.5X lower network bandwidth usage. Compared to gRPC or Linkerd, DataXc is highly flexible, achieves up to 2X higher processing rate, lower latency and lower jitter, but it also consumes more network bandwidth.

Application-specific, Dynamic Reservation of 5G Compute and Network Resources by using Reinforcement Learning

5G services and applications explicitly reserve compute and network resources in today’s complex and dynamic infrastructure of multi-tiered computing and cellular networking to ensure application-specific service quality metrics, and the infrastructure providers charge the 5G services for the resources reserved. A static, one-time reservation of resources at service deployment typically results in extended periods of under-utilization of reserved resources during the lifetime of the service operation. This is due to a plethora of reasons like changes in content from the IoT sensors (for example, change in number of people in the field of view of a camera) or a change in the environmental conditions around the IoT sensors (for example, time of the day, rain or fog can affect data acquisition by sensors). Under-utilization of a specific resource like compute can also be due to temporary inadequate availability of another resource like the network bandwidth in a dynamic 5G infrastructure. We propose a novel Reinforcement Learning-based online method to dynamically adjust an application’s compute and network resource reservations to minimize under-utilization of requested resources, while ensuring acceptable service quality metrics. We observe that a complex application-specific coupling exists between the compute and network usage of an application. Our proposed method learns this coupling during the operation of the service, and dynamically modulates the compute and network resource requests to mimimize under-utilization of reserved resources. Through experimental evaluation using real-world video analytics application, we show that our technique is able to capture complex compute-network coupling relationship in an online manner i.e. while the application is running, and dynamically adapts and saves up to 65% compute and 93% network resources on average (over multiple runs), without significantly impacting application accuracy.

ROMA: Resource Orchestration for Microservices-based 5G Applications

With the growth of 5G, Internet of Things (IoT), edge computing and cloud computing technologies, the infrastructure (compute and network) available to emerging applications (AR/VR, autonomous driving, industry 4.0, etc.) has become quite complex. There are multiple tiers of computing (IoT devices, near edge, far edge, cloud, etc.) that are connected with different types of networking technologies (LAN, LTE, 5G, MAN, WAN, etc.). Deployment and management of applications in such an environment is quite challenging. In this paper, we propose ROMA, which performs resource orchestration for microservices-based 5G applications in a dynamic, heterogeneous, multi-tiered compute and network fabric. We assume that only application-level requirements are known, and the detailed requirements of the individual microservices in the application are not specified. As part of our solution, ROMA identifies and leverages the coupling relationship between compute and network usage for various microservices and solves an optimization problem in order to appropriately identify how each microservice should be deployed in the complex, multi-tiered compute and network fabric, so that the end-to-end application requirements are optimally met. We implemented two real-world 5G applications in video surveillance and intelligent transportation system (ITS) domains. Through extensive experiments, we show that ROMA is able to save up to 90%, 55% and 44% compute and up to 80%, 95% and 75% network bandwidth for the surveillance (watchlist) and transportation application (person and car detection), respectively. This improvement is achieved while honoring the application performance requirements, and it is over an alternative scheme that employs a static and overprovisioned resource allocation strategy by ignoring the resource coupling relationships.

DataXe: A System for Application Self-optimization in Serverless Edge Computing Environments

A key barrier to building performant, remotely managed and self-optimizing multi-sensor, distributed stream processing edge applications is high programming complexity. We recently proposed DataX [1], a novel platform that improves programmer productivity by enabling easy exchange, transformations, and fusion of data streams on virtualized edge computing infrastructure. This paper extends DataX to include (a) serverless computing that automatically scales stateful and stateless analytics units (AUs) on virtualized edge environments, (b) novel communication mechanisms that efficiently communicate data among analytics units, and (c) new techniques to promote automatic reuse and sharing of analytics processing across multiple applications in a lights out, serverless computing environment. Synthesizing these capabilities into a single platform has been substantially more transformative than any available stream processing system for the edge. We refer to this enhanced and efficient version of DataX as DataXe. To the best of our knowledge, this is the first serverless system for stream processing. For a real-world video analytics application, we observed that the performance of the DataXe implementation of the analytics application is about 3X faster than a standalone implementation of the analytics application with custom, handcrafted communication, multiprocessing and allocation of edge resources.

Edge-based fever screening system over private 5G

Edge computing and 5G have made it possible to perform analytics closer to the source of data and achieve super-low latency response times, which isn’t possible with centralized cloud deployment. In this paper, we present a novel fever screening system, which uses edge machine learning techniques and leverages private 5G to accurately identify and screen individuals with fever in real-time. Particularly, we present deep-learning based novel techniques for fusion and alignment of cross-spectral visual and thermal data streams at the edge. Our novel Cross-Spectral Generative Adversarial Network (CS-GAN) synthesizes visual images that have the key, representative object level features required to uniquely associate objects across visual and thermal spectrum. Two key features of CS-GAN are a novel, feature-preserving loss function that results in high-quality pairing of corresponding cross-spectral objects, and dual bottleneck residual layers with skip connections (a new, network enhancement) to not only accelerate real-time inference, but to also speed up convergence during model training at the edge. To the best of our knowledge, this is the first technique that leverages 5G networks and limited edge resources to enable real-time feature-level association of objects in visual and thermal streams (30 ms per full HD frame on an Intel Core i7-8650 4-core, 1.9GHz mobile processor). To the best of our knowledge, this is also the first system to achieve real-time operation, which has enabled fever screening of employees and guests in arenas, theme parks, airports and other critical facilities. By leveraging edge computing and 5G, our fever screening system is able to achieve 98.5% accuracy and is able to process ∼ 5X more people when compared to a centralized cloud deployment.

Magic-Pipe: Self-optimizing video analytics pipelines

Microservices-based video analytics pipelines routinely use multiple deep convolutional neural networks. We observe that the best allocation of resources to deep learning engines (or microservices) in a pipeline, and the best configuration of parameters for each engine vary over time, often at a timescale of minutes or even seconds based on the dynamic content in the video. We leverage these observations to develop Magic-Pipe, a self-optimizing video analytic pipeline that leverages AI techniques to periodically self-optimize. First, we propose a new, adaptive resource allocation technique to dynamically balance the resource usage of different microservices, based on dynamic video content. Then, we propose an adaptive microservice parameter tuning technique to balance the accuracy and performance of a microservice, also based on video content. Finally, we propose two different approaches to reduce unnecessary computations due to unavoidable mismatch of independently designed, re-usable deep-learning engines: a deep learning approach to improve the feature extractor performance by filtering inputs for which no features can be extracted, and a low-overhead graph-theoretic approach to minimize redundant computations across frames. Our evaluation of Magic-Pipe shows that pipelines augmented with self-optimizing capability exhibit application response times that are an order of magnitude better than the original pipelines, while using the same hardware resources, and achieving similar high accuracy.

SmartSlice: Dynamic, Self-optimization of Application’s QoS requests to 5G networks

Applications can tailor a network slice by specifying a variety of QoS attributes related to application-specific performance, function or operation. However, some QoS attributes like guaranteed bandwidth required by the application do vary over time. For example, network bandwidth needs of video streams from surveillance cameras can vary a lot depending on the environmental conditions and the content in the video streams. In this paper, we propose a novel, dynamic QoS attribute prediction technique that assists any application to make optimal resource reservation requests at all times. Standard forecasting using traditional cost functions like MAE, MSE, RMSE, MDA, etc. don’t work well because they do not take into account the direction (whether the forecasting of resources is more or less than needed), magnitude (by how much the forecast deviates, and in which direction), or frequency (how many times the forecast deviates from actual needs, and in which direction). The direction, magnitude and frequency have a direct impact on the application’s accuracy of insights, and the operational costs. We propose a new, parameterized cost function that takes into account all three of them, and guides the design of a new prediction technique. To the best of our knowledge, this is the first work that considers time-varying application requirements and dynamically adjusts slice QoS requests to 5G networks in order to ensure a balance between application’s accuracy and operational costs. In a real-world deployment of a surveillance video analytics application over 17 cameras, we show that our technique outperforms other traditional forecasting methods, and it saves 34% of network bandwidth (over a ~24 hour period) when compared to a static, one-time reservation.

CamTuner: Reinforcement Learning based System for Camera Parameter Tuning to enhance Analytics

Video analytics systems critically rely on video cameras, which capture high quality video frames, to achieve high analytics accuracy. Although modern video cameras often expose tens of configurable parameter settings that can be set by end users, deployment of surveillance cameras today often uses a fixed set of parameter settings because the end users lack the skill or understanding to reconfigure these parameters. In this paper, we first show that in a typical surveillance camera deployment, environmental condition changes can significantly affect the accuracy of analytics units such as person detection, face detection and face recognition, and how such adverse impact can be mitigated by dynamically adjusting camera settings. We then propose CAMTUNER, a framework that can be easily applied to an existing video analytics pipeline (VAP) to enable automatic and dynamic adaptation of complex camera settings to changing environmental conditions, and autonomously optimize the accuracy of analytics units (AUs) in the VAP. CAMTUNER is based on SARSA reinforcement learning (RL) and it incorporates two novel components: a light weight analytics quality estimator and a virtual camera. CAMTUNER is implemented in a system with AXIS surveillance cameras and several VAPs (with various AUs) that processed day long customer videos captured at airport entrances. Our evaluations show that CAMTUNER can adapt quickly to changing environments. We compared CAMTUNER with two alternative approaches where either static camera settings were used, or a strawman approach where camera settings were manually changed every hour (based on human perception of quality). We observed that for the face detection and person detection AUs, CAMTUNER is able to achieve up to 13.8% and 9.2% higher accuracy, respectively, compared to the best of the two approaches (average improvement of 8% for both AUs).

AppSlice: A system for application-centric design of 5G and edge computing applications

Applications that use edge computing and 5G to improve response times consume both compute and network resources. However, 5G networks manage only network resources without considering the application’s compute requirements, and container orchestration frameworks manage only compute resources without considering the application’s network requirements. We observe that there is a complex coupling between an application’s compute and network usage, which can be leveraged to improve application performance and resource utilization. We propose a new, declarative abstraction called app slice that jointly considers the application’s compute and network requirements. This abstraction leverages container management systems to manage edge computing resources, and 5G network stacks to manage network resources, while the joint consideration of coupling between compute and network usage is explicitly managed by a new runtime system, which delivers the declarative semantics of the app slice. The runtime system also jointly manages the edge compute and network resource usage automatically across different edge computing environments and 5G networks by using two adaptive algorithms. We implement a complex, real-world, real-time monitoring application using the proposed app slice abstraction, and demonstrate on a private 5G/LTE testbed that the proposed runtime system significantly improves the application performance and resource usage when compared with the case where the coupling between the compute and network resource usage is ignored.