Reinforcement Learning is a type of machine learning paradigm where an agent learns to make decisions by interacting with an environment. The agent aims to maximize a cumulative reward signal over time by taking a sequence of actions in the environment. This learning process involves trial and error, as the agent explores various actions and learns from the consequences of those actions.

Posts

Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning

In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition running on remote servers critically rely on surveillance cameras to capture high-quality video streams in order to achieve high accuracy. Modern IP cameras come with a large number of camera parameters that directly affect the quality of the video stream capture. While a few of such parameters, e.g., exposure, focus, white balance are automatically adjusted by the camera internally, the remaining ones are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this paper, we first show that environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. We then present CamTuner, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CamTuner is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CamTuner enhances VAP accuracy by detecting 15.9% additional persons and 2.6%–4.2% additional cars (without any false positives) in a large enterprise parking lot and 9.7% additional cars in a 5G smart traffic intersection scenario, which enables a new usecase of accurate and reliable automatic vehicle collision prediction (AVCP). CamTuner opens doors for new ways to significantly enhance video analytics accuracy beyond incremental improvements from refining deep-learning models.

Application-specific, Dynamic Reservation of 5G Compute and Network Resources by using Reinforcement Learning

5G services and applications explicitly reserve compute and network resources in today’s complex and dynamic infrastructure of multi-tiered computing and cellular networking to ensure application-specific service quality metrics, and the infrastructure providers charge the 5G services for the resources reserved. A static, one-time reservation of resources at service deployment typically results in extended periods of under-utilization of reserved resources during the lifetime of the service operation. This is due to a plethora of reasons like changes in content from the IoT sensors (for example, change in number of people in the field of view of a camera) or a change in the environmental conditions around the IoT sensors (for example, time of the day, rain or fog can affect data acquisition by sensors). Under-utilization of a specific resource like compute can also be due to temporary inadequate availability of another resource like the network bandwidth in a dynamic 5G infrastructure. We propose a novel Reinforcement Learning-based online method to dynamically adjust an application’s compute and network resource reservations to minimize under-utilization of requested resources, while ensuring acceptable service quality metrics. We observe that a complex application-specific coupling exists between the compute and network usage of an application. Our proposed method learns this coupling during the operation of the service, and dynamically modulates the compute and network resource requests to mimimize under-utilization of reserved resources. Through experimental evaluation using real-world video analytics application, we show that our technique is able to capture complex compute-network coupling relationship in an online manner i.e. while the application is running, and dynamically adapts and saves up to 65% compute and 93% network resources on average (over multiple runs), without significantly impacting application accuracy.

Learning Transferable Reward for Query Object Localization with Policy Adaptation

We propose a reinforcement learning-based approach to query object localization, for which an agent is trained to localize objects of interest specified by a small exemplary set. We learn a transferable reward signal formulated using the exemplary set by ordinal metric learning. Our proposed method enables test-time policy adaptation to new environments where the reward signals are not readily available and outperforms fine-tuning approaches that are limited to annotated images. In addition, the transferable reward allows repurposing the trained agent from one specific class to another class. Experiments on corrupted MNIST, CU-Birds, and COCO datasets demonstrate the effectiveness of our approach.

DataXe: A System for Application Self-optimization in Serverless Edge Computing Environments

A key barrier to building performant, remotely managed and self-optimizing multi-sensor, distributed stream processing edge applications is high programming complexity. We recently proposed DataX [1], a novel platform that improves programmer productivity by enabling easy exchange, transformations, and fusion of data streams on virtualized edge computing infrastructure. This paper extends DataX to include (a) serverless computing that automatically scales stateful and stateless analytics units (AUs) on virtualized edge environments, (b) novel communication mechanisms that efficiently communicate data among analytics units, and (c) new techniques to promote automatic reuse and sharing of analytics processing across multiple applications in a lights out, serverless computing environment. Synthesizing these capabilities into a single platform has been substantially more transformative than any available stream processing system for the edge. We refer to this enhanced and efficient version of DataX as DataXe. To the best of our knowledge, this is the first serverless system for stream processing. For a real-world video analytics application, we observed that the performance of the DataXe implementation of the analytics application is about 3X faster than a standalone implementation of the analytics application with custom, handcrafted communication, multiprocessing and allocation of edge resources.

Dynamic Causal Discovery in Imitation Learning

Using deep reinforcement learning (DRL) to recover expert policies via imitation has been found to be promising in a wide range of applications. However, it remains a difficult task to interpret the control policy learned by the agent. Difficulties mainly come from two aspects: 1) agents in DRL are usually implemented as deep neural networks (DNNs), which are black-box models and lack in interpretability, 2) the latent causal mechanism behind agents’ decisions may vary along the trajectory, rather than staying static throughout time steps. To address these difficulties, in this paper, we propose a self-explaining imitation framework, which can expose causal relations among states and action variables behind its decisions. Specifically, a dynamic causal discovery module is designed to extract the causal graph basing on historical trajectory and current states at each time step, and a causality encoding module is designed to model the interactions among variables with discovered causal edges. After encoding causality into variable embeddings, a prediction model conducts the imitation learning on top of obtained representations. These three components are trained end-to-end, and discovered causal edges can provide interpretations on rules captured by the agent. Comprehensive experiments are conducted on the simulation dataset to analyze its causal discovery capacity, and we further test it on a real-world medical dataset MIMIC-IV. Experimental results demonstrate its potential of providing explanations behind decisions.

Magic-Pipe: Self-optimizing video analytics pipelines

Microservices-based video analytics pipelines routinely use multiple deep convolutional neural networks. We observe that the best allocation of resources to deep learning engines (or microservices) in a pipeline, and the best configuration of parameters for each engine vary over time, often at a timescale of minutes or even seconds based on the dynamic content in the video. We leverage these observations to develop Magic-Pipe, a self-optimizing video analytic pipeline that leverages AI techniques to periodically self-optimize. First, we propose a new, adaptive resource allocation technique to dynamically balance the resource usage of different microservices, based on dynamic video content. Then, we propose an adaptive microservice parameter tuning technique to balance the accuracy and performance of a microservice, also based on video content. Finally, we propose two different approaches to reduce unnecessary computations due to unavoidable mismatch of independently designed, re-usable deep-learning engines: a deep learning approach to improve the feature extractor performance by filtering inputs for which no features can be extracted, and a low-overhead graph-theoretic approach to minimize redundant computations across frames. Our evaluation of Magic-Pipe shows that pipelines augmented with self-optimizing capability exhibit application response times that are an order of magnitude better than the original pipelines, while using the same hardware resources, and achieving similar high accuracy.

CamTuner: Reinforcement Learning based System for Camera Parameter Tuning to enhance Analytics

Video analytics systems critically rely on video cameras, which capture high quality video frames, to achieve high analytics accuracy. Although modern video cameras often expose tens of configurable parameter settings that can be set by end users, deployment of surveillance cameras today often uses a fixed set of parameter settings because the end users lack the skill or understanding to reconfigure these parameters. In this paper, we first show that in a typical surveillance camera deployment, environmental condition changes can significantly affect the accuracy of analytics units such as person detection, face detection and face recognition, and how such adverse impact can be mitigated by dynamically adjusting camera settings. We then propose CAMTUNER, a framework that can be easily applied to an existing video analytics pipeline (VAP) to enable automatic and dynamic adaptation of complex camera settings to changing environmental conditions, and autonomously optimize the accuracy of analytics units (AUs) in the VAP. CAMTUNER is based on SARSA reinforcement learning (RL) and it incorporates two novel components: a light weight analytics quality estimator and a virtual camera. CAMTUNER is implemented in a system with AXIS surveillance cameras and several VAPs (with various AUs) that processed day long customer videos captured at airport entrances. Our evaluations show that CAMTUNER can adapt quickly to changing environments. We compared CAMTUNER with two alternative approaches where either static camera settings were used, or a strawman approach where camera settings were manually changed every hour (based on human perception of quality). We observed that for the face detection and person detection AUs, CAMTUNER is able to achieve up to 13.8% and 9.2% higher accuracy, respectively, compared to the best of the two approaches (average improvement of 8% for both AUs).

Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes

Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor’s treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors, 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance, 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.

Learning Robust Representations with Graph Denoising Policy Network

Existing representation learning methods based on graph neural networks and their variants rely on the aggregation of neighborhood information, which makes it sensitive to noises in the graph, e.g. erroneous links between nodes, incorrect/missing node features. In this paper, we propose Graph Denoising Policy Network (short for GDPNet) to learn robust representations from noisy graph data through reinforcement learning. GDPNet first selects signal neighborhoods for each node, and then aggregates the information from the selected neighborhoods to learn node representations for the down-stream tasks. Specifically, in the signal neighborhood selection phase, GDPNet optimizes the neighborhood for each target node by formulating the process of removing noisy neighborhoods as a Markov decision process and learning a policy with task-specific rewards received from the representation learning phase. In the representation learning phase, GDPNet aggregates features from signal neighbors to generate node representations for down-stream tasks, and provides task-specific rewards to the signal neighbor selection phase. These two phases are jointly trained to select optimal sets of neighbors for target nodes with maximum cumulative task-specific rewards, and to learn robust representations for nodes. Experimental results on node classification task demonstrate the effectiveness of GDNet, outperforming the state-of-the-art graph representation learning methods on several well-studied datasets.

Learning To Simulate

Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications.