FedSkill: Privacy Preserved Interpretable Skill Learning via Imitation Imitation learning that replicates experts’ skills via their demonstrations has shown significant success in various decision-making tasks. However, two critical challenges still hinder the deployment of imitation learning techniques in real-world application scenarios. First, existing methods lack the intrinsic interpretability to explicitly explain the underlying rationale of the learned skill and thus making learned policy untrustworthy. Second, due to the scarcity of expert demonstrations from each end user (client), learning a policy based on different data silos is necessary but challenging in privacy-sensitive applications such as finance and healthcare. To this end, we present a privacy-preserved interpretable skill learning framework (FedSkill) that enables global policy learning to incorporate data from different sources and provides explainable interpretations to each local user without violating privacy and data sovereignty. Specifically, our proposed interpretable skill learning model can capture the varying patterns in the trajectories of expert demonstrations, and extract prototypical information as skills that provide implicit guidance for policy learning and explicit explanations in the reasoning process. Moreover, we design a novel aggregation mechanism coupled with the based skill learning model to preserve global information utilization and maintain local interpretability under the federated framework. Thoroughly experiments on three datasets and empirical studies demonstrate that our proposed FedSkill framework not only outperforms state-of-the-art imitation learning methods but also exhibits good interpretability under a federated setting. Our proposed FedSkill framework is the first attempt to bridge the gaps among federated learning, interpretable machine learning, and imitation learning.
Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations Imitation learning has achieved great success in many sequential decision-making tasks, in which a neural agent is learned by imitating collected human demonstrations. However, existing algorithms typically require a large number of high-quality demonstrations that are difficult and expensive to collect. Usually, a trade-off needs to be made between demonstration quality and quantity in practice. Targeting this problem, in this work we consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. Some pioneering works have been proposed, but they suffer from many limitations, e.g., assuming a demonstration to be of the same optimality throughout time steps and failing to provide any interpretation w.r.t knowledge learned from the noisy set. Addressing these problems, we propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills. Concretely, SDIL consists of a high-level controller to discover skills and a skill-conditioned module to capture action-taking policies and is trained following a two-phase pipeline by first discovering skills with all demonstrations and then adapting the controller to only the clean set. A mutual-information-based regularization and a dynamic sub-demonstration optimality estimator are designed to promote disentanglement in the skill space. Extensive experiments are conducted over two gym environments and a real-world healthcare dataset to demonstrate the superiority of SDIL in learning from sub-optimal demonstrations and its improved interpretability by examining learned skills.
Dynamic Causal Discovery in Imitation Learning Using deep reinforcement learning (DRL) to recover expert policies via imitation has been found to be promising in a wide range of applications. However, it remains a difficult task to interpret the control policy learned by the agent. Difficulties mainly come from two aspects: 1) agents in DRL are usually implemented as deep neural networks (DNNs), which are black-box models and lack in interpretability, 2) the latent causal mechanism behind agents’ decisions may vary along the trajectory, rather than staying static throughout time steps. To address these difficulties, in this paper, we propose a self-explaining imitation framework, which can expose causal relations among states and action variables behind its decisions. Specifically, a dynamic causal discovery module is designed to extract the causal graph basing on historical trajectory and current states at each time step, and a causality encoding module is designed to model the interactions among variables with discovered causal edges. After encoding causality into variable embeddings, a prediction model conducts the imitation learning on top of obtained representations. These three components are trained end-to-end, and discovered causal edges can provide interpretations on rules captured by the agent. Comprehensive experiments are conducted on the simulation dataset to analyze its causal discovery capacity, and we further test it on a real-world medical dataset MIMIC-IV. Experimental results demonstrate its potential of providing explanations behind decisions.
Hierarchical Imitation Learning with Contextual Bandits for Dynamic Treatment Regimes Imitation learning has been proved to be effective in mimicking experts’ behaviors from their demonstrations without access to explicit reward signals. Meanwhile, complex tasks, e.g., dynamic treatment regimes for patients with comorbidities, often suggest significant variability in expert demonstrations with multiple sub-tasks. In these cases, it could be difficult to use a single flat policy to handle tasks of hierarchical structures. In this paper, we propose the hierarchical imitation learning model, HIL, to jointly learn latent high-level policies and sub-policies (for individual sub-tasks) from expert demonstrations without prior knowledge. First, HIL learns sub-policies by imitating expert trajectories with the sub-task switching guidance from high-level policies. Second, HIL collects the feedback from its sub-policies to optimize high-level policies, which is modeled as a contextual multi-arm bandit that sequentially selects the best sub-policies at each time step based on the contextual information derived from demonstrations. Compared with state-of-the-art baselines on real-world medical data, HIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of hierarchical structures in expert demonstrations.
Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor’s treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors, 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance, 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.
4 Independence Way, Suite 200
Princeton, NJ 08540
San Jose Office
2033 Gateway Place, Suite 200
San Jose, CA 95110
NEC Laboratories America, Inc. (NEC Labs) is the US-based center for NEC Corporation’s global network of corporate research laboratories. Our diverse research groups collaborate with industry, academia and governments to provide disruptive solutions to complex problems. A leader in the integration of IT and network technologies with more than 100 years of expertise, NEC provides a combination of products and solutions that cross-utilize the company’s experience and global resources to meet the complex and ever-changing needs of its customers.
Read Our Blog Posts
- Meet the NEC Labs America Intern Helping to Make Autonomous Vehicles Safer and More Secure
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Industrial Labs to Drive Disruptive Innovation for the Fourth Industrial Revolution
- A New Hope: AI Research is Conquering Today’s Computer Vision Plateau
- NEC Labs America’s Time Series Data Research Drives Space Systems Innovation
- Next-Generation Computing Finally Sees Light
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Using AI To Safely Put The First Woman On The Moon
- Our AI Research Contributing to NASA’s Artemis Space Program
- NEC provides AI-based traffic monitoring system with fiber-optic sensing technology for NEXCO CENTRAL