Dynamic Causal Discovery in Imitation Learning

Publication Date: 12/14/2021

Event: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice – A NeurIPS 2021 Workshop

Reference: pp. 1-11, 2021

Authors: Tianxiang Zhao, Pennsylvania State University; Wenchao Yu, NEC Laboratories America, Inc.; Lu Wang, East China Normal University; Suhang Wang, Pennsylvania State University; Wei Cheng, NEC Laboratories America, Inc.; Xiang Zhang, Pennsylvania State University; Yuncong Chen, NEC Laboratories America, Inc.; Xuchao Zhang, NEC Laboratories America, Inc.; Haifeng Chen, NEC Laboratories America, Inc.

Abstract: Using deep reinforcement learning (DRL) to recover expert policies via imitation has been found to be promising in a wide range of applications. However, it remains a difficult task to interpret the control policy learned by the agent. Difficulties mainly come from two aspects: 1) agents in DRL are usually implemented as deep neural networks (DNNs), which are black-box models and lack in interpretability, 2) the latent causal mechanism behind agents’ decisions may vary along the trajectory, rather than staying static throughout time steps. To address these difficulties, in this paper, we propose a self-explaining imitation framework, which can expose causal relations among states and action variables behind its decisions. Specifically, a dynamic causal discovery module is designed to extract the causal graph basing on historical trajectory and current states at each time step, and a causality encoding module is designed to model the interactions among variables with discovered causal edges. After encoding causality into variable embeddings, a prediction model conducts the imitation learning on top of obtained representations. These three components are trained end-to-end, and discovered causal edges can provide interpretations on rules captured by the agent. Comprehensive experiments are conducted on the simulation dataset to analyze its causal discovery capacity, and we further test it on a real-world medical dataset MIMIC-IV. Experimental results demonstrate its potential of providing explanations behind decisions.

Publication Link: https://neurips.cc/virtual/2021/33878