Policy Adaptation refers to the process of adjusting or fine-tuning a decision-making policy or reinforcement learning system to better suit a specific target environment or task. The policy is the set of rules or strategies that the system follows to make decisions based on input data. Adaptation may be necessary when the system needs to generalize its knowledge from a source domain (where it was trained) to a different target domain (where it will be deployed). Policy adaptation aims to enhance the model’s performance and effectiveness in the new context.


Learning Transferable Reward for Query Object Localization with Policy Adaptation

We propose a reinforcement learning-based approach to query object localization, for which an agent is trained to localize objects of interest specified by a small exemplary set. We learn a transferable reward signal formulated using the exemplary set by ordinal metric learning. Our proposed method enables test-time policy adaptation to new environments where the reward signals are not readily available and outperforms fine-tuning approaches that are limited to annotated images. In addition, the transferable reward allows repurposing the trained agent from one specific class to another class. Experiments on corrupted MNIST, CU-Birds, and COCO datasets demonstrate the effectiveness of our approach.