Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning


As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), this paper introduces a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new analogy-making objective which encourages learning correspondences between similar subtasks using neural networks. For generalization over sequential instructions, we present a hierarchical deep RL architecture where a meta controller learns to use the acquired skills while executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more stable. Experimental results on a stochastic 3D visual domain show that analogy-making can be successfully applied to various generalization scenarios, and our hierarchical architecture generalizes well to longer instructions as well as unseen instructions.