Interactive reinforcement learning for task-oriented dialogue management


Dialogue management is the component of a dialogue system that determines the optimal action for the system to take at each turn. An important consideration for dialogue managers is the ability to adapt to new user behaviors unseen during training. In this paper, we investigate policy gradient based methods for interactive reinforcement learning where the agent receives action-specific feedback from the user and incorporates this feedback into its policy. We show that using the feedback to directly shape the policy enables a dialogue manager to learn new interactions faster compared to interpreting the feedback as a reward value.