Device Placement Optimization with Reinforcement Learning


The past few years have seen much success in applying neural networks to many practical problems. Together with this success is the growth in size and computational requirements for training and inference with neural networks. A common approach to address these requirements is to use a heterogeneous distributed environment with a mix of hardware devices such as CPUs, and GPUs. Importantly, the decision of placing parts of the neural models on devices is most often made by a human expert relying on heuristic approaches. In this paper, we propose a method which learns to optimize device placement. Key to our method is the employment of a recurrent neural network to predict a set of device placements for a target neural computation graph. The execution time according to the predicted placements is then used as the reward function to optimize the parameters of the recurrent neural network. Our main result is that on Inception for ImageNet classification, and on LSTM, for language modeling and neural translation, our model finds non-trivial device placements that significantly outperform handcrafted heuristics and traditional algorithmic methods.