Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks


Gated recurrent neural networks have becoming the default for modeling sequence data in various domains. The underlying mechanism enables such remarkable performance is however not well understood. We aim at de-mystifying the difference in trainability of vanilla and gated RNNs. We introduce a new gated variant of RNNs, the minimal recurrent neural network (minimalRNN). Its simplistic update enables us to analyze the signal propagation using mean field theory and random matrix theory. We develop a closed-form critical initialization scheme that achieves dynamical isometry in both vanilla and minimal RNN, which results in significant improvement in training RNNs. In contrast to the narrow region of good random initialization in vanillaRNN, minimalRNN enjoys a much broader range of good initialization (some easily achievable by adapting the bias term only), which explains the better trainability of gated RNNs. We demonstrate that minimalRNN achieves comparable performance to its more complex counterpart, such as LSTMs or GRUs on language modeling task.