Jump to Content

Dale Schuurmans

Prof. Schuurmans has had a diverse career in ML research, with over 160 refereed publications, and paper awards from the International Conference on Machine Learning (ICML), the International Joint Conference on Artificial Intelligence (IJCAI), and the Conference of the Association for Artificial Intelligence (AAAI). He currently serves as associate editor in chief for the IEEE Transaction on Pattern Analysis and Machine Intelligence (IEEE TPAMI) and as an associate editor for the Journal of Artificial Intelligence Research (JAIR). Previously he served as an associate editor for the Journal of Machine Learning Research (JMLR), Machine Learning (MLJ), and Artificial Intelligence (AIJ), and as a program co-chair for ICML-2004, NIPS-2008 and AAAI-2016.

At Google, he is working on a number of research projects with the Brain and Sibyl teams. One project has been investigating a reduction of supervised deep learning to game playing, which has not only revealed surprising connections, but has also led to the development of new training methods. After establishing a bijection between the Nash equilibria of a certain game and KKT points of the deep learning problem, an interesting finding has been that "regret matching", a classical on-line learning algorithm from the game theory literature, achieves competitive training performance while producing sparser models than current deep learning algorithms.

Another project has been investigating a new approach to structured output learning that exploits a simple connection between forward and inverse reinforcement learning. The key observation is that these two problems can be expressed as minimizing an identical Bregman divergence but in opposite directions. The connection is not merely theoretical: it allows one to draw a precise relation between tempered log-likelihood and regularized expected reward, revealing that their difference involves a simple variance term. This observation suggests new approaches for exploiting supervised data to efficiently estimate the expected (regularized) reward of a given policy. Experiments on speech and translation data are already showing improvements in test evaluations.