John Canny

John Canny is a Professor in computer science at UC Berkeley. He received his Ph.D. and Masters degrees from MIT in the 1980s. He is best known for having the same last name as a famous edge detector. His Ph.D. on roadmaps received the ACM Doctoral Dissertation Award, and helped garner a Foundations of Computer Science (FOCS) Machtey Award.

While he has worked on a prolific amount of things, some of his favorite past projects include: NP-completeness of the shortest path problem, a formula for efficiently reducing sparse polynomial systems to eigenvalue problems, a universal planar manipulator, collaborative filtering with privacy, and learning games on cell phones.

In the last few years he has been exploring roofline design for machine learning: establishing hardware limits from machines and networks, and designing algorithms to achieve them. BIDMach is a rooflined machine learning (and recently deep learning) toolkit with a focus on sparse data, and implementation on commodity hardware.

At Google he is working on improving inference and representations for deep learning. This has three threads right now:

  1. Deploying sparse allreduce and anytime updates on Tensorflow. This leverages Kylix which is a hierarchical, fault-tolerant allreduce protocol. Kylix can be run synchronously, but also supports graded, out-of-sequence model updates for lowest possible latency.
  2. MCMC updates for distributed model optimization. Most distributed deep learning systems try to tie the state of individual machines together. But a variety of evidence (the most relevant is Elastic Averaging SGD from Facebook) suggests that allowing parameter particles some autonomy for local search does better. This thread is based on a fast and simple Metropolis-Hastings test, and proposal distributions which combine exploration and exploitation.
  3. Sparse layer representations. For feature sets that follow power-law distributions, power-law shaped layers can better match model dof to observations. This leads to better performance on regression tasks, and is being evaluated on sequence-to-sequence models. An open question is whether power-law features have value deep inside the network. We are building them to find out.