AI

Under the hood of the Pixel 2: How AI is supercharging hardware

We all know the feeling: there’s an amazing song on the radio, and you’re frantic to make sure you can find it when you get home. In the past, you might have written down a few hasty lyrics to look it up later. But today, smarter technology on mobile phones makes it easier than ever to find the information you need, right when you need it.

The Now Playing feature on the Pixel 2 uses machine learning to recognize when music is playing and display the song and artist names on your lock screen, so that you no longer need to scramble to find the name of that song. The key is a miniaturized neural network that runs on a tiny chip in the Pixel 2. This system is trained to recognize the audio fingerprint of over 70,000 songs, and it’s updated weekly with the latest from Google Play Music. Most importantly, this audio recognition happens on the device — so that when a song comes on, your phone can compare just a few seconds of music to its internal database and quickly recall the information without sending information to the cloud. This means that Now Playing works fast, is private to you and keeps power consumption to a minimum.

Now Playing is a complex feature made simple by bringing the right hardware, software, and AI together in a single product. And that’s an approach that teams across Google are using to create smarter features, more helpful products, and entirely new paradigms for technology. It’s a unique moment where the combination of AI, software, and hardware can help us design more helpful and delightful experiences for users.

The making of a Pixel-perfect portrait

When we built the Pixel 2, we wanted to build features that help people focus on the things that matter—things like moments and memories. Now Playing is one of many features on Pixel 2 that is made possible by pairing AI with the right hardware and software. Another is Portrait Mode, a setting that allows people to take professional-looking, shallow depth-of-field images on a mobile phone without any manual editing. The result is a crisp, focused subject in the foreground set against a subtly blurred background—the perfect setting for portraits of your favorite people.

Historically, Portrait Mode-style images required an SLR camera with a large lens, a small aperture and a steady photographer to capture the subject in focus. But today, roughly 85% of all photos are taken on mobile, which offers an interesting set of challenges: a small lens, a fixed aperture and a photographer who might not be so steady. To recreate this effect, research and hardware teams at Google worked hand-in-hand to develop a Portrait Mode process that’s almost as striking as the photos it takes.

AI is central to creating the Portrait Mode effect once an image is captured. The Pixel 2 contains a specialized neural network that researchers trained on almost a million images to recognize what’s important in a photo. “Instead of just treating each pixel as a pixel, we try to understand what it is,” explains Portrait Mode lead, Yael Pritch Knaan. By using machine learning, the device can make predictions about what should stay sharp in the photo and create a mask around it.

To create the Portrait Mode effect, the area outside of that mask needs to be blurred. But to make that blur realistic, we need to know how far each object in the image is from the camera. This is where the hardware really shines. Many phones do this by placing two cameras next to each other, but the Pixel 2 team was determined to do it with one camera using a feature of the single camera called dense dual-pixel autofocus. Pixel Camera Product Manager, Isaac Reynolds, explains, “When we picked the hardware, we knew we were getting a sensor where every pixel is split into two sub-pixels. This architecture lets us take two pictures out of the same camera lens at the same time: one from the left side of the lens and one through the right. This tiny difference in perspective gives the camera depth perception just like your own two eyes, and it generates a depth map of objects in the image from that."

This allows the software to provide the finishing touch—a realistic blur. Using the depth map, the Portrait Mode software replaces each pixel in the image with beautifully blurry background known as bokeh. The result is a high quality image that rivals professional quality with just a quick tap. And as big as this breakthrough was in computational photography, it was an even bigger breakthrough for selfies. Now, the front-facing camera can capture that professional-quality shot from anywhere, with a quick point, pose, and shoot.

A new paradigm for learning

Machine learning is shaping more than just products and features on mobile devices. New devices like smartphones are also pushing us to change the way that we design machine learning. One example is Federated Learning, a new kind of machine learning approach that runs directly on mobile devices. Phones, tablets, and watches are as powerful as the supercomputers of decades past — but they also have limited battery and connectivity. Machine learning can make these devices smarter by learning from surroundings and user patterns, but they must also optimize for the constraints, including keeping users’ data private. That’s where Federated Learning comes in.

The idea behind Federated Learning is that everyone’s smartphones could get even smarter if we enabled them to learn together. Machine learning traditionally requires training data to be stored at a central location, like a datacenter. Federated Learning decentralizes the learning process, enabling devices to collaboratively learn improvements without sharing their data. It’s sort of like each device submits an anonymous survey, and Google produces an update that reflects everyone’s feedback.

Here’s how it works: a Federated Learning-based system starts with a central machine learning model, which is distributed to a fleet of devices (like the Pixel phones). Every device personalizes a model locally, learning from the interactions and patterns of its users, and packages its learnings into anonymized “updates.” Thousands of these summaries are securely averaged, so no individual information is disclosed and the central machine learning model gets better. This means everyone’s phone gets smarter without sharing any personal data.

Federated Learning works through collaborative learning, using de-identified, aggregated information from many devices to improve machine learning models.

We’re already using Federated Learning to improve several Google products. The Pixel first and second generation phones, for example, use Federated Learning to surface more accurate, useful settings search results so that people can find what that they’re looking for faster. The Pixel has thousands of settings to adjust, from font size and brightness to app preferences and battery use. Different settings apply to different people and use cases, so personalizing users’ experiences with machine learning can help people more easily find the one that they care about.

By using Federated Learning, the team replaced a hard-coded ranking system with a model that was trained on mobile phone usage. True to the Federated Learning model, each phone contributed improvements to the global model without sending any training data to Google’s servers. “Federated learning helps us improve your experience using Pixel while keeping data from your interaction with your phone private,” says Research Scientist Daniel Ramage.

Federated Learning complements the traditional method of centralized machine learning, helping create more useful experiences on your phone without data ever leaving the device. And that’s really the core vision of bringing hardware, software, and AI together. We think that AI has tremendous potential to free people up to focus on the things that matter, from phones that can understand your world to headphones that can handle translation and cameras that can work hands-free. These experiences are designed to be more helpful, more intuitive, and more effective, so that you can do more of the things you love.

Next Story

AI for everyone: inside TensorFlow, our open-source machine learning platform