Algorithms and Theory
Google’s mission presents many exciting algorithmic and optimization challenges across different product areas including Search, Ads, Social, and Google Infrastructure. These include optimizing internal systems such as scheduling the machines that power the numerous computations done each day, as well as optimizations that affect core products and users, from online allocation of ads to page-views to automatic management of ad campaigns, and from clustering large-scale graphs to finding best paths in transportation networks. Other than employing new algorithmic ideas to impact millions of users, Google researchers contribute to the state-of-the-art research in these areas by publishing in top conferences and journals.
Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets, Docs, and many others). The overarching goal is to create a plethora of structured data on the Web that maximally help Google users consume, interact and explore information. Through those projects, we study various cutting-edge data management research issues including information extraction and integration, large scale data analysis, effective data exploration, etc., using a variety of techniques, such as information retrieval, data mining and machine learning.
A major research effort involves the management of structured data within the enterprise. The goal is to discover, index, monitor, and organize this type of data in order to make it easier to access high-quality datasets. This type of data carries different, and often richer, semantics than structured data on the Web, which in turn raises new opportunities and technical challenges in their management.
Furthermore, Data Management research across Google allows us to build technologies that power Google's largest businesses through scalable, reliable, fast, and general-purpose infrastructure for large-scale data processing as a service. Some examples of such technologies include F1, the database serving our ads infrastructure; Mesa, a petabyte-scale analytic data warehousing system; and Dremel, for petabyte-scale data processing with interactive response times. Dremel is available for external customers to use as part of Google Cloud’s BigQuery.
Data Mining and Modeling
The proliferation of machine learning means that learned classifiers lie at the core of many products across Google. However, questions in practice are rarely so clean as to just to use an out-of-the-box algorithm. A big challenge is in developing metrics, designing experimental methodologies, and modeling the space to create parsimonious representations that capture the fundamentals of the problem. These problems cut across Google’s products and services, from designing experiments for testing new auction algorithms to developing automated metrics to measure the quality of a road map.
Data mining lies at the heart of many of these questions, and the research done at Google is at the forefront of the field. Whether it is finding more efficient algorithms for working with massive data sets, developing privacy-preserving methods for classification, or designing new machine learning approaches, our group continues to push the boundary of what is possible.
Distributed Systems and Parallel Computing
No matter how powerful individual computers become, there are still reasons to harness the power of multiple computational units, often spread across large geographic areas. Sometimes this is motivated by the need to collect data from widely dispersed locations (e.g., web pages from servers, or sensors for weather or traffic). Other times it is motivated by the need to perform enormous computations that simply cannot be done by a single CPU.
From our company’s beginning, Google has had to deal with both issues in our pursuit of organizing the world’s information and making it universally accessible and useful. We continue to face many exciting distributed systems and parallel computing challenges in areas such as concurrency control, fault tolerance, algorithmic efficiency, and communication. Some of our research involves answering fundamental theoretical questions, while other researchers and engineers are engaged in the construction of systems to operate at the largest possible scale, thanks to our hybrid research model.
Economics and Electronic Commerce
Google is a global leader in electronic commerce. Not surprisingly, it devotes considerable attention to research in this area. Topics include 1) auction design, 2) advertising effectiveness, 3) statistical methods, 4) forecasting and prediction, 5) survey research, 6) policy analysis and a host of other topics. This research involves interdisciplinary collaboration among computer scientists, economists, statisticians, and analytic marketing researchers both at Google and academic institutions around the world.
A major challenge is in solving these problems at very large scales. For example, the advertising market has billions of transactions daily, spread across millions of advertisers. It presents a unique opportunity to test and refine economic principles as applied to a very large number of interacting, self-interested parties with a myriad of objectives.
It is remarkable how some of the fundamental problems Google grapples with are also some of the hardest research problems in the academic community. At Google, this research translates direction into practice, influencing how production systems are designed and used.
Our Education Innovation research area includes publications on: online learning at scale, educational technology (which is any technology that supports teaching and learning), curriculum and programming tools for computer science education, diversity and broadening participation in computer science the hiring and onboarding process at Google.
Google's highest leverage is in transforming scientific research itself. Many scientific endeavors can benefit from large scale experimentation, data gathering, and machine learning (including deep learning). We aim to accelerate scientific research by applying Google’s computational power and techniques in areas such as drug discovery, biological pathway modeling, microscopy, medical diagnostics, material science, and agriculture. We collaborate closely with world-class research partners to help solve important problems with large scientific or humanitarian benefit.
Hardware and Architecture
The machinery that powers many of our interactions today — Web search, social networking, email, online video, shopping, game playing — is made of the smallest and the most massive computers. The smallest part is your smartphone, a machine that is over ten times faster than the iconic Cray-1 supercomputer. The capabilities of these remarkable mobile devices are amplified by orders of magnitude through their connection to Web services running on building-sized computing systems that we call Warehouse-scale computers (WSCs).
Google’s engineers and researchers have been pioneering both WSC and mobile hardware technology with the goal of providing Google programmers and our Cloud developers with a unique computing infrastructure in terms of scale, cost-efficiency, energy-efficiency, resiliency and speed. The tight collaboration among software, hardware, mechanical, electrical, environmental, thermal and civil engineers result in some of the most impressive and efficient computers in the world.
Human-Computer Interaction and Visualization
HCI researchers at Google have enormous potential to impact the experience of Google users as well as conduct innovative research. Grounded in user behavior understanding and real use, Google’s HCI researchers invent, design, build and trial large-scale interactive systems in the real world. We declare success only when we positively impact our users and user communities, often through new and improved Google products. HCI research has fundamentally contributed to the design of Search, Gmail, Docs, Maps, Chrome, Android, YouTube, serving over a billion daily users. We are engaged in a variety of HCI disciplines such as predictive and intelligent user interface technologies and software, mobile and ubiquitous computing, social and collaborative computing, interactive visualization and visual analytics. Many projects heavily incorporate machine learning with HCI, and current projects include predictive user interfaces; recommenders for content, apps, and activities; smart input and prediction of text on mobile devices; user engagement analytics; user interface development tools; and interactive visualization of complex data.
Information Retrieval and the Web
The science surrounding search engines is commonly referred to as information retrieval, in which algorithmic principles are developed to match user interests to the best information about those interests.
Google started as a result of our founders' attempt to find the best matching between the user queries and Web documents, and do it really fast. During the process, they uncovered a few basic principles: 1) best pages tend to be those linked to the most; 2) best description of a page is often derived from the anchor text associated with the links to a page. Theories were developed to exploit these principles to optimize the task of retrieving the best documents for a user query.
Search and Information Retrieval on the Web has advanced significantly from those early days: 1) the notion of "information" has greatly expanded from documents to much richer representations such as images, videos, etc., 2) users are increasingly searching on their Mobile devices with very different interaction characteristics from search on the Desktops; 3) users are increasingly looking for direct information, such as answers to a question, or seeking to complete tasks, such as appointment booking. Through our research, we are continuing to enhance and refine the world's foremost search engine by aiming to scientifically understand the implications of those changes and address new challenges that they bring.
Google is at the forefront of innovation in Machine Intelligence, with active research exploring virtually all aspects of machine learning, including deep learning and more classical algorithms. Exploring theory as well as application, much of our work on language, speech, translation, visual processing, ranking and prediction relies on Machine Intelligence. In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, applying learning algorithms to understand and generalize.
Machine Intelligence at Google raises deep scientific and engineering challenges, allowing us to contribute to the broader academic research community through technical talks and publications in major conferences and journals. Contrary to much of current theory and practice, the statistics of the data we observe shifts rapidly, the features of interest change as well, and the volume of data often requires enormous computation capacity. When learning systems are placed at the core of interactive services in a fast changing and sometimes adversarial environment, combinations of techniques including deep learning and statistical models need to be combined with ideas from control and game theory.
Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.
Machine Translation is an excellent example of how cutting-edge research and world-class infrastructure come together at Google. We focus our research efforts on developing statistical translation techniques that improve with more data and generalize well to new languages. Our large scale computing infrastructure allows us to rapidly experiment with new models trained on web-scale data to significantly improve translation quality. This research backs the translations served at translate.google.com, allowing our users to translate text, web pages and even speech. Deployed within a wide range of Google services like GMail, Books, Android and web search, Google Translate is a high-impact, research-driven product that bridges language barriers and makes it possible to explore the multilingual web in 90 languages. Exciting research challenges abound as we pursue human quality translation and develop machine translation systems for new languages.
Mobile devices are the prevalent computing device in many parts of the world, and over the coming years it is expected that mobile Internet usage will outpace desktop usage worldwide. Google is committed to realizing the potential of the mobile web to transform how people interact with computing technology. Google engineers and researchers work on a wide range of problems in mobile computing and networking, including new operating systems and programming platforms (such as Android and ChromeOS); new interaction paradigms between people and devices; advanced wireless communications; and optimizing the web for mobile settings. In addition, many of Google’s core product teams, such as Search, Gmail, and Maps, have groups focused on optimizing the mobile experience, making it faster and more seamless. We take a cross-layer approach to research in mobile systems and networking, cutting across applications, networks, operating systems, and hardware. The tremendous scale of Google’s products and the Android and Chrome platforms make this a very exciting place to work on these problems.
Some representative projects include mobile web performance optimization, new features in Android to greatly reduce network data usage and energy consumption; new platforms for developing high performance web applications on mobile devices; wireless communication protocols that will yield vastly greater performance over today’s standards; and multi-device interaction based on Android, which is now available on a wide variety of consumer electronics.
Natural Language Processing
Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.
Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems. We are particularly interested in algorithms that scale well and can be run efficiently in a highly distributed environment.
Our syntactic systems predict part-of-speech tags for each word in a given sentence, as well as morphological features such as gender and number. They also label relationships between words, such as subject, object, modification, and others. We focus on efficient algorithms that leverage large amounts of unlabeled data, and recently have incorporated neural net technology.
On the semantic side, we identify entities in free text, label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (coreference resolution), and resolve the entities to the Knowledge Graph.
Recent work has focused on incorporating multiple sources of knowledge and information to aid with analysis of text, as well as applying frame semantics at the noun phrase, sentence, and document level.
Networking is central to modern computing, from connecting cell phones to massive Cloud-based data stores to the interconnect for data centers that deliver seamless storage and fine-grained distributed computing at the scale of entire buildings. With an understanding that our distributed computing infrastructure is a key differentiator for the company, Google has long focused on building network infrastructure to support our scale, availability, and performance needs.
Our research combines building and deploying novel networking systems at massive scale, with recent work focusing on fundamental questions around data center architecture, wide area network interconnects, Software Defined Networking control and management infrastructure, as well as congestion control and bandwidth allocation. By publishing our findings at premier research venues, we continue to engage both academic and industrial partners to further the state of the art in networked systems.
Quantum Computing merges two great scientific revolutions of the 20th century: computer science and quantum physics. Quantum physics is the theoretical basis of the transistor, the laser, and other technologies which enabled the computing revolution. But on the algorithmic level, today's computing machinery still operates on "classical" Boolean logic. Quantum computing is the design of hardware and software that replaces Boolean logic by quantum law at the algorithmic level. For certain computations such as optimization, sampling, search or quantum simulation this promises dramatic speedups. We are particularly interested in applying quantum computing to artificial intelligence and machine learning. This is because many tasks in these areas rely on solving hard optimization problems or performing efficient sampling.
Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today.
Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.
Security, Privacy and Abuse Prevention
The Internet and the World Wide Web have brought many changes that provide huge benefits, in particular by giving people easy access to information that was previously unavailable, or simply hard to find. Unfortunately, these changes have raised many new challenges in the security of computer systems and the protection of information against unauthorized access and abusive usage. At Google, our primary focus is the user, and his/her safety. We have people working on nearly every aspect of security, privacy, and anti-abuse including access control and information security, networking, operating systems, language design, cryptography, fraud detection and prevention, spam and abuse detection, denial of service, anonymity, privacy-preserving systems, disclosure controls, as well as user interfaces and other human-centered aspects of security and privacy. Our security and privacy efforts cover a broad range of systems including mobile, cloud, distributed, sensors and embedded systems, and large-scale machine learning.
Delivering Google's products to our users requires computer systems that have a scale previously unknown to the industry. Building on our hardware foundation, we develop technology across the entire systems stack, from operating system device drivers all the way up to multi-site software systems that run on hundreds of thousands of computers. We design, build and operate warehouse-scale computer systems that are deployed across the globe. We build storage systems that scale to exabytes, approach the performance of RAM, and never lose a byte. We design algorithms that transform our understanding of what is possible. Thanks to the distributed systems we provide our developers, they are some of the most productive in the industry. And we write and publish research papers to share what we have learned, and because peer feedback and interaction helps us build better systems that benefit everybody.
Our goal in Speech Technology Research is to make speaking to devices--those around you, those that you wear, and those that you carry with you--ubiquitous and seamless.
Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.
We are also in a unique position to deliver very user-centric research. Researchers are able to conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we focus on solving real problems and with real impact for users.
We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages, and we continue to expand our reach to more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach have never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.
Indexing and transcribing the web’s audio content is another challenge we have set for ourselves, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and, of course, cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The potential payoff is immense: imagine making every lecture on the web accessible to every language. This is the kind of impact for which we are striving.