Language

We advance the state of the art in natural language technologies and build systems that learn to understand and generate language in context.

Language

About the team

Our team comprises multiple research groups working on a wide range of natural language understanding and generation projects. We pursue long-term research to develop novel capabilities that can address the needs of current and future Google products. We publish frequently and evaluate our methods on established scientific benchmarks (e.g., SQuAD, GLUE, SuperGlue) or develop new ones for measuring progress (e.g., Conceptual Captions, Natural Questions, TyDiQA). We collaborate with other teams across Google to deploy our research to the benefit of our users. Our product contributions often stretch the boundaries of what is technically possible. Applications of our research have resulted in better language capabilities across all major Google products.

Our researchers are experts in natural language processing and machine learning with varied backgrounds and a passion for language. Computer scientists and linguists work hand-in-hand to provide insight into ways to define language tasks, collect valuable data, and assist in enabling internationalization. Researchers and engineers work together to develop new neural network models that are sensitive to the nuances of language while taking advantage of the latest advances in specialized compute hardware (e.g., TPUs) to produce scalable solutions that can be used by billions of users.

Team focus summaries

Language representations

Learn contextual language representations that capture meaning at various levels of granularity and are transferable across tasks.

Question answering

Learn end-to-end models for real world question answering that requires complex reasoning about concepts, entities, relations, and causality in the world.

Document understanding

Learn document representations from geometric features and spatial relations, multi-modal content features, syntactic, semantic and pragmatic signals.

Dialogue

Advance next generation dialogue systems in human-machine and multi-human-machine interactions to achieve natural user interactions and enrich conversations between human users.

Generation

Produce natural and fluent output for spoken and written text for different domains and styles.

Multilinguality

Learning high-quality models that scale to all languages and locales and are robust to multilingual inputs, transliterations, and regional variants.

Language & vision

Understand visual inputs (image & video) and express that understanding using fluent natural language (phrases, sentences, paragraphs).

Translation

Use state-of-the-art machine learning techniques and large-scale infrastructure to break language barriers and offer human quality translations across many languages to make it possible to easily explore the multilingual world.

Summarization

Learn to summarize single and multiple documents into cohesive and concise summaries that accurately represent the documents.

Classification

Learn end-to-end models that classify the semantics of text, such as topic, sentiment or sensitive content (i.e., offensive, inappropriate, or controversial content).

Speech and language algorithms

Represent, combine, and optimize models for speech to text and text to speech.

Entities, relations, and reasoning

Learn models that infer entities (people, places, things) from text and that can perform reasoning based on their relationships.

Grounded language understanding

Use and learn representations that span language and other modalities, such as vision, space and time, and adapt and use them for problems requiring language-conditioned action in real or simulated environments (i.e., vision-and-language navigation).

Semantic parsing

Learn models for predicting executable logical forms given text in varying domains and languages, situated within diverse task contexts.

Sentiment analysis

Learn models that can detect sentiment attribution and changes in narrative, conversation, and other text or spoken scenarios.

Trustworthiness

Learn models of language that are predictable and understandable, perform well across the broadest possible range of linguistic settings and applications, and adhere to our principles of responsible practices in AI.

Featured publications

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Kristina N. Toutanova

NAACL 2019 (2018)

Natural Questions: a Benchmark for Question Answering Research

Tom Kwiatkowski

Jennimaria Palomaki

Olivia Redfield

Michael Collins

Ankur Parikh

Chris Alberti

Danielle Epstein

Illia Polosukhin

Matthew Kelcey

Jacob Devlin

Kenton Lee

Kristina N. Toutanova

Llion Jones

Ming-Wei Chang

Andrew Dai

Jakob Uszkoreit

Quoc Le

Slav Petrov

Transactions of the Association of Computational Linguistics (2019) (to appear)

BERT Rediscovers the Classical NLP Pipeline

Ian Tenney

Dipanjan Das

Ellie Pavlick

Association for Computational Linguistics (2019) (to appear)

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

Piyush Sharma

Nan Ding

Sebastian Goodman

Radu Soricut

ACL (2018)

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

Christian Buck

Jannis Bulian

Massimiliano Ciaramita

Wojciech Paweł Gajewski

Andrea Gesmundo

Neil Houlsby

Wei Wang

Sixth International Conference on Learning Representations (2018)

Massively Multilingual Neural Machine Translation

Melvin Johnson

Orhan Firat

Roee Aharoni

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp. 3874-3884 (to appear)

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

Kellie Webster

Marta Recasens

Vera Axelrod

Jason Baldridge

Transactions of the Association for Computational Linguistics, vol. 6 (2018), pp. 605-618

Matching the Blanks: Distributional Similarity for Relation Learning

Livio Baldini Soares

Nicholas Arthur FitzGerald

Jeffrey Ling

Tom Kwiatkowski

ACL 2019 - The 57th Annual Meeting of the Association for Computational Linguistics (2019) (to appear)

Counterfactual Fairness in Text Classification through Robustness

Sahaj Garg

Vincent Perot

Nicole Limtiaco

Ankur Taly

Ed H. Chi

Alex Beutel

AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (2019)

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Naveen Ari

Colin Andrew Cherry

Wolfgang Macherey

Chung-Cheng Chiu

Semih Yavuz

Ruoming Pang

Wei Li

Colin Raffel

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Association for Computational Linguistics, Florence, Italy (2019), pp. 1313-1323