Slav Petrov
Slav Petrov is a Distinguished Scientist / Senior Research Director at Google leading a globally distributed team that conducts natural language understanding and machine learning research. His work has been recognized with multiple Best Paper Awards (ACL'11, NAACL'12, ACL'16) and provides better language understanding to billions of users in a variety of Google products spanning Web Search, Assistant, Ads, Translate & Chrome. Slav is the recipient of the 2014 John Atanasoff Award by the President of Bulgaria and a World Champion at RoboCup 2004. For many years, Slav taught Statistical Natural Language Processing at New York University. He holds a PhD from the University of California at Berkeley.
Slav has spent roughly equal parts of his life in Bulgaria, Germany and the US. Whenever Bulgaria plays Germany in soccer, he supports Bulgaria.
See also my personal webpage for more information (including presentation slides).
Slav has spent roughly equal parts of his life in Bulgaria, Germany and the US. Whenever Bulgaria plays Germany in soccer, he supports Bulgaria.
See also my personal webpage for more information (including presentation slides).
Authored Publications
Google Publications
Other Publications
Sort By
Measuring Attribution in Natural Language Generation Models
Iulia Turc
Computational Linguistics, vol. 49 (2023), pp. 777-840
Preview abstract
With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies.
View details
PaLM: Scaling Language Modeling with Pathways
Sharan Narang
Jacob Devlin
Maarten Bosma
Hyung Won Chung
Sebastian Gehrmann
Parker Schuh
Sasha Tsvyashchenko
Abhishek Rao
Yi Tay
Noam Shazeer
Nan Du
Reiner Pope
James Bradbury
Guy Gur-Ari
Toju Duke
Henryk Michalewski
Xavier Garcia
Liam Fedus
David Luan
Barret Zoph
Ryan Sepassi
David Dohan
Shivani Agrawal
Mark Omernick
Marie Pellat
Aitor Lewkowycz
Erica Moreira
Rewon Child
Oleksandr Polozov
Zongwei Zhou
Michele Catasta
Jason Wei
arxiv:2204.02311 (2022)
Preview abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
View details
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Pat Verga
Jianmo Ni
arXiv (2022)
Preview abstract
Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?).
View details
Measuring and Reducing Gendered Correlations in Pre-trained Models
Alex Beutel
Emily Pitler
arXiv (2020)
Preview abstract
Large pre-trained models have revolutionized natural language understanding.
However, researchers have found they can encode correlations undesired in many applications, like \emph{surgeon} being associated more with \emph{he} than \emph{she}.
We explore such \emph{gendered correlations} as a case study, to learn how we can configure and train models to mitigate the risk of encoding unintended associations.
We find that it is important to define correlation metrics, since they can reveal differences among models with similar accuracy.
Large models have more capacity to encode gendered correlations, but this can be mitigated with general dropout regularization.
Counterfactual data augmentation is also effective, and can even reduce correlations not explicitly targeted for mitigation, potentially making it useful beyond gender too.
Both techniques yield models with comparable accuracy to unmitigated analogues, and still resist re-learning correlations in fine-tuning.
View details
Natural Questions: a Benchmark for Question Answering Research
Olivia Redfield
Danielle Epstein
Illia Polosukhin
Matthew Kelcey
Jacob Devlin
Llion Jones
Ming-Wei Chang
Jakob Uszkoreit
Transactions of the Association of Computational Linguistics (2019) (to appear)
Preview abstract
We present the Natural Questions corpus, a question answering dataset. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.
View details
Natural Language Processing with Small Feed-Forward Networks
Jan A. Botha
Emily Pitler
Anton Bakalov
Alex Salcianu
Ryan Mcdonald
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, 2879–2885
Preview abstract
We show that small and shallow feedforward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.
View details
Universal Semantic Parsing
Preview
Siva Reddy
Oscar Tackstrom
Mark Steedman
Mirella Lapata
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Preview abstract
The aim of this document is to provide a list of dependency tags that are to be used for the Arabic dependency annotation task, with examples provided for each tag. The dependency representation is a simple description of the grammatical relationships in a sentence. It represents all sentence relations uniformly typed as dependency relations. The dependencies are all binary relations between a governor (also known the head) and a dependant (any complement of or modifier to the head).
View details
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Preview
Daniel Zeman
Martin Popel
Milan Straka
Jan Hajic
Joakim Nivre
Filip Ginter
Juhani Luotolahti
Sampo Pyysalo
Martin Potthast
Francis Tyers
Elena Badmaeva
Memduh Gokirmak
Anna Nedoluzhko
Silvie Cinkova
Jan Hajic jr.
Jaroslava Hlavacova
Václava Kettnerová
Zdenka Uresova
Jenna Kanerva
Stina Ojala
Anna Missilä
Christopher D. Manning
Sebastian Schuster
Siva Reddy
Dima Taji
Nizar Habash
Herman Leung
Marie-Catherine de Marneffe
Manuela Sanguinetti
Maria Simi
Hiroshi Kanayama
Valeria de Paiva
Kira Droganova
Héctor Martínez Alonso
Çagrı Çöltekin
Umut Sulubacak
Hans Uszkoreit
Vivien Macketanz
Aljoscha Burchardt
Kim Harris
Katrin Marheinecke
Georg Rehm
Tolga Kayadelen
Ali Elkahky
Zhuoran Yu
Emily Pitler
Saran Lertpradit
Michael Mandl
Jesse Kirchner
Hector Fernandez Alcalde
Esha Banerjee
Antonio Stella
Atsuko Shimada
Sookyoung Kwak
Gustavo Mendonca
Tatiana Lando
Rattima Nitisaroj
Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Globally Normalized Transition-Based Neural Networks
Association for Computational Linguistics (2016)
Preview abstract
We introduce a globally normalized transition-based neural network
model that achieves state-of-the-art part-of-speech tagging,
dependency parsing and sentence compression results. Our model is a
simple feed-forward neural network that operates on a task-specific
transition system, yet achieves comparable or better accuracies than
recurrent models.
We discuss the importance of global as opposed to local normalization:
a key insight is that the label bias problem implies that
globally
normalized models can be strictly more expressive
than locally normalized models.
View details
Universal Dependencies v1: A Multilingual Treebank Collection
Preview
Joakim Nivre
Marie-Catherine de Marneffe
Filip Ginter
Yoav Goldberg
Jan Hajic
Christopher D. Manning
Ryan McDonald
Sampo Pyysalo
Natalia Silveira
Reut Tsarfaty
Daniel Zeman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Structured Training for Neural Network Transition-Based Parsing
Preview
Proceedings of the 53th Annual Meeting of the Association for Computational Linguistics (ACL '15) (2015)
Improved Transition-Based Parsing and Tagging with Neural Networks
Preview
Greg Coppola
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15)
Preview abstract
Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.
View details
Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram Viewer
Preview
Jason Mann
David Zhang
Lu Yang
Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics (2014)
Learning Compact Lexicons for CCG Semantic Parsing
Preview
Yoav Artzi
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP '14)
Source-Side Classifier Preordering for Machine Translation
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP '13) (2013)
Preview abstract
We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.
View details
Universal Dependency Annotation for Multilingual Parsing
Preview
Ryan McDonald
Joakim Nivre
Yoav Goldberg
Yvonne Quirmbach-Brundage
Keith Hall
Oscar Tackstrom
Claudia Bedini
Nuria Bertomeu Castello
Jungmee Lee
Association for Computational Linguistics, Association for Computational Linguistics (2013)
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
Preview
Oscar Tackstrom
Ryan McDonald
Joakim Nivre
Transactions of the Association for Computational Linguistics (2013), 1–-12
Using Search-Logs to Improve Query Tagging
Preview
Keith B. Hall
Ryan McDonald
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers (ACL '12) (2012)
Syntactic Annotations for the Google Books Ngram Corpus
Preview
Yuri Lin
Jean-Baptiste Michel
Erez Lieberman Aiden
William Brockman
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 2: Demo Papers (ACL '12) (2012)
Google's Hybrid Approach to Research
Alfred Spector
Communications of the ACM, vol. 55 Issue 7 (2012), pp. 34-37
Preview abstract
In this viewpoint, we describe how we organize computer science research at Google. We focus on how we integrate research and development and discuss the benefits and risks of our approach.
View details
Vine Pruning for Efficient Multi-Pass Dependency Parsing
Alexander Rush
The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL '12), Best Paper Award
Preview abstract
Coarse-to-fine inference has been shown to be a robust approximate method for
improving the efficiency of structured prediction models while preserving their
accuracy. We propose a multi-pass coarse-to-fine architecture for dependency
parsing using linear-time vine pruning and structured prediction cascades.
Our first-, second-, and third-order models achieve accuracies comparable to
those of their unpruned counterparts, while exploring only a fraction of the
search space. We observe speed-ups of up to two orders of magnitude compared
to exhaustive search. Our pruned third-order model is twice as fast as an
unpruned first-order model and also compares favorably to a state-of-the-art
transition-based parser for multiple languages.
View details
A Universal Part-of-Speech Tagset
Preview
Ryan McDonald
Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC '12) (2012)
Overview of the 2012 Shared Task on Parsing the Web
Preview
Ryan McDonald
Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) (2012)
Multi-Source Transfer of Delexicalized Dependency Parsers
Preview
Ryan McDonald
Keith B. Hall
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11)
Efficient Parallel CKY Parsing on GPUs
Preview
Youngmin Yi
Chao-Yue Lai
Kurt Keutzer
Proceedings of the International Conference on Parsing Technologies (IWPT '11) (2011)
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11) (2011), Best Paper Award
Preview abstract
We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable for a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as constraints in an unsupervised model. Across six European languages, our approach results in an average absolute improvement of 9.7\% over the state-of-the-art baseline, and 17.0\% over vanilla hidden Markov models induced with EM.
View details
Training a Parser for Machine Translation Reordering
Jason Katz-Brown
Ryan McDonald
Franz Och
David Talbot
Hiroshi Ichikawa
Masakazu Seno
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11)
Preview abstract
We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.
View details
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
Preview
Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
Uptraining for Accurate Deterministic Question Parsing
Preview
Michael Ringgaard
Hiyan Alshawi
Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
Learning Better Monolingual Models with Unannotated Bilingual Text
Preview
David Burkett
Dan Klein
Fourteenth Conference on Computational Natural Language Learning (CoNLL '10) (2010)
Self-training with Products of Latent Variable Grammars
Preview
Zhongqiang Huang
Mary Harper
Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
Products of Random Latent Variable Grammars
Preview
Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL/HLT '10) (2010)
Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs
Preview
Alexandre Bouchard-Côté
Dan Klein
Advances in Neural Information Processing Systems 22 (NIPS '09) (2009)
Generative and Discriminative Latent Variable Grammars
Preview
The Generative and Discriminative Learning Interface Workshop at NIPS 2009
Coarse-to-Fine Natural Language Processing
Ph.D. Thesis, University of California at Berkeley (2009)
Coarse-to-Fine Syntactic Machine Translation using Language Projections
Aria Haghighi
Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Honolulu, Hawaii, pp. 108-116
Efficient Sentence Segmentation Using Syntactic Features
Benoit Favre
Dilek Hakkani-Tür
Dan Klein
Spoken Language Technologies (SLT), Goa, India (2008)
Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Honolulu, Hawaii, pp. 867-876
Discriminative Log-Linear Grammars with Latent Variables
Dan Klein
Advances in Neural Information Processing Systems 20 (NIPS), MIT Press, Cambridge, MA (2008), pp. 1153-1160
Parsing German with Latent Variable Grammars
Dan Klein
Proceedings of the Workshop on Parsing German at ACL '08, Association for Computational Linguistics, Columbus, Ohio (2008), pp. 33-39
Improved Inference for Unlexicalized Parsing
Dan Klein
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Association for Computational Linguistics, Rochester, New York, pp. 404-411
Learning and Inference for Hierarchically Split PCFGs
The Infinite PCFG Using Hierarchical Dirichlet Processes
Percy Liang
Michael Jordan
Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 688-697
Learning Structured Models for Phone Recognition
Adam Pauls
Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 897-905
Learning Accurate, Compact, and Interpretable Tree Annotation
Leon Barrett
Romain Thibaux
Dan Klein
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (ACL/COLING), Association for Computational Linguistics, Sydney, Australia (2006), pp. 433-440
Detecting Categories in News Video using Acoustic, Speech and Image Features
Arlo Faria
Pascal Michaillat
Alexander Berg
Andreas Stolcke
Dan Klein
Jitendra Malik
Proceedings of (VIDEO) TREC (TrecVid 2006)
Non-Local Modeling with a Mixture of PCFGs
Leon Barrett
Dan Klein
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), Association for Computational Linguistics, New York City (2006), pp. 14-20
3D Tracking = Classification + Interpolation
Carlo Tomasi
Arvind Sastry
Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV) (2003)