AI ACROSS GOOGLE:

PaLM 2

PaLM 2 is our next generation language model with improved multilingual, reasoning and coding capabilities that builds on Google’s legacy of breakthrough research in machine learning and responsible AI.

It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements.

PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. All versions of PaLM 2 are evaluated rigorously for potential harms and biases, capabilities and downstream uses in research and in-product applications. PaLM 2 is used in other state-of-the-art models, like Sec-PaLM. We continue to implement the latest versions of PaLM 2 in generative AI tools like the PaLM API.

What PaLM 2 can do

Reasoning

PaLM 2 can decompose a complex task into simpler subtasks and is better at understanding nuances of the human language than previous LLMs, like PaLM. For example, PaLM 2 excels at understanding riddles and idioms, which requires understanding ambiguous and figurative meaning of words, rather than the literal meaning.
Multilingual translation

PaLM 2 was pre-trained on parallel multilingual text and on a much larger corpus of different languages than its predecessor, PaLM. This makes PaLM 2 excel at multilingual tasks.
Coding

PaLM 2 was pre-trained on a large quantity of webpage, source code and other datasets. This means that it excels at popular programming languages like Python and JavaScript, but is also capable of generating specialized code in languages like Prolog, Fortran, and Verilog. Combining this with its language capabilities can help teams collaborate across languages.

How PaLM 2 was built and evaluated

Building PaLM 2

PaLM 2 excels at tasks like advanced reasoning, translation, and code generation because of how it was built. It improves upon its predecessor, PaLM, by unifying three distinct research advancements in large language models:

Use of compute-optimal scaling: The basic idea of compute-optimal scaling is to scale the model size and the training dataset size in proportion to each other. This new technique makes PaLM 2 smaller than PaLM, but more efficient with overall better performance, including faster inference, fewer parameters to serve, and a lower serving cost.

Improved dataset mixture: Previous LLMs, like PaLM, used pre-training datasets that were mostly English-only text. PaLM 2 improves on its corpus with a more multilingual and diverse pre-training mixture, which includes hundreds of human and programming languages, mathematical equations, scientific papers, and web pages.

Updated model architecture and objective: PaLM 2 has an improved architecture. PaLM 2 and its latest version were trained on a variety of different tasks, all of which helps PaLM 2 learn different aspects of language.

Evaluating PaLM 2

PaLM 2 achieves state of the art results on reasoning benchmark tasks. For example, the May 2023 version of PaLM 2 was evaluated on tasks such as WinoGrande and BigBench-Hard and on benchmarks such as XSum, WikiLingua, and XLSum. On the latter, it significantly achieved better multilingual results than our previous large language model, PaLM, and improved translation capability over PaLM and Google Translate in languages like Portuguese and Chinese.

PaLM 2, and its ongoing version updates, continue to follow our responsible AI development practices and commitment to safety.

Pre-training Data: We apply our Responsible AI Practices, filter duplicate documents to reduce memorization, and have shared analysis of how people are represented in pre-training data.

New Capabilities: PaLM 2 demonstrates improved multilingual toxicity classification capabilities, and has built-in control over toxic generation.

Evaluations: We evaluate potential harms and bias across a range of potential downstream uses for PaLM 2 and its version updates, including dialog, classification, translation, and question answering. This includes developing new evaluations for measuring potential harms in generative question-answering settings and dialog settings related to toxic language harms and social bias related to identity terms.