PaLM 2

PaLM 2 is our next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI.

It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements.

PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. It was evaluated rigorously for its potential harms and biases, capabilities and downstream uses in research and in-product applications. It’s being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API.


What PaLM 2 can do

  • Reasoning

    PaLM 2 can decompose a complex task into simpler subtasks and is better at understanding nuances of the human language than previous LLMs, like PaLM. For example, PaLM 2 excels at understanding riddles and idioms, which requires understanding ambiguous and figurative meaning of words, rather than the literal meaning.

  • Multilingual translation

    PaLM 2 was pre-trained on parallel multilingual text and on a much larger corpus of different languages than its predecessor, PaLM. This makes PaLM 2 excel at multilingual tasks.

  • Coding

    PaLM 2 was pre-trained on a large quantity of webpage, source code and other datasets. This means that it excels at popular programming languages like Python and JavaScript, but is also capable of generating specialized code in languages like Prolog, Fortran, and Verilog. Combining this with its language capabilities can help teams collaborate across languages.

How PaLM 2 was built and evaluated

Building PaLM 2

PaLM 2 excels at tasks like advanced reasoning, translation, and code generation because of how it was built. It improves upon its predecessor, PaLM, by unifying three distinct research advancements in large language models:

  • Use of compute-optimal scaling: The basic idea of compute-optimal scaling is to scale the model size and the training dataset size in proportion to each other. This new technique makes PaLM 2 smaller than PaLM, but more efficient with overall better performance, including faster inference, fewer parameters to serve, and a lower serving cost.
  • Improved dataset mixture: Previous LLMs, like PaLM, used pre-training datasets that were mostly English-only text. PaLM 2 improves on its corpus with a more multilingual and diverse pre-training mixture, which includes hundreds of human and programming languages, mathematical equations, scientific papers, and web pages.
  • Updated model architecture and objective: PaLM 2 has an improved architecture and was trained on a variety of different tasks, all of which helps PaLM 2 learn different aspects of language.

Evaluating PaLM 2

PaLM 2 achieves state of the art results on reasoning benchmark tasks such as WinoGrande and BigBench-Hard. It is significantly more multilingual than our previous large language model, PaLM, achieving better results on benchmarks such as XSum, WikiLingua and XLSum. PaLM 2 also improves translation capability over PaLM and Google Translate in languages like Portuguese and Chinese.

PaLM 2 continues our responsible AI development and commitment to safety.

  • Pre-training Data: We remove forms of sensitive personally identifiable information, filter duplicate documents to reduce memorization, and share analysis of how people are represented in pre-training data.
  • New Capabilities: PaLM 2 demonstrates improved multilingual toxicity classification capabilities, and has built-in control over toxic generation.
  • Evaluations: We evaluate potential harms and bias across a range of potential downstream uses for PaLM 2, including dialog, classification, translation, and question answering. This includes developing new evaluations for measuring potential harms in generative question-answering settings and dialog settings related to toxic language harms and social bias related to identity terms.

How PaLM 2 is powering our generative AI features