AI
DISCOVER:

Foundation Models

Learn more about Google’s foundation models that include text-to-image, text-to-code and speech-to-text.

Imagen Model Family

Imagen Model Family

Unlocking visual creativity

Imagen is our family of image generation and editing models. These models build on advances in large Transformer language models and diffusion models. This family of models is being incorporated into multiple Google products, including: Image generation in Google Slides, Cloud Vertex AI and Android’s Generative AI wallpaper.

Imagen is a text-to-image model with a high degree of photorealism and deep language representations. Imagen consists of multiple diffusion models, which start by generating a small image and progressively increase its resolution. The basic intuition is that diffusion models learn to generate structure from noise, conditioned on the provided text prompts.

The Pathways Autoregressive Text-to-Image model (Parti), is an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.

Muse is a text-to-image Transformer model that achieves strong image generation performance while being significantly more efficient than diffusion or autoregressive models. It’s trained on a masked modeling task in a discrete visual token space, conditioned on a text embedding extracted from a pre-trained large language model (LLM). Muse also supports text-guided and mask-based image editing out-of-the-box.

MedLM

MedLM

Bringing the power of foundation models to healthcare

MedLM, developed by Google Research, is a family of foundation models fine-tuned for the healthcare industry. MedLM can help with a wide range of tasks, including answering medical questions, generating summaries, automating manual administrative processes, and more. It builds on our efforts in Med-PaLM, the first large language model to reach expert level performance on medical licensing exam-style questions. It is now generally available to Google Cloud customers in the United States through Vertex AI.

Codey

Codey

Empowering developers to be more productive and creative with large language models

Codey is our family of foundational coding models built on PaLM 2. Codey was fine-tuned on a large dataset of high quality, permissively licensed code from external sources and includes support for 20+ coding languages, including Python, Java, Javascript, Go, and others. The Codey models have been used to enhance various kinds of software development related tasks across various Google surfaces such as Colab, Android Studio, Google Cloud and Google Search. This has various benefits for developers such as improving coding speed, enhancing code quality and closing the skills gap between novice and expert developers. Some of the tasks that Codey can help with include and enable:

  • Code completion: Codey suggests the next few lines based on the existing context of code.
  • Code generation: Codey generates code based on natural language prompts from a developer.
  • Code chat: Codey lets developers converse with a bot to get help with debugging, documentation, learning new concepts, and other code-related questions.

Chirp

Chirp

Google's family of Universal Speech Models enabling automatic speech recognition for 100+ languages.

Chirp is Google's family of state of the art Universal Speech Models trained on 12 million hours of speech to enable automatic speech recognition (ASR) for 100+ languages. The models can perform ASR on under-resourced languages, such as Amharic, Cebuano, and Assamese, in addition to widely spoken languages like English and Mandarin. Chirp is able to cover a wide variety of languages by leveraging self-supervised learning on unlabeled multilingual dataset with fine-tuning on a smaller set of labeled data.