Objectives for building beneficial AI
Guiding responsible AI development
Our review and approval process
Addressing societal challenges with AI
Our contributions to AI Governance
EXAMPLES OF OUR WORK
Ai across Google
How teams at Google are using AI
Next generation large language model
Learn more about our models
AI publications, tools, and datasets
Current news and stories from Google
Learn more about Google’s foundation models that include text-to-image, text-to-code and speech-to-text.
Imagen Model Family
Imagen Model Family
Unlocking visual creativity
Imagen is our family of image generation and editing models. These models build on advances in large Transformer language models and diffusion models. This family of models is being incorporated into multiple Google products, including: Image generation in Google Slides, Cloud Vertex AI and Android’s Generative AI wallpaper.
Imagen is a text-to-image model with a high degree of photorealism and deep language representations. Imagen consists of multiple diffusion models, which start by generating a small image and progressively increase its resolution. The basic intuition is that diffusion models learn to generate structure from noise, conditioned on the provided text prompts.
The Pathways Autoregressive Text-to-Image model (Parti), is an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.
Muse is a text-to-image Transformer model that achieves strong image generation performance while being significantly more efficient than diffusion or autoregressive models. It’s trained on a masked modeling task in a discrete visual token space, conditioned on a text embedding extracted from a pre-trained large language model (LLM). Muse also supports text-guided and mask-based image editing out-of-the-box.
Empowering developers to be more productive and creative with large language models
Google's family of Universal Speech Models enabling automatic speech recognition for 100+ languages.
Chirp is Google's family of state of the art Universal Speech Models trained on 12 million hours of speech to enable automatic speech recognition (ASR) for 100+ languages. The models can perform ASR on under-resourced languages, such as Amharic, Cebuano, and Assamese, in addition to widely spoken languages like English and Mandarin. Chirp is able to cover a wide variety of languages by leveraging self-supervised learning on unlabeled multilingual dataset with fine-tuning on a smaller set of labeled data.