Objectives for building beneficial AI
Guiding responsible AI development
Our review and approval process
Addressing societal challenges with AI
Our contributions to AI Governance
EXAMPLES OF OUR WORK
How teams at Google are using AI
Our largest and most capable AI model
Next generation large language model
Learn more about our models
Accelerating scientific discovery
Building a useful quantum computer
Current news and stories from Google
AI publications, tools, and datasets
Learn more about Google’s foundation models that include text-to-image, text-to-code and speech-to-text.
Imagen Model Family
Imagen Model Family
Unlocking visual creativity
Imagen is our family of image generation and editing models. These models build on advances in large Transformer language models and diffusion models. This family of models is being incorporated into multiple Google products, including: Image generation in Google Slides, Cloud Vertex AI and Android’s Generative AI wallpaper.
Imagen is a text-to-image model with a high degree of photorealism and deep language representations. Imagen consists of multiple diffusion models, which start by generating a small image and progressively increase its resolution. The basic intuition is that diffusion models learn to generate structure from noise, conditioned on the provided text prompts.
The Pathways Autoregressive Text-to-Image model (Parti), is an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.
Muse is a text-to-image Transformer model that achieves strong image generation performance while being significantly more efficient than diffusion or autoregressive models. It’s trained on a masked modeling task in a discrete visual token space, conditioned on a text embedding extracted from a pre-trained large language model (LLM). Muse also supports text-guided and mask-based image editing out-of-the-box.
Bringing the power of foundation models to healthcare
MedLM, developed by Google Research, is a family of foundation models fine-tuned for the healthcare industry. MedLM can help with a wide range of tasks, including answering medical questions, generating summaries, automating manual administrative processes, and more. It builds on our efforts in Med-PaLM, the first large language model to reach expert level performance on medical licensing exam-style questions. It is now generally available to Google Cloud customers in the United States through Vertex AI.
Empowering developers to be more productive and creative with large language models
Google's family of Universal Speech Models enabling automatic speech recognition for 100+ languages.
Chirp is Google's family of state of the art Universal Speech Models trained on 12 million hours of speech to enable automatic speech recognition (ASR) for 100+ languages. The models can perform ASR on under-resourced languages, such as Amharic, Cebuano, and Assamese, in addition to widely spoken languages like English and Mandarin. Chirp is able to cover a wide variety of languages by leveraging self-supervised learning on unlabeled multilingual dataset with fine-tuning on a smaller set of labeled data.