Learn about our leading AI models

Discover the AI models behind our most impactful innovations, understand their capabilities, and find the right one when you're ready to build your own AI project.

Agentic Multimodal Text Code Image Audio

Gemini 2.5 Pro Preview

Best for coding and complex prompts.

Enhanced reasoning

State-of-the-art in key math and science benchmarks.

Advanced coding

Easily generate code for web development tasks.

Natively multimodal

Understands input across text, audio, images and video.

Long context

Explore vast datasets with a 1-million token context window.

Learn more
Agentic Ready for developers Multimodal Text Code Image Audio

Gemini 2.0 Flash

Our powerful workhorse model with low latency and enhanced performance, built to power agentic experiences.

Native image generation (Coming soon)

Create or edit images and seamlessly blend them with text.

Native text-to-speech (Coming soon)

Easily steer Gemini’s speaking style to match any mood.

Native tool use

Build agents that use Google Search, code execution and more.

Learn more
Agentic Multimodal Text Code Image

Gemini 2.0 Flash Thinking Experimental

Our enhanced reasoning model, capable of showing its thoughts to improve performance and explainability.

Enhanced performance

Improvements on math and science benchmarks.

Long context

A one-million token context window enables deeper analysis of long-form text.

Improved thinking

More consistency between thoughts and answers.

Tool use

Turn on code execution to run and evaluate code.

Learn more
Ready for Developers Multimodal Text Code Image Audio

Gemini 2.0 Flash-Lite

Our most cost-efficient model yet.

Highly efficient

Better quality than 1.5 Flash, at the same speed and cost.

More context

A 1 million token context window and multimodal input.

Learn more
Ready for Developers Multimodal Text Code Image Audio

Gemini 1.5 Flash

Our lightweight model, optimized for tasks where speed and efficiency matter the most.

Built for speed

Sub-second average first-token latency for the vast majority of developer and enterprise use cases.

Quality at lower cost

On most common tasks, 1.5 Flash models achieve comparable quality to larger models, at a fraction of the cost.

Long-context understanding

Process hours of video and audio, and hundreds of thousands of words or lines of code.

Learn more
Ready for developers Multimodal Text Code

Gemini 1.5 Pro

Our best model for reasoning across large amounts of information.

Complex reasoning about vast amounts of information

Can seamlessly analyze, classify and summarize large amounts of content within a given prompt.

Reasoning across modalities

Can perform highly sophisticated understanding and reasoning tasks for different modalities.

Relevant problem-solving with longer blocks of code

When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works.

Learn more
Ready for developers Multimodal Text Code

Gemini 1.0 Pro

Our model for scaling across a wide range of tasks.

Complex reasoning systems

Fine-tuned both to be a coding model to generate proposal solution candidates, and to be a reward model that is leveraged to recognize and extract the most promising code candidates.

Advanced audio understanding

Significantly outperforms the USM and Whisper models across all ASR and AST tasks, both for English and multilingual test sets.

Learn more
Ready for developers Multimodal Text Code

Gemini 1.0 Ultra

Our largest model for highly complex tasks.

Multimodal reasoning

Natively understands and reasons across sequences of audio, images, and text.

Complex coding

Excels at coding and achieves state-of-the-art performance when integrated into AlphaCode 2.

Mathematical reasoning

Advanced analytical capabilities and strong performance on competition-grade problem sets.

Learn more
Ready for developers Multimodal Text Code

Gemini 1.0 Nano

Our most efficient model for on-device tasks.

Reasoning, functionality & language understanding

Excels at on-device tasks, such as summarization, reading comprehension, text completion tasks, and exhibits impressive capabilities in reasoning, STEM, coding, multimodal, and multilingual tasks relative to their sizes.

Broad accessibility

With capabilities accessible to a larger set of platforms and devices, the Gemini models expand accessibility to everyone.

Learn more
Open models Ready for developers Text

Gemma

A family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Responsible by design

Incorporating comprehensive safety measures, these models help ensure responsible and trustworthy AI solutions through curated datasets and rigorous tuning.

Unmatched performance at size

Gemma models achieve exceptional benchmark results at its 2B and 7B sizes, even outperforming some larger open models.

Framework flexible

With Keras 3.0, enjoy seamless compatibility with JAX, TensorFlow, and PyTorch, empowering you to effortlessly choose and switch frameworks depending on your task.

Learn more
Open models Ready for developers Code

CodeGemma

A collection of lightweight open code models built on top of Gemma. CodeGemma models perform a variety of tasks like code completion, code generation, code chat, and instruction following.

Intelligent code completion and generation

Complete lines, functions, and even generate entire blocks of code, whether you're working locally or using Google Cloud resources.

Enhanced accuracy

Trained on 500 billion tokens data from web documents, mathematics, and code. Generates code that's not only more syntactically correct but also semantically meaningful, reducing errors and debugging time.

Multi-language proficiency

Supports Python, JavaScript, Java, Kotlin, C++, C#, Rust, Go, and other languages.

Learn more
Open models Ready for developers Text

RecurrentGemma

A technically distinct model that leverages recurrent neural networks and local attention to improve memory efficiency.

Reduced memory usage

Lower memory requirements allow for the generation of longer samples on devices with limited memory, such as single GPUs or CPUs.

Higher throughput

Can perform inference at significantly higher batch sizes, thus generating substantially more tokens per second (especially when generating long sequences).

Research innovation

Showcases a non-transformer model that achieves high performance, highlighting advancements in deep learning research.

Learn more
Open models Ready for developers Text

PaliGemma

Our first multimodal Gemma model, designed for class-leading fine-tune performance across diverse vision-language tasks.
Powerful fine tuning
Designed for class-leading fine-tune performance on a wide range of vision-language tasks like:
image and short video captioning
visual question answering
understanding text in images
object detection
and object segmentation
Extensive language support

Supports a wide range of languages.
Learn more
Ready for developers Text Code

PaLM 2

A next generation language model with improved multilingual, reasoning and coding capabilities.

Advanced reasoning

Demonstrates improved capabilities in logic, common sense reasoning, and mathematics.

Multilingual translation

Improved its ability to understand, generate and translate nuanced text — including idioms, poems and riddles. PaLM 2 also passes advanced language proficiency exams at the “mastery” level.

Improved coding

Excels at popular programming languages like Python and JavaScript, but is also capable of generating specialized code in languages like Prolog, Fortran, and Verilog.

Learn more
Ready for developers Image

Imagen

A family of text-to-image models able to generate high-quality images and understand prompts written in natural language.

High quality Images

Able to generate images in a wide range of visual styles, with rich lighting and capturing even small details thanks to advancements in modeling techniques.

Text rendering support

Text-to-image models often struggle to include text accurately. Imagen 3 improves this process, ensuring the correct words or phrases appear in the generated images.

Prompt understanding

Imagen 3 understands prompts written in natural, everyday language, making it easier to get the output you want without complex prompt engineering.

Safety

Includes built-in safety precautions to help ensure that generated images align with Google’s Responsible AI principles.

Learn more
Ready for developers Code

Codey

A family of models that generate code based on a natural language description. It can be used to create functions, web pages, unit tests, and other types of code.

Code completion

Suggests the next few lines based on the existing context of code.

Code generation

Generates code based on natural language prompts from a developer.

Code chat

Lets developers converse with a bot to get help with debugging, documentation, learning new concepts, and other code-related questions.

Learn more
Ready for developers Text

Chirp

A family of universal Speech Models trained on 12 million hours of speech to enable automatic speech recognition (ASR) for 100+ languages.

Broad language support

Can transcribe in over 100 languages with excellent speech recognition.

High accuracy

Achieves state-of-the-art Word Error Rate (WER) on a variety of public test sets and languages. It delivers 98% speech recognition accuracy in English and over 300% relative improvement in several languages with less than 10M speakers.

Large model size

Chirp's 2-billion-parameter model outpaces previous speech models to deliver superior performance.

Learn more
Video

Veo

Our most capable generative video model. A tool to explore new applications and creative possibilities with video generation.

Advanced cinematic effects

With just text prompts, it creates high-quality, 1080P videos that can go beyond 60 seconds. Lets you control the camera, and prompt for things like time lapse or aerial shots of a landscape.

Detail and tone understanding

Interprets and visualizes the tone of prompts. Subtle cues in body language, lighting, and even color choices could dramatically shift the look of a generated video.

Improved consistency and quality of video

Able to retain visual consistency in appearance, locations and style across multiple scenes in a longer video.

More control

Veo allows users to edit videos through prompts, including modifying, adding or replacing visual elements and it can generate a video from an image input, using the image to fit within any frame of the output and the prompt as guidance for how the video should proceed.

Learn more
Healthcare Ready for developers Text

MedLM

A family of models fine-tuned for the healthcare industry.

Transform your healthcare workflow

Revolutionizes the way medical information is accessed, analyzed, and applied. Reduces administrative burdens and helps synthesize information seamlessly.

Build customized solutions

MedLM is a customizable solution that can embed into your workflow and integrate with your data to augment your healthcare capabilities.

Innovate safely and responsibly

Born from a belief that together, technology and medical experts can innovate safely, MedLM helps you stay on the cutting edge.

Learn more
Education Text

LearnLM

A family of models fine-tuned for learning and based on Gemini, infused with education capabilities and grounded in pedagogical evaluations.

Inspire active learning

Allow for practice and healthy struggle with timely feedback.

Manage cognitive load

Present relevant, well-structured information in multiple modalities.

Adapt to learner

Dynamically adjust to goals and needs, grounding in relevant materials.

Stimulate curiosity

Inspire engagement to provide motivation through the learning journey.

Deepen metacognition

Plan, monitor and help the learner reflect on progress.

Learn more
Cybersecurity Text

SecLM

A security-specialized AI API that combines multiple models, business logic, retrieval, and grounding into a cohesive system that is tuned for security-specific tasks.

Industry-leading threat data

Tuned, trained and grounded in threat intelligence from Google, VirusTotal, and Mandiant to bring up-to-date security information and context to users.

Infused in Google Cloud Security products

Gemini in Security agents use SecLM to help defenders protect their organizations.

Supercharging security use cases

Cybersecurity professionals can easily make sense of complex information and perform specialized tasks and workflows.

Learn more

Ready to build?

Explore developer tools

Responsibility is the bedrock of all of our models.

AI is helping us deliver on our mission in exciting new ways, yet it's still an emerging technology that surfaces new challenges and questions as it evolves.

To us, building AI responsibly means both addressing these challenges and questions while maximizing the benefits for people and society. In navigating this complexity, we’re guided by our AI Principles and cutting-edge research, along with feedback from experts, users, and partners.

These efforts are helping us continually improve our models with new advances like AI-assisted redteaming and prevent their misuse with technologies like SynthID. They are also unlocking exciting, real-world progress towards some of society’s most pressing challenges like predicting floods and accelerating research on neglected diseases.

Learn more

Responsibility is the bedrock of all of our models

*SynthID helps identify AI-generated content by embedding an imperceptible watermark on text, images, audio, and video content generated by our models.

Learn about our leading AI models

Gemini 2.5 Pro Preview

Enhanced reasoning

Advanced coding

Natively multimodal

Long context

Gemini 2.0 Flash

Native image generation (Coming soon)

Native text-to-speech (Coming soon)

Native tool use

Gemini 2.0 Flash Thinking Experimental

Enhanced performance

Long context

Improved thinking

Tool use

Gemini 2.0 Flash-Lite

Highly efficient

More context

Gemini 1.5 Flash

Built for speed

Quality at lower cost

Long-context understanding

Gemini 1.5 Pro

Complex reasoning about vast amounts of information

Reasoning across modalities

Relevant problem-solving with longer blocks of code

Gemini 1.0 Pro

Complex reasoning systems

Advanced audio understanding

Gemini 1.0 Ultra

Multimodal reasoning

Complex coding

Mathematical reasoning

Gemini 1.0 Nano

Reasoning, functionality & language understanding

Broad accessibility

Gemma

Responsible by design

Unmatched performance at size

Framework flexible

CodeGemma

Intelligent code completion and generation

Enhanced accuracy

Multi-language proficiency

RecurrentGemma

Reduced memory usage

Higher throughput

Research innovation

PaliGemma

Powerful fine tuning

Extensive language support

PaLM 2

Advanced reasoning

Multilingual translation

Improved coding

Imagen

High quality Images

Text rendering support

Prompt understanding

Safety

Codey

Code completion

Code generation

Code chat

Chirp

Broad language support

High accuracy

Large model size

Veo

Advanced cinematic effects

Detail and tone understanding

Improved consistency and quality of video

More control

MedLM

Transform your healthcare workflow

Build customized solutions

Innovate safely and responsibly

LearnLM

Inspire active learning

Manage cognitive load