Cloud Tensor Processing Units (TPUs)

Accelerate AI development with Google Cloud TPUs

Cloud TPUs optimize performance and cost for all AI workloads, from training to inference. Using world-class data center infrastructure, TPUs offer high reliability, availability, and security.

Not sure if TPUs are the right fit? Learn about when to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads.

Overview

What is a Tensor Processing Unit (TPU)?

Google Cloud TPUs are custom-designed AI accelerators, which are optimized for training and inference of large AI models. They are ideal for a variety of use cases, such as chatbots, code generation, media content generation, synthetic speech, vision services, recommendation engines, personalization models, among others. 

What are the advantages of Cloud TPUs?

Cloud TPUs are designed to scale cost-efficiently for a wide range of AI workloads, spanning training, fine-tuning, and inference. Cloud TPUs provide the versatility to accelerate workloads on leading AI frameworks, including PyTorch, JAX, and TensorFlow. Seamlessly orchestrate large-scale AI workloads through Cloud TPU integration in Google Kubernetes Engine (GKE). Customers looking for the simplest way to develop AI models can also leverage Cloud TPUs in Vertex AI, a fully-managed AI platform.

When to use Cloud TPUs?

Cloud TPUs are optimized for training large and complex deep learning models that feature many matrix calculations, for instance building large language models (LLMs). Cloud TPUs also have SparseCores, which are dataflow processors that accelerate models relying on embeddings found in recommendation models. Other use cases include healthcare, like protein folding modeling and drug discovery.

How are Cloud TPUs different from GPUs?

A GPU is a specialized processor originally designed for manipulating computer graphics. Their parallel structure makes them ideal for algorithms that process large blocks of data commonly found in AI workloads. Learn more.

A TPU is an application-specific integrated circuit (ASIC) designed by Google for neural networks. TPUs possess specialized features, such as the matrix multiply unit (MXU) and proprietary interconnect topology that make them ideal for accelerating AI training and inference.

Cloud TPU versions

Cloud TPU versionDescriptionAvailability

Cloud TPU v5p

The most powerful Cloud TPU for training AI models

Cloud TPU v5p will be available in North America (US East region)

Cloud TPU v5e

The most efficient, versatile, and scalable Cloud TPU

Cloud TPU v5e is generally available in North America (US West/East regions)

Cloud TPU v5p is available in North America (us-east5) during Preview.

Cloud TPU v5p

Description

The most powerful Cloud TPU for training AI models

Availability

Cloud TPU v5p will be available in North America (US East region)

Cloud TPU v5e

Description

The most efficient, versatile, and scalable Cloud TPU

Availability

Cloud TPU v5e is generally available in North America (US West/East regions)

Cloud TPU v5p is available in North America (us-east5) during Preview.

How It Works

Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers where it all happens. Customers use Cloud TPUs to run some of the world's largest AI workloads and that power comes from much more than just a chip. In this video, take a look at the components of the TPU system, including data center networking, optical circuit switches, water cooling systems, biometric security verification and more.

Replace w/TPU Video!

Common Uses

Run large-scale AI training workloads

Powerful, scalable, and efficient AI training

Cloud TPU Multislice training is a full-stack technology that enables fast, easy, and reliable large-scale AI model training across tens of thousands of TPU chips.

Near-linear scaling to tens of thousands of chips with Multislice training

Powerful, scalable, and efficient AI training

Cloud TPU Multislice training is a full-stack technology that enables fast, easy, and reliable large-scale AI model training across tens of thousands of TPU chips.

Near-linear scaling to tens of thousands of chips with Multislice training

Fine-tune foundational AI models

Adapt LLMs for your applications with Pytorch/XLA

Efficiently fine-tune foundation models by leveraging your own training data that represents your use case. Cloud TPU v5e provides up to 1.9x higher LLM fine-tuning performance per dollar compared to Cloud TPU v4.

Cloud TPU LLM fine tuning Performance/$

    Adapt LLMs for your applications with Pytorch/XLA

    Efficiently fine-tune foundation models by leveraging your own training data that represents your use case. Cloud TPU v5e provides up to 1.9x higher LLM fine-tuning performance per dollar compared to Cloud TPU v4.

    Cloud TPU LLM fine tuning Performance/$

      Serve large-scale AI inference workloads

      Maximize performance/$ with AI infrastructure that scales

      Cloud TPU v5e enables high-performance and cost-effective inference for a wide range of AI workloads, including the latest LLMs and Gen AI models. TPU v5e delivers up to 2.5x more throughput performance per dollar and up to 1.7x speedup over Cloud TPU v4. Each TPU v5e chip provides up to 393 trillion int8 operations per second, allowing complex models to make fast predictions. A TPU v5e pod delivers up to 100 quadrillion int8 operations per second, or 100 petaOps of compute power.

      Learn more here
      Cloud TPU v5e pod

      Maximize performance/$ with AI infrastructure that scales

      Cloud TPU v5e enables high-performance and cost-effective inference for a wide range of AI workloads, including the latest LLMs and Gen AI models. TPU v5e delivers up to 2.5x more throughput performance per dollar and up to 1.7x speedup over Cloud TPU v4. Each TPU v5e chip provides up to 393 trillion int8 operations per second, allowing complex models to make fast predictions. A TPU v5e pod delivers up to 100 quadrillion int8 operations per second, or 100 petaOps of compute power.

      Learn more here
      Cloud TPU v5e pod

      Cloud TPU in GKE

      Effortless scaling with GKE

      Combine the power of Cloud TPUs with the flexibility and scalability of GKE to build and deploy machine learning models faster and more easily than ever before. With Cloud TPUs available in GKE, you can now have a single consistent operations environment for all your workloads, standardizing automated MLOps pipelines.

      TPU in GKE Architecture

      Effortless scaling with GKE

      Combine the power of Cloud TPUs with the flexibility and scalability of GKE to build and deploy machine learning models faster and more easily than ever before. With Cloud TPUs available in GKE, you can now have a single consistent operations environment for all your workloads, standardizing automated MLOps pipelines.

      TPU in GKE Architecture

      Cloud TPU in Vertex AI

      Vertex AI Training & Predictions with Cloud TPUs

      For customers looking for a simplest way to develop AI models, you can deploy Cloud TPU v5e with Vertex AI, an end-to-end platform for building AI models on fully-managed infrastructure that’s purpose-built for low-latency serving and high-performance training.

      Vertex AI Training & Predictions with Cloud TPUs

      For customers looking for a simplest way to develop AI models, you can deploy Cloud TPU v5e with Vertex AI, an end-to-end platform for building AI models on fully-managed infrastructure that’s purpose-built for low-latency serving and high-performance training.

      Pricing

      Cloud TPU pricingAll Cloud TPU pricing is per chip-hour
      Cloud TPU VersionEvaluation Price (USD)1-year commitment (USD)3-year commitment (USD)

      Cloud TPU v5p

      Starting at

      $4.2000

      per chip-hour

      Starting at

      $2.9400

      per chip-hour

      Starting at

      $1.8900

      per chip-hour

      Cloud TPU v5e

      Starting at

      $1.2000

      per chip-hour

      Starting at

      $0.8400

      per chip-hour

      Starting at

      $0.5400

      per chip-hour

      Cloud TPU pricing varies by product and region, view details here.

      Cloud TPU pricing

      All Cloud TPU pricing is per chip-hour

      Cloud TPU v5p

      Evaluation Price (USD)

      Starting at

      $4.2000

      per chip-hour

      1-year commitment (USD)

      Starting at

      $2.9400

      per chip-hour

      3-year commitment (USD)

      Starting at

      $1.8900

      per chip-hour

      Cloud TPU v5e

      Evaluation Price (USD)

      Starting at

      $1.2000

      per chip-hour

      1-year commitment (USD)

      Starting at

      $0.8400

      per chip-hour

      3-year commitment (USD)

      Starting at

      $0.5400

      per chip-hour

      Cloud TPU pricing varies by product and region, view details here.

      PRICING CALCULATOR

      Estimate your monthly Cloud TPU costs, including region specific pricing and fees.

      CUSTOM QUOTE

      Connect with our sales team to get a custom quote for your organization.

      Start your proof of concept

      Try Cloud TPUs for free

      Get a quick intro to using Cloud TPUs

      Run TensorFlow on Cloud TPU VM

      Run JAX on Cloud TPU VM

      Run PyTorch on Cloud TPU VM

      Google Cloud
      • ‪English‬
      • ‪Deutsch‬
      • ‪Español‬
      • ‪Español (Latinoamérica)‬
      • ‪Français‬
      • ‪Indonesia‬
      • ‪Italiano‬
      • ‪Português (Brasil)‬
      • ‪简体中文‬
      • ‪繁體中文‬
      • ‪日本語‬
      • ‪한국어‬
      Console
      Google Cloud