Skip to main content
Google Cloud Documentation
Technology areas
  • AI and ML
  • Application development
  • Application hosting
  • Compute
  • Data analytics and pipelines
  • Databases
  • Distributed, hybrid, and multicloud
  • Industry solutions
  • Migration
  • Networking
  • Observability and monitoring
  • Security
  • Storage
Cross-product tools
  • Access and resources management
  • Costs and usage management
  • Infrastructure as code
  • SDK, languages, frameworks, and tools
/
Console
  • English
  • Deutsch
  • Español – América Latina
  • Français
  • Indonesia
  • Italiano
  • Português – Brasil
  • 中文 – 简体
  • 中文 – 繁體
  • 日本語
  • 한국어
Sign in
  • Gemini Enterprise Agent Platform
Start free
Overview Studio Agents Models Notebooks
  • Agent Platform
  • Generative AI
Engineering Blog
Google Cloud Documentation
  • Technology areas
    • More
    • Overview
    • Studio
    • Agents
    • Models
    • Notebooks
    • Pricing
      • More
    • Engineering Blog
  • Cross-product tools
    • More
  • Console
  • Overview
  • Beginner's guide
  • Get started
  • Get started with Agent Platform
  • Develop Gemini API code with the Gen AI SDK
  • Get an API key
  • Configure application default credentials
  • Migrate from Google AI Studio to Agent Platform
  • Get started with Gemini 3
  • Gemini 3 prompting guide
  • Google GenAI libraries
  • Generative AI cookbook
  • Access Gemini models using OpenAI libraries
  • Express mode
    • Overview
    • Console tutorial
    • API tutorial
  • Select models
    • Model Garden
    • Overview of Model Garden
    • Use models in Model Garden
    • Test model capabilities
    • Google Models
    • All Google models
    • Gemini
      • Migrate to the latest Gemini models
      • Pro
      • Gemini 3.1 Pro
      • Gemini 3 Pro
      • Gemini 3 Pro Image
      • Gemini 3 Pro Image Preview
      • Gemini 2.5 Pro
      • Flash
      • Gemini 3.5 Flash
      • Gemini 3.1 Flash Image
      • Gemini 3.1 Flash Image Preview
      • Gemini 3 Flash
      • Gemini 2.5 Flash
      • Gemini 2.5 Flash Image
      • Gemini 2.5 Flash Live API
      • Flash-Lite
      • Gemini 3.1 Flash-Lite
      • Gemini 2.5 Flash-Lite
      • Embedding
      • Gemini Embedding 2
    • Veo
      • Veo 2
      • Veo 3
      • Veo 3.1
    • Lyria
      • Lyria 2
      • Lyria 3
    • Model versions
    • Partner Models
    • Partner models overview
    • Claude
      • Overview
      • Request predictions
      • Quotas for Anthropic Claude models
      • Batch predictions
      • Structured outputs
      • Prompt caching
      • Count tokens
      • Web search
      • Safety classifiers
      • Model details
      • Claude Fable 5
      • Claude Opus 4.8
      • Claude Opus 4.7
      • Claude Sonnet 4.6
      • Claude Opus 4.6
      • Claude Opus 4.5
      • Claude Sonnet 4.5
      • Claude Opus 4.1
      • Claude Haiku 4.5
      • Claude Opus 4
      • Claude Sonnet 4
    • Grok
      • Overview
      • Responses API
      • Function calling
      • Structured output
      • Reasoning
      • Model details
      • Grok 4.1 Fast
      • Grok 4.20
      • Grok 4.3
    • Mistral AI
      • Overview
      • Model details
      • Mistral Medium 3
      • Mistral OCR (25.05)
      • Mistral Small 3.1 (25.03)
      • Codestral 2
    • Deploy partner models from Model Garden
    • Model deprecations (MaaS)
    • Open Models
    • Overview
    • DeepSeek
      • Overview
      • DeepSeek-V3.2
      • DeepSeek-V3.1
      • DeepSeek-R1-0528
      • DeepSeek-OCR
    • Embedding (e5)
      • Multilingual E5 Small
      • Multilingual E5 Large
    • Google Gemma
      • Model-as-a-Service (MaaS)
      • Gemma-4-26B-A4B-IT MaaS
      • Use Gemma
      • Tutorial: Deploy and inference Gemma (GPU)
      • Tutorial: Deploy and inference Gemma (TPU)
    • Kimi
      • Overview
      • Kimi K2 Thinking
    • Llama
      • Overview
      • Request predictions
      • Model details
      • Llama 4 Maverick
      • Llama 4 Scout
      • Llama 3.3
    • MiniMax
      • Overview
      • MiniMax M2
    • OpenAI
      • Overview
      • OpenAI gpt-oss-120b
      • OpenAI gpt-oss-20b
    • Qwen
      • Overview
      • Qwen 3 Next Instruct 80B
      • Qwen 3 Next Thinking 80B
      • Qwen 3 Coder
      • Qwen 3 235B
    • ZAI.org
      • Overview
      • GLM 5
      • GLM 4.7
    • Managed open models (MaaS)
      • Overview
      • Use open models via Model as a Service (MaaS)
      • Grant access to open models
      • API
      • Call MaaS APIs for open models
      • Function calling
      • Thinking
      • Structured output
      • Batch prediction
    • Self-deployed open models
      • Overview
      • Deploy open models
        • Deploy open models from Model Garden
        • Deploy open models with prebuilt containers
        • Deploy open models with a custom vLLM container
        • Deploy models with custom weights
      • Use Hugging Face Models
      • Tutorials
        • Optimize model performance with advanced features in Model Garden
        • Hex-LLM
        • Comprehensive guide to vLLM for Text and Multimodal LLM Serving (GPU)
        • vLLM TPU
        • xDiT
        • Deploy Llamma 3 models with SpotVM and Reservations
  • Build
    • Prompt design
    • Introduction to prompting
    • Prompting strategies
      • Overview
      • Give clear and specific instructions
      • Use system instructions
      • Include few-shot examples
      • Add contextual information
      • Structure prompts
      • Compare prompts
      • Instruct the model to explain its reasoning
      • Break down complex tasks
      • Experiment with parameter values
      • Prompt iteration strategies
    • Task-specific prompt guidance
      • Design multimodal prompts
      • Design chat prompts
    • Capabilities
    • Safety
      • Overview
      • Responsible AI
      • System instructions for safety
      • Configure content filters
      • Gemini for safety filtering and content moderation
      • Abuse monitoring
      • Process blocked responses
      • Content Credentials
      • AI Content Detection API
    • Text and code generation
      • Text generation
      • System instructions
      • Function calling
      • Structured output
      • Content generation parameters
      • Code execution
    • Image generation
      • Generate images with Gemini
      • Generate images from video with Gemini
      • Edit images with Gemini
      • Gemini image generation best practices
      • Gemini image generation limitations
      • Responsible AI and usage for Gemini image generation
      • Imagen documentation
    • Video generation
      • Introduction to Veo
      • Text to video
      • First frame image to video
      • First and last frames to video
      • Ingredients to videos with image references
      • Extend videos
      • Insert objects
      • Remove objects
      • Veo prompt guide
      • Veo best practices
      • Turn off Veo's prompt rewriter
      • Responsible AI for Veo
    • Music generation
      • Introduction to Lyria
      • Generate music using Lyria
      • Lyria prompt guide
    • Media analysis
      • Image understanding
      • Video understanding
      • Audio understanding
      • Document understanding
      • Bounding box detection
    • Grounding
      • Overview
      • Grounding with Google Search
      • Grounding with Google Maps
      • Grounding with Agent Search
      • Grounding with your search API
      • Grounding responses using RAG
      • Grounding with Elasticsearch
      • Grounding with Parallel web search
      • Grounding with Exa web search
      • Web Grounding for Enterprise
    • URL context
    • Thinking
      • Overview
      • Thought signatures
    • Computer Use
    • Live API
      • Overview
      • Get started
        • Get started using the Gen AI SDK
        • Get started using WebSockets
        • Get started using ADK
      • Start and manage live sessions
      • Send audio and video streams
      • Configure language and voice
      • Configure Gemini capabilities
      • Asynchronous function calling
      • Best practices with Live API
      • Troubleshooting Live API
      • Demo apps and resources
    • Embeddings
      • Overview
      • Text embeddings
        • Get text embeddings
        • Choose an embeddings task type
      • Get multimodal embeddings
      • Get batch embeddings inferences
    • Translation
    • Generate speech from text
    • Transcribe speech
    • Development tools
    • Use AI-powered prompt writing tools
      • Overview
      • Optimize prompts
        • Overview
        • Zero-shot optimizer
        • Few-shot optimizer
        • Data-driven optimizer
      • Use prompt templates
    • Model tuning
    • Introduction to tuning
    • Tuning Gemini models
      • Supervised fine-tuning
        • About supervised fine-tuning
        • Prepare your data
        • Use supervised fine-tuning
        • Supported modalities
          • Text tuning
          • Document tuning
          • Image tuning
          • Audio tuning
          • Video tuning
          • Tune function calling
      • Preference tuning
        • About preference tuning
        • Prepare your data
        • Use preference tuning
      • Use tuning checkpoints
      • Use continuous tuning
      • Tuning recommendations with LoRA and QLoRA
      • Distillation
    • Open models
      • Supervised and distillation fine-tuning
    • Embeddings models
      • Tune text embeddings models
    • Translation models
      • About supervised fine-tuning
      • Prepare your data
      • Use supervised fine-tuning
    • Migrate
    • Call Agent Platform models using OpenAI libraries
      • Overview
      • Authenticate
      • Examples
      • Migrate from OpenAI SDK
  • Evaluate
    • Overview
    • Tutorial: Perform evaluation using the console
    • Perform evaluation using the GenAI Client in Agent Platform SDK
      • Tutorial: Evaluate models using the GenAI Client in Agent Platform SDK
      • Define your evaluation metrics
        • Define your evaluation metrics
        • Details for managed rubric-based metrics
      • Prepare your evaluation dataset
      • Run an evaluation
      • View and interpret evaluation results
      • Evaluate agents
    • Alternative evaluation methods
    • Evaluate using the evaluation module in Agent Platform SDK
      • Tutorial: Perform evaluation using the evaluation module in Agent Platform SDK
      • Define your evaluation metrics
      • Prepare your evaluation dataset
      • Run an evaluation
      • Interpret evaluation results
      • Templates for model-based metrics
      • Evaluate agents
      • Evaluate a judge model
      • Configure a judge model
    • Run AutoSxS pipeline
    • Run a computation-based evaluation pipeline
  • Deploy
    • Consumption options overview
    • Provisioned Throughput
      • Provisioned Throughput overview
      • Supported models
      • Calculate Provisioned Throughput requirements
      • Provisioned Throughput for Live API
      • Provisioned Throughput for Gemini 3 (Nano Banana) models
      • Provisioned Throughput for Veo 3 models
      • Single Zone Provisioned Throughput
      • Purchase Provisioned Throughput
      • Use Provisioned Throughput
    • PayGo
      • Standard PayGo
      • Priority PayGo
      • Flex PayGo
    • Batch inference
      • Overview
      • Create batch job from Cloud Storage
      • Create batch job from BigQuery
      • Resume an incomplete batch job
    • Quotas and system limits
    • Cache reused prompt context
      • Overview
      • Create a context cache
      • Use a context cache
      • Get context cache information
      • Update a context cache
      • Delete a context cache
      • Context cache for fine-tuned Gemini models
    • Deploy generative AI models
    • Troubleshooting error code 429
    • Retry strategy
  • Administer
    • Access control
    • Networking
    • Security controls
    • Control access to Model Garden models
    • Enable Data Access audit logs
    • Save and share prompts
    • Monitor models
    • Monitor cost using custom metadata labels
    • Request-response logging
    • Secure a gen AI app by using IAP
      • Overview
      • Set up your project and source repository
      • Create a Cloud Run service
      • Create a load balancer
      • Configure IAP
      • Test your IAP-secured app
      • Clean up your project
  • Build your own model
    • Overview
    • MLOps on Agent Platform
    • Interfaces for Agent Platform
    • Agent Platform beginner's guides
      • Train an AutoML model
      • Train a custom model
      • Get inferences from a custom model
      • Train a model using Agent Platform and the Python SDK
        • Introduction
        • Prerequisites
        • Create a notebook
        • Create a dataset
        • Create a training script
        • Train a model
        • Make an inference
    • Integrated ML frameworks
      • PyTorch
      • TensorFlow
    • Agent Platform for BigQuery users
    • Glossary
    • Get started
    • Set up a project and a development environment
    • Install the Agent Platform SDK for Python
    • Authenticate to Agent Platform
    • Choose a training method
    • Try a tutorial
      • Tutorials overview
      • AutoML tutorials
        • Hello image data
          • Overview
          • Set up your project and environment
          • Create a dataset and import images
          • Train an AutoML image classification model
          • Evaluate and analyze model performance
          • Deploy a model to an endpoint and make an inference
          • Clean up your project
        • Hello tabular data
          • Overview
          • Set up your project and environment
          • Create a dataset and train an AutoML classification model
          • Deploy a model and request an inference
          • Clean up your project
      • Custom training tutorials
        • Train a custom tabular model
        • Train a TensorFlow Keras image classification model
          • Overview
          • Set up your project and environment
          • Train a custom image classification model
          • Serve predictions from a custom image classification model
          • Clean up your project
        • Fine-tune an image classification model with custom data
    • Use Agent Platform development tools
    • Development tools overview
    • Use the Agent Platform SDK
      • Overview
      • Introduction to the Agent Platform SDK for Python
      • Agent Platform SDK for Python classes
        • Agent Platform SDK classes overview
        • Data classes
        • Training classes
        • Model classes
        • Prediction classes
        • Tracking classes
    • Terraform support for Agent Platform
    • Agent Platform Training
    • Overview
    • Agent Platform serverless training
      • Overview of serverless training in Agent Platform
      • Load and prepare data
        • Data preparation overview
        • Use Cloud Storage as a mounted file system
        • Mount an NFS share for serverless training
        • Use managed datasets
      • Prepare training application
        • Understand the serverless training service
        • Prepare training code
        • Use prebuilt containers
          • Create a Python training application for a prebuilt container
          • Prebuilt containers for serverless training
        • Use custom containers
          • Custom containers for serverless training
          • Create a custom container
          • Containerize and run training code locally
      • Train on a persistent resource
        • Overview
        • Create persistent resource
        • Run training jobs on a persistent resource
        • Get persistent resource information
        • Reboot a persistent resource
        • Delete a persistent resource
      • Configure training job
        • Choose a custom training method
        • Configure container settings for training
        • Configure compute resources for training
        • Use reservations with training
        • Use Spot VMs with training
      • Submit training job
        • Create custom jobs
        • Hyperparameter tuning
          • Hyperparameter tuning overview
          • Use hyperparameter tuning
        • Create training pipelines
        • Schedule jobs based on resource availability
        • Use distributed training
        • Training with Cloud TPU VMs
        • Use private IP for custom training
        • Use Private Service Connect interface for training (recommended)
      • Monitor and debug
        • Monitor and debug training using an interactive shell
        • Profile model training performance
      • Tutorial: Build a pipeline for continuous training
      • Create custom organization policy constraints
    • Vertex AI training clusters
      • Overview
      • Get started with training clusters
      • Deployment considerations
        • Compute resources
        • Networking
        • Storage
        • Orchestration
      • Create and manage clusters
        • Create cluster
        • Manage cluster
        • Manage accounts and job scheduling on a cluster
      • Cluster resiliency
      • Feature guides
        • Using