יצירת תמונות באמצעות Gemini

‫Gemini יכול ליצור ולעבד תמונות בשיחה. אתם יכולים לתת ל-Gemini הנחיות באמצעות טקסט, תמונות או שילוב של שניהם כדי לבצע משימות שקשורות לתמונות, כמו יצירה ועריכה של תמונות. כל התמונות שנוצרות כוללות סימן מים של SynthID.

יכול להיות שיצירת תמונות לא תהיה זמינה בכל האזורים והמדינות. למידע נוסף, אפשר לעיין בדף מודלים של Gemini.

יצירת תמונות (טקסט לתמונה)

בדוגמה הבאה מוצג קוד ליצירת תמונה על סמך הנחיה תיאורית. צריך לכלול את responseModalities: ["TEXT", "IMAGE"] בהגדרה. במודלים האלה אין תמיכה בפלט של תמונות בלבד.

Python

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

client = genai.Client()

contents = ('Hi, can you create a 3d rendered image of a pig '
            'with wings and a top hat flying over a happy '
            'futuristic scifi city with lots of greenery?')

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image.png')
    image.show()

JavaScript

import { GoogleGenAI, Modality } from "@google/genai";
import * as fs from "node:fs";

async function main() {

  const ai = new GoogleGenAI({});

  const contents =
    "Hi, can you create a 3d rendered image of a pig " +
    "with wings and a top hat flying over a happy " +
    "futuristic scifi city with lots of greenery?";

  // Set responseModalities to include "Image" so the model can generate  an image
  const response = await ai.models.generateContent({
    model: "gemini-2.0-flash-preview-image-generation",
    contents: contents,
    config: {
      responseModalities: [Modality.TEXT, Modality.IMAGE],
    },
  });
  for (const part of response.candidates[0].content.parts) {
    // Based on the part type, either show the text or save the image
    if (part.text) {
      console.log(part.text);
    } else if (part.inlineData) {
      const imageData = part.inlineData.data;
      const buffer = Buffer.from(imageData, "base64");
      fs.writeFileSync("gemini-native-image.png", buffer);
      console.log("Image saved as gemini-native-image.png");
    }
  }
}

main();

Go

package main

import (
  "context"
  "fmt"
  "os"
  "google.golang.org/genai"
)

func main() {

  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  config := &genai.GenerateContentConfig{
      ResponseModalities: []string{"TEXT", "IMAGE"},
  }

  result, _ := client.Models.GenerateContent(
      ctx,
      "gemini-2.0-flash-preview-image-generation",
      genai.Text("Hi, can you create a 3d rendered image of a pig " +
                 "with wings and a top hat flying over a happy " +
                 "futuristic scifi city with lots of greenery?"),
      config,
  )

  for _, part := range result.Candidates[0].Content.Parts {
      if part.Text != "" {
          fmt.Println(part.Text)
      } else if part.InlineData != nil {
          imageBytes := part.InlineData.Data
          outputFilename := "gemini_generated_image.png"
          _ = os.WriteFile(outputFilename, imageBytes, 0644)
      }
  }
}

REST

curl -s -X POST
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"}
      ]
    }],
    "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
  }' \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-native-image.png

תמונה שנוצרה על ידי AI של חזיר מעופף פנטסטי

עריכת תמונות (טקסט ותמונה לתמונה)

כדי לערוך תמונה, מוסיפים תמונה כקלט. בדוגמה הבאה אפשר לראות איך מעלים תמונות בקידוד base64. לגבי כמה תמונות ומטענים ייעודיים (payload) גדולים יותר, אפשר לעיין בקטע image input.

Python

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

import PIL.Image

image = PIL.Image.open('/path/to/image.png')

client = genai.Client()

text_input = ('Hi, This is a picture of me.'
            'Can you add a llama next to me?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.show()

JavaScript

import { GoogleGenAI, Modality } from "@google/genai";
import * as fs from "node:fs";

async function main() {

  const ai = new GoogleGenAI({});

  // Load the image from the local file system
  const imagePath = "path/to/image.png";
  const imageData = fs.readFileSync(imagePath);
  const base64Image = imageData.toString("base64");

  // Prepare the content parts
  const contents = [
    { text: "Can you add a llama next to the image?" },
    {
      inlineData: {
        mimeType: "image/png",
        data: base64Image,
      },
    },
  ];

  // Set responseModalities to include "Image" so the model can generate an image
  const response = await ai.models.generateContent({
    model: "gemini-2.0-flash-preview-image-generation",
    contents: contents,
    config: {
      responseModalities: [Modality.TEXT, Modality.IMAGE],
    },
  });
  for (const part of response.candidates[0].content.parts) {
    // Based on the part type, either show the text or save the image
    if (part.text) {
      console.log(part.text);
    } else if (part.inlineData) {
      const imageData = part.inlineData.data;
      const buffer = Buffer.from(imageData, "base64");
      fs.writeFileSync("gemini-native-image.png", buffer);
      console.log("Image saved as gemini-native-image.png");
    }
  }
}

main();

Go

package main

import (
 "context"
 "fmt"
 "os"
 "google.golang.org/genai"
)

func main() {

 ctx := context.Background()
 client, err := genai.NewClient(ctx, nil)
 if err != nil {
     log.Fatal(err)
 }

 imagePath := "/path/to/image.png"
 imgData, _ := os.ReadFile(imagePath)

 parts := []*genai.Part{
   genai.NewPartFromText("Hi, This is a picture of me. Can you add a llama next to me?"),
   &genai.Part{
     InlineData: &genai.Blob{
       MIMEType: "image/png",
       Data:     imgData,
     },
   },
 }

 contents := []*genai.Content{
   genai.NewContentFromParts(parts, genai.RoleUser),
 }

 config := &genai.GenerateContentConfig{
     ResponseModalities: []string{"TEXT", "IMAGE"},
 }

 result, _ := client.Models.GenerateContent(
     ctx,
     "gemini-2.0-flash-preview-image-generation",
     contents,
     config,
 )

 for _, part := range result.Candidates[0].Content.Parts {
     if part.Text != "" {
         fmt.Println(part.Text)
     } else if part.InlineData != nil {
         imageBytes := part.InlineData.Data
         outputFilename := "gemini_generated_image.png"
         _ = os.WriteFile(outputFilename, imageBytes, 0644)
     }
 }
}

REST

IMG_PATH=/path/to/your/image1.jpeg

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \
    -H "x-goog-api-key: $GEMINI_API_KEY" \
    -H 'Content-Type: application/json' \
    -d "{
      \"contents\": [{
        \"parts\":[
            {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"},
            {
              \"inline_data\": {
                \"mime_type\":\"image/jpeg\",
                \"data\": \"$IMG_BASE64\"
              }
            }
        ]
      }],
      \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]}
    }"  \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-edited-image.png

מצבים אחרים של יצירת תמונות

‫Gemini תומך במצבי אינטראקציה אחרים עם תמונות על סמך מבנה ההנחיה וההקשר, כולל:

טקסט לתמונה(או לתמונות) וטקסט (משולב): יצירת תמונות עם טקסט שקשור אליהן.
- הנחיה לדוגמה: "תכין מתכון מאויר לפאייה".
תמונה/תמונות וטקסט לתמונה/תמונות וטקסט (משולבים): משתמש בתמונות ובטקסט שמוזנים כדי ליצור תמונות וטקסט חדשים שקשורים אליהם.
- הנחיה לדוגמה: (עם תמונה של חדר מרוהט) "אילו ספות בצבעים אחרים יתאימו לחלל שלי? אפשר לעדכן את התמונה?"
עריכת תמונות בשיחה (צ'אט): אפשר להמשיך ליצור או לערוך תמונות בשיחה.
- הנחיות לדוגמה: [upload an image of a blue car.] , "הפוך את המכונית הזו למכונית קבריולט", ‫"Now change the color to yellow" (עכשיו תשנה את הצבע לצהוב).

מגבלות

כדי להשיג את הביצועים הטובים ביותר, מומלץ להשתמש בשפות הבאות: EN, es-MX, ja-JP, zh-CN, hi-IN.
אי אפשר להשתמש בקלט של אודיו או וידאו כדי ליצור תמונות.
יכול להיות שיצירת התמונות לא תופעל תמיד:
- יכול להיות שהמודל יפיק פלט של טקסט בלבד. אפשר לנסות לבקש תמונות באופן מפורש (למשל: "צור תמונה", "תציג תמונות במהלך השיחה", "תעדכן את התמונה").
- יכול להיות שהמודל יפסיק ליצור באמצע. אפשר לנסות שוב או להזין הנחיה אחרת.
כשיוצרים טקסט לתמונה, מומלץ קודם ליצור את הטקסט ואז לבקש תמונה עם הטקסט.
יש אזורים או מדינות שבהם אי אפשר ליצור תמונות. מידע נוסף זמין במאמר בנושא מודלים.

מתי כדאי להשתמש ב-Imagen

בנוסף ליכולות המובנות של Gemini ליצירת תמונות, אתם יכולים לגשת גם אל Imagen, המודל המיוחד שלנו ליצירת תמונות, דרך Gemini API.

בוחרים באפשרות Gemini כש:

אתם צריכים תמונות רלוונטיות מבחינת ההקשר, שמבוססות על ידע והיגיון.
חשוב לשלב את הטקסט והתמונות בצורה חלקה.
אתם רוצים להטמיע תמונות מדויקות ברצפים ארוכים של טקסט.
אתם רוצים לערוך תמונות בשיחה תוך שמירה על ההקשר.

כדאי לבחור באפשרות Imagen כשרוצים:

העדיפות העליונה היא לאיכות התמונה, לריאליזם, לפרטים אומנותיים או לסגנונות ספציפיים (למשל, אימפרסיוניזם, אנימה).
ביצוע משימות עריכה מיוחדות כמו עדכוני רקע של מוצרים או הגדלת תמונות.
הוספת מיתוג, סגנון או יצירת סמלי לוגו ועיצובים של מוצרים.

מודל Imagen 4 הוא המודל המומלץ ליצירת תמונות באמצעות Imagen. כדאי לבחור ב-Imagen 4 Ultra לתרחישי שימוש מתקדמים או כשנדרשת איכות התמונה הכי טובה. חשוב לדעת ש-Imagen 4 Ultra יכול ליצור רק תמונה אחת בכל פעם.

המאמרים הבאים

במדריך ל-Veo מוסבר איך ליצור סרטונים באמצעות Gemini API.
מידע נוסף על מודלים של Gemini זמין במאמרים מודלים של Gemini ומודלים ניסיוניים.