SAM图像分割，实操应用，详细案例（点操作）——进阶

最新推荐文章于 2025-07-03 15:52:22 发布

原创最新推荐文章于 2025-07-03 15:52:22 发布 · 1.5k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#python #plotly #pandas #图像处理

前言：

1、SAM单个输入点对应单个图像操作：

SAM图像分割，实操应用，详细案例（点操作）-CSDN博客

2、本文主旨在于“将SAM的点操作，融入到视频图像处理中”：

主要以眼动仪中采集的具体场景下的眼坐标位置作为SAM中的点操作的点进行输入。以原视频作为图像，每个坐标点与该帧图像一一对应，进行单图片单坐标点处理，最后将结果图像压缩为视频进行输出。

内容：

1、新手可先阅读前言部分的1部分文章，便可知基本的单图操作。

2、主程序输入与输出

输入：原视频（session.mp4），对应的注视坐标点（gaze.xlsx）

输出：分割后的结果视频（myResultVideoTest.mp4）

① 重点函数 show_mask(image, mask)：

# 输入：原图像image，与掩码mask。 输出： image + mask 的结果图像
def show_mask(image, mask):

    color = np.array([ 235, 251, 134])
    h, w = mask.shape[-2:]
    imageResult = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    for i in range(0, h ):
        for j in range(0, w ):
            if 0 not in imageResult[i][j]:
                image[i][j] = [ 235, 251, 134]
    return image

传入单个图片和单个图片对应的mask，可直接返回image + mask的最终结果图。

② 文件读取与变量定义：

sam_checkpoint = "C:\\Users\\MelonZhou\\Desktop\\PythonCode\\MindLink_SAM\\sam_vit_h_4b8939.pth"  # 定义模型路径
model_type = "vit_h"  # 定义模型类型
device = "cuda"  # "cpu"  or  "cuda"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device = device)  # 定义模型参数
predictor = SamPredictor(sam)  # 调用预测模型



df = pd.read_excel("gaze.xlsx", sheet_name = "Sheet1") # 读取视频图像对应的坐标点（x，y）
numpy_array = df.to_numpy() # 将excel转为矩阵格式
myArrayList = numpy_array

video_path = "session.mp4" # 原视频文件路径
cap = cv2.VideoCapture(video_path) # 创建视频读取对象
frame_width = int(cap.get(3)) # session.mp4 视频图像的宽（1920）
frame_height = int(cap.get(4)) # session.mp4 视频图像的宽（1080）

# 定义编解码器并创建视频写入对象
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
video = cv2.VideoWriter('myResultVideoTest.mp4', fourcc, 30, (frame_width, frame_height),True)

读取excel中每一帧图像对应的注视坐标点，文章中用到了pandas库。

读入视频，并拆解视频为每帧图像，文章中用到了cv2.VideoCapture（）等视频读取功能函数。

要把处理后的图片写成视频，所以文章中用到了 cv2.VideoWriter（）等视频封装功能函数。

③ 解析视频为单帧图片，循环调用SAM模型进行识别分割：

while cap.isOpened():
    start_time = time.time()
    ret, frame = cap.read()
    if not ret:
        break
    image = frame  # 读取的图像以NumPy数组的形式存储在变量image中
    image1 = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # 将图像从BGR颜色空间转换为RGB颜色空间，还原图片色彩（图像处理库所认同的格式）

    predictor.set_image(image1)
    input_point = np.array([myArrayList[listIndex]])
    input_label = np.array([1])  # 点所对应的标签
    masks, scores, logit = predictor.predict(
        point_coords=input_point,
        point_labels=input_label,
        multimask_output=True,  # 为False时，它将返回一个掩码
    )

    imageResult = show_mask(image, masks[1])
    video.write(imageResult)
    plt.show()

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    listIndex += 1
    end_time = time.time()
    execution_time = end_time - start_time  # 计算运行时间
    print("Numbers", listIndex, ":  ", scores,"   ",execution_time,"s")

这一块内容是完全继承前言1中内容，所以建议读者可以先理解前言1中的文章内容。

3、总代码：

import time
import cv2
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from segment_anything import sam_model_registry, SamPredictor
import warnings
warnings.filterwarnings("ignore")

# 输入：原图像image，与掩码mask。 输出： image + mask 的结果图像
def show_mask(image, mask):

    color = np.array([ 235, 251, 134])
    h, w = mask.shape[-2:]
    imageResult = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    for i in range(0, h ):
        for j in range(0, w ):
            if 0 not in imageResult[i][j]:
                image[i][j] = [ 235, 251, 134]
    return image


sam_checkpoint = "C:\\Users\\MelonZhou\\Desktop\\PythonCode\\MindLink_SAM\\sam_vit_h_4b8939.pth"  # 定义模型路径
model_type = "vit_h"  # 定义模型类型
device = "cuda"  # "cpu"  or  "cuda"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device = device)  # 定义模型参数
predictor = SamPredictor(sam)  # 调用预测模型



df = pd.read_excel("gaze.xlsx", sheet_name = "Sheet1") # 读取视频图像对应的坐标点（x，y）
numpy_array = df.to_numpy() # 将excel转为矩阵格式
myArrayList = numpy_array

video_path = "session.mp4" # 原视频文件路径
cap = cv2.VideoCapture(video_path) # 创建视频读取对象
frame_width = int(cap.get(3)) # session.mp4 视频图像的宽（1920）
frame_height = int(cap.get(4)) # session.mp4 视频图像的宽（1080）

# 定义编解码器并创建视频写入对象
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
video = cv2.VideoWriter('myResultVideoTest.mp4', fourcc, 30, (frame_width, frame_height),True)

number = 1
listIndex = 0
# 逐帧读取视频
while cap.isOpened():
    start_time = time.time()
    ret, frame = cap.read()
    if not ret:
        break
    image = frame  # 读取的图像以NumPy数组的形式存储在变量image中
    image1 = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # 将图像从BGR颜色空间转换为RGB颜色空间，还原图片色彩（图像处理库所认同的格式）

    predictor.set_image(image1)
    input_point = np.array([myArrayList[listIndex]])
    input_label = np.array([1])  # 点所对应的标签
    masks, scores, logit = predictor.predict(
        point_coords=input_point,
        point_labels=input_label,
        multimask_output=True,  # 为False时，它将返回一个掩码
    )

    imageResult = show_mask(image, masks[1])
    video.write(imageResult)
    plt.show()

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    listIndex += 1
    end_time = time.time()
    execution_time = end_time - start_time  # 计算运行时间
    print("Numbers", listIndex, ":  ", scores,"   ",execution_time,"s")

# 释放资源
cap.release()
video.release()
cv2.destroyAllWindows()