Skip to content

Commit ee9fcd7

Browse files
committed
update README & code
1 parent 4149228 commit ee9fcd7

File tree

4 files changed

+48
-105
lines changed

4 files changed

+48
-105
lines changed

LongCoT/LongCoT.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@
180180
"source": [
181181
"# Please set the API key here\n",
182182
"os.environ['OPENAI_API_KEY'] = 'your api key'\n",
183-
"seed_vl_version = \"doubao-1.5-vision-pro-250328\"\n",
183+
"seed_vl_version = \"doubao-1-5-thinking-vision-pro-250428\"\n",
184184
"client = OpenAI(\n",
185185
" base_url=\"https://ark.cn-beijing.volces.com/api/v3\",\n",
186186
" api_key=os.environ.get(\"OPENAI_API_KEY\"),\n",

README.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,32 @@
1-
<div align="center">
2-
👋 Hi, everyone!
3-
<br>
4-
We are <b>ByteDance Seed team.</b>
1+
<div>
2+
<center>
3+
<img src="./assets/banner.png" width=400>
4+
</center>
55
</div>
66

7-
87
<p align="center">
9-
You can get to know us better through the following channels👇
10-
<br>
11-
<a href="https://seed.bytedance.com/">
12-
<img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
13-
<a href="https://github.com/user-attachments/assets/5793e67c-79bb-4a59-811a-fcc7ed510bd4">
14-
<img src="https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>
15-
<a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">
16-
<img src="https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>
17-
<a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">
18-
<img src="https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>
8+
🌐 <a href=""> Homepage (upcoming)</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co">Hugging Face (upcoming)</a>&nbsp&nbsp | &nbsp&nbsp📄 <a href="https://arxiv.org/abs/">arXiv (upcoming)</a>
199
</p>
2010

21-
![seed logo](./assets/logo.jpg)
11+
## 🌟 Highlights
2212

23-
# Seed1.5-VL Cookbook
13+
* Seed1.5-VL is a vision-language foundation model featuring a 532M-parameter vision encoder and a 20B active parameter Mixture-of-Experts (MoE) LLM, designed to advance general-purpose multimodal understanding and reasoning.
14+
15+
* Seed1.5-VL delivers strong performance across numerous public benchmarks, achieving state-of-the-art results in areas including multimodal reasoning and agent-centric tasks.
16+
17+
* This repository offers usage cookbook and best practices designed to help developers effectively use Seed1.5-VL.
2418

25-
Welcome to the **Seed1.5-VL** API Cookbook! This collection of code samples is designed to help you get started with using the Seed1.5-VL API. Our flagship Seed1.5-VL has been deployed on [Volcano Engine](https://www.volcengine.com/product/doubao). After obtaining your `API_KEY`, you can use the examples in this cookbook to rapidly understand and leverage the diverse capabilities of our Seed1.5-VL.
2619

27-
## News
20+
## 📢 News
2821
* `2025-05-12:` We have released the [Seed1.5-VL Technical Report](./Seed1.5-VL-Technical-Report.pdf).
2922
* `2025-05-12`: We are extremely delighted to release the flagship Seed1.5-VL on Volcano Engine. The model id is `doubao-1-5-thinking-vision-pro-250428`. You can try it now!
3023

31-
## Quick Start
24+
25+
## 📖 Seed1.5-VL Cookbook
26+
27+
Welcome to the **Seed1.5-VL** API Cookbook! This collection of code samples is designed to help you get started with using the Seed1.5-VL API. Our flagship Seed1.5-VL has been deployed on [Volcano Engine](https://www.volcengine.com/product/doubao). After obtaining your `API_KEY`, you can use the examples in this cookbook to rapidly understand and leverage the diverse capabilities of our Seed1.5-VL.
28+
29+
### Quick Start
3230

3331
- [x] Cookbook for online/offline [Gradio Demo](./GradioDemo)
3432
- [x] Cookbook for turning on/off [LongCoT](./longCoT)

Video/video_understanding.ipynb

Lines changed: 29 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -55,16 +55,16 @@
5555
},
5656
{
5757
"cell_type": "code",
58-
"execution_count": 7,
58+
"execution_count": null,
5959
"id": "ed96287d-8fd1-454b-9bfd-9f0eaff6c56e",
6060
"metadata": {},
6161
"outputs": [],
6262
"source": [
63-
"# 定义抽帧策略枚举类\n",
6463
"class Strategy(Enum):\n",
65-
" # 固定间隔抽帧策略,例如每1秒抽一帧\n",
64+
" # sampling stragegies\n",
65+
" # constant interval: sampling at a constant interval, fps sampling\n",
6666
" CONSTANT_INTERVAL = \"constant_interval\"\n",
67-
" # 均匀间隔抽帧策略,根据设定的最大帧数均匀从视频全长度抽取\n",
67+
" # even interval: sampling at an even interval, uniform sampling\n",
6868
" EVEN_INTERVAL = \"even_interval\""
6969
]
7070
},
@@ -78,7 +78,7 @@
7878
},
7979
{
8080
"cell_type": "code",
81-
"execution_count": 18,
81+
"execution_count": null,
8282
"id": "974013e8-6436-403f-a5a3-fa245f322939",
8383
"metadata": {},
8484
"outputs": [],
@@ -92,142 +92,87 @@
9292
" use_timestamp: bool = True,\n",
9393
" keyframe_naming_template: str = \"frame_{:04d}.jpg\",\n",
9494
") -> list[str]:\n",
95-
" \"\"\"将视频按照指定策略抽帧\n",
96-
" 参数:\n",
97-
" video_file_path (str): 视频文件路径\n",
98-
" output_dir (str): 输出目录\n",
99-
" extraction_strategy (Optional[Strategy], optional): 抽帧策略。\n",
100-
" 固定间隔 比如 1s 抽一帧 或\n",
101-
" 均匀间隔 根据设定的最大帧数 均匀从视频全长度均匀抽取\n",
102-
" 默认固定间隔 1s 抽一帧\n",
103-
" interval_in_seconds (Optional[float], optional): 固定间隔抽帧的间隔时间. 默认 1s 抽一帧\n",
104-
" max_frames (Optional[int], optional): 最大抽帧帧数. 默认 10 帧\n",
105-
" use_timestamp (bool): 是否输出视频时间戳, 默认True\n",
106-
" keyframe_naming_template (_type_, optional): 抽帧图片命名模板\n",
107-
" 返回:\n",
108-
" list[str]: 抽帧图片路径列表\n",
109-
" list[float]: 视频采样帧对应的时间戳\n",
95+
" \"\"\"sampling videos and extract keyframes with different strategies.\n",
96+
" Args:\n",
97+
" video_file_path (str): video path\n",
98+
" output_dir (str): output directory for sampled keyframes\n",
99+
" extraction_strategy (Optional[Strategy], optional): extraction strategy. Defaults to Strategy.EVEN_INTERVAL.\n",
100+
" interval_in_seconds (Optional[float], optional): the sampling interval\n",
101+
" max_frames (Optional[int], optional): maximum number of sampled frames. Defaults to 10.\n",
102+
" use_timestamp (bool): whether to output video timestamps. Defaults to True.\n",
103+
" keyframe_naming_template (_type_, optional): keyframe naming template. Defaults to \"frame_{:04d}.jpg\".\n",
104+
" Returns:\n",
105+
" list[str]: sampled keyframe paths\n",
106+
" list[float]: timestamps of sampled keyframes\n",
110107
" \"\"\"\n",
111-
" # 检查输出目录是否存在,如果不存在则创建\n",
112108
" if not os.path.exists(output_dir):\n",
113109
" os.makedirs(output_dir)\n",
114-
" # 使用OpenCV打开视频文件\n",
115110
" cap = cv2.VideoCapture(video_file_path)\n",
116-
" # 获取视频的帧率\n",
117111
" fps = cap.get(cv2.CAP_PROP_FPS)\n",
118-
" # 获取视频的总帧数\n",
119112
" length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n",
120113
"\n",
121-
" # 根据策略选择抽帧间隔\n",
122114
" if extraction_strategy == Strategy.CONSTANT_INTERVAL:\n",
123-
" # 计算固定间隔抽帧的帧间隔\n",
124115
" frame_interval = int(fps * interval_in_seconds)\n",
125116
" elif extraction_strategy == Strategy.EVEN_INTERVAL:\n",
126-
" # 计算均匀间隔抽帧的帧间隔\n",
127117
" frame_interval = int(length / max_frames)\n",
128118
" else:\n",
129-
" # 如果策略无效,抛出异常\n",
130119
" raise ValueError(\"Invalid extraction strategy\")\n",
131-
" # 初始化帧计数器\n",
132120
" frame_count = 0\n",
133-
" # 初始化关键帧列表\n",
134121
" keyframes = []\n",
135122
" timestamps = []\n",
136-
" # 循环读取视频帧\n",
137123
" while True:\n",
138-
" # 读取一帧\n",
139124
" ret, frame = cap.read()\n",
140-
" # 如果读取失败,跳出循环\n",
141125
" if not ret:\n",
142126
" break\n",
143-
" # 如果当前帧是关键帧\n",
144127
" if frame_count % frame_interval == 0:\n",
145-
" # 生成关键帧的文件名\n",
146128
" image_path = os.path.join(\n",
147129
" output_dir, keyframe_naming_template.format(len(keyframes))\n",
148130
" )\n",
149-
" # 将关键帧保存为图片\n",
150131
" cv2.imwrite(\n",
151132
" image_path,\n",
152133
" frame,\n",
153134
" )\n",
154-
" # 将关键帧路径添加到列表中\n",
155135
" keyframes.append(image_path)\n",
156136
" timestamps.append(round(frame_count / fps, 1))\n",
157-
" # 增加帧计数器\n",
158137
" frame_count += 1\n",
159-
" # 如果关键帧数量达到最大值,跳出循环\n",
160138
" if len(keyframes) >= max_frames:\n",
161139
" break\n",
162140
"\n",
163-
" print(\"抽取帧数:\", len(keyframes))\n",
164-
" # 返回关键帧路径列表\n",
141+
" print(\"sampled frames:\", len(keyframes))\n",
165142
" if use_timestamp:\n",
166143
" return keyframes, timestamps\n",
167144
" return keyframes, None\n",
168145
"\n",
169146
"def resize(image):\n",
170-
" \"\"\"\n",
171-
" 调整图片大小以适应指定的尺寸。\n",
172-
" 参数:\n",
173-
" image (numpy.ndarray): 输入的图片,格式为numpy数组。\n",
174-
" 返回:\n",
175-
" numpy.ndarray: 调整大小后的图片。\n",
176-
" \"\"\"\n",
177-
" # 获取图片的原始高度和宽度\n",
178147
" height, width = image.shape[:2]\n",
179-
" # 根据图片的宽高比确定目标尺寸\n",
180148
" if height < width:\n",
181149
" target_height, target_width = 480, 640\n",
182150
" else:\n",
183151
" target_height, target_width = 640, 480\n",
184-
" # 如果图片尺寸已经小于或等于目标尺寸,则直接返回原图片\n",
185152
" if height <= target_height and width <= target_width:\n",
186153
" return image\n",
187-
" # 计算新的高度和宽度,保持图片的宽高比\n",
188154
" if height / target_height < width / target_width:\n",
189155
" new_width = target_width\n",
190156
" new_height = int(height * (new_width / width))\n",
191157
" else:\n",
192158
" new_height = target_height\n",
193159
" new_width = int(width * (new_height / height))\n",
194-
" # 调整图片大小\n",
195160
" return cv2.resize(image, (new_width, new_height))\n",
196161
"\n",
197-
"# 定义方法将指定路径图片resize到合适大小并转为Base64编码\n",
198162
"def encode_image(image_path: str) -> str:\n",
199-
" \"\"\"\n",
200-
" 将指定路径的图片进行编码\n",
201-
" 参数:\n",
202-
" image_path (str): 图片文件的路径\n",
203-
" 返回:\n",
204-
" str: 编码后的图片字符串\n",
205-
" \"\"\"\n",
206-
" # 读取图片\n",
207163
" image = cv2.imread(image_path)\n",
208-
" # 调整图片大小\n",
209164
" image_resized = resize(image)\n",
210-
" # 将图片编码为JPEG格式\n",
211165
" _, encoded_image = cv2.imencode(\".jpg\", image_resized)\n",
212-
" # 将编码后的图片转换为Base64字符串\n",
213166
" return base64.b64encode(encoded_image).decode(\"utf-8\")\n",
214167
"\n",
215168
"def construct_messages(image_paths: list[str], timestamps: list[float], prompt: str) -> list[dict]:\n",
216169
" \"\"\"\n",
217-
" 构造包含文本和图像的消息列表。\n",
218-
" 参数:\n",
219-
" image_paths (list[str]): 图像文件路径列表。\n",
220-
" timestamps (list[float]): 视频的时间戳。\n",
221-
" prompt (str): 文本提示。\n",
222-
" 返回:\n",
223-
" list[dict]: 包含文本和图像的消息列表。\n",
170+
" construct messages for the video understanding\n",
224171
" \"\"\"\n",
225-
" # 初始化消息内容列表\n",
226172
" content = []\n",
227-
" # 遍历图像路径列表\n",
228173
" for idx, image_path in enumerate(image_paths):\n",
229-
" # 为每个图像路径构造一个图像URL消息\n",
230174
" if timestamps is not None:\n",
175+
" # add timestamp for each frame\n",
231176
" content.append({\n",
232177
" \"type\": \"text\",\n",
233178
" \"text\": f'[{timestamps[idx]} second]'\n",
@@ -236,9 +181,7 @@
236181
" {\n",
237182
" \"type\": \"image_url\",\n",
238183
" \"image_url\": {\n",
239-
" # 使用Base64编码将图像转换为数据URL\n",
240184
" \"url\": f\"data:image/jpeg;base64,{encode_image(image_path)}\",\n",
241-
" # 指定图像细节级别为低\n",
242185
" \"detail\":\"low\"\n",
243186
" },\n",
244187
" }\n",
@@ -248,7 +191,6 @@
248191
" \"type\": \"text\",\n",
249192
" \"text\": prompt,\n",
250193
" })\n",
251-
" # 返回包含文本和图像的消息列表\n",
252194
" return [\n",
253195
" {\n",
254196
" \"role\": \"user\",\n",
@@ -274,7 +216,7 @@
274216
},
275217
{
276218
"cell_type": "code",
277-
"execution_count": 12,
219+
"execution_count": null,
278220
"id": "f48fd468-12d6-46c9-ae19-c2c981bdc6c2",
279221
"metadata": {},
280222
"outputs": [
@@ -324,11 +266,12 @@
324266
"# sampling video frames\n",
325267
"sampling_fps = 1\n",
326268
"max_frames = 30\n",
269+
"sampling_interval = 1.0 / sampling_fps\n",
327270
"selected_images, timestamps = preprocess_video(\n",
328271
" video_file_path=video_path,\n",
329272
" output_dir=\"video_frames\",\n",
330273
" extraction_strategy=Strategy.CONSTANT_INTERVAL,\n",
331-
" interval_in_seconds=sampling_fps,\n",
274+
" interval_in_seconds=sampling_interval,\n",
332275
" use_timestamp=True,\n",
333276
" max_frames=max_frames\n",
334277
")\n",
@@ -348,7 +291,7 @@
348291
},
349292
{
350293
"cell_type": "code",
351-
"execution_count": 15,
294+
"execution_count": null,
352295
"id": "89845b90-e976-45c1-84cd-7239963101ee",
353296
"metadata": {},
354297
"outputs": [
@@ -369,11 +312,12 @@
369312
"# sampling video frames\n",
370313
"sampling_fps = 1\n",
371314
"max_frames = 30\n",
315+
"sampling_interval = 1.0 / sampling_fps\n",
372316
"selected_images, timestamps = preprocess_video(\n",
373317
" video_file_path=video_path,\n",
374318
" output_dir=\"video_frames\",\n",
375319
" extraction_strategy=Strategy.CONSTANT_INTERVAL,\n",
376-
" interval_in_seconds=sampling_fps,\n",
320+
" interval_in_seconds=sampling_interval,\n",
377321
" use_timestamp=True,\n",
378322
" max_frames=max_frames\n",
379323
")\n",
@@ -393,7 +337,7 @@
393337
},
394338
{
395339
"cell_type": "code",
396-
"execution_count": 17,
340+
"execution_count": null,
397341
"id": "2c4fbac9-5c82-447b-b174-bec330bd70df",
398342
"metadata": {},
399343
"outputs": [
@@ -418,11 +362,12 @@
418362
"# sampling video frames\n",
419363
"sampling_fps = 1\n",
420364
"max_frames = 30\n",
365+
"sampling_interval = 1.0 / sampling_fps\n",
421366
"selected_images, timestamps = preprocess_video(\n",
422367
" video_file_path=video_path,\n",
423368
" output_dir=\"video_frames\",\n",
424369
" extraction_strategy=Strategy.CONSTANT_INTERVAL,\n",
425-
" interval_in_seconds=sampling_fps,\n",
370+
" interval_in_seconds=sampling_interval,\n",
426371
" use_timestamp=True,\n",
427372
" max_frames=max_frames\n",
428373
")\n",

assets/banner.png

41.8 KB
Loading

0 commit comments

Comments
 (0)