[misc] Improve distributed related env variables and setup #487

jzhang38 · 2025-06-07T20:52:48Z

remove fastvideo_args.device and fastvideo_args.device_str. Use get_torch_device() instead.
remove unused functions in weight_utils.py
prioritize using maybe_init_distributed_environment_and_model_parallel
move check_fastvideo_args inside post_init

SolitaryThinker · 2025-06-08T04:41:45Z

fastvideo/data_preprocess/v1_preprocess.py

lets we just move this and related V1 preprocessing files (except those under scripts) to under fastvideo/v1?

SolitaryThinker · 2025-06-08T04:43:57Z

fastvideo/data_preprocess/v1_preprocess.py

    fastvideo_args = FastVideoArgs(model_path=args.model_path,
-                                   num_gpus=world_size,
-                                   device_str="cuda",
+                                   num_gpus=get_world_size(),


perhaps we should just remove num_gpus altogether, including in inference? If we always hardcode num_gpus=get_world_size() users would need to control it using CUDA_VISIBLE_DEVICES. When I added num_gpus I was only considering inference and also assumed tp_size == sp_size

I think num_gpus is easier to control, if not specified we can use world size

SolitaryThinker · 2025-06-08T04:53:17Z

Thanks!

JerryZhou54

lgtm

…b#487)

jzhang38 added 4 commits June 7, 2025 18:14

remove unused file in weight_utils

cffce5e

rename preprocess files with v1 prefix

ea7c90e

remove args.device and args.device_str

bedeff2

fix

24c13e6

jzhang38 marked this pull request as draft June 7, 2025 20:52

jzhang38 added 3 commits June 7, 2025 20:56

rm

701ca9e

improve distributed calling

a6a6aaa

precommit passed

05146cc

jzhang38 marked this pull request as ready for review June 7, 2025 21:35

jzhang38 requested review from SolitaryThinker and kevin314 June 7, 2025 21:36

jzhang38 had a problem deploying to runpod-runners June 7, 2025 21:36 — with GitHub Actions Failure

move check args

d13aceb

jzhang38 temporarily deployed to runpod-runners June 7, 2025 22:43 — with GitHub Actions Inactive

SolitaryThinker approved these changes Jun 8, 2025

View reviewed changes

JerryZhou54 approved these changes Jun 8, 2025

View reviewed changes

jzhang38 merged commit 46e7a15 into main Jun 8, 2025
7 checks passed

jzhang38 deleted the py/improve_distributed_calling branch June 10, 2025 23:44

qimcis pushed a commit to qimcis/FastVideo that referenced this pull request Oct 30, 2025

[misc] Improve distributed related env variables and setup (hao-ai-la…

623c245

…b#487)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[misc] Improve distributed related env variables and setup #487

[misc] Improve distributed related env variables and setup #487

Uh oh!

jzhang38 commented Jun 7, 2025 •

edited

Loading

Uh oh!

SolitaryThinker Jun 8, 2025

Uh oh!

SolitaryThinker Jun 8, 2025

Uh oh!

Edenzzzz Jun 9, 2025 •

edited

Loading

Uh oh!

SolitaryThinker commented Jun 8, 2025

Uh oh!

JerryZhou54 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[misc] Improve distributed related env variables and setup #487

[misc] Improve distributed related env variables and setup #487

Uh oh!

Conversation

jzhang38 commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SolitaryThinker Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

Edenzzzz Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker commented Jun 8, 2025

Uh oh!

JerryZhou54 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jzhang38 commented Jun 7, 2025 •

edited

Loading

Edenzzzz Jun 9, 2025 •

edited

Loading