Space-Aware Vision-Language Model (SA-VLM)

Welcome to the official repository for Space-Aware Vision-Language Model (SA-VLM), Space-Aware Instruction Tuning (SAIT) and Space-Aware Benchmark (SA-Bench), developed to enhance guide dog robots' assistance for visually impaired individuals.

Overview

Guide dog robots hold the potential to significantly improve mobility and safety for visually impaired people. However, traditional Vision-Language Models (VLMs) often struggle with accurately interpreting spatial relationships, which is crucial for navigation in complex environments.

Our work introduces:

SAIT Dataset: A dataset automatically generated using the pipeline, designed to enhance VLMs' understanding of physical environments by focusing on virtual paths and 3D surroundings.
SA-Bench: A benchmark with an evaluation protocol to assess the effectiveness of VLMs in delivering walking guidance. By integrating spatial awareness into VLMs, our approach enables guide dog robots to provide more accurate and concise guidance, improving the safety and mobility of visually impaired users.

Key Contributions

Training Dataset and Benchmark: We release the SAIT dataset and SA-Bench, providing resources to the community for developing and evaluating space-aware VLMs.
Automated Data Generation Pipeline: An innovative pipeline that automatically generates data focusing on spatial relationships and virtual paths in 3D space.
Improved VLM Performance: Our space-aware instruction-tuned model outperforms state-of-the-art algorithms in providing walking guidance.

Getting Started

Dataset Download

Our dataset uses some images from the SideGuide dataset. Due to copyright restrictions, please download the SideGuide images as follows:
- If you are Korean, download the "인도 보행 영상" dataset from AI Hub.
- If you are not Korean, use this link to request access to the dataset.
Download the following datasets. The following datasets includes images and labels, excluding the Sideguide images.:
- SAIT
- SA-Bench
Dataset Preparation:
- For the SAIT dataset, copy the image files listed in llava_gd_space_aware.json from the downloaded SideGuide dataset into the original_images folder.
- For the SA-Bench, each image should have a corresponding .xml file with the same filename.
  - If an image is missing but the .xml file exists, copy the corresponding image from the SideGuide dataset.

Installation & Training

We utilized LLaVA-OneVision as the baseline network. For installation and training instructions, please refer to this link.

Evaluation

If you would like to evaluate your VLM using SA-Bench, please use the following script.

CUDA_VISIBLE_DEVICES=${GPU_NUM} python ./eval/eval_savlm.py \
--model_ckpt_path <path-to-ckpt-dir> \
--eval_db_dir <path-to-SA-Bench-dir> \
--output_dir <path-to-output-dir>

Results

Our experiments demonstrate that the space-aware instruction-tuned model:

Provides more accurate and concise walking guidance compared to existing VLMs.
Shows improved understanding of spatial relationships in complex environments. For detailed results and analysis, refer to the paper.

Citation

@misc{savlm_icra2025,
  title={Space-Aware Instruction Tuning: Dataset and Benchmark for Guide Dog Robots Assisting the Visually Impaired},
  author={ByungOk Han and Woo-han Yun and Beom-Su Seo and Jaehong Kim},
  year={2025},
  eprint={2502.07183},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2502.07183}
}

Acknowledgments

This work was supported by the Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2023-00215760, Guide Dog: Development of Navigation AI Technology of a Guidance Robot for the Visually Impaired Person).

This research (paper) used datasets from ‘The Open AI Dataset Project (AI-Hub, S. Korea)’. All data information can be accessed through ‘AI-Hub (www.aihub.or.kr)’.

Contact

For questions or collaborations, please contact:

ByungOk Han: [email protected]
Woo-han Yun: [email protected]

License

We will add the license information later.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
db_utils		db_utils
eval		eval
llava		llava
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Space-Aware Vision-Language Model (SA-VLM)

Overview

Key Contributions

Getting Started

Dataset Download

Installation & Training

Evaluation

Results

Citation

Acknowledgments

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

byungokhan/Space-awareVLM

Folders and files

Latest commit

History

Repository files navigation

Space-Aware Vision-Language Model (SA-VLM)

Overview

Key Contributions

Getting Started

Dataset Download

Installation & Training

Evaluation

Results

Citation

Acknowledgments

Contact

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages