解决bitsandbytes项目中的CUDA编译与运行问题
环境: Linux (CentOS 7), GCC 9.3.1, Python 3.10.9, CUDA 12.4, PyTorch 2.4.0
问题发现 & 报错内容
事情的起因是我在vllm加载模型的代码中也保留着peft,于是在from peft import prepare_model_for_int8_training, PeftModelForSeq2SeqLM
报错:
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-12.4/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 124
CUDA SETUP: Required library version not found: libbitsandbytes_cuda124.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=124
python setup.py install
CUDA SETUP: Setup Failed!
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=124
python setup.py install
容易看到问题是bitsandbytes与当前CUDA版本不兼容。于是我按照它的步骤一步步来:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=124
python setup.py install
这时候我用python -m bitsandbytes
来测试,依然报错:
bitsandbytes library load error: Configured CUDA binary not found at /.../bitsandbytes/libbitsandbytes_cuda124.so
If you are using Intel CPU/XPU, please install intel_extension_for_pytorch to enable required ops
Traceback (most recent call last):
File "", line 318, in <module>
lib = get_native_library()
省略
RuntimeError: Configured CUDA binary not found at /.../bitsandbytes/libbitsandbytes_cuda124.so
================ bitsandbytes v0.47.0.dev0 =================
Platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17
libc: glibc-2.17
Python: 3.10.9
PyTorch: 2.4.0+cu124
CUDA: 12.4
HIP: N/A
XPU: N/A
Related packages:
accelerate: 1.8.1
diffusers: not found
numpy: 1.26.4
pip: 25.1.1
peft: 0.7.0
safetensors: 0.5.3
transformers: 4.53.2
triton: 3.0.0
trl: not found
============================================================
PyTorch settings found: CUDA_VERSION=124, Highest Compute Capability: (8, 0).
Library not found: /.../bitsandbytes/libbitsandbytes_cuda124.so. Maybe you need to compile it from source?
Checking that the library is importable and CUDA is callable...
Traceback (most recent call last):此处省略了哈
RuntimeError:
🚨 Forgot to compile the bitsandbytes library? 🚨
1. You're not using the package but checked-out the source code
2. You MUST compile from source
Attempted to use bitsandbytes native library functionality but it's not available.
This typically happens when:
1. bitsandbytes doesn't ship with a pre-compiled binary for your CUDA version
2. The library wasn't compiled properly during installation from source
To make bitsandbytes work, the compiled library version MUST exactly match the linked CUDA version.
If your CUDA version doesn't have a pre-compiled binary, you MUST compile from source.
You have two options:
1. COMPILE FROM SOURCE (required if no binary exists):
https://huggingface.co/docs/bitsandbytes/main/en/installation#cuda-compile
2. Use BNB_CUDA_VERSION to specify a DIFFERENT CUDA version from the detected one, which is installed on your machine and matching an available pre-compiled version listed above
Original error: Configured CUDA binary not found at /public/***/bitsandbytes-main/bitsandbytes/libbitsandbytes_cuda124.so
🔍 Run this command for detailed diagnostics:
python -m bitsandbytes
If you've tried everything and still have issues:
1. Include ALL version info (operating system, bitsandbytes, pytorch, cuda, python)
2. Describe what you've tried in detail
3. Open an issue with this information:
https://github.com/bitsandbytes-foundation/bitsandbytes/issues
Native code method attempted to call: lib.cadam32bit_grad_fp32()
当前的核心错误是仍然没有编译出 .so 文件
bitsandbytes/libbitsandbytes_cuda124.so 仍然不存在…
于是我极其想编译出这个_cuda124.so文件,用cmake编译:
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4
-- The CXX compiler identification is GNU 9.3.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/devtoolset-9/root/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cpu)
-- Configuring done (4.6s)
-- Generating done (0.1s)
CMake Warning:
Manually-specified variables were not used by the project:
CUDA_TOOLKIT_ROOT_DIR
-- Build files have been written to: /**/bitsandbytes-main/build
最终解决
在踩了一些无序坑之后,我总结了一下正确安装过程,供参考:
我们将手动使用 CMake 触发 CUDA 编译过程,这是 bitsandbytes 官方推荐方式之一。
1. 安装工具
pip install scikit-build-core cmake ninja
2. 官方安装bitsandbytes
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
好,到这了先“婷婷”,下一步你就算进行了也可能不触发cuda编译,报过自动用cpu模式的错就知道了。
根据前面的问题发现与报错内容我们这里需要主要解决的目标是“如何编译出bitsandbytes/libbitsandbytes_cuda124.so,而不是_cpu.so”,后者这个文件是容易产生的。
3. 创建构建目录并配置
mkdir -p build
cd build
关键步骤(显式指定cuda):
cmake .. -DCMAKE_BUILD_TYPE=Release -DCOMPUTE_BACKEND=cuda
然后常规进行下一步:
make -j$(nproc)
这个时候很可能会遇到很多warning,记住,没关系,只要不是error就可以。我在这里看到反复出现warning以为进入了死循环(因为真的很像)所以等了几分钟就强制停了,想找找问题,(这里其实有很多可以解决warning的方法,不过多赘述,且不影响本帖要解决的问题)。那么问题来了,这里到底进行多久是正常的?算力较大的情况下,可能会进行3~15分钟。对这些架构一次进行编译。可以在make -j$(nproc)
前使用export TORCH_CUDA_ARCH_LIST="8.6"
进行加速。
另外,可以排除进程卡死的可能性,去看top
里自己的进程状态即可。
…等待几分钟…
当你看到终端出现:
[ 42%] Linking CXX shared library bitsandbytes/libbitsandbytes_cuda124.so
[100%] Built target bitsandbytes
恭喜!🎉成功。
4. 测试
是否存在我们要编译出的cuda对应版本.so文件?
ls -l /../bitsandbytes/libbitsandbytes_cuda124.so
是否能正确引入bitsandbytes了?
python -m bitsandbytes
成功了输出内容会非常明显
补充
用于看源码和找官方文档:
https://github.com/bitsandbytes-foundation/bitsandbytes
用于下载whl文件(本文中不太用的到):
https://pypi.org/project/bitsandbytes/