Xinference之windows环境非Docker部署

最新推荐文章于 2025-08-27 18:55:26 发布

hnmpf

最新推荐文章于 2025-08-27 18:55:26 发布

阅读量497

点赞数 5

CC 4.0 BY-SA版权

文章标签： docker 容器运维

本文链接：https://blog.csdn.net/hnmpf/article/details/149062385

背景：

由于Ollama 不能完美部署RERANK类模型，而这类模型在知识库建设中有着举足轻重的左右，故开始摸索这款网上口碑不错的模型运行工具Xinference

材料：

Miniaconda: Miniconda3-py312_24.11.1-0-Windows-x86_64.exe (下载及安装可以翻看其他兄弟文章)

windows电脑，最好是有独立显卡的电脑，因为再好的CPU在模型计算上远远弱于低端显卡的。

制作：

xinference 运行环境搭建

1、运行命令：conda create -n Xinference python=3.10.18 创建基础环境并制定Python版本。

2、运行命令：conda activate Xinference 激活Xinference环境

3、运行命令：conda env list 查看已创建的虚拟环境，其中带“*”的为当前激活环境

conda-script.py: error: unrecognized arguments: -list

(自己的根路径\Xinference) C:\Users\mpf>conda env list

# conda environments:
#
base                   D:\ProgramData\miniconda3
Xinference           * E:\pythonConda\Xinference
myflask_web_env        E:\pythonConda\myflask_web_env
transform_env          E:\pythonConda\transform_env

4、通过 torch.cuda.is_available() 命令检测当前conda环境能不能读取到显卡，如果显示True表示显卡生效

(自己的环境路径\Xinference) C:\Users\mpf>python
Python 3.10.18 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:08:55) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>>

5、通过命令：pip install xinference[all]=1.7.0.post1 进行 xinference 的安装。

6、通过命令：pip show xinference 查看版本

(自己的根目录\Xinference) C:\Users\mpf>pip show xinference
Name: xinference
Version: 1.7.0.post1
Summary: Model Serving Made Easy
Home-page: https://github.com/xorbitsai/inference
Author: Qin Xuye
Author-email: qinxuye@xprobe.io
License: Apache License 2.0
Location: e:\pythonconda\xinference\lib\site-packages
Requires: aioprometheus, async-timeout, click, fastapi, gradio, huggingface-hub, modelscope, nvidia-ml-py, openai, passlib, peft, pillow, pydantic, pynvml, python-jose, requests, setproctitle, sse_starlette, tabulate, timm, torch, tqdm, typing_extensions, uvicorn, xoscar
Required-by:

7、通过命令：xinference-local --host 127.0.0.1 --port 9997 启动xinference服务

(自己的根目录\Xinference) C:\Users\mpf>xinference-local --host 127.0.0.1 --port 9997
2025-07-02 09:23:36,492 xinference.core.supervisor 9900 INFO     Xinference supervisor 127.0.0.1:63528 started
2025-07-02 09:23:37,090 xinference.core.worker 9900 INFO     Worker metrics is disabled due to the environment XINFERENCE_DISABLE_METRICS=1
2025-07-02 09:23:37,091 xinference.core.worker 9900 INFO     Purge cache directory: C:\Users\mpf\.xinference\cache
2025-07-02 09:23:37,093 xinference.core.worker 9900 INFO     Connected to supervisor as a fresh worker
2025-07-02 09:23:37,123 xinference.core.worker 9900 INFO     Xinference worker 127.0.0.1:63528 started
2025-07-02 09:24:13,247 xinference.api.restful_api 10892 INFO     Starting Xinference at endpoint: http://127.0.0.1:9997
2025-07-02 09:24:13,318 xinference.api.restful_api 10892 INFO     Supervisor metrics is disabled due to the environment XINFERENCE_DISABLE_METRICS=1
2025-07-02 09:24:13,422 uvicorn.error 10892 INFO     Uvicorn running on http://127.0.0.1:9997 (Press CTRL+C to quit)

8、通过浏览器地址栏中输入http://127.0.0.1:9997/

xinference 部署中的异常

问题1、xinference-local --host 127.0.0.1 --port 9997 启动报“RuntimeError: Cluster is not available after multiple attempts”错误

答案：配置如下命令：

# 在满足条件时，Xinference 会自动汇报worker健康状况，设置改环境变量为 1可以禁用健康检查
set XINFERENCE_DISABLE_HEALTH_CHECK=1 

# Xinference 会默认在 supervisor 和 worker 上启用 metrics exporter。设置环境变量为 1可以在 supervisor 上禁用 /metrics 端点，并在 worker 上禁用 HTTP 服务（仅提供 /metrics 端点）
set XINFERENCE_DISABLE_METRICS=1

# Xinference 启动时健康检查的次数，如果超过这个次数还未成功，启动会报错，默认值为 3
set XINFERENCE_HEALTH_CHECK_ATTEMPTS=18

# Xinference 启动时健康检查的时间间隔，如果超过这个时间还未成功，启动会报错，默认值为 3
set XINFERENCE_HEALTH_CHECK_INTERVAL=30

# 检查间隔时间
set XINFERENCE_HEALTH_CHECK_TIMEOUT=30

#Xinference 默认使用 <HOME>/.xinference 作为默认目录来存储模型以及日志等必要的文件。其中 <HOME> 是当前用户的主目录。可以通过配置这个环境变量来修改默认目录
set XINFERENCE_HOME='E://home/'

# 配置模型下载仓库。默认下载源是 “huggingface”，也可以设置为 “modelscope” 作为下载源
set XINFERENCE_MODEL_SRC='huggingface'

2、 xinference-local --host 0.0.0.0 --port 9997 启动报“RuntimeError: Cluster is not available after multiple attempts”错误

答案：使用 xinference-local --host 127.0.0.1 --port 9997 命令进行，因为windows不能支持0.0.0.0