【compshare】（1）：推荐UCloud(优刻得)一个GPU按小时租的平台，使用实体机部署，可以方便快速的部署xinf推理框架并提供web展示，部署qwen大模型，特别方便

fly-iot

已于 2024-07-02 09:28:27 修改

阅读量1.2k

点赞数 12

CC 4.0 BY-SA版权

分类专栏：大模型 ai技术 compshare 文章标签： qwen大模型人工智能 gpu算力

于 2024-05-26 20:40:08 首次发布

本文链接：https://blog.csdn.net/freewebsys/article/details/139188230

大模型同时被 3 个专栏收录

77 篇文章

订阅专栏

ai技术

22 篇文章

订阅专栏

compshare

9 篇文章

订阅专栏

关于compshare算力共享平台

关于UCloud(优刻得)旗下的compshare算力共享平台
UCloud(优刻得)是中国知名的中立云计算服务商，科创板上市，中国云计算第一股。
Compshare GPU算力平台隶属于UCloud，专注于提供高性价4090算力资源，配备独立IP，支持按时、按天、按月灵活计费，支持github、huggingface访问加速。

https://www.compshare.cn/?ytag=GPU_flyiot_Lcsdn_csdn_display

1，使用平台创建应用

【compshare】（1）：推荐一个GPU按小时租的平台，使用实体机部署，可以方便快速的部署xinf推理框架并提供web展示，部署qwen大模型，特别方便

在这里插入图片描述

在这里插入图片描述
需要选择 ubuntu 版本，可以支持最高 12.4 的cuda驱动。

然后就可以通过webshell登陆了：

在这里插入图片描述

但是界面启动后需要等待下驱动安装。

2，直接使用帐号远程登陆，端口使用8888

ssh root@117.50.xxx.xxx
输入密码就可以操作了

因为没有设置环境变量所以需要执行：

export PATH=${PATH}:/home/ubuntu/.local/bin
export HF_ENDPOINT=https://hf-mirror.com
export XINFERENCE_MODEL_SRC=modelscope
export XINFERENCE_HOME=/home/ubuntu/xinf-data
xinference-local --host 0.0.0.0 --port 8888

在这里插入图片描述

下载速度还是非常快的

2024-05-25 08:03:08,915 - modelscope - INFO - PyTorch version 2.3.0 Found.
2024-05-25 08:03:08,917 - modelscope - INFO - Loading ast index from /home/ubuntu/xinf-data/modelscope/ast_indexer
2024-05-25 08:03:08,917 - modelscope - INFO - No valid ast index found from /home/ubuntu/xinf-data/modelscope/ast_indexer, generating ast index from prebuilt!
2024-05-25 08:03:08,961 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 e9a811c5e567c666896afa26370f3928 and a total number of 976 components indexed
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 661/661 [00:00<00:00, 1.16MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 51.0/51.0 [00:00<00:00, 108kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 206/206 [00:00<00:00, 365kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 7.11k/7.11k [00:00<00:00, 11.8MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1.59M/1.59M [00:00<00:00, 19.6MB/s]
Downloading:  14%|██████████████                                                                                          | 160M/1.15G [00:13<01:27, 12.3MB/s]

模型之能运行一个，重启再运行即可。

  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "/home/ubuntu/.local/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/xinference/core/worker.py", line 629, in launch_builtin_model
    subpool_address, devices = await self._create_subpool(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/xinference/core/worker.py", line 467, in _create_subpool
    else self.allocate_devices(model_uid=model_uid, n_gpu=gpu_cnt)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/xinference/core/worker.py", line 365, in allocate_devices
    raise RuntimeError("No available slot found for the model")
RuntimeError: [address=0.0.0.0:62972, pid=27149] No available slot found for the model

3，调用接口

可以成功调用接口。测试qwen 大模型速度。

curl -X 'POST' 'http://0.0.0.0:8888/v1/chat/completions' -H 'Content-Type: application/json' -d '{
    "model": "qwen1.5-chat","stream": true,
    "messages": [
        {
            "role": "user",
            "content": "北京景点?"
        }
    ],
    "max_tokens": 512,
    "temperature": 0.7
}'