Xorbits Inference 项目客户端 API 使用指南-CSDN博客

本文链接：https://blog.csdn.net/gitblog_00466/article/details/148441662

Xorbits Inference 项目客户端 API 使用指南

项目概述

Xorbits Inference 是一个强大的模型推理服务框架，提供了多种类型的预训练模型支持，包括大语言模型(LLM)、嵌入模型(Embedding)、图像生成模型(Image)、音频处理模型(Audio)等。通过简洁的客户端API，开发者可以轻松地将这些模型集成到自己的应用中。

服务启动与连接

要使用Xorbits Inference的客户端API，首先需要启动本地推理服务：

>>> xinference
2023-10-17 16:32:21,700 xinference   24584 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997

启动后会显示服务端点地址，通常为http://127.0.0.1:9997。客户端通过这个地址与服务进行通信。

大语言模型(LLM)使用

查看可用模型

>>> xinference registrations -t LLM

Type    Name                     Language      Ability                        Is-built-in
------  -----------------------  ------------  -----------------------------  -------------
LLM     baichuan                 ['en', 'zh']  ['embed', 'generate']          True
LLM     baichuan-2               ['en', 'zh']  ['embed', 'generate']          True
LLM     baichuan-2-chat          ['en', 'zh']  ['embed', 'generate', 'chat']  True
...

使用Xinference客户端

from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(
    model_name="glm4-chat",
    model_engine="llama.cpp",
    model_format="ggufv2",
    model_size_in_billions=9,
    quantization="Q4_K"
)
model = client.get_model(model_uid)

messages = [{"role": "user", "content": "世界上最大的动物是什么？"}]
response = model.chat(
    messages,
    generate_config={"max_tokens": 1024}
)

使用兼容接口

import openai

client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
response = client.chat.completions.create(
    model=model_uid,
    messages=[
        {
            "content": "世界上最大的动物是什么？",
            "role": "user",
        }
    ],
    max_tokens=1024
)

工具调用功能

Xorbits Inference支持风格的函数调用功能：

tools = [
    {
        "type": "function",
        "function": {
            "name": "uber_ride",
            "description": "根据位置、车型和等待时间为客户寻找合适的乘车",
            "parameters": {
                "type": "object",
                "properties": {
                    "loc": {"type": "int", "description": "乘车起始位置"},
                    "type": {"type": "string", "enum": ["plus", "comfort", "black"], "description": "车型"},
                    "time": {"type": "int", "description": "客户愿意等待的时间(分钟)"},
                },
            },
        },
    }
]

response = client.chat.completions.create(
    model="chatglm3",
    messages=[{"role": "user", "content": "在94704邮编地区叫一辆'Plus'型优步车，10分钟内到达"}],
    tools=tools,
)

嵌入模型(Embedding)使用

查看可用模型

>>> xinference registrations -t embedding

Type       Name                     Language      Dimensions  Is-built-in
---------  -----------------------  ----------  ------------  -------------
embedding  bge-base-en              ['en']               768  True
embedding  bge-base-en-v1.5         ['en']               768  True
embedding  bge-base-zh              ['zh']               768  True
...

使用Xinference客户端

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-small-en-v1.5", model_type="embedding")
model = client.get_model(model_uid)

embedding = model.create_embedding("中国的首都是哪里？")

使用兼容接口

client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
embedding = client.embeddings.create(model=model_uid, input=["中国的首都是哪里？"])

图像生成模型(Image)使用

查看可用模型

>>> xinference registrations -t image

Type    Name                          Family            Is-built-in
------  ----------------------------  ----------------  -------------
image   sd-turbo                      stable_diffusion  True
image   sdxl-turbo                    stable_diffusion  True
image   stable-diffusion-v1.5         stable_diffusion  True

使用Xinference客户端

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="stable-diffusion-v1.5", model_type="image")
model = client.get_model(model_uid)

image = model.text_to_image("一个苹果")

使用兼容接口

client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
image = client.images.generate(model=model_uid, prompt="一个苹果")

音频处理模型(Audio)使用

查看可用模型

>>> xinference registrations -t audio

Type    Name               Family    Multilingual    Is-built-in
------  -----------------  --------  --------------  -------------
audio   whisper-base       whisper   True            True
audio   whisper-base.en    whisper   False           True
audio   whisper-large-v3   whisper   True            True

使用Xinference客户端

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="whisper-large-v3", model_type="audio")
model = client.get_model(model_uid)

with open("audio.mp3", "rb") as audio_file:
    transcription = model.transcriptions(audio_file.read())

使用兼容接口

client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(model=model_uid, file=audio_file)

重排序模型(Rerank)使用

重排序模型可用于计算查询与文档的相关性分数：

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-reranker-base", model_type="rerank")
model = client.get_model(model_uid)

query = "一个男人在吃意大利面"
corpus = [
    "一个男人在吃东西",
    "一个男人在吃一片面包",
    "女孩抱着一个婴儿",
    "一个男人在骑马",
    "一个女人在拉小提琴"
]
results = model.rerank(corpus, query)

最佳实践建议

对于中文场景，推荐使用bge系列的嵌入模型和chatglm系列的大语言模型
图像生成时，提示词越详细生成的图片质量通常越好
音频转录支持多种语言，选择多语言模型可获得更好的识别效果
重排序模型在搜索和推荐场景中非常有用

通过Xorbits Inference提供的这些模型和API，开发者可以快速构建各种AI应用，而无需关心底层模型的部署和优化细节。

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考