使用magentic实现检索增强生成(RAG)技术实践-CSDN博客

本文链接：https://blog.csdn.net/gitblog_01042/article/details/148886405

使用magentic实现检索增强生成(RAG)技术实践

检索增强生成(Retrieval-Augmented Generation, RAG)是当前大语言模型应用中的一项重要技术，它能够有效解决模型知识更新滞后的问题。本文将基于magentic项目，详细介绍如何利用RAG技术构建一个智能代码仓库推荐系统。

什么是检索增强生成(RAG)

RAG技术通过将外部知识检索与生成模型相结合，为大语言模型提供实时、准确的外部信息参考。其核心思想是：

当用户提出查询时，系统首先从外部知识库中检索相关信息
将检索到的相关内容与用户查询一起输入给生成模型
生成模型基于检索内容和自身知识生成最终回答

这种架构特别适合需要访问最新信息或专有信息的场景，能够显著减少模型幻觉现象。

环境准备

首先需要安装必要的Python包：

pip install magentic
pip install ghapi

然后设置使用GPT-3.5-turbo模型：

%env MAGENTIC_OPENAI_MODEL=gpt-3.5-turbo

基础推荐功能实现

我们先创建一个基本的推荐函数，不使用任何外部信息：

from magentic import prompt

@prompt(
    """What are the latest github repos I should use related to {topic}?
    Recommend three in particular that I should check out and why.
    Provide a link to each, and a note on whether they are actively maintained.
    """
)
def recommmend_github_repos(topic: str) -> str: ...

output = recommmend_github_repos("LLMs")

这种基础实现存在明显问题：

无法获取模型知识截止日期后的新仓库
有时会产生错误信息(幻觉)
推荐内容可能过时

集成代码仓库搜索API

为了解决上述问题，我们需要集成代码仓库搜索功能：

from ghapi.all import GhApi
from pydantic import BaseModel

class GithubRepo(BaseModel):
    full_name: str
    description: str
    html_url: str
    stargazers_count: int
    pushed_at: str

def search_github_repos(query: str, num_results: int = 10):
    github = GhApi(authenticate=False)
    results = github.search.repos(query, per_page=num_results)
    return [GithubRepo.model_validate(item) for item in results["items"]]

这个搜索函数能够：

根据关键词查询相关代码仓库
返回仓库名称、描述、URL、星标数和最后更新时间
使用Pydantic进行数据验证

实现RAG推荐系统

现在我们将搜索功能与生成模型结合，创建真正的RAG系统：

@prompt(
    """What are the latest github repos I should use related to {topic}?
    Recommend three in particular that I should check out and why.
    Provide a link to each, and a note on whether they are actively maintained.

    Here are the latest search results for this topic on GitHub:
    {search_results}
    """,
)
def recommmend_github_repos_using_search_results(
    topic: str, search_results: list[GithubRepo]
) -> str: ...

def recommmend_github_repos(topic: str) -> str:
    search_results = search_github_repos(topic, num_results=10)
    return recommmend_github_repos_using_search_results(topic, search_results)

这个实现的关键点：