【RAG从入门到精通系列】【RAG From Scratch 系列教程2：Query Transformations】

原创已于 2025-03-12 19:45:30 修改 · 1k 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#langchain #python #LangGraph #RAG

于 2025-03-05 17:32:51 首次发布

RAG 专栏收录该内容

5 篇文章

订阅专栏

前言

"检索增强生成”（RAG）系列教程2：该教程介绍了如何对RAG的查询部分进行优化。

一、概述

1-1、RAG概念

概念：目前的LLM通常是用很多已经存在的文字数据训练出来的。这就导致一个问题：LLM对最新的信息或者个人隐私信息不太了解，因为这些内容在训练时没有被包括进去。虽然可以通过“微调”（也就是针对特定任务再训练一下LLM）来解决这个问题，但微调成本很高，技术相对比较复杂，现在出现了一种新的方法，叫“检索增强生成”（RAG）。这个方法的思路是：从外部的数据源（比如数据库或者网页）中找到相关的资料，然后把这些资料“喂”给聊天机器人，帮助它更好地回答问题。这种方法就像是给聊天机器人提供了一个“外挂”，让它能够接触到更多的知识。

在这里插入图片描述

1-2、前置知识

1-2-1、ModelScopeEmbeddings 词嵌入模型

ModelScope Embeddings 是阿里巴巴达摩院推出的嵌入模型，旨在将文本、图像等数据转换为高维向量，便于机器学习模型处理。这些嵌入向量能够捕捉数据的语义信息，广泛应用于自然语言处理（NLP）、计算机视觉（CV）等领域。

安装库：

pip install modelscope

Demo:

from langchain.embeddings import ModelScopeEmbeddings

model_id = "damo/nlp_corom_sentence-embedding_english-base"
embeddings = ModelScopeEmbeddings(model_id=model_id)
text = "This is a test document."
query_result = embeddings.embed_query(text)
doc_results = embeddings.embed_documents(["foo"])

输出：

在这里插入图片描述

1-2-2、FAISS介绍&安装 (向量相似性搜索)

FAISS（Facebook AI Similarity Search）是由 Meta（前 Facebook）开发的一个高效相似性搜索和密集向量聚类库。它主要用于在大规模数据集中进行向量相似性搜索，特别适用于机器学习和自然语言处理中的向量检索任务。FAISS 提供了多种索引类型和算法，可以在 CPU 和 GPU 上运行，以实现高效的向量搜索。

FAISS 的主要特性

高效的相似性搜索：支持大规模数据集的高效相似性搜索，包括精确搜索和近似搜索。
多种索引类型：支持多种索引类型，如扁平索引（Flat Index）、倒排文件索引（IVF）、产品量化（PQ）等。
GPU 加速：支持在 GPU 上运行，以加速搜索过程。
批量处理：支持批量处理多个查询向量，提高搜索效率。
灵活性：支持多种距离度量，如欧氏距离（L2）、内积（Inner Product）等。

安装：

# cpu或者是GPU版本
pip install faiss-cpu
# 或者
pip install faiss-gpu

Demo分析： 使用 LangChain 库来处理一个长文本文件，将其分割成小块，然后使用 Hugging Face 嵌入和 FAISS 向量存储来执行相似性搜索。

CharacterTextSplitter：用于将长文本分割成小块。
FAISS：用于创建向量数据库。
TextLoader：用于加载文本文件。
HuggingFaceEmbeddings：另一个用于生成文本嵌入向量的类。

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.embeddings import HuggingFaceEmbeddings

# This is a long document we can split up.
with open('./index.txt', encoding='utf-8') as f:
    state_of_the_union = f.read()

text_splitter = CharacterTextSplitter(        
    chunk_size = 100,
    chunk_overlap  = 0,
)
docs = text_splitter.create_documents([state_of_the_union])

embeddings = HuggingFaceEmbeddings()
db = FAISS.from_documents(docs, embeddings)

query = "学生的表现怎么样？"
docs = db.similarity_search(query)
print(docs[0].page_content)

输出：

在这里插入图片描述
Notice： 查询分数，这里的分数为L2距离，因此越低越好

1-2-3、Tiktoken 分词工具

Tiktoken 是 OpenAI 开发的一个高效的分词工具，专门用于处理 GPT 系列模型（如 GPT-3、GPT-4）的文本输入和输出。它能够将自然语言文本转换为模型可以理解的 token 序列，同时支持从 token 序列还原为文本。Tiktoken 的设计目标是高效、灵活且易于集成到各种自然语言处理（NLP）任务中。

安装：

pip install tiktoken

使用：

import tiktoken
# 编码器的加载
encoder = tiktoken.get_encoding("cl100k_base")
text = "这是一个示例文本。"

# 对文本进行编码
tokens = encoder.encode(text)
print(tokens)

# 对文本进行解码
decoded_text = encoder.decode(tokens)
print(decoded_text)

二、Rag From Scratch：Query Transformations

Rag From Scratch：Query Transformations 查询转换：即侧重于重写和/或修改检索问题的方法。

在这里插入图片描述

2-1、前置环境安装

使用到的包：

pip install langchain_community 
pip install tiktoken 
pip install langchain-openai 
pip install langchainhub 
pip install chromadb 
pip install langchain
pip install modelscope

# 这里我使用的向量检索工具是FAISS
# cpu或者是GPU版本
pip install faiss-cpu
# 或者
pip install faiss-gpu

2-2、多查询检索器

多查询检索器： 通过使用大语言模型（LLM），我们可以从不同的角度为用户的查询生成多个相关的问题。这样做的目的是自动优化搜索过程。对于每一个生成的问题，系统会找到一组相关的文档。然后，系统会把这些文档合并起来，形成一个更大的、可能更相关的文档集合。通过从多个角度生成问题，我们可以克服单纯依靠距离匹配检索的局限性，从而得到更丰富、更全面的搜索结果。如下图所示：

在这里插入图片描述

2-2-1、加载网页内容

WebBaseLoader：从指定 URL 加载网页内容。
bs4.SoupStrainer：只解析特定类名的 HTML 元素（如 post-content、post-title、post-header），以减少解析时间。
blog_docs：加载后的文档对象。

import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

2-2-2、分割文档

RecursiveCharacterTextSplitter：将文档递归分割成小块。
chunk_size=300：每个块的最大 token 数量。
chunk_overlap=50：块之间的重叠 token 数量，用于保持上下文连贯。
splits：分割后的文档块。

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

输出：
在这里插入图片描述

2-2-3、向量化文档并创建向量存储

ModelScopeEmbeddings：使用 ModelScope 嵌入模型将文档块转换为向量。
FAISS：将嵌入向量存储到 FAISS 向量数据库中。
retriever：创建一个检索器，用于查询相关文档。

from langchain_community.embeddings import ModelScopeEmbeddings
from langchain_openai import ChatOpenAI
import os

from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(
    documents=splits,
    embedding=ModelScopeEmbeddings(),
)
retriever = vectorstore.as_retriever()

2-2-4、初始化LLM

ChatOpenAI：初始化一个 LLM 实例，使用 qwen-max 模型。
temperature=0： 控制生成文本的随机性，值为 0 时生成确定性结果。
max_tokens=1024：限制生成文本的最大长度。
base_url：指定 API 的基础 URL。

from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="qwen-max",
    temperature=0,
    max_tokens=1024,
    timeout=None,
    max_retries=2,
    api_key=os.environ.get('DASHSCOPE_API_KEY'),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

2-2-5、多维度查询的生成

ChatPromptTemplate：定义一个提示模板，用于生成多个查询。
generate_queries：将用户问题输入 LLM，生成多个相关查询。


from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_perspectives
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# 这里我们小试牛刀，可以看得出来输出非常的人性化。
generate_queries.invoke({"question": '论如何解决提出无理要求的员工？'})

输出：

[‘如何妥善处理员工提出的不合理要求？’,
‘面对员工的无理要求时，有哪些有效的解决策略？’,
‘当员工提出过分的要求时，管理者应该如何应对？’,
‘对于那些经常提出不切实际要求的员工，公司应该采取什么措施？’,
‘在工作中遇到员工提出难以接受的要求时，怎样做才能既解决问题又保持良好的工作关系？’]

2-2-6、检索相关文档

get_unique_union：将检索到的文档合并并去重。
retrieval_chain：生成多个查询，检索相关文档，并返回唯一的文档集合。
retriever.map()：用于检索和查询相关文档
docs：最终检索到的文档。

from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)

2-2-7、构建RAG链，生成最终答案

ChatPromptTemplate：定义一个提示模板，用于生成最终答案。
final_rag_chain：结合检索到的文档和用户问题，生成最终答案。
invoke：执行 RAG 链并输出结果。

from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# 输入数据有两个
# 1、context：通过 retrieval_chain 检索到的相关文档。（检索链检索到的文档）
# 2、itemgetter("question") 是 Python 标准库 operator 模块中的一个函数，它的作用是从一个字典或对象中提取指定的键或属性。在这里，它的作用是从输入的字典中提取键为 "question" 的值。
final_rag_chain = (
    {"context": retrieval_chain,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

print(final_rag_chain.invoke({"question":question}))

输出：

Task decomposition for LLM (Large Language Model) agents is a process where complex tasks are broken down into smaller, more manageable subtasks. This approach helps the agent to handle and execute complex tasks more efficiently. The idea is to make the problem-solving process more structured and step-by-step, which can improve the overall performance and reliability of the agent.

In the context provided, task decomposition can be achieved in several ways:

Chain of Thought (CoT): This technique involves instructing the model to “think step by step” to break down a large task into smaller, simpler steps. This not only makes the task more manageable but also provides insight into the model’s reasoning process.
Tree of Thoughts (ToT): This extends the CoT approach by exploring multiple reasoning possibilities at each step, creating a tree structure. The search process can use breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier or through majority voting.
Simple Prompting: Using straightforward prompts like “Steps for XYZ. 1.” or “What are the subgoals for achieving XYZ?” to guide the LLM in breaking down the task.
Task-Specific Instructions: Providing specific instructions tailored to the task, such as “Write a story outline” for writing a novel.
Human Inputs: Incorporating human input to help define and refine the subtasks.

By decomposing tasks, LLM agents can better manage complexity, improve their planning and execution, and ultimately enhance their problem-solving capabilities.

2-3、Decomposition（将复杂问题进行拆分）（子问题之间关联）

概述： Decomposition的目标是将一个复杂问题分解为多个子问题，后续的子问题会用到前边的子问题的结果，适用于需要分步骤解决复杂问题的场景。

流程：

使用 LLM 生成多个子问题（generate_queries_decomposition）。
对每个子问题分别检索和生成答案，除了第一个子问题之外，其他子问题都会参考上一步的子问题+答案。
最后一个子问题的答案为最终答案。

在这里插入图片描述

2-3-1、子问题生成

概述： 加载网页内容、分割文档、向量化文档并创建向量存储、初始化LLM与之前部分相同。分解任务的提示词模板不同，如下所示。

from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "如何学习好人工智能?"
questions = generate_queries_decomposition.invoke({"question":question})

print(questions)

输出：

[‘1. 人工智能的基础知识有哪些？’, ‘2. 学习人工智能需要掌握哪些编程语言和技术？’, ‘3. 有哪些优质的人工智能在线课程和资源？’]

2-3-2、RAG Chain的构建

方法优势：

对每个子问题分别进行检索，获取到RAG中相关的上下文信息
在回答每个子问题时，会动态的整合之前生成的子问题及其答案作为背景知识，帮助模型更好的理解问题的全貌。

template = """Here is the question you need to answer:
\n --- \n {question} \n --- \n
Here is any available background question + answer pairs:
\n --- \n {q_a_pairs} \n --- \n
Here is additional context relevant to the question: 
\n --- \n {context} \n --- \n
Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser


# 构建问题和答案对
def format_qa_pair(question, answer):
    """Format Q and A pair"""

    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

q_a_pairs = ""
for q in questions:
    rag_chain = (
            {"context": itemgetter("question") | retriever,
             "question": itemgetter("question"),
             "q_a_pairs": itemgetter("q_a_pairs")}
            | decomposition_prompt
            | llm
            | StrOutputParser())

    answer = rag_chain.invoke({"question": q, "q_a_pairs": q_a_pairs})
    q_a_pair = format_qa_pair(q, answer)
    q_a_pairs = q_a_pairs + "\n---\n" + q_a_pair

print(answer)

2-4、Decomposition（将复杂问题进行拆分）（子问题之间无关）

概述： Decomposition的目标是将一个复杂问题分解为多个子问题，然后分别回答这些子问题，最后整合答案。适用于需要分步骤解决复杂问题的场景。

流程：

使用 LLM 生成多个子问题（generate_queries_decomposition）。
对每个子问题分别检索和生成答案。
将子问题的答案整合为最终答案。

在这里插入图片描述

2-4-1、子问题生成

概述： 加载网页内容、分割文档、向量化文档并创建向量存储、初始化LLM与之前部分相同。分解任务的提示词模板不同，如下所示。

from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI


from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

# Chain
generate_queries_decomposition = (prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question":question})

questions

[‘1. What are the key components of a language model in an LLM-powered autonomous agent?’,
‘2. How does an LLM-powered autonomous agent integrate decision-making processes?’,
‘3. What role does data input and output handling play in an LLM-powered autonomous agent system?’]

2-4-2、文档检索

retrieve_and_rag: 对于每一个子问题，都执行构建好的RAG chain，利用索引到的上下文+问题去得到答案

sub_questions ：分解得到的子问题。
retrieved_docs ：对每个子问题检索，得到相关文档。

# RAG prompt
prompt_rag = hub.pull("rlm/rag-prompt")


def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
    """RAG on each sub-question"""

    # Use our decomposition /
    sub_questions = sub_question_generator_chain.invoke({"question": question})

    # Initialize a list to hold RAG chain results
    rag_results = []

    for sub_question in sub_questions:
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.get_relevant_documents(sub_question)

        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs,
                                                                "question": sub_question})
        rag_results.append(answer)
    return rag_results, sub_questions


# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

prompt_rag如下所示
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don’t know the answer, just say that you don’t know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:

answer的值：
[‘In an LLM-powered autonomous agent system, the core elements include planning capabilities such as subgoal and decomposition, and reflection and refinement. These components enable the agent to manage complex tasks by breaking them into smaller, manageable parts and learning from past actions to improve future performance.’,
‘In an LLM-powered autonomous agent system, the LLM can integrate with external data sources and APIs by first using an API search engine to find the appropriate API, then consulting the corresponding documentation to make the call. This process is part of a larger workflow where the LLM makes decisions at each step, which can be evaluated for accuracy.’,
‘In an LLM-powered autonomous agent system, decision-making and action execution are crucial for the agent to break down tasks into subgoals, plan, and carry out actions. The agent uses its capability to reflect on past actions, learn from mistakes, and refine future steps, which enhances the efficiency and effectiveness of task completion. These processes enable the system to function as a general problem solver.’]

questions的值
[‘1. What are the core elements of a language model in an LLM-powered autonomous agent system?’,
‘2. How does an LLM-powered autonomous agent system integrate with external data sources and APIs?’,
‘3. What role does decision-making and action execution play in an LLM-powered autonomous agent system?’]

2-4-3、RAG chain构建

format_qa_pairs： 将子问题（questions）和对应的答案（answers）格式化为一个清晰的字符串，便于后续使用。

def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""

    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()


context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
        prompt
        | llm
        | StrOutputParser()
)
print(final_rag_chain.invoke({"context": context, "question": question}))

输出：

An LLM-powered autonomous agent system is composed of several key components that work together to enable the agent to perform complex tasks effectively. The main components include:

Planning Capabilities:
- Subgoal and Decomposition: The system can break down complex tasks into smaller, more manageable subgoals. This allows the agent to tackle each part systematically.
- Reflection and Refinement: The agent can reflect on past actions, learn from any mistakes, and refine its approach for future tasks. This continuous learning process enhances the agent’s performance over time.
Integration with External Data Sources and APIs:
- The LLM can search for and utilize external data sources and APIs to gather necessary information or perform specific functions. This involves using an API search engine to find the appropriate API, consulting the documentation, and making the API call. This integration is a critical part of the agent’s workflow, allowing it to leverage external resources to make informed decisions.
Decision-Making and Action Execution:
- The system is equipped with robust decision-making capabilities, enabling it to plan and execute actions based on the subgoals and available data. The agent can evaluate the outcomes of its actions and use this feedback to improve future decision-making. This iterative process of planning, executing, and refining actions makes the system a versatile problem solver.

These components collectively enable the LLM-powered autonomous agent to manage and complete complex tasks efficiently and effectively, continuously improving its performance through learning and adaptation.

参考文章：
rag-from-scratch 官方GitHub仓库.