10、论文笔记:Retrieval-Augmented Generation for Large Language Models: A Survey(RAG调查)

RAG调查


论文:Retrieval-Augmented Generation for Large Language Models: A Survey大型语言模型的检索增强生成:一项调查
链接:https://arxiv.org/abs/2312.10997
Official Code:https://github.com/tongji-kgllm/rag-survey

Graph RAG之前的Na ̈ıve RAG 朴素RAG:
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-
augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
高倩,Y.,熊倩,Y.,高 X.,贾,K.,潘,J.,毕,Y.,戴,Y.,孙,J. 和王,H.(2023)。大型语言模型的检索增强生成:一项调查。arXiv 预印本 arXiv:2312.10997。

主要内容

“Retrieval-Augmented Generation for Large Language Models: A Survey” 这篇综述论文全面探讨了检索增强生成(RAG)技术,涵盖其发展历程、技术框架、应用任务、评估方法以及面临的挑战与未来方向,为读者深入理解 RAG 在大语言模型中的作用提供了详尽参考。
在这里插入图片描述
图1。RAG研究的技术树。涉及RAG的阶段主要包括预训练、微调和推理。随着LLM的出现,对RAG的研究最初集中在利用LLM强大的上下文学习能力上,主要集中在推理阶段。随后的研究更加深入,逐渐与LLM的微调相结合。研究人员还一直在探索通过检索增强技术在预训练阶段增强语言模型的方法。

  1. 引言

    • 大型语言模型(LLMs)虽取得显著成就,但在特定领域或知识密集型任务中存在局限性,如产生幻觉等。检索增强生成(RAG)通过从外部知识库检索相关文档块来增强 LLMs,有效减少错误内容生成,已成为关键技术。
    • RAG 技术发展迅速,经历了与 Transformer 架构兴起相关的早期阶段、ChatGPT 出现后的快速发展阶段,其研究从关注预训练逐渐深入到与 LLM 微调技术融合。
    • 本文贡献包括系统回顾 RAG 方法、剖析其核心组件、总结评估方法以及展望未来发展方向。
### Retrieval-Augmented Generation in Knowledge-Intensive NLP Tasks Implementation and Best Practices The method of retrieval-augmented generation (RAG) for knowledge-intensive natural language processing tasks aims to combine the strengths of dense vector representations with sparse exact match methods, thereby improving model performance on tasks that require access to external information not present during training[^1]. This approach ensures models can retrieve relevant documents or passages from a large corpus at inference time and generate responses conditioned on this retrieved context. #### Key Components of RAG Framework A typical implementation involves two main components: 1. **Retriever**: A component responsible for fetching potentially useful pieces of text based on input queries. 2. **Generator**: An encoder-decoder architecture like BART or T5 which generates outputs given both the query and retrieved contexts as inputs. This dual-stage process allows systems to leverage vast amounts of unstructured data without needing explicit retraining when new facts become available. #### Practical Steps for Implementing RAG Models To effectively implement such an architecture, one should consider several factors including but not limited to choosing appropriate pre-trained retrievers and generators fine-tuned specifically towards question answering or similar objectives where factual accuracy is paramount. Additionally, integrating these modules into existing pipelines requires careful consideration regarding latency constraints versus quality trade-offs especially under real-time applications scenarios. For instance, here's how you might set up a simple pipeline using Hugging Face Transformers library: ```python from transformers import RagTokenizer, RagTokenForGeneration tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq") model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq") def rag_pipeline(question): inputs = tokenizer([question], return_tensors="pt", truncation=True) generated_ids = model.generate(input_ids=inputs["input_ids"]) output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] return output ``` In practice, tuning hyperparameters associated with each stage separately could lead to better overall results compared to treating them monolithically due to their distinct roles within the system design. #### Best Practices When Working With RAG Systems When deploying RAG-based solutions, adhering to certain guidelines helps maximize effectiveness while minimizing potential pitfalls: - Ensure high-quality indexing over document collections used by the retriever part since poor recall directly impacts downstream generations negatively. - Regularly update underlying corpora so they remain current; stale resources may propagate outdated information through synthetic texts produced thereafter. - Monitor closely any changes made either upstream (e.g., modifications affecting source material accessibility) or inside your own infrastructure because alterations elsewhere often necessitate corresponding adjustments locally too. By following these recommendations alongside leveraging state-of-the-art techniques provided via frameworks like those mentioned earlier, developers stand well positioned to build robust conversational agents capable of delivering accurate answers across diverse domains requiring specialized domain expertise beyond what general-purpose pretrained models alone offer today. --related questions-- 1. How does multi-task learning compare against single-task approaches concerning adaptability? 2. What are some challenges faced when implementing keyword-based point cloud completion algorithms? 3. Can prompt engineering significantly influence outcomes in few-shot learning settings? 4. Are there specific industries benefiting most prominently from advancements in knowledge-intensive NLP technologies?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值