0% found this document useful (0 votes)
50 views

WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag

Uploaded by

uma5b3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag

Uploaded by

uma5b3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Login

Retrieval Augmented Generation


All / Retrieval Augmented Generation

What Is Retrieval Augmented Generation, or RAG?


Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM)
applications by leveraging custom data. This is done by retrieving data/documents relevant to a question or task and providing them
as context for the LLM. RAG has shown success in support chatbots and Q&A systems that need to maintain up-to-date information
or access domain-specific knowledge.

Here’s more to explore

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The Big Book of MLOps Augment your LLM's using RAG
A must-read for ML engineers and data scientists seeking a How to get more from generative AI with RAG.
better way to do MLOps.

Get the eBook  Download now 

Databricks Named a Leader in New Report


Databricks is a Leader in the 2024 Gartner®️Magic
Quadrant™️for Data Science and Machine Learning
Platforms.

Read now 

What challenges does the retrieval augmented generation approach solve?


Problem 1: LLM models do not know your data
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
LLMs use deep learning models and train on massive datasets to understand, summarize and generate novel content. Most LLMs are
trained on a wide range of public data so one model can respond to many types of tasks or questions. Once trained, many LLMs do
not have the ability to access data beyond their training data cutoff point. This makes LLMs static and may cause them to respond
incorrectly, give out-of-date answers or hallucinate when asked questions about data they have not been trained on.

Problem 2: AI applications must leverage custom data to be effective


For LLMs to give relevant and specific responses, organizations need the model to understand their domain and provide answers from
their data vs. giving broad and generalized responses. For example, organizations build customer support bots with LLMs, and those
solutions must give company-specific answers to customer questions. Others are building internal Q&A bots that should answer
employees' questions on internal HR data. How do companies build such solutions without retraining those models?

Solution: Retrieval augmentation is now an industry standard


An easy and popular way to use your own data is to provide it as part of the prompt with which you query the LLM model. This is
called retrieval augmented generation (RAG), as you would retrieve the relevant data and use it as augmented context for the LLM.
Instead of relying solely on knowledge derived from the training data, a RAG workflow pulls relevant information and connects static
LLMs with real-time data retrieval.

With RAG architecture, organizations can deploy any LLM model and augment it to return relevant results for their organization by
giving it a small amount of their data without the costs and time of fine-tuning or pretraining the model.

What are the use cases for RAG?


There are many different use cases for RAG. The most common ones are:

1. Question and answer chatbots: Incorporating LLMs with chatbots allows them to automatically derive more accurate answers
from company documents and knowledge bases. Chatbots are used to automate customer support and website lead follow-up to
answer questions and resolve issues quickly.

2. Search augmentation: Incorporating LLMs with search engines that augment search results with LLM-generated answers can
better answer informational queries and make it easier for users to find the information they need to do their jobs.

3. Knowledge engine — ask questions on your data (e.g., HR, compliance documents): Company data can be used as context for
LLMs and allow employees to get answers to their questions easily, including HR questions related to benefits and policies and
security and compliance questions.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
What are the benefits of RAG?
The RAG approach has a number of key benefits, including:

1. Providing up-to-date and accurate responses: RAG ensures that the response of an LLM is not based solely on static, stale
training data. Rather, the model uses up-to-date external data sources to provide responses.

2. Reducing inaccurate responses, or hallucinations: By grounding the LLM model's output on relevant, external knowledge, RAG
attempts to mitigate the risk of responding with incorrect or fabricated information (also known as hallucinations). Outputs can
include citations of original sources, allowing human verification.

3. Providing domain-specific, relevant responses: Using RAG, the LLM will be able to provide contextually relevant responses
tailored to an organization's proprietary or domain-specific data.

4. Being efficient and cost-effective: Compared to other approaches to customizing LLMs with domain-specific data, RAG is simple
and cost-effective. Organizations can deploy RAG without needing to customize the model. This is especially beneficial when
models need to be updated frequently with new data.

When should I use RAG and when should I fine-tune the model?
RAG is the right place to start, being easy and possibly entirely sufficient for some use cases. Fine-tuning is most appropriate in a
different situation, when one wants the LLM's behavior to change, or to learn a different "language." These are not mutually exclusive. As
a future step, it's possible to consider fine-tuning a model to better understand domain language and the desired output form — and
also use RAG to improve the quality and relevance of the response.

When I want to customize my LLM with data, what are all the options and which method is the best
(prompt engineering vs. RAG vs. fine-tune vs. pretrain)?
There are four architectural patterns to consider when customizing an LLM application with your organization's data. These techniques
are outlined below and are not mutually exclusive. Rather, they can (and should) be combined to take advantage of the strengths of
each.

Method Definition Primary use case Data requirements Advantages Considerations

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Crafting specialized prompts Quick, on-the-fly None Fast, cost-effective, no Less control than fine-tuning
to guide LLM behavior model guidance training required

Prompt engineering

Combining an LLM with Dynamic datasets External knowledge Dynamically updated Increases prompt length and
external knowledge retrieval and external base or database context, enhanced inference computation
knowledge (e.g., vector accuracy
database)

Retrieval augmented
generation (RAG)

Adapting a pretrained LLM to Domain or task Thousands of Granular control, high Requires labeled data,
specific datasets or domains specialization domain-specific or specialization computational cost
instruction
examples

Fine-tuning

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Training an LLM from scratch Unique tasks or Large datasets Maximum control, tailored Extremely resource-intensive
domain-specific (billions to trillions for specific needs
corpora of tokens)

Pretraining

Regardless of the technique selected, building a solution in a well-structured, modularized manner ensures organizations will be
prepared to iterate and adapt. Learn more about this approach and more in The Big Book of MLOps.

What is a reference architecture for RAG applications?


There are many ways to implement a retrieval augmented generation system, depending on specific needs and data nuances. Below is
one commonly adopted workflow to provide a foundational understanding of the process.

1. Prepare data: Document data is gathered alongside metadata and subjected to initial preprocessing — for example, PII handling
(detection, filtering, redaction, substitution). To be used in RAG applications, documents need to be chunked into appropriate
lengths based on the choice of embedding model and the downstream LLM application that uses these documents as context.

2. Index relevant data: Produce document embeddings and hydrate a Vector Search index with this data.

3. Retrieve relevant data: Retrieving parts of your data that are relevant to a user's query. That text data is then provided as part of
the prompt that is used for the LLM.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
4. Build LLM applications: Wrap the components of prompt augmentation and query the LLM into an endpoint. This endpoint can
then be exposed to applications such as Q&A chatbots via a simple REST API.

Databricks also recommends some key architectural elements of a RAG architecture:

Vector database: Some (but not all) LLM applications use vector databases for fast similarity searches, most often to provide
context or domain knowledge in LLM queries. To ensure that the deployed language model has access to up-to-date information,
regular vector database updates can be scheduled as a job. Note that the logic to retrieve from the vector database and inject
information into the LLM context can be packaged in the model artifact logged to MLflow using MLflow LangChain or PyFunc
model flavors.

MLflow LLM Deployments or Model Serving: In LLM-based applications where a third-party LLM API is used, the MLflow LLM
Deployments or Model Serving support for external models can be used as a standardized interface to route request from vendors
such as OpenAI and Anthropic. In addition to providing an enterprise-grade API gateway, the MLflow LLM Deployments or Model
Serving centralizes API key management and provides the ability to enforce cost controls.

Model Serving: In the case of RAG using a third-party API, one key architectural change is that the LLM pipeline will make external
API calls, from the Model Serving endpoint to internal or third-party LLM APIs. It should be noted that this adds complexity,
potential latency and another layer of credential management. By contrast, in the fine-tuned model example, the model and its
model environment will be deployed.

Resources
Databricks blog posts
Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps

Best Practices for LLM Evaluation of RAG Applications

Databricks Demo

Databricks eBook — The Big Book of MLOps

Databricks customers using RAG


JetBlue

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
JetBlue has deployed "BlueBot," a chatbot that uses open source generative AI models complemented by corporate data, powered by
Databricks. This chatbot can be used by all teams at JetBlue to get access to data that is governed by role. For example, the finance
team can see data from SAP and regulatory filings, but the operations team will only see maintenance information.

Also read this article.

Chevron Phillips

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Chevron Phillips Chemical uses Databricks to support their generative AI initiatives, including document process automation.

Thrivent Financial
Thrivent Financial is looking at generative AI to make search better, produce better summarized and more accessible insights, and
improve the productivity of engineering.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Where can I find more information about retrieval augmented generation?
There are many resources available to find more information on RAG, including:

Blogs
Creating High-Quality RAG Applications With Databricks

Databricks Vector Search Public Preview

Improve RAG Application Response Quality With Real-Time Structured Data


PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Build Gen AI Apps Faster With New Foundation Model Capabilities

Best Practices for LLM Evaluation of RAG Applications

Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps (Achieve greater accuracy using retrieval augmented generation
(RAG) with your own data)

E-books
The Big Book of GenAI

The Compact Guide to RAG

The Big Book of MLOps

Demos
Deploy Your LLM Chatbot With Retrieval Augmented Generation (RAG), llama2-70B (MosaicML Inferences) and Vector Search

Contact Databricks to schedule a demo and talk to someone about your LLM and retrieval augmented generation (RAG) projects

Back to Glossary

Why Databricks

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Product

Solutions

Resources

About

Databricks Inc.
160 Spear Street, 15th Floor
San Francisco, CA 94105 See Careers
1-866-330-0121 at Databricks

© Databricks 2024. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.

Privacy Notice | Terms of Use | Modern Slavery Statement | California Privacy | Your Privacy Choices

PDFmyURL converts web pages and even full websites to PDF easily and quickly.

You might also like