WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag
WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The Big Book of MLOps Augment your LLM's using RAG
A must-read for ML engineers and data scientists seeking a How to get more from generative AI with RAG.
better way to do MLOps.
Read now
With RAG architecture, organizations can deploy any LLM model and augment it to return relevant results for their organization by
giving it a small amount of their data without the costs and time of fine-tuning or pretraining the model.
1. Question and answer chatbots: Incorporating LLMs with chatbots allows them to automatically derive more accurate answers
from company documents and knowledge bases. Chatbots are used to automate customer support and website lead follow-up to
answer questions and resolve issues quickly.
2. Search augmentation: Incorporating LLMs with search engines that augment search results with LLM-generated answers can
better answer informational queries and make it easier for users to find the information they need to do their jobs.
3. Knowledge engine — ask questions on your data (e.g., HR, compliance documents): Company data can be used as context for
LLMs and allow employees to get answers to their questions easily, including HR questions related to benefits and policies and
security and compliance questions.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
What are the benefits of RAG?
The RAG approach has a number of key benefits, including:
1. Providing up-to-date and accurate responses: RAG ensures that the response of an LLM is not based solely on static, stale
training data. Rather, the model uses up-to-date external data sources to provide responses.
2. Reducing inaccurate responses, or hallucinations: By grounding the LLM model's output on relevant, external knowledge, RAG
attempts to mitigate the risk of responding with incorrect or fabricated information (also known as hallucinations). Outputs can
include citations of original sources, allowing human verification.
3. Providing domain-specific, relevant responses: Using RAG, the LLM will be able to provide contextually relevant responses
tailored to an organization's proprietary or domain-specific data.
4. Being efficient and cost-effective: Compared to other approaches to customizing LLMs with domain-specific data, RAG is simple
and cost-effective. Organizations can deploy RAG without needing to customize the model. This is especially beneficial when
models need to be updated frequently with new data.
When should I use RAG and when should I fine-tune the model?
RAG is the right place to start, being easy and possibly entirely sufficient for some use cases. Fine-tuning is most appropriate in a
different situation, when one wants the LLM's behavior to change, or to learn a different "language." These are not mutually exclusive. As
a future step, it's possible to consider fine-tuning a model to better understand domain language and the desired output form — and
also use RAG to improve the quality and relevance of the response.
When I want to customize my LLM with data, what are all the options and which method is the best
(prompt engineering vs. RAG vs. fine-tune vs. pretrain)?
There are four architectural patterns to consider when customizing an LLM application with your organization's data. These techniques
are outlined below and are not mutually exclusive. Rather, they can (and should) be combined to take advantage of the strengths of
each.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Crafting specialized prompts Quick, on-the-fly None Fast, cost-effective, no Less control than fine-tuning
to guide LLM behavior model guidance training required
Prompt engineering
Combining an LLM with Dynamic datasets External knowledge Dynamically updated Increases prompt length and
external knowledge retrieval and external base or database context, enhanced inference computation
knowledge (e.g., vector accuracy
database)
Retrieval augmented
generation (RAG)
Adapting a pretrained LLM to Domain or task Thousands of Granular control, high Requires labeled data,
specific datasets or domains specialization domain-specific or specialization computational cost
instruction
examples
Fine-tuning
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Training an LLM from scratch Unique tasks or Large datasets Maximum control, tailored Extremely resource-intensive
domain-specific (billions to trillions for specific needs
corpora of tokens)
Pretraining
Regardless of the technique selected, building a solution in a well-structured, modularized manner ensures organizations will be
prepared to iterate and adapt. Learn more about this approach and more in The Big Book of MLOps.
1. Prepare data: Document data is gathered alongside metadata and subjected to initial preprocessing — for example, PII handling
(detection, filtering, redaction, substitution). To be used in RAG applications, documents need to be chunked into appropriate
lengths based on the choice of embedding model and the downstream LLM application that uses these documents as context.
2. Index relevant data: Produce document embeddings and hydrate a Vector Search index with this data.
3. Retrieve relevant data: Retrieving parts of your data that are relevant to a user's query. That text data is then provided as part of
the prompt that is used for the LLM.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
4. Build LLM applications: Wrap the components of prompt augmentation and query the LLM into an endpoint. This endpoint can
then be exposed to applications such as Q&A chatbots via a simple REST API.
Vector database: Some (but not all) LLM applications use vector databases for fast similarity searches, most often to provide
context or domain knowledge in LLM queries. To ensure that the deployed language model has access to up-to-date information,
regular vector database updates can be scheduled as a job. Note that the logic to retrieve from the vector database and inject
information into the LLM context can be packaged in the model artifact logged to MLflow using MLflow LangChain or PyFunc
model flavors.
MLflow LLM Deployments or Model Serving: In LLM-based applications where a third-party LLM API is used, the MLflow LLM
Deployments or Model Serving support for external models can be used as a standardized interface to route request from vendors
such as OpenAI and Anthropic. In addition to providing an enterprise-grade API gateway, the MLflow LLM Deployments or Model
Serving centralizes API key management and provides the ability to enforce cost controls.
Model Serving: In the case of RAG using a third-party API, one key architectural change is that the LLM pipeline will make external
API calls, from the Model Serving endpoint to internal or third-party LLM APIs. It should be noted that this adds complexity,
potential latency and another layer of credential management. By contrast, in the fine-tuned model example, the model and its
model environment will be deployed.
Resources
Databricks blog posts
Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps
Databricks Demo
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
JetBlue has deployed "BlueBot," a chatbot that uses open source generative AI models complemented by corporate data, powered by
Databricks. This chatbot can be used by all teams at JetBlue to get access to data that is governed by role. For example, the finance
team can see data from SAP and regulatory filings, but the operations team will only see maintenance information.
Chevron Phillips
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Chevron Phillips Chemical uses Databricks to support their generative AI initiatives, including document process automation.
Thrivent Financial
Thrivent Financial is looking at generative AI to make search better, produce better summarized and more accessible insights, and
improve the productivity of engineering.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Where can I find more information about retrieval augmented generation?
There are many resources available to find more information on RAG, including:
Blogs
Creating High-Quality RAG Applications With Databricks
Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps (Achieve greater accuracy using retrieval augmented generation
(RAG) with your own data)
E-books
The Big Book of GenAI
Demos
Deploy Your LLM Chatbot With Retrieval Augmented Generation (RAG), llama2-70B (MosaicML Inferences) and Vector Search
Contact Databricks to schedule a demo and talk to someone about your LLM and retrieval augmented generation (RAG) projects
Back to Glossary
Why Databricks
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Product
Solutions
Resources
About
Databricks Inc.
160 Spear Street, 15th Floor
San Francisco, CA 94105 See Careers
1-866-330-0121 at Databricks
© Databricks 2024. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.
Privacy Notice | Terms of Use | Modern Slavery Statement | California Privacy | Your Privacy Choices
PDFmyURL converts web pages and even full websites to PDF easily and quickly.