0% found this document useful (0 votes)
36 views

What Is Retrieval Augmented Generation Rag Final v2 Cs

Uploaded by

Nikhil Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

What Is Retrieval Augmented Generation Rag Final v2 Cs

Uploaded by

Nikhil Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

McKinsey Explainers

What is retrieval-
augmented generation
(RAG)?
Retrieval-augmented generation, or RAG, is a process applied to
large language models to make their outputs more relevant for the
end user.

October 2024
In recent years, large language models (LLMs) Because RAG deployments have access to vast
have made tremendous progress in their ability to amounts of information that is more up to date and
generate content. But some leaders who hoped enterprise-specific, they can provide much more
these models would increase business efficiency accurate, relevant, and coherent outputs. This is
and productivity have been disappointed. Off-the- particularly helpful in applications and use cases
shelf generative AI (gen AI) tools have yet to live up that require highly accurate outputs, such as
to the considerable hype surrounding them. Why is enterprise knowledge management and copilots
that? For one thing, LLMs are trained on only the that are specific to a given domain (for example, a
information that’s available to the providers that workflow or process, journey, or function within the
build them. This can limit their utility in company).
environments where a wider range of more
nuanced, enterprise-specific knowledge is needed. Learn more about QuantumBlack, AI by McKinsey.

Retrieval-augmented generation, or RAG, is a


process applied to LLMs to make their outputs How does RAG work?
more relevant in specific contexts. RAG allows RAG involves two phases: ingestion and retrieval.
LLMs to access and reference information outside To understand these concepts, it helps to imagine a
the LLMs own training data, such as an large library with millions of books.
organization’s specific knowledge base, before
generating a response—and, crucially, with citations The initial “ingestion” phase is akin to stocking the
included. This capability enables LLMs to produce shelves and creating an index of their contents,
highly specific outputs without extensive fine- which allows a librarian to quickly locate any book in
tuning or training, delivering some of the benefits of the library’s collection. As part of this process, a set
a custom LLM at considerably less expense. of dense vector representations—numerical
representations of data, also known as
Consider a typical gen AI chatbot that’s deployed in “embeddings” (for more, see sidebar, “What are
a customer service context. While it may offer some embeddings?”)—is generated for each book,
general guidance, because the chatbot is working chapter, or even selected paragraphs.
from an LLM that was trained on only a specific
amount of information, it’s therefore not accessing Once the library is stocked and indexed, the
the enterprise’s unique policies, procedures, data, “retrieval” phase begins. Whenever a user asks a
or knowledge base. As a result, its answers will lack question on a specific topic, the librarian uses the
specificity and relevance to a user’s inquiry. For index to locate the most relevant books. The
example, when a customer asks about the status of selected books are then scanned for relevant
their account or payment options, the chatbot might content, which is carefully extracted and
respond with only generic information; because the synthesized into a concise output. The original
chatbot isn’t accessing the company’s specific data, question informs the initial research and selection
the response it gives doesn’t consider that process, guiding the librarian to present only the
customer’s specific situation. most pertinent and accurate information in
response. This process might involve summarizing
key points from multiple sources, quoting

What is retrieval-augmented generation (RAG)? 2


What are embeddings?

Embeddings are numerical representations system in a library allows a librarian to — The word “woman,” on the other hand,
of words or phrases that are unique points quickly locate related text, embeddings is represented as a vector that is
in a multidimensional digital space, where help users organize and retrieve relevant different from both “king” and “man.”
similar ideas and concepts are clustered information. Here’s an example of how
— When we subtract “man” from “king”
together. Each embedding is defined by a they work:
and add “woman” to the space, the
vector—that is, a set of numbers that
— The word “king” is represented as a vectors are manipulated accordingly.
describes a particular characteristic or trait
vector in the multidimensional space. This results in a new vector that
of the word or phrase, such as color,
represents the concept of “queen.”
shape, or meaning. A vector is a coordinate — The word “man” is also represented as
on a map: it pinpoints the exact location of a vector in that space. Because “king”
something in relation to its other features. and “man” share a semantic meaning,
Embeddings allow LLMs to retrieve only their vectors are similar as well.
the most relevant data. Just as a catalog

authoritative texts, or even generating new content — Database queries. RAG can retrieve relevant
based on the insights that can be gleaned from the data that are stored in structured formats, such
library’s resources. as databases or tables, making it easy to search
and analyze this information.
Through these ingestion and retrieval phases, RAG
can generate highly specific outputs that would be — Application programming interface (API) calls.
impossible for traditional LLMs to produce on their RAG can use APIs to access specific
own. The stocked library and index provide a information from other services or platforms.
foundation for the librarian to select and synthesize
information in response to a query, leading to a — Web search/scraping. In some cases, RAG
more relevant and thus more helpful answer. implementations can scrape web pages for
relevant information, although this method is
In addition to accessing a company’s internal more prone to errors than others, due to the
“library,” many RAG implementations can query underlying data quality.
external systems and sources in real time. Examples
of such searches include the following:

What is retrieval-augmented generation (RAG)? 3


Which areas of the business stand to Learn more about QuantumBlack, AI by McKinsey.
benefit from RAG systems?
RAG has far-reaching applications in various
What are some challenges associated
domains, including customer service, marketing,
with RAG?
finance, and knowledge management. By
integrating RAG into existing systems, businesses While RAG is a powerful tool for enhancing an
can generate outputs that are more accurate than LLM’s capabilities, it is not without its limitations.
they would be using an off-the-shelf LLM, which Like LLMs, RAG is only as good as the data it can
can improve customer satisfaction, reduce costs, access. Here are some of its specific challenges:
and enhance overall performance. Here are some
specific examples of where and how RAG can be — Data quality issues. If the knowledge that RAG
applied: is sourcing is not accurate or up to date, the
resulting output may be incorrect.
— Enterprise-knowledge-management chatbot.
When an employee searches for information — Multimodal data. RAG may not be able to read
within their organization’s intranet or other certain graphs, images, or complex slides, which
internal knowledge sources, the RAG system can lead to issues in the generated output. New
can retrieve relevant information from across multimodal LLMs, which can parse complex
the organization, synthesize it, and provide the data formats, can help mitigate this.
employee with actionable insights.
— Bias. If the underlying data contains biases, the
— Customer service chatbots. When a customer generated output is likely to be biased as well.
interacts with a company’s website or mobile
app to inquire about a product or service, the — Data access and licensing concerns. Intellectual
RAG system can retrieve relevant information property, licensing, and privacy and security
based on corporate policies, customer account issues related to data access need to be
data, and other sources, then provide the considered throughout the design of a RAG
customer with more accurate and helpful system.
responses.
To help address these challenges, enterprises can
— Drafting assistants. When an employee starts establish data governance frameworks—or, if they
drafting a report or document that requires already have them, ramp up those frameworks to
company-specific data or information, the RAG help ensure the quality, accessibility, and timeliness
system retrieves the relevant information from of the underlying data used in RAG. Organizations
enterprise data sources, such as databases, that are implementing RAG systems should also
spreadsheets, and other systems, then provides carefully consider any copyright issues with respect
the employee with prepopulated sections of the to RAG-derived content, biases in the overall data
document. This output can help the employee set, and the level of interoperability between data
develop the document more efficiently and sets that were not previously centrally accessible.
more accurately.

What is retrieval-augmented generation (RAG)? 4


How is RAG evolving? LLMs enhanced with retrieval-augmented
generation can harness the strengths of both
As RAG’s capabilities and potential applications
humans and machines, enabling users to tap into
continue to evolve, we expect several emerging
vast knowledge sources and generate more
trends to shape its future:
accurate and relevant responses. As this
technology continues to evolve, we expect
— Standardization. The increasing standardization
significant improvements in its scalability,
of underlying software patterns means that
adaptability, and impact on enterprise applications,
there will be more off-the-shelf solutions and
with the potential to spur innovation and create
libraries available for RAG implementations,
value.
making them progressively easier to build and
deploy.
Learn more about QuantumBlack, AI by McKinsey.
And check out AI-related job opportunities if you’re
— Agent-based RAG. Agents are systems that can
interested in working with McKinsey.
reason and interact with each other and require
less human intervention than earlier AI systems.
Articles referenced:
These tools can enable RAG systems to flexibly
and efficiently adapt to changing contexts and
— “Why agents are the next frontier of generative
user needs so they can better respond to more
AI,” McKinsey Quarterly, July 24, 2024, Lareina
complex and more nuanced prompts.
Yee, Michael Chui, and Roger Roberts, with
Stephen Xu
— LLMs that are optimized for RAG. Some LLMs
are now being trained specifically for use with
— “A data leader’s technical guide to scaling gen
RAG. These models are tailored to meet the
AI,” July 8, 2024, Asin Tavakoli, Carlo Giovine,
unique needs of RAG tasks, such as quickly
Joe Caserta, Jorge Machado, and Kayvaun
retrieving data from a vast corpus of
Rowshankish, with Jon Boorstein and Nathan
information, rather than relying solely on the
Westby
LLM’s own parametric knowledge. One example
of these optimized LLMs is the AI-powered
— “Choose the right transformation ‘bite size’,”
answer engine Perplexity AI, which has been
March 27, 2024, Eric Lamarre, Kate Smaje, and
fine-tuned to perform in various RAG
Rodney Zemmel
applications (for example, answering complex
questions and summarizing text).

Get to know and directly engage with senior McKinsey experts on RAG.
Lareina Yee is a senior partner in McKinsey’s Bay Area office, where Michael Chui is a senior fellow
and Roger Roberts is a partner; Mara Pometti is a consultant in the London office; Patrick Wollner is
a consultant in the Vienna office; and Stephen Xu is a senior director of product management in
the Toronto office.

Copyright © 2024 McKinsey & Company. All rights reserved.

What is retrieval-augmented generation (RAG)? 5

You might also like