Beyond RAG: Building Advanced Context-
Augmented LLM
Applications
Jerry Liu, LlamaIndex co-founder/CEO
LlamaIndex: Context Augmentation for your LLM app
RAG
RAG
Data Parsing & Ingestion Data Querying
Data Parsing + LLM +
Data Index Retrieval Response
Ingestion Prompts
Naive RAG
Sentence
Dense Retrieval Simple QA
PyPDF Splitting
Top-k = 5
Prompt
Chunk Size 256
Data Parsing + LLM +
Data Index Retrieval Response
Ingestion Prompts
Naive RAG is Limited
RAG Prototypes are Limited
Naive RAG approaches tend to work well for simple questions over a simple,
small set of documents.
● “What are the main risk factors for Tesla?” (over Tesla 2021 10K)
● “What did the author do during his time at YC?” (Paul Graham essay)
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
● Summarization Questions: “Give me a summary of the entire <company>
10K annual report”
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
● Summarization Questions: “Give me a summary of the entire <company>
10K annual report”
● Comparison Questions: “Compare the open-source contributions of
candidate A and candidate B”
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
● Summarization Questions: “Give me a summary of the entire <company>
10K annual report”
● Comparison Questions: “Compare the open-source contributions of
candidate A and candidate B”
● Structured Analytics + Semantic Search: “Tell me about the risk factors of
the highest-performing rideshare company in the US”
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
● Summarization Questions: “Give me a summary of the entire <company> 10K
annual report”
● Comparison Questions: “Compare the open-source contributions of candidate
A and candidate B”
● Structured Analytics + Semantic Search: “Tell me about the risk factors of the
highest-performing rideshare company in the US”
● General Multi-part Questions: “Tell me about the pro-X arguments in article A,
and tell me about the pro-Y arguments in article B, make a table based on our
internal style guide, then generate your own conclusion based on these facts.”
Can we do more?
In the naive setting, RAG is boring.
🚫 It’s just a glorified search system
🚫 There’s many questions/tasks that naive RAG can’t give an answer to.
💡 Can we go beyond simple search/QA to building a general
context-augmented research assistant?
Beyond RAG: Adding Layers of Agentic Reasoning
From RAG to Agents
Query RAG Response
From RAG to Agents
Query RAG Response
Single-shot
No query understanding/planning
No tool use
No reflection, error correction
No memory (stateless)
From RAG to Agents
✅ Multi-turn Tool
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection Tool
✅ Memory for personalization
Tool
Query Agent RAG Response
From Simple to Advanced Agents
Agent Ingredients Full Agents
Routing Tool Use
Dynamic
Planning +
One-Shot Query ReAct Execution
Planning
Conversation
Memory
Simple Advanced
Lower Cost Higher Cost
Lower Latency Higher Latency
Routing
Simplest form of agentic
reasoning.
Given user query and set of
choices, output subset of
choices to route query to.
Routing
Use Case: Joint QA and
Summarization
Guide
Conversation Memory
In addition to current query,
take into account
conversation history as
input to your RAG
pipeline.
Conversation Memory
How to account for
conversation history in a
RAG pipeline?
● Condense question
● Condense question +
context
Compare revenue growth of
Query Planning Uber and Lyft in 2021
Break down query into
Describe revenue Describe revenue growth
parallelizable sub-queries. growth of Lyft in 2021 of Uber in 2021
Each sub-query can be
executed against any set of top-2
RAG pipelines
Uber 10-K chunk 4
Uber 10-K
Uber 10-K chunk 8
top-2
Lyft 10-K
Lyft 10-K chunk 4
Lyft 10-K chunk 8
Compare revenue growth of
Query Planning Uber and Lyft in 2021
Example: Compare
Describe revenue Describe revenue growth
revenue of Uber and Lyft in growth of Lyft in 2021 of Uber in 2021
2021
Query Planning Guide top-2
Uber 10-K chunk 4
Uber 10-K
Uber 10-K chunk 8
top-2
Lyft 10-K
Lyft 10-K chunk 4
Lyft 10-K chunk 8
Tool Use
Use an LLM to call an API
Infer the parameters of that
API
Tool Use
In normal RAG you just
pass through the query.
But what if you used the
LLM to infer all the
parameters for the API
interface?
A key capability in many QA
use cases (auto-retrieval,
text-to-SQL, and more)
Let’s put them together
● All of these are agent ingredients
● Let’s put them together for a full agent system
○ Query planning
○ Memory
○ Tool Use
● Let’s add additional components
○ Reflection
○ Controllability
○ Observability
Core Components of a Full Agent
Minimum necessary
ingredients:
● Query planning
● Memory
● Tool Use
ReAct: Reasoning + Acting with LLMs
Source: https://react-lm.github.io/
ReAct: Reasoning + Acting with LLMs
Query Planning:
Generate next step
given previous steps
(chain-of-thought
prompt)
Tool Use:
Sequential tool
calling.
Memory: Maintain
simple buffer.
ReAct: Reasoning + Acting with LLMs
ReAct + RAG Guide
Can we make this even better?
● Stop being so short-sighted - plan ahead at each step
● Parallelize execution where we can
LLMCompiler (Kim et al. 2023)
Kim et al. 2023
An agent compiler
for parallel multi-
function planning +
execution.
LLMCompiler
Query Planning:
Generate a DAG of
steps. Replan if steps
don’t reach desired
state
Tool Use: Parallel
function calling.
Memory: Maintain
simple buffer.
LLMCompiler Agent
Tree-based Planning
Tree of Thoughts
(Yao et al. 2023)
Reasoning via
Planning (Hao et al.
2023)
Language Agent
Tree Search (Zhou
et al. 2023)
Tree-based Planning
Query Planning in the
face of uncertainty:
Instead of planning out
a fixed sequence of
steps, sample a few
different states.
Run Monte-Carlo Tree
Search (MCTS) to
balance exploration vs.
exploitation.
Self-Reflection
Use feedback to improve
agent execution and
reduce errors
Human feedback
🤖 LLM feedback
Use few-shot examples
instead of retraining the
model!
Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023)
Additional Requirements
● Observability: see the full trace of the agent
○ Observability Guide
● Control: Be able to guide the intermediate steps of an agent step-by-step
○ Lower-Level Agent API
● Customizability: Define your own agentic logic around any set of tools.
○ Custom Agent Guide
○ Custom Agent with Query Pipeline Guide
● Multi-agents: Define multi-agent interactions!
○ Synchronously: Define an explicit flow between agents
○ Asynchronously: Treat each agent as a microservice that can communicate with each other.
■ Upcoming in LlamaIndex!
○ Current Frameworks: Autogen, CrewAI
LlamaIndex + W&B
Tracing and
Observability are
essential developer
tools for RAG/agent
development.
We have first-class
integrations with
Weights and Biases.
Guide