LLM Indexing and Chaining
Community Session
Outline
Introductory Talk - Thinking about LLMs as Builders
● Chaining
● Indexing
● LLMs, Vector DBs, and LLM Ops
Community Breakout Discussion Activities
Live Interactive Build Demo!
● Document querying with LangChain
© 2023 FourthBrain
How most people use LLMs
© 2023 FourthBrain
What most people ask about LLMs
© 2023 FourthBrain
Thinking about LLMs as Builders
● Primary Chains
○ Prompt Chain
○ Tools Chain
○ Data Indexing Chain
● LangChain (also LlamaIndex, HayStack,
others) provide a standard interface for
Chains
● Chains = main abstraction innovation
© 2023 FourthBrain
“LangChain provides a generic interface for
interacting with LLMs”
“LLMs in isolation is often insufficient for creating a truly
powerful app - the real power comes when you can combine
them with other sources of computation or knowledge.” ~
Harrison Chase, Creator of LangChain
Creating an Index (with a Data-Indexing Chain)
1. Splitting doc into chunks
2. Creating embeddings for each document
3. Storing documents and embeddings in a vectorstore
© 2023 FourthBrain
A Simplified LangChain Application
Same chains…
● Prompt Chain
● Tools Chain
● Data Indexing Chain
Same primary components…
● LLM
● Vector Database
● Document(s)
© 2023 FourthBrain
Def: Vector Store (a.k.a. Vector Database)
● optimized for storing documents and
their embeddings
● fetching the most relevant documents for
a particular query
○ → those whose embeddings are
most similar to the embedding of the
query
© 2023 FourthBrain
LLMs, Chaining, Data Indexing, Vector DBs, Documents…
How does all of this fit together?
LLM Ops
● Definition
How we store, index,
and retrieve
knowledge that we
need to perform useful
LLM tasks
[Link]
© 2023 FourthBrain
Players to Watch (Chaining)
● LangChain ~ $10M seed funding
● LlamaIndex ~ Open Source
● HayStack ~ $9.2M seed funding/debt financing
○ Extractive QA
● AgentGPT ~ Open Source Project by Level AI ($20M Series B, 2022)
○ Call centers
© 2023 FourthBrain
The more mature infrastructure layer is…
Vector Store DB Companies (LangChain Support)
● Chroma ~ $18M seed round
● FAISS (Facebook)
● Elastic Search (est. company)
● Milvus $60M Series B (ext), $43M in ‘21
● Pinecone ~ $100M Series B
● Qdrant ~ $7.5M seed round
● Weaviate ~$50M Series B
© 2023 FourthBrain
Ex Project Ideas from “Building with LLMs” Course
Simple (1-step)
● Natural Language Website Search: Scrape all text from {hotel}.com webpages
and index it in a vector store so that any information can be searched with an
LLM
Mild (2-step)
● Technical Q&A (“AI Tech Support”): Create a fine-tuned LLM to answer FAQs
about technical documentation, and then if there is no answer use a non-fine-
tuned LLM to search all relevant documentation to find answer.
© 2023 FourthBrain
Ex Project Ideas from “Building with LLMs” Course
Medium (2+step)
● Qualitative + Quantitative Q&A (“The AI VP”): Create a fine-tuned LLM to
generate SQL queries for your database structure using common queries useful to
your product/sales/etc. team, then perform the SQL query and return the
quantitative result. Compare the result against the question asked, and combine
into a holistic response.
© 2023 FourthBrain
What useful LLM tasks (projects)
Breakouts! are you interested in building
solutions for?
(20 minutes)
10 per room Assign ONE person from your
room to take notes and share!
Indexing, Chaining, and LLM Ops
© 2023 FourthBrain
This week’s build - Questioning Your Document
● Model: OpenAI’s gpt-3.5-turbo
● Dataset: Hitchhiker’s Guide to the Galaxy
● Chaining Tool: LangChain
● Vector DB Tool: ChromaDB
© 2023 FourthBrain
Let’s Check it Out!