0% found this document useful (0 votes)
15 views38 pages

Embeddings, Vector Databases, and Search in LLM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views38 pages

Embeddings, Vector Databases, and Search in LLM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Module 2

Embeddings, Vector Databases,


and Search

© Databricks Inc. — All rights reserved


Learning Objectives

By the end of this module you will:


• Understand vector search strategies and how to evaluate search results

• Understand the utility of vector databases

• Differentiate between vector databases, vector libraries, and vector plugins

• Learn best practices for when to use vector stores and how to improve
search-retrieval performance

© Databricks Inc. — All rights reserved


How do language models learn knowledge?

Through model training or fine-tuning


• Via model weights
• More on fine-tuning in Module

Through model inputs


• Insert knowledge or context into the input
• Ask the LM to incorporate the context in its output

This is what we will cover:


• How do we use vectors to search and provide relevant context to LMs?

© Databricks Inc. — All rights reserved


Passing context to LMs helps factual recall

• Fine-tuning is usually better-suited to teach a model specialized tasks


• Analogy: Studying for an exam weeks away

• Passing context as model inputs improves factual recall


• Analogy: Take an exam with open notes
• Downsides:
• Context length limitation
• E.g., OpenAI’s gpt-3.5-turbo accepts a maximum of ~ tokens (~ pages) as context
• Common mitigation method: pass document summaries instead
• Anthropic’s Claude: k token limit
• An ongoing research area (Pope et al , Fu et al )
• Longer context = higher API costs = longer processing times

© Databricks Inc. — All rights reserved Source: OpenAI


Refresher: We represent words with vectors

We can project these vectors onto D


to see how they relate graphically

Word Embedding: Basics. Create a vector from a word | by Hariom Gautam | Medium
© Databricks Inc. — All rights reserved
Turn images and audio into vectors too
Data objects Vectors Tasks
• Object recognition
[ . , . , - . , ….] • Scene detection
• Product search

• Translation
[ . , . , - . , ….] • Question Answering
• Semantic search

• Speech to text
[ . , . , - . , ….] • Music transcription
• Machinery malfunction
© Databricks Inc. — All rights reserved
Use cases of vector databases
• Similarity search: text, images, audio
Are electric cars better for the environment?
• De-duplication
• Semantic match, rather than keyword match! electric cars climate impact
• Example on enhancing product search
• Very useful for knowledge-based Q/A Environmental impact of electric vehicles

• Recommendation engines
How to cope with the pandemic
• Example blog post: Spotify uses vector
search to recommend podcast episodes dealing with covid ptsd

• Finding security threats Dealing with covid anxiety

• Vectorizing virus binaries


and finding anomalies Shared embedding space for queries and podcast episodes

Source: Spotify

© Databricks Inc. — All rights reserved


Search and Retrieval-Augmented Generation
The RAG workflow

© Databricks Inc. — All rights reserved


Search and Retrieval-Augmented Generation
The RAG workflow

© Databricks Inc. — All rights reserved


Search and Retrieval-Augmented Generation
The RAG workflow

© Databricks Inc. — All rights reserved


How does
vector search work?

© Databricks Inc. — All rights reserved


9
2
Vector search strategies
• K-nearest neighbors (KNN)

• Approximate nearest neighbors (ANN)


• Trade accuracy for speed gains
• Examples of indexing algorithms:
• Tree-based: ANNOY by Spotify
• Proximity graphs: HNSW
• Clustering: FAISS by Facebook
• Hashing: LSH
• Vector compression:
Source: Weaviate
SCaNN by Google

© Databricks Inc. — All rights reserved


How to measure if 2 vectors are similar?
L2 (Euclidean) and cosine are most popular

Distance metrics Similarity metrics

The higher the metric, the less similar The higher the metric, the more similar

Source: buildin.com

© Databricks Inc. — All rights reserved


Compressing vectors with Product Quantization
PQ stores vectors with fewer bytes

Quantization = representing vectors to a smaller set of vectors


• Naive example: round(8.954521346) = 9

Trade off between recall and memory saving

© Databricks Inc. — All rights reserved


FAISS: Facebook AI Similarity Search
Forms clusters of dense vectors and conducts Product Quantization

• Compute Euclidean distance between all points and query vector


• Given a query vector, identify which cell it belongs to
• Find all other vectors belonging to that cell
• Limitation: Not good with sparse vectors (refer to GitHub issue)

© Databricks Inc. — All rights reserved Source: Pinecone


HNSW: Hierarchical Navigable Small Worlds
Builds proximity graphs based on Euclidean (L2) distance

Uses linked list to find the element x: “11”

Traverses from query vector node to find the


nearest neighbor
• What happens if too many nodes?
Use hierarchy!

Source: Pinecone
© Databricks Inc. — All rights reserved
Ability to search for similar
objects is

Not limited to fuzzy text or


exact matching rules

© Databricks Inc. — All rights reserved


Filtering

© Databricks Inc. — All rights reserved


Adding filtering function is hard
I want Nike-only: need an additional metadata index for “Nike”

Types Source: Pinecone

• Post-query
• In-query
• Pre-query

No one-sized shoe fits all


Different vector databases implement this differently
© Databricks Inc. — All rights reserved
Post-query filtering
Applies filters to top-k results after user queries

• Leverages ANN speed

• # of results is highly
unpredictable

• Maybe no products meet


the requirements

© Databricks Inc. — All rights reserved


In-query filtering
Compute both product similarity and filters simultaneously

• Product similarity as vectors

• Branding as a scalar

• Leverages ANN speed

• May hit system OOM!


• Especially when many filters
are applied

• Suitable for row-based data

© Databricks Inc. — All rights reserved


Pre-query filtering
Search for products within a limited scope

• All data needs to be


filtered == brute force
search!
• Slows down search

• Not as performant as
post- or in-query filtering

© Databricks Inc. — All rights reserved


Vector stores
Databases, libraries, plugins

© Databricks Inc. — All rights reserved


Why are vector database (VDBs) so hot?
Query time and scalability

• Specialized, full-fledged databases


for unstructured data
• Inherit database properties, i.e.
Create-Read-Update-Delete (CRUD)

• Speed up query search for the


closest vectors
• Rely on ANN algorithms
• Organize embeddings into indices

© Databricks Inc. — All rights reserved Image Source: Weaviate


What about vector libraries or plugins?
Many don’t support filter queries, i.e. “WHERE”

Libraries create vector indices Plugins provide architectural


enhancements
• Approximate Nearest Neighbor • Relational databases or search
(ANN) search algorithm systems may offer vector search
• Sufficient for small, static data plugins, e.g.,
• Do not have CRUD support • Elasticsearch
• Need to rebuild • pgvector
• Need to wait for full import to • Less rich features (generally)
• Fewer metric choices
finish before querying
• Fewer ANN choices
• Stored in-memory (RAM)
• Less user-friendly APIs
• No data replication

Caveat: things are moving fast! These weaknesses

© Databricks Inc. — All rights reserved could improve soon!


Do I need a vector database?
Best practice: Start without. Scale out as necessary.

Pros Cons

• Scalability • One more system to learn


• Mil/billions of records and integrate
• Speed • Added cost
• Fast query time (low latency)
• Full-fledged database properties
• If use vector libraries, need to come up with a
way to store the objects and do filtering
• If data changes frequently, it’s cheaper than
using an online model to compute
embeddings dynamically!

© Databricks Inc. — All rights reserved


Popular vector database comparisons
Released Billion-scale vector Approximate Nearest LangChain
support Neighbor Algorithm Integration

Open-Sourced

Chroma No HNSW Yes

Milvus Yes FAISS, ANNOY, HNSW

Qdrant No HNSW

Redis No HNSW

Weaviate No HNSW

Vespa Yes Modified HNSW

Not Open-Sourced

Pinecone Yes Proprietary Yes

*Note: the information is collected from public documentation. It is accurate


as of May , .

© Databricks Inc. — All rights reserved


Best practices

© Databricks Inc. — All rights reserved


Do I always need a vector store?
Vector store includes vector databases, libraries or plugins

• Vector stores extend LLMs with knowledge


• The returned relevant documents become the LLM context
• Context can reduce hallucination (Module !)

• Which use cases do not need context augmentation?


• Summarization
• Text classification
• Translation

© Databricks Inc. — All rights reserved


How to improve retrieval performance?
This means users get better responses

• Embedding model selection


• Do I have the right embedding model for my data?
• Do my embeddings capture BOTH my documents and queries?

• Document storage strategy


• Should I store the whole document as one? Or split it up into chunks?

© Databricks Inc. — All rights reserved


Tip 1: Choose your embedding model wisely
The embedding model should represent BOTH your queries and documents

© Databricks Inc. — All rights reserved


Tip 2: Ensure embedding space is the same
for both queries and documents

• Use the same embedding model for indexing and querying


• OR if you use different embedding models, make sure they are trained on similar
data (therefore produce the same embedding space!)

© Databricks Inc. — All rights reserved


Chunking strategy: Should I split my docs?
Split into paragraphs? Sections?

• Chunking strategy determines


• How relevant is the context to the prompt?
• How much context/chunks can I fit within the model’s token limit?
• Do I need to pass this output to the next LLM? (Module : Chaining LLMs into a workflow)

• Splitting doc into smaller docs = doc can produce N vectors of M tokens

© Databricks Inc. — All rights reserved


Chunking strategy is use-case specific
Another iterative step! Experiment with different chunk sizes and approaches

• How long are our documents?


• sentence?
• N sentences?

• If chunk = sentence, embeddings focus on specific meaning

• If chunk = multiple paragraphs, embeddings capture broader theme


• How about splitting by headers?

• Do we know user behavior? How long are the queries?


• Long queries may have embeddings more aligned with the chunks returned
• Short queries can be more precise

© Databricks Inc. — All rights reserved


Chunking best practices are not yet well-defined
It’s still a very new field!

Existing resources:
• Text Splitters by LangChain
• Blog post on semantic search by Vespa - light mention of chunking
• Chunking Strategies by Pinecone

© Databricks Inc. — All rights reserved


Preventing silent failures and undesired
performance
• For users: include explicit instructions in prompts
• "Tell me the top 3 hikes in California. If you do not know the answer, do not
make it up. Say 'I don’t have information for that.'"
• Helpful when upstream embedding model selection is incorrect

• For software engineers


• Add failover logic
• If distance-x exceeds threshold y, show canned response, rather than showing nothing
• Add basic toxicity classification model on top
• Prevent users from submitting offensive inputs
Source: BBC
• Discard offensive content to avoid training or saving to VDB
• Configure VDB to time out if a query takes too long to return a response

© Databricks Inc. — All rights reserved


Module Summary
Embeddings, Vector Databases and Search - What have we learned?

• Vector stores are useful when you need context augmentation.


• Vector search is all about calculating vector similarities or distances.
• A vector database is a regular database with out-of-the-box search
capabilities.
• Vector databases are useful if you need database properties, have big
data, and need low latency.
• Select the right embedding model for your data.
• Iterate upon document splitting/chunking strategy

© Databricks Inc. — All rights reserved


Time for some code!

© Databricks Inc. — All rights reserved

You might also like