0% found this document useful (0 votes)

0 views9 pages

2(d) Vector Space Model

The Vector Space Model (VSM) represents text as mathematical vectors to compare documents and find relevant ones, commonly used in search engines and information retrieval. It treats words as points and documents as vectors, scoring their relevance based on the direction of their vectors relative to a search query. Similarity between documents is measured using cosine similarity, allowing for efficient ranking of documents based on their content.

Uploaded by

sushilkpal9457

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views9 pages

2(d) Vector Space Model

Uploaded by

sushilkpal9457

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

What’s the Vector Space Model?

The Vector Space Model is like a magic sorting robot in the library. It looks at all the books (or
documents) and tries to figure out which ones are most related to the words you’re searching for —
like "space" and "adventures." Here's how it works:

How it Works:

1. Words as Points:
Every word in the books is like a point in space. For example, the word "space" is one point,
and the word "adventure" is another.

2. Books as Arrows (Vectors):

Imagine every book becomes an arrow that points toward the words it talks about. If a book
talks a lot about "space" and "adventure," its arrow will point close to those two words.

3. Comparing Directions:
When you ask, "Show me books about space adventures," the robot compares the arrows of
all the books to the direction of your search (another arrow). The closer a book’s arrow is to
your search arrow, the better match it is!

4. Scores and Sorting:

The robot gives each book a score (higher means closer to your search) and shows you the
ones with the best scores first.

Example:

 Books in the library:

o Book 1: "Space Travel and Rockets"

o Book 2: "Fairy Tales and Magic"

o Book 3: "Adventure Stories on Mars"

 Your search: "Space Adventures"

 What happens:

o The robot sees that Book 1 talks a lot about "space" and Book 3 talks about both
"space" and "adventure." Book 2 doesn’t match much.

o It sorts the books like this:

1. Book 3 (most relevant)

2. Book 1

3. Book 2 (not relevant)

Example Question:

You have three documents in your library:

1. Document 1: "The cat is on the mat."

2. Document 2: "Dogs are in the house."

3. Document 3: "The cat and the dog are friends."

You search for: "cat dog"

Let’s use the Vector Space Model to find which document is most relevant to your search.

Steps to Solve:

1. Create a Vocabulary (Unique Words):

From all the documents, list all unique words:

["cat", "dog", "is", "on", "the", "mat", "dogs", "are", "in", "house", "and", "friends"]

So, the vectors for the documents look like this:

 Doc 1: [1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]

 Doc 2: [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0]
 Doc 3: [1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1]

3. Represent the Query as a Vector:

For the query "cat dog", create a vector:

["cat", "dog", "is", "on", "the", "mat", "dogs", "are", "in", "house", "and", "friends"]

Query vector = [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

(1 for "cat", 1 for "dog", 0 for all other words.)
5. Sort the Documents by Similarity:

 Doc 3: 0.632 (most relevant)

 Doc 1: 0.316

 Doc 2: 0

Final Answer:

The query "cat dog" matches best with Document 3, followed by Document 1. Document 2 is
irrelevant.

Example Question:

You have three documents:

1. Document 1: "I love to eat pizza and pasta."

2. Document 2: "Pasta is my favorite food."

3. Document 3: "I enjoy pizza with extra cheese."

You search for: "pizza pasta"

Let’s use the Vector Space Model step by step to find which document is most relevant to your
search.

Steps to Solve:

1. Create a Vocabulary (Unique Words):

List all unique words from the documents:

["I", "love", "to", "eat", "pizza", "and", "pasta", "is", "my", "favorite", "food", "enjoy", "with", "extra",
"cheese"]

2. Represent Each Document as a Vector:

Count the frequency of each word in each document:

Vectors for the documents:

 Doc 1: [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]

 Doc 2: [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]

 Doc 3: [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]

3. Represent the Query as a Vector:

For the query "pizza pasta", create a vector:

["I", "love", "to", "eat", "pizza", "and", "pasta", "is", "my", "favorite", "food", "enjoy", "with", "extra",
"cheese"]

Query vector = [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

(1 for "pizza" and 1 for "pasta"; 0 for all other words.)
5. Sort the Documents by Similarity:

 Doc 1: 0.535 (most relevant)

 Doc 3: 0.354

 Doc 2: 0.316

Final Answer:

The query "pizza pasta" matches best with Document 1, followed by Document 3, and finally
Document 2

What is a Vector Space Model (VSM)?

A Vector Space Model (VSM) is a way to represent text as mathematical vectors so that we can
compare documents and find the most relevant ones. It is used in search engines, document
ranking, and information retrieval.

Instead of treating words as just text, VSM converts them into numbers (vectors) to compare their
meanings efficiently.
How Do We Compare Documents?

We can measure similarity between documents using the cosine similarity formula (like checking
the angle between two vectors).

 Doc1 vs. Doc2 (Do they have similar words? Yes! "bananas" is common)

 Doc1 vs. Doc3 (Do they have similar words? Yes! "love" is common)

 Doc2 vs. Doc3 (Do they have similar words? Yes! "mangoes" is common)

The closer the vectors, the more similar the documents are.

Easy Example: Vector Space Model in Real Life

Imagine you are searching for "delicious mangoes" on Google.

1️⃣ The search engine converts your query ("delicious mangoes") into a vector:
Query Vector: [0, 0, 1, 0, 1, 0]

2️⃣ It compares this vector with all document vectors.

3️⃣ The most similar document (Doc2: "Bananas and mangoes are delicious") is ranked highest
because it has both "mangoes" and "delicious"!

This is how search engines like Google find the best matching documents for your query using
Vector Space Model.

Thesis On Impact of Service Quality On Customer Satisfaction
100% (2)
Thesis On Impact of Service Quality On Customer Satisfaction
4 pages
5990
No ratings yet
5990
72 pages
IR Models: Chapter Five
100% (1)
IR Models: Chapter Five
26 pages
Lenovo ThinkPad T410I 09A21-3 Schematic
No ratings yet
Lenovo ThinkPad T410I 09A21-3 Schematic
108 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
Vector Search
No ratings yet
Vector Search
10 pages
RAGHack-AzureAISearch-Spanish
No ratings yet
RAGHack-AzureAISearch-Spanish
85 pages
5bdb704a-2eaa-40ff-a177-1c16b064da57 -2
No ratings yet
5bdb704a-2eaa-40ff-a177-1c16b064da57 -2
54 pages
ShortCourse-QTT-Lecture1
No ratings yet
ShortCourse-QTT-Lecture1
40 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
06 VectorSpaceModel PDF
No ratings yet
06 VectorSpaceModel PDF
75 pages
ISR chap...5
No ratings yet
ISR chap...5
34 pages
Week 5 - Latent Semantic Indexing
No ratings yet
Week 5 - Latent Semantic Indexing
38 pages
Chapter 5 IR
No ratings yet
Chapter 5 IR
46 pages
vectorsearch
No ratings yet
vectorsearch
37 pages
Respondant 1982080
No ratings yet
Respondant 1982080
34 pages
Information Retrieval Notes
No ratings yet
Information Retrieval Notes
42 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
Embeddings, Vector Databases, and Search in LLM
No ratings yet
Embeddings, Vector Databases, and Search in LLM
38 pages
Chapter 4- Part II
No ratings yet
Chapter 4- Part II
44 pages
You Ll Learn Why They Matter What Makes Them Different How They Work the New Use Cases They Re Designed for and How to Get Started 1688203106
No ratings yet
You Ll Learn Why They Matter What Makes Them Different How They Work the New Use Cases They Re Designed for and How to Get Started 1688203106
25 pages
webir06
No ratings yet
webir06
32 pages
11 Text Categorization
No ratings yet
11 Text Categorization
25 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
The Rise of Vector Databases in the Age of LLMs
No ratings yet
The Rise of Vector Databases in the Age of LLMs
26 pages
06 VectorSpaceModel
No ratings yet
06 VectorSpaceModel
65 pages
Harmony Amidst Chaos - Exploring The Role of Art and Culture in Conflict Healing and Reconciliation
No ratings yet
Harmony Amidst Chaos - Exploring The Role of Art and Culture in Conflict Healing and Reconciliation
19 pages
NLP Ir
No ratings yet
NLP Ir
24 pages
Problems On Synchronous Machine
No ratings yet
Problems On Synchronous Machine
9 pages
Cosine Similarity in Machine Learning
No ratings yet
Cosine Similarity in Machine Learning
14 pages
Language Independent Document
No ratings yet
Language Independent Document
10 pages
UNIT 2
No ratings yet
UNIT 2
13 pages
En Brochure MMAWelding
No ratings yet
En Brochure MMAWelding
12 pages
UNIT-4 Information Retrieval Notes
No ratings yet
UNIT-4 Information Retrieval Notes
16 pages
Vector-DataBase in AI
No ratings yet
Vector-DataBase in AI
14 pages
English B Paper 1 TZ1 SL Markscheme
No ratings yet
English B Paper 1 TZ1 SL Markscheme
9 pages
Text
No ratings yet
Text
11 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
10 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Unit 2.3 Vector Model
No ratings yet
Unit 2.3 Vector Model
11 pages
Introduction To Vector Embeddings and Vector Databases
No ratings yet
Introduction To Vector Embeddings and Vector Databases
11 pages
L04
No ratings yet
L04
35 pages
LLM✅✅
No ratings yet
LLM✅✅
8 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
Information Retrieval Models: Vector Space Models: Chengxiang Zhai
No ratings yet
Information Retrieval Models: Vector Space Models: Chengxiang Zhai
30 pages
Vector Space Model
No ratings yet
Vector Space Model
4 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
IR-Lab Manual A1
No ratings yet
IR-Lab Manual A1
3 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
What is Vector
No ratings yet
What is Vector
4 pages
DB_FIN103_KINPRO_LUFTTECHNIK_SemiAuto_210225
No ratings yet
DB_FIN103_KINPRO_LUFTTECHNIK_SemiAuto_210225
2 pages
?
No ratings yet
?
9 pages
Schematic - AUTOMATED PAPER MAKER MACHINE - 2022-11-18
No ratings yet
Schematic - AUTOMATED PAPER MAKER MACHINE - 2022-11-18
5 pages
Manual MiniBio
No ratings yet
Manual MiniBio
15 pages
Guided Learning Pathways Project: 4/7, 2011 Tetsuro Takahashi
No ratings yet
Guided Learning Pathways Project: 4/7, 2011 Tetsuro Takahashi
21 pages
Las 1 MMW Mathematics in The Modern World Materials
No ratings yet
Las 1 MMW Mathematics in The Modern World Materials
3 pages
Multiplexing
No ratings yet
Multiplexing
10 pages
Document Ranking Using Customizes Vector Method
No ratings yet
Document Ranking Using Customizes Vector Method
6 pages
Term Weighting & The Vector Space Model
No ratings yet
Term Weighting & The Vector Space Model
2 pages
2(c)_Jaccard and cosine method
No ratings yet
2(c)_Jaccard and cosine method
6 pages
History Paper 1
No ratings yet
History Paper 1
6 pages
Vector Space Model: An Information Retrieval System: Information Technology Empowering Digital India
No ratings yet
Vector Space Model: An Information Retrieval System: Information Technology Empowering Digital India
3 pages
Vector Database
No ratings yet
Vector Database
7 pages
Trompenaars and Lewis - Cultural Dimensions
No ratings yet
Trompenaars and Lewis - Cultural Dimensions
6 pages
2(b)_Question1( tf idf scoring based document ranking )
No ratings yet
2(b)_Question1( tf idf scoring based document ranking )
5 pages
COC4
No ratings yet
COC4
2 pages
Know Petroleum Processing Refineries in Indonesia
No ratings yet
Know Petroleum Processing Refineries in Indonesia
6 pages
All Work and No Play: Focus On Talking
No ratings yet
All Work and No Play: Focus On Talking
4 pages
1.6-TR-Vector Space Model Simplest Instantiation
No ratings yet
1.6-TR-Vector Space Model Simplest Instantiation
11 pages
Retrieval Models and Rank Retrieval
No ratings yet
Retrieval Models and Rank Retrieval
16 pages
1.5-TR-Vector Space Model Basic Idea
No ratings yet
1.5-TR-Vector Space Model Basic Idea
6 pages
DEflection
No ratings yet
DEflection
3 pages
Data Sheet PDF
No ratings yet
Data Sheet PDF
8 pages
CHap 1,2,3,4 & 5
No ratings yet
CHap 1,2,3,4 & 5
85 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
W1 Energy and Energy Balance
No ratings yet
W1 Energy and Energy Balance
32 pages
Unit 2 Motion
No ratings yet
Unit 2 Motion
1 page
ALX Foundations Overview
No ratings yet
ALX Foundations Overview
20 pages
Vol 12 Issue 5
No ratings yet
Vol 12 Issue 5
8 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Omam 3rd GP Calendar
No ratings yet
Omam 3rd GP Calendar
2 pages
Vector Space Model
No ratings yet
Vector Space Model
11 pages
Monitor: Block Diagram of A Typical Embedded System
No ratings yet
Monitor: Block Diagram of A Typical Embedded System
4 pages
Elastic Ebook Building Ai Powered Search Experiences
No ratings yet
Elastic Ebook Building Ai Powered Search Experiences
33 pages
DX420LCA
No ratings yet
DX420LCA
13 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
43 pages
ALV List and ALV Grid
No ratings yet
ALV List and ALV Grid
6 pages
Embeddings
No ratings yet
Embeddings
13 pages
Vector Databases - A Technical Primer
100% (1)
Vector Databases - A Technical Primer
68 pages
1en 02 PDF
100% (2)
1en 02 PDF
254 pages
Mind Tools: The Five Levels of Mathematical Reality
From Everand
Mind Tools: The Five Levels of Mathematical Reality
Rudy Rucker
4/5 (22)
Reality
From Everand
Reality
John David Ensworth
No ratings yet

2(d) Vector Space Model

Uploaded by

2(d) Vector Space Model

Uploaded by

What’s the Vector Space Model?

2. Books as Arrows (Vectors):

4. Scores and Sorting:

 Books in the library:

o Book 1: "Space Travel and Rockets"

o Book 2: "Fairy Tales and Magic"

o Book 3: "Adventure Stories on Mars"

 Your search: "Space Adventures"

o It sorts the books like this:

1. Book 3 (most relevant)

3. Book 2 (not relevant)

You have three documents in your library:

1. Document 1: "The cat is on the mat."

2. Document 2: "Dogs are in the house."

3. Document 3: "The cat and the dog are friends."

You search for: "cat dog"

1. Create a Vocabulary (Unique Words):

From all the documents, list all unique words:

So, the vectors for the documents look like this:

3. Represent the Query as a Vector:

For the query "cat dog", create a vector:

Query vector = [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

 Doc 3: 0.632 (most relevant)

You have three documents:

1. Document 1: "I love to eat pizza and pasta."

2. Document 2: "Pasta is my favorite food."

3. Document 3: "I enjoy pizza with extra cheese."

You search for: "pizza pasta"

1. Create a Vocabulary (Unique Words):

List all unique words from the documents:

2. Represent Each Document as a Vector:

Count the frequency of each word in each document:

3. Represent the Query as a Vector:

For the query "pizza pasta", create a vector:

Query vector = [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

 Doc 1: 0.535 (most relevant)

What is a Vector Space Model (VSM)?

Easy Example: Vector Space Model in Real Life

Imagine you are searching for "delicious mangoes" on Google.

2️⃣ It compares this vector with all document vectors.

You might also like