0% found this document useful (0 votes)
0 views9 pages

2(d) Vector Space Model

The Vector Space Model (VSM) represents text as mathematical vectors to compare documents and find relevant ones, commonly used in search engines and information retrieval. It treats words as points and documents as vectors, scoring their relevance based on the direction of their vectors relative to a search query. Similarity between documents is measured using cosine similarity, allowing for efficient ranking of documents based on their content.

Uploaded by

sushilkpal9457
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views9 pages

2(d) Vector Space Model

The Vector Space Model (VSM) represents text as mathematical vectors to compare documents and find relevant ones, commonly used in search engines and information retrieval. It treats words as points and documents as vectors, scoring their relevance based on the direction of their vectors relative to a search query. Similarity between documents is measured using cosine similarity, allowing for efficient ranking of documents based on their content.

Uploaded by

sushilkpal9457
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

What’s the Vector Space Model?

The Vector Space Model is like a magic sorting robot in the library. It looks at all the books (or
documents) and tries to figure out which ones are most related to the words you’re searching for —
like "space" and "adventures." Here's how it works:

How it Works:

1. Words as Points:
Every word in the books is like a point in space. For example, the word "space" is one point,
and the word "adventure" is another.

2. Books as Arrows (Vectors):


Imagine every book becomes an arrow that points toward the words it talks about. If a book
talks a lot about "space" and "adventure," its arrow will point close to those two words.

3. Comparing Directions:
When you ask, "Show me books about space adventures," the robot compares the arrows of
all the books to the direction of your search (another arrow). The closer a book’s arrow is to
your search arrow, the better match it is!

4. Scores and Sorting:


The robot gives each book a score (higher means closer to your search) and shows you the
ones with the best scores first.

Example:

 Books in the library:

o Book 1: "Space Travel and Rockets"

o Book 2: "Fairy Tales and Magic"

o Book 3: "Adventure Stories on Mars"

 Your search: "Space Adventures"

 What happens:

o The robot sees that Book 1 talks a lot about "space" and Book 3 talks about both
"space" and "adventure." Book 2 doesn’t match much.

o It sorts the books like this:

1. Book 3 (most relevant)

2. Book 1

3. Book 2 (not relevant)


Example Question:

You have three documents in your library:

1. Document 1: "The cat is on the mat."

2. Document 2: "Dogs are in the house."

3. Document 3: "The cat and the dog are friends."

You search for: "cat dog"

Let’s use the Vector Space Model to find which document is most relevant to your search.

Steps to Solve:

1. Create a Vocabulary (Unique Words):

From all the documents, list all unique words:


["cat", "dog", "is", "on", "the", "mat", "dogs", "are", "in", "house", "and", "friends"]

So, the vectors for the documents look like this:

 Doc 1: [1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]

 Doc 2: [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0]
 Doc 3: [1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1]

3. Represent the Query as a Vector:

For the query "cat dog", create a vector:


["cat", "dog", "is", "on", "the", "mat", "dogs", "are", "in", "house", "and", "friends"]

Query vector = [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


(1 for "cat", 1 for "dog", 0 for all other words.)
5. Sort the Documents by Similarity:

 Doc 3: 0.632 (most relevant)

 Doc 1: 0.316

 Doc 2: 0

Final Answer:

The query "cat dog" matches best with Document 3, followed by Document 1. Document 2 is
irrelevant.

Example Question:

You have three documents:

1. Document 1: "I love to eat pizza and pasta."

2. Document 2: "Pasta is my favorite food."

3. Document 3: "I enjoy pizza with extra cheese."

You search for: "pizza pasta"

Let’s use the Vector Space Model step by step to find which document is most relevant to your
search.

Steps to Solve:

1. Create a Vocabulary (Unique Words):

List all unique words from the documents:


["I", "love", "to", "eat", "pizza", "and", "pasta", "is", "my", "favorite", "food", "enjoy", "with", "extra",
"cheese"]

2. Represent Each Document as a Vector:

Count the frequency of each word in each document:


Vectors for the documents:

 Doc 1: [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]

 Doc 2: [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]

 Doc 3: [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]

3. Represent the Query as a Vector:

For the query "pizza pasta", create a vector:


["I", "love", "to", "eat", "pizza", "and", "pasta", "is", "my", "favorite", "food", "enjoy", "with", "extra",
"cheese"]

Query vector = [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]


(1 for "pizza" and 1 for "pasta"; 0 for all other words.)
5. Sort the Documents by Similarity:

 Doc 1: 0.535 (most relevant)

 Doc 3: 0.354

 Doc 2: 0.316

Final Answer:

The query "pizza pasta" matches best with Document 1, followed by Document 3, and finally
Document 2

What is a Vector Space Model (VSM)?

A Vector Space Model (VSM) is a way to represent text as mathematical vectors so that we can
compare documents and find the most relevant ones. It is used in search engines, document
ranking, and information retrieval.

Instead of treating words as just text, VSM converts them into numbers (vectors) to compare their
meanings efficiently.
How Do We Compare Documents?

We can measure similarity between documents using the cosine similarity formula (like checking
the angle between two vectors).

 Doc1 vs. Doc2 (Do they have similar words? Yes! "bananas" is common)

 Doc1 vs. Doc3 (Do they have similar words? Yes! "love" is common)

 Doc2 vs. Doc3 (Do they have similar words? Yes! "mangoes" is common)

The closer the vectors, the more similar the documents are.

Easy Example: Vector Space Model in Real Life

Imagine you are searching for "delicious mangoes" on Google.

1️⃣ The search engine converts your query ("delicious mangoes") into a vector:
Query Vector: [0, 0, 1, 0, 1, 0]

2️⃣ It compares this vector with all document vectors.

3️⃣ The most similar document (Doc2: "Bananas and mangoes are delicious") is ranked highest
because it has both "mangoes" and "delicious"!

This is how search engines like Google find the best matching documents for your query using
Vector Space Model.

You might also like