2(d) Vector Space Model
2(d) Vector Space Model
The Vector Space Model is like a magic sorting robot in the library. It looks at all the books (or
documents) and tries to figure out which ones are most related to the words you’re searching for —
like "space" and "adventures." Here's how it works:
How it Works:
1. Words as Points:
Every word in the books is like a point in space. For example, the word "space" is one point,
and the word "adventure" is another.
3. Comparing Directions:
When you ask, "Show me books about space adventures," the robot compares the arrows of
all the books to the direction of your search (another arrow). The closer a book’s arrow is to
your search arrow, the better match it is!
Example:
What happens:
o The robot sees that Book 1 talks a lot about "space" and Book 3 talks about both
"space" and "adventure." Book 2 doesn’t match much.
2. Book 1
Let’s use the Vector Space Model to find which document is most relevant to your search.
Steps to Solve:
Doc 1: [1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
Doc 2: [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0]
Doc 3: [1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1]
Doc 1: 0.316
Doc 2: 0
Final Answer:
The query "cat dog" matches best with Document 3, followed by Document 1. Document 2 is
irrelevant.
Example Question:
Let’s use the Vector Space Model step by step to find which document is most relevant to your
search.
Steps to Solve:
Doc 1: [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Doc 2: [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]
Doc 3: [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
Doc 3: 0.354
Doc 2: 0.316
Final Answer:
The query "pizza pasta" matches best with Document 1, followed by Document 3, and finally
Document 2
A Vector Space Model (VSM) is a way to represent text as mathematical vectors so that we can
compare documents and find the most relevant ones. It is used in search engines, document
ranking, and information retrieval.
Instead of treating words as just text, VSM converts them into numbers (vectors) to compare their
meanings efficiently.
How Do We Compare Documents?
We can measure similarity between documents using the cosine similarity formula (like checking
the angle between two vectors).
Doc1 vs. Doc2 (Do they have similar words? Yes! "bananas" is common)
Doc1 vs. Doc3 (Do they have similar words? Yes! "love" is common)
Doc2 vs. Doc3 (Do they have similar words? Yes! "mangoes" is common)
The closer the vectors, the more similar the documents are.
1️⃣ The search engine converts your query ("delicious mangoes") into a vector:
Query Vector: [0, 0, 1, 0, 1, 0]
3️⃣ The most similar document (Doc2: "Bananas and mangoes are delicious") is ranked highest
because it has both "mangoes" and "delicious"!
This is how search engines like Google find the best matching documents for your query using
Vector Space Model.