0% found this document useful (0 votes)
3 views18 pages

Module 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

Module 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Module 2

Classic model : Boolean model: Extended set


theoretical model
• In the Boolean model of information retrieval:
• Documents and queries are both represented as sets of index terms
(important keywords).
• It uses Boolean logic: AND, OR, and NOT to match queries with documents.
• Since it treats terms as sets and applies logical operations, it is called a set-
theoretic model.
• Example:
• Suppose we have 3 documents:
• D1: "machine learning, data, algorithm" → {machine, learning, data,
algorithm}
• D2: "deep learning, neural networks, AI" → {deep, learning, neural,
networks, AI}
• D3: "statistics, data, probability" → {statistics, data, probability}
• Query:
• Q = "data AND learning"
• This means we want documents that have both "data" and "learning".
• 🔹 Evaluation using Boolean logic:
• D1 has both "data" AND "learning" → ✅ (relevant)
• D2 has "learning" but not "data" → ❌
• D3 has "data" but not "learning" → ❌
• Only D1 matches the query.
• Conclusion:
• The Boolean model treats document terms as sets and uses set
operations (like intersection for AND) to find matches.
That's why it's called a set-theoretic model.
Documents and queries are represented as vectors in t dimension
space in vector model .as a result the model is algebric model .

• In the Vector Space Model (VSM) of information retrieval:


• Documents and queries are represented as vectors in a t-dimensional
space, where t is the total number of index terms (keywords).
• Each term is a dimension, and the value in each dimension is a weight
(e.g., term frequency or TF-IDF).
• The similarity between a document and a query is calculated using
algebraic methods, such as cosine similarity.
• Therefore, it is called an algebraic model.
• In the Vector Space Model (VSM) of information retrieval:
• Documents and queries are represented as vectors in a t-dimensional
space, where t is the total number of index terms (keywords).
• Each term is a dimension, and the value in each dimension is a weight
(e.g., term frequency or TF-IDF).
• The similarity between a document and a query is calculated using
algebraic methods, such as cosine similarity.
• Therefore, it is called an algebraic model.
• Example:
• Let’s say our index terms (vocabulary) are:
[data, machine, learning, AI] → 4 terms → 4D vector space
• Document D1:
• Contains: data (2 times), machine (1), learning (1), AI (0)
→ Vector: D1 = [2, 1, 1, 0]
• Query Q:
• Contains: data (1), learning (1)
→ Vector: Q = [1, 0, 1, 0]
Conclusion:
•Each document and query is a point/vector in a multi-
dimensional space.
•Matching is based on algebraic similarity, not just keyword
overlap.
•Hence, it is called an algebraic model.

You might also like