Module 2
Classic model : Boolean model: Extended set
theoretical model
• In the Boolean model of information retrieval:
• Documents and queries are both represented as sets of index terms
(important keywords).
• It uses Boolean logic: AND, OR, and NOT to match queries with documents.
• Since it treats terms as sets and applies logical operations, it is called a set-
theoretic model.
• Example:
• Suppose we have 3 documents:
• D1: "machine learning, data, algorithm" → {machine, learning, data,
algorithm}
• D2: "deep learning, neural networks, AI" → {deep, learning, neural,
networks, AI}
• D3: "statistics, data, probability" → {statistics, data, probability}
• Query:
• Q = "data AND learning"
• This means we want documents that have both "data" and "learning".
• 🔹 Evaluation using Boolean logic:
• D1 has both "data" AND "learning" → ✅ (relevant)
• D2 has "learning" but not "data" → ❌
• D3 has "data" but not "learning" → ❌
• Only D1 matches the query.
• Conclusion:
• The Boolean model treats document terms as sets and uses set
operations (like intersection for AND) to find matches.
That's why it's called a set-theoretic model.
Documents and queries are represented as vectors in t dimension
space in vector model .as a result the model is algebric model .
• In the Vector Space Model (VSM) of information retrieval:
• Documents and queries are represented as vectors in a t-dimensional
space, where t is the total number of index terms (keywords).
• Each term is a dimension, and the value in each dimension is a weight
(e.g., term frequency or TF-IDF).
• The similarity between a document and a query is calculated using
algebraic methods, such as cosine similarity.
• Therefore, it is called an algebraic model.
• In the Vector Space Model (VSM) of information retrieval:
• Documents and queries are represented as vectors in a t-dimensional
space, where t is the total number of index terms (keywords).
• Each term is a dimension, and the value in each dimension is a weight
(e.g., term frequency or TF-IDF).
• The similarity between a document and a query is calculated using
algebraic methods, such as cosine similarity.
• Therefore, it is called an algebraic model.
• Example:
• Let’s say our index terms (vocabulary) are:
[data, machine, learning, AI] → 4 terms → 4D vector space
• Document D1:
• Contains: data (2 times), machine (1), learning (1), AI (0)
→ Vector: D1 = [2, 1, 1, 0]
• Query Q:
• Contains: data (1), learning (1)
→ Vector: Q = [1, 0, 1, 0]
Conclusion:
•Each document and query is a point/vector in a multi-
dimensional space.
•Matching is based on algebraic similarity, not just keyword
overlap.
•Hence, it is called an algebraic model.