Abdullaahi Faisal Faarah BSCS/014/2021
1. Boolean Model in Information Retrieval
The Boolean model is one of the earliest and simplest approaches to
information retrieval. It is based on Boolean algebra and treats
documents as sets of terms. Queries are formulated as Boolean
expressions using logical operators such as AND, OR, and NOT.
Representation: Each document is represented as a binary vector
indicating the presence or absence of terms.
Retrieval Mechanism: A document is retrieved only if it exactly
satisfies the Boolean query expression.
2. Vector Space Model (VSM) in Information Retrieval
The Vector Space Model, introduced by Salton et al., represents
documents and queries as vectors in a multi-dimensional space where
each dimension corresponds to a term from the document collection.
Representation: Documents and queries are represented as
weighted term vectors, typically using TF-IDF (Term
Frequency–Inverse Document Frequency) to assign weights.
Retrieval Mechanism: The similarity between a query and a
document is calculated using cosine similarity or other
distance/similarity metrics.
3. Probabilistic Models in Information Retrieval
Probabilistic models aim to model the uncertainty in the relevance of
documents to a given query by estimating the probability that a document
is relevant.
Basic Model: The Binary Independence Model (BIM) is the
foundation, where documents are ranked by their probability of
being relevant to a query using Bayes' theorem under the
assumption that terms are independent.
Ranking Function: Documents are ranked by computing a
retrieval status value (RSV), often derived from the odds of
relevance.