Chapter 4
Chapter 4
Systems
Chapter Four- Part I
1
Template for Search Engine Evaluation
Task
Cover page (course title, group members name, search engines name)
Introduction (only one page)
About how and why you select the search engines for the comparison,
brief description about each search engines)
Comparison (only one page)
Use the table next page
Conclusion (only one page)
Discussion of key findings you want to emphasis
Reference
File naming convention – Team leader name with first letter of his/her
father -UG-IR-SEE
2
Example : tibebeb-UG-IR-SEE
Criteria SE1 SE2 SE3
Searching options
Stemming technique
Ranking approach
Similarity measure
approach
User interface
Relevance feedback
mechanism
Others
3
Objectives
4
Topics
Overview
Boolean/Logical Model
Vector Space Model (VSM)
5
Modeling of Modern IR Systems
6
Cont…
Ranking is an ordering of the documents retrieved
that (hopefully) reflects the relevance of the
documents to the user query.
Ranking is based on fundamental premises
regarding the notion of relevance, such as:
common sets of index terms
sharing of weighted terms
likelihood of relevance
7
Cont…
Such distinct set of premises (regarding document
relevance) leads to a distinct IR models.
Model
is an idealization or abstraction of the actual
process (here, retrieval).
It represents something that exists or is
planned in the real world and that in someway
is too complex or large for us to understand it
as it stands.
8
Cont…
9
Cont…
10
What is a retrieval model?
11
What an IR Model includes
Two elements
The retrieval mechanism: used to match query with a set
of documents
The ways in which the user’s information need can be
formulated as a query that can be searched by that
mechanism
thus a retrieval model specifies the details of
Document representation
Query representation
Retrieval function
12
Building a Model
To build a model, we need to think of first on
representations of the documents and the user
information need.
Given these representations the next step is to conceive
a framework in which they can be modeled.
This framework should also provide the idea on
constructing a ranking function.
In the Boolean model, the framework is composed of sets
of documents and the standard operations on sets
For the vector space model, the framework is composed
of a t-dimensional Vector space and standard linear
algebra operations on vectors 13
Cont…
14
Boolean/Logical model
It is a simple retrieval model, which is more of retrieval
than document representation and based on or uses set
theory and Boolean algebra.
Documents and queries are represented as sets of index
terms
Provides a framework, which is easy to grasp by a
common user of an IR system.
17
Cont…
Index term Document
Information 1,2,3,4
Storage 1
Retrieval 1,2,4
System 2
Processing 3
Management 3
Archives 4
Suppose our query consists of information AND Retrieval
Which documents will be retrieved based on Boolean
model?
18
Cont…
The basic assumption is that there is a domain and
both the author of the document and the readers
belong to the same domain, at any one time you
have t of them.
whatit means is that any document in the
domain is written in these terms
19
Relevance – matching
Matching as a concept is the degree of similarity
between D and Q,
The degree of similarity determines the degree of
closeness between D and Q.
Ifthere is more sharing between query terms and
document terms, the author and the user are
talking the same thing.
20
Cont…
Boolean model’s matching considers that index terms
are either present or absent in a document.
Thus, the index term weights are assumed to be all
binary.
That is, wij = {0, 1}
A query q is composed of index terms linked by the
three connectives, example
q = ka ( kb kc )
Thus, intersection
Takes operands and returns degree of commonness
Is a function that counts the number of matches
23
How do you explain the
essence of relevance in IRS
designed using Boolean model?
24
Example:
25
Cont…
2. Query: Find all documents that do not contain
“information”
This is a query which attempts to find documents
that do not contain a particular pattern
Boolean expression (representation)
NOT information
Result
A set whose elements are all documents that
do not contain the pattern “information
26
Cont…
Most queries search for more than one term
Find all documents containing “information” and
“retrieval”
Find all documents containing “information” or
“retrieval” (or both)
Find all documents containing “information” or
“retrieval”, but not both
Each of the three queries illustrates a particular
concept that may form a Boolean expression, namely
Conjunction, Disjunction, Exclusive disjunction
27
Cont…
Boolean expressions may be formed from other
Boolean expressions to yield complex structure
Query
Find all documents containing “information”,
“retrieval” or not containing both “retrieval”
and “science”
Boolean expression
(Information
and retrieval) OR NOT (retrieval
AND science), parenthesis avoid ambiguity
28
Cont…
29
The following list defines how to evaluate
Boolean expressions operators in terms of
the sets
U – d1 is the set of all docs not containing p1
(NOT)
d1 ∩ d2 is the set of all docs. containing both
p1 and p2 (AND)
d1 U d2 is the set of all docs. containing
either p1 or p2 (OR)
d1 U d2 – d1 ∩ d2 is the set of all docs.
Containing either p1 or p2, but not both (XOR)
30
Cont…
Thus In Boolean,
the use of AND requires that both terms that it
connects be present in the retrieved documents,
the use of OR requires that at least one of the terms
be present.
This is an inclusive use of OR, meaning that it is
acceptable for both of the terms to be present,
31
Cont…
If an exclusive use of OR is desired- one term or the
other, but not both- the construction is more
complex:
(A AND NOT B) OR (B AND NOT A)
or
(A OR B) AND NOT (A AND B)
NOT requires that the specified term be absent from
any retrieved document
32
Exercise: Consider a set of five docs and assume that
they contain the terms shown in the table
Doc. Terms
D1 Algorithm, information, retrieval
D2 Retrieval, science
D3 Algorithm, information, science
D4 Pattern, retrieval, science
D5 Science, algorithm
34
Advantages of the Boolean
model
Simplicity
35
Limitations (Drawbacks) of the
Boolean Model
38
Cont…
In Boolean model no ranking of the documents is
provided (absence of a grading scale)
as all documents are considered equal, no
ordering of retrieved set
Retrieved documents are generally not ranked.
All retrieved are presumed to be equally useful.
No mechanism to show the relative importance of
the different components of a query
39
Cont…
Query formulation is too difficult using the Boolean
operators.
Boolean expressions have precise semantics
Thus, it is not simple to translate an information
need into a Boolean expression.
Informationneed has to be translated into a
Boolean expression which most users find awkward.
40
Cont…
42
Cont…
“NOT”, for instance, retrieves every document that does
not contain a specific term.
A query such as ‘NOT aardvark’ runs of retrieving virtually
the entire database
Again another point is separation between retrieved / non-
retrieved too strict that means
For q= t1 Λ t2 Λ t3, documents containing two of the
terms will be rejected as well as those containing none
Analogously for q=t1 V t2 V t3, no ordering within
retrieved documents
Generally it has poor retrieval quality and its main problem
is the inability to recognize partial matches which
frequently leads to poor performance 43
Exercise
Given the following four documents with the following
contents:
D1 = “computer information retrieval”
D2 = “computer retrieval”
D3 = “information”
D4 = “computer information”
What are the relevant documents retrieved for the
queries:
Q1 = “information retrieval”
Q2 = “information ¬computer”
44
The Boolean Model: Example
• Given the following determine documents retrieved by the Boolean
model based IR system
• Index Terms: K1, …,K8.
• Documents:
1. D1 = {K1, K2, K3, K4, K5}
2. D2 = {K1, K2, K3, K4}
3. D3 = {K2, K4, K6, K8}
4. D4 = {K1, K3, K5, K7}
5. D5 = {K4, K5, K6, K7, K8}
6. D6 = {K1, K2, K3, K4}
• Query: K1 (K2 K3)
45
• Answer: {D1, D2, D4, D6} ({D1, D2, D3, D6} {D3, D5})
= {D1, D2, D6}
46
The Boolean Model: Example
Given the following three documents, Construct Term – document matrix and
find the relevant documents retrieved by the Boolean model for given
query
• D1: “Shipment of gold damaged in a fire” • Find the relevant
• D2: “Delivery of silver arrived in a silver truck” documents for the
• D3: “Shipment of gold arrived in a truck” queries (use AND , OR)
• Query: “gold silver truck” (a)gold delivery
Use table below for the –term matrix (b)ship gold
(c)silver truck
48