0% found this document useful (0 votes)

146 views32 pages

Information Retrieval Basics and Advances

This document provides an introduction to information retrieval (IR). It discusses how IR systems work by indexing and retrieving relevant documents from a corpus in response to a user query. Key components of IR systems include text processing, indexing, searching, ranking and the user interface. The history and development of IR is then outlined, from early keyword-based systems to current advances in web search, question answering, and learning techniques. Related fields that influence IR like database management, library science, artificial intelligence and natural language processing are also mentioned.

Uploaded by

Saad Bin Shahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views32 pages

Information Retrieval Basics and Advances

Uploaded by

Saad Bin Shahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Information Retrieval

Introduction

Dr Sharifullah Khan
NUST SEECS

1
Acknowledgement

These slides have been borrowed from:

Professor Dr. Raymond J. Mooney

Computer Science, The University of Texas at Austin,

USA
https://www.cs.utexas.edu/users/mooney/

2
Information Retrieval
(IR)
The indexing and retrieval of textual
documents.
Searching for pages on the World Wide
Web is the killer app.
Concerned firstly with retrieving relevant
documents to a query.
Concerned secondly with retrieving from
large sets of documents efficiently.

3
Typical IR Task

Given:
A corpus of textual natural-language
documents.
A user query in the form of a textual string.
Find:
A ranked set of documents that are relevant to
the query.

4
IR System

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Documents .
.

5
Relevance

Relevance is a subjective judgment and may

include:
Being on the proper subject.
Being timely (recent information).
Being authoritative (from a trusted source).
Satisfying the goals of the user and his/her
intended use of the information (information
need).

6
Keyword Search

Simplest notion of relevance is that the

query string appears verbatim in the
document.
Slightly less strict notion is that the words
in the query appear frequently in the
document, in any order (bag of words).

7
Problems with Keywords

May not retrieve relevant documents that

include synonymous terms.
restaurant vs. caf
PRC vs. China
May retrieve irrelevant documents that
include ambiguous terms.
bat (baseball vs. mammal)
Apple (company vs. fruit)
bit (unit of data vs. act of eating)
8
Beyond Keywords

We will cover the basics of keyword-based

IR, but
We will focus on extensions and recent
developments that go beyond keywords.
We will cover the basics of building an
efficient IR system, but
We will focus on basic capabilities and
algorithms rather than systems issues that
allow scaling to industrial size databases.
9
Intelligent IR

Taking into account the meaning of the

words used.
Taking into account the order of words in
the query.
Adapting to the user based on direct or
indirect feedback.
Taking into account the authority of the
source.

10
IR System Architecture

User Interface
Text
User
Text Operations
Need
Logical View
User Query Database
Feedback Operations Indexing
Manager
Inverted
file
Query Searching Index
Text
Ranked Retrieved Database
Docs Ranking Docs
11
IR System Components
Text Operations forms index words (tokens).
Stopword removal
Stemming
Indexing constructs an inverted index of
word to document pointers.
Searching retrieves documents that contain a
given query token from the inverted index.
Ranking scores all retrieved documents
according to a relevance metric.

12
IR System Components (continued)
User Interface manages interaction with the
user:
Query input and document output.
Relevance feedback.
Visualization of results.
Query Operations transform the query to
improve retrieval:
Query expansion using a thesaurus.
Query transformation using relevance feedback.

13
Web Search

Application of IR to HTML documents on

the World Wide Web.
Differences:
Must assemble document corpus by spidering
the web.
Can exploit the structural layout information
in HTML (XML).
Documents change uncontrollably.
Can exploit the link structure of the web.

14
Web Search System

Web Spider Document

corpus

Query IR
String System

1. Page1
2. Page2
3. Page3
Ranked
. Documents
.

15
Other IR-Related Tasks

Automated document categorization

Information filtering (spam filtering)
Information routing
Automated document clustering
Recommending information or products
Information extraction
Information integration
Question answering
16
History of IR

1960-70s:
Initial exploration of text retrieval systems for
small corpora of scientific abstracts, and law
and business documents.
Development of the basic Boolean and vector-
space models of retrieval.
Prof. Salton and his students at Cornell
University are the leading researchers in the
area.

17
IR History Continued

1980s:
Large document database systems, many run by
companies:
Lexis-Nexis
Dialog
MEDLINE

18
IR History Continued

1990s:
Searching FTPable documents on the Internet
Archie
WAIS
Searching the World Wide Web
Lycos
Yahoo
Altavista

19
IR History Continued

1990s continued:
Organized Competitions
NIST TREC
Recommender Systems
Ringo
Amazon
NetPerceptions
Automated Text Categorization & Clustering

20
IR History Continued

2000s
Link analysis for Web Search
Google
Automated Information Extraction
Parallel Processing
Map/Reduce
Question Answering
TREC Q/A track

21
IR History Continued

2000s continued:
Multimedia IR
Image
Video
Audio and music
Cross-Language IR
DARPA Tides
Document Summarization
Learning to Rank

22
Recent IR History

2010s
Intelligent Personal Assistants
Siri
Cortana
Google Now
Alexa
Complex Question Answering
IBM Watson
Distributional Semantics
Deep Learning
23
Related Areas

Database Management
Library and Information Science
Artificial Intelligence
Natural Language Processing
Machine Learning

24
Database Management

Focused on structured data stored in

relational tables rather than free-form text.
Focused on efficient processing of well-
defined queries in a formal language (SQL).
Clearer semantics for both data and queries.
Recent move towards semi-structured data
(XML) brings it closer to IR.

25
Library and Information Science

Focused on the human user aspects of

information retrieval (human-computer
interaction, user interface, visualization).
Concerned with effective categorization of
human knowledge.
Concerned with citation analysis and
bibliometrics (structure of information).
Recent work on digital libraries brings it
closer to CS & IR.
26
Artificial Intelligence

Focused on the representation of knowledge,

reasoning, and intelligent action.
Formalisms for representing knowledge and
queries:
First-order Predicate Logic
Bayesian Networks
Recent work on web ontologies and
intelligent information agents brings it
closer to IR.
27
Natural Language Processing

Focused on the syntactic, semantic, and

pragmatic analysis of natural language text
and discourse.
Ability to analyze syntax (phrase structure)
and semantics could allow retrieval based
on meaning rather than keywords.

28
Natural Language Processing:
IR Directions
Methods for determining the sense of an
ambiguous word based on context (word
sense disambiguation).
Methods for identifying specific pieces of
information in a document (information
extraction).
Methods for answering specific NL
questions from document corpora or
structured data like FreeBase or Googles
Knowledge Graph. 29
Machine Learning

Focused on the development of

computational systems that improve their
performance with experience.
Automated classification of examples
based on learning concepts from labeled
training examples (supervised learning).
Automated methods for clustering
unlabeled examples into meaningful
groups (unsupervised learning).
30
Machine Learning:
IR Directions
Text Categorization
Automatic hierarchical classification (Yahoo).
Adaptive filtering/routing/recommending.
Automated spam filtering.
Text Clustering
Clustering of IR query results.
Automatic formation of hierarchies (Yahoo).
Learning for Information Extraction
Text Mining
Learning to Rank 31
Thanks for your Kind Attention

Questions are welcomed

Intro Notes
No ratings yet
Intro Notes
11 pages
Information Retrieval Basics and Techniques
No ratings yet
Information Retrieval Basics and Techniques
29 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
1stunit GN
No ratings yet
1stunit GN
36 pages
Information Retrieval: Dr. Bassel ALKHATIB
No ratings yet
Information Retrieval: Dr. Bassel ALKHATIB
55 pages
UNIT I - Introduction and Motivation
No ratings yet
UNIT I - Introduction and Motivation
57 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
1 IR Introduction
No ratings yet
1 IR Introduction
23 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
12 pages
Ch2 - IR and LT
No ratings yet
Ch2 - IR and LT
45 pages
Intro to Info Retrieval Course
No ratings yet
Intro to Info Retrieval Course
31 pages
Information Retrieval Techniques Overview
No ratings yet
Information Retrieval Techniques Overview
281 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
Ir Mod1 Notes
No ratings yet
Ir Mod1 Notes
20 pages
Overview of Information Retrieval Systems
0% (1)
Overview of Information Retrieval Systems
23 pages
1 introIR
No ratings yet
1 introIR
15 pages
1 IR Introductionn
No ratings yet
1 IR Introductionn
30 pages
UNIT I IR Final
No ratings yet
UNIT I IR Final
26 pages
Chapter 1 Ir
No ratings yet
Chapter 1 Ir
37 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
73 pages
Information Retrieval Course Overview
100% (2)
Information Retrieval Course Overview
12 pages
Lecture17 IR
No ratings yet
Lecture17 IR
28 pages
Chapter One IR
No ratings yet
Chapter One IR
18 pages
Introduction to Information Retrieval Course
No ratings yet
Introduction to Information Retrieval Course
39 pages
M.Tech IR Course Overview
No ratings yet
M.Tech IR Course Overview
72 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
59 pages
2 Mod-1 - Lec-2
No ratings yet
2 Mod-1 - Lec-2
58 pages
Chap 1
No ratings yet
Chap 1
23 pages
Cs8080irtunitinotes 220515215754 E06d144b
No ratings yet
Cs8080irtunitinotes 220515215754 E06d144b
43 pages
VV - IR - UNIT-I - Part2
No ratings yet
VV - IR - UNIT-I - Part2
35 pages
CS & Engineering Lecture Notes
No ratings yet
CS & Engineering Lecture Notes
24 pages
Information Retrieval Techniques Overview
No ratings yet
Information Retrieval Techniques Overview
31 pages
ch1 - Information Retrieval Systems
100% (1)
ch1 - Information Retrieval Systems
52 pages
Ir - Chapter 1
No ratings yet
Ir - Chapter 1
7 pages
Introduction To IIR
No ratings yet
Introduction To IIR
53 pages
7 B - Query Languages
No ratings yet
7 B - Query Languages
33 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
Lec5 Ir Introduction
No ratings yet
Lec5 Ir Introduction
37 pages
Understanding Information Retrieval Systems
No ratings yet
Understanding Information Retrieval Systems
30 pages
1 Introduction MIR
No ratings yet
1 Introduction MIR
35 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
34 pages
IR Lec1
No ratings yet
IR Lec1
26 pages
Module 1print
No ratings yet
Module 1print
5 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
77 pages
1 IRIntro
No ratings yet
1 IRIntro
95 pages
Intelligent
No ratings yet
Intelligent
20 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
29 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
Understanding Information Retrieval Systems
No ratings yet
Understanding Information Retrieval Systems
30 pages
Chap 1
No ratings yet
Chap 1
22 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
15 pages
IR Chapter 1
No ratings yet
IR Chapter 1
29 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
88 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
Information Retrieval Systems
No ratings yet
Information Retrieval Systems
46 pages
Understanding Information Retrieval Systems
No ratings yet
Understanding Information Retrieval Systems
18 pages
Rayyan Air PDF
50% (2)
Rayyan Air PDF
60 pages
Volume Two: Chapter 1 Directions
No ratings yet
Volume Two: Chapter 1 Directions
6 pages
The Sociological Perspective and Research Process
No ratings yet
The Sociological Perspective and Research Process
7 pages
PDF Display Error Help
No ratings yet
PDF Display Error Help
1 page
Information Retrieval Course CS825
No ratings yet
Information Retrieval Course CS825
3 pages
Exercise 1.2010.solutions
83% (24)
Exercise 1.2010.solutions
3 pages
Preprocessing in Ir: Rida Hafeez
No ratings yet
Preprocessing in Ir: Rida Hafeez
14 pages
Preprocessing in Ir: Rida Hafeez
No ratings yet
Preprocessing in Ir: Rida Hafeez
14 pages
Text Preprocessing: Information Retrieval
100% (2)
Text Preprocessing: Information Retrieval
16 pages
Media Bias in Celebrating Success
No ratings yet
Media Bias in Celebrating Success
1 page
Cesar Chavez: Analyzing Primary and Secondary Sources
No ratings yet
Cesar Chavez: Analyzing Primary and Secondary Sources
8 pages
Ethical Leadership and Personality Traits
No ratings yet
Ethical Leadership and Personality Traits
9 pages
8th Grade Pre-Post Test Student's Edition
100% (1)
8th Grade Pre-Post Test Student's Edition
12 pages
2012 - Delirious Heterotopias
No ratings yet
2012 - Delirious Heterotopias
29 pages
Master Data Management Guide
No ratings yet
Master Data Management Guide
27 pages
Ethics Governance Accountability A Professional Perspective PDF
No ratings yet
Ethics Governance Accountability A Professional Perspective PDF
2 pages
Angrboda DEMON PDF
No ratings yet
Angrboda DEMON PDF
1 page
PLR Therapy
No ratings yet
PLR Therapy
3 pages
Charter College: Grade 10
No ratings yet
Charter College: Grade 10
5 pages
Japanese Garden Styles & History
No ratings yet
Japanese Garden Styles & History
74 pages
Lecture 6 - Symmetrical Components
No ratings yet
Lecture 6 - Symmetrical Components
3 pages
Hope (By John Galsworthy)
100% (1)
Hope (By John Galsworthy)
2 pages
If Anyone Hungers - . .' An Integrated
No ratings yet
If Anyone Hungers - . .' An Integrated
14 pages
The Role of Criticism in Society
100% (1)
The Role of Criticism in Society
2 pages
Mixed Method Research
No ratings yet
Mixed Method Research
15 pages
Torque POGIL and Webquest
100% (1)
Torque POGIL and Webquest
7 pages
Types of Logical Reasoning Questions
No ratings yet
Types of Logical Reasoning Questions
3 pages
Political Ideologies
100% (7)
Political Ideologies
19 pages
Class 3 JAN& FEB (V &D) AISG-43
No ratings yet
Class 3 JAN& FEB (V &D) AISG-43
7 pages
Dr. Ambedkar's Key Life Events
No ratings yet
Dr. Ambedkar's Key Life Events
5 pages
Invergine's The Study of Psychology
No ratings yet
Invergine's The Study of Psychology
3 pages
David Foster Wallace's Kenyon Commencement Speech
No ratings yet
David Foster Wallace's Kenyon Commencement Speech
7 pages
Vincentian Virtues: Reported By: Edmar M. Patricio
No ratings yet
Vincentian Virtues: Reported By: Edmar M. Patricio
11 pages
Anne Sauvagnargues - Deleuze and Art - 2013
100% (6)
Anne Sauvagnargues - Deleuze and Art - 2013
241 pages
Report Writing Structure Guide
No ratings yet
Report Writing Structure Guide
18 pages
Hamburg Master Thesis
100% (2)
Hamburg Master Thesis
5 pages
Betrayal J Hillman
No ratings yet
Betrayal J Hillman
13 pages
Russian Literature's Reflection 1921-41
No ratings yet
Russian Literature's Reflection 1921-41
14 pages
Gurley - The Materialist Conception of History
100% (1)
Gurley - The Materialist Conception of History
8 pages
Emulation Working Explained
86% (7)
Emulation Working Explained
43 pages

Information Retrieval Basics and Advances

Uploaded by

Information Retrieval Basics and Advances

Uploaded by

Information Retrieval

These slides have been borrowed from:

Professor Dr. Raymond J. Mooney

Computer Science, The University of Texas at Austin,

Relevance is a subjective judgment and may

Simplest notion of relevance is that the

May not retrieve relevant documents that

We will cover the basics of keyword-based

Taking into account the meaning of the

Application of IR to HTML documents on

Web Spider Document

Automated document categorization

Focused on structured data stored in

Focused on the human user aspects of

Focused on the representation of knowledge,

Focused on the syntactic, semantic, and

Focused on the development of

Questions are welcomed

You might also like