lec7

Wildcard queries are utilized when users are uncertain about spelling, seek multiple variants, or are unsure if stemming is applied. The document discusses techniques for processing wildcard queries, including the use of B-trees and permuterm indexes to efficiently handle queries with wildcards. Additionally, it introduces bigram indexes to facilitate searching for terms based on character sequences.

Uploaded by

menaahmed15200

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views11 pages

lec7

Uploaded by

menaahmed15200

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

WILD-CARD QUERIES

1
WILD-CARD QUERIES: *
Wildcard queries are used in any of the following situations:
(1) User is uncertain of the spelling of a query term (e.g., Sydney vs. Sidney,
which leads to the wildcard query S*dney)
(2) User is aware of multiple variants of spelling a term and seeks
documents containing any of the variants (e.g., color vs. colour)
(3) User seeks documents containing variants of a term that would be
caught by stemming, but is unsure whether the search engine performs
stemming (e.g., judicial vs. judiciary, leading to the wildcard query
judicia*)
(4) User is uncertain of the correct rendition of a foreign word or phrase
(e.g., the query Universit* Stuttgart).

2
Sec. 3.2

Wild-card queries: *
• mon*: find all docs containing any word
beginning with “mon”.
• Easy with binary tree (or B-tree) lexicon:
retrieve all words in range: mon ≤ w < moo
• *mon: find words ending in “mon”: harder
– Maintain an additional B-tree for terms
Lemon-nomel
backwards (reverse B-tree).
Can retrieve all
Exercise: from words
this, in
how can range:
we nom
enumerate
meeting the wild-card query pro*cent ?
≤ w < non.
all terms

3
Sec. 3.2

B-trees handle *’s at the end of a query

term
• How can we handle *’s in the middle of query
term?
– co*tion
• We could look up co* AND *tion in a B-tree
and intersect the two term sets
– Expensive
• The solution: transform wild-card queries so
that the *’s occur at the end

4
Sec. 3.2

Query processing
• At this point, we have an enumeration of all terms
in the dictionary that match the wild-card query.
• We still have to look up the postings for each
enumerated term.
• E.g., consider the query:
se*ate AND fil*er
This may result in the execution of many Boolean
AND queries.
This gives rise to the Permuterm Index.
5
Sec. 3.2.1

Permuterm index
• For term hello, index under:
– hello$, ello$h, llo$he, lo$hel, o$hell, $hello
where $ is a special symbol.
• Queries:
– X lookup on X$ X* lookup on $X*
– *X lookup on X$* *X* lookup on X*
– X*Y lookup on Y$X* X*Y*ZQuery:
??? Exercise!
fi*mo*er
1-Look up er$fi*
Query = hel*o 2-filter terms to ensure
X=hel, Y=o mo in middle.
Lookup o$hel* (fishmonger but not
filibuster) 6
Sec. 3.2.1

Permuterm query processing

• Rotate query wild-card to the right
• Now use B-tree lookup as before.
• Permuterm problem: ≈ quadruples lexicon size
Empirical observation for English.

7
Sec. 3.2.2

Bigram (k-gram) indexes

• Enumerate all k-grams (sequence of k chars)
occurring in any term
• e.g., from text “April is the cruelest month”
we get the 2-grams (bigrams)
$a,ap,pr,ri,il,l$,$i,is,s$,$t,th,he,e$,$c,cr,ru,
ue,el,le,es,st,t$, $m,mo,on,nt,h$

– $ is a special word boundary symbol

• Maintain a second inverted index from
bigrams to dictionary terms that match each 8
Sec. 3.2.2

Bigram index example

• The k-gram index finds terms based on a
query consisting of k-grams (here k=2).
$m mace madden

mo among amortize

on along among

9
Sec. 3.2.2

Processing wild-cards
• Query mon* can now be run as
– $m AND mo AND on
• Gets terms that match AND version of our
wildcard query.
• But we’d enumerate moon.
• Must post-filter these terms against query
(eg..red*($r AND red) retired) [post-
filtering step]
• Surviving enumerated terms are then looked up
in the term-document inverted index. 10
Sec. 3.2.2

Processing wild-card queries

• As before, we must execute a Boolean query
for each enumerated, filtered term.
• Wild-cards can result in expensive query
execution pyth* AND prog*
• If you encourage “laziness” people will
respond!
Searc
h
Type your search terms, use ‘*’ if you need to.
E.g., Alex* will match Alexander.

Admin Network Security - Issue 78 2023
No ratings yet
Admin Network Security - Issue 78 2023
100 pages
Lecture3 Tolerant Retrieval
100% (1)
Lecture3 Tolerant Retrieval
48 pages
Lecture3 Tolerant Retrieval
100% (1)
Lecture3 Tolerant Retrieval
48 pages
Lecture3-Tolerant-retrieval Dictionaries and Tolerant Retrieval CH 3
No ratings yet
Lecture3-Tolerant-retrieval Dictionaries and Tolerant Retrieval CH 3
47 pages
Tbxlha 6565C VTM - Tbxlha 6565C A3m PDF
No ratings yet
Tbxlha 6565C VTM - Tbxlha 6565C A3m PDF
3 pages
Module 4-Boolean Retrieval Models-Edit Distance, Spelling Correction
No ratings yet
Module 4-Boolean Retrieval Models-Edit Distance, Spelling Correction
124 pages
4-Tolerant retrieval
No ratings yet
4-Tolerant retrieval
82 pages
10th Surya Maths EM 2020-2021 Full Guide
No ratings yet
10th Surya Maths EM 2020-2021 Full Guide
419 pages
03 Dictionaries
No ratings yet
03 Dictionaries
112 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
03 Dictionaries
No ratings yet
03 Dictionaries
112 pages
ENG_50-55-65_M550
No ratings yet
ENG_50-55-65_M550
29 pages
3.tolerant Retrieval
No ratings yet
3.tolerant Retrieval
46 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
Imformation Retrieval
No ratings yet
Imformation Retrieval
48 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
115 pages
lecture3-tolerent
No ratings yet
lecture3-tolerent
81 pages
Lecture2 Indexing
No ratings yet
Lecture2 Indexing
49 pages
Lec4 IR
No ratings yet
Lec4 IR
53 pages
Advanced Topics in Information Systems
No ratings yet
Advanced Topics in Information Systems
175 pages
IR Chap7
No ratings yet
IR Chap7
30 pages
C7 SpellCorrection
No ratings yet
C7 SpellCorrection
43 pages
L14 - Wildcard Queries
No ratings yet
L14 - Wildcard Queries
19 pages
Lecture3 Tolerant Retrieval
No ratings yet
Lecture3 Tolerant Retrieval
48 pages
86b899da87de4ffca2871bfd95e72a27
No ratings yet
86b899da87de4ffca2871bfd95e72a27
20 pages
Lecture 2 - Boolean Retrieval
No ratings yet
Lecture 2 - Boolean Retrieval
49 pages
Lecture5 Spell Correction 1per
No ratings yet
Lecture5 Spell Correction 1per
61 pages
Barite As An Industrial Mineral in Nigeria Occurre
No ratings yet
Barite As An Industrial Mineral in Nigeria Occurre
33 pages
6-Spelling Correction Soundex
No ratings yet
6-Spelling Correction Soundex
52 pages
Lecture 4 - Tolerant-Retrieval Chapter 3
No ratings yet
Lecture 4 - Tolerant-Retrieval Chapter 3
20 pages
6_2018_09_11!11_16_16_AM
No ratings yet
6_2018_09_11!11_16_16_AM
101 pages
Lecture 4-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 4-Dictionaries and Tolerant Retrieval
50 pages
Advanced Indexing Issues
No ratings yet
Advanced Indexing Issues
52 pages
20 Tolerantretrieval
No ratings yet
20 Tolerantretrieval
39 pages
Irs 3
No ratings yet
Irs 3
14 pages
IR Lecture 3b
No ratings yet
IR Lecture 3b
44 pages
Unit I
No ratings yet
Unit I
83 pages
Data Science Presentation
No ratings yet
Data Science Presentation
22 pages
Unit 1
No ratings yet
Unit 1
181 pages
Lecture3 Tolerant Retrieval Handout 6 Per
No ratings yet
Lecture3 Tolerant Retrieval Handout 6 Per
8 pages
IRS Chapter 2
No ratings yet
IRS Chapter 2
57 pages
IR Lecture 3b
No ratings yet
IR Lecture 3b
44 pages
IR Merged Merged
No ratings yet
IR Merged Merged
132 pages
2T-Inverted Index
No ratings yet
2T-Inverted Index
54 pages
2
No ratings yet
2
50 pages
MIR Mod _03(Chapter04-Query languages)
No ratings yet
MIR Mod _03(Chapter04-Query languages)
31 pages
Lecture1-Intro - Realted To Ch1
No ratings yet
Lecture1-Intro - Realted To Ch1
60 pages
lecture1-intro
No ratings yet
lecture1-intro
60 pages
Module 5 - DECISION MAKING
No ratings yet
Module 5 - DECISION MAKING
28 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
69 pages
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
No ratings yet
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
16 pages
2.boolean Retrieval Model
No ratings yet
2.boolean Retrieval Model
40 pages
Information Retrieval: Indexing
No ratings yet
Information Retrieval: Indexing
32 pages
Lecture1 Intro Handout 1 Per
No ratings yet
Lecture1 Intro Handout 1 Per
57 pages
Lecture1 Intro
No ratings yet
Lecture1 Intro
57 pages
lecture02 - IR
No ratings yet
lecture02 - IR
36 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
31 pages
IR-Lec1 - Ch1-2023
No ratings yet
IR-Lec1 - Ch1-2023
41 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
57 pages
Crude Tower Simulation-HYSYS v8.6 PDF
100% (2)
Crude Tower Simulation-HYSYS v8.6 PDF
62 pages
2-Boolean IR and Indexing
No ratings yet
2-Boolean IR and Indexing
46 pages
Geography Hons CBCS Final
No ratings yet
Geography Hons CBCS Final
18 pages
Unit 2 Irt
No ratings yet
Unit 2 Irt
33 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
38 pages
Lec 1 IR
No ratings yet
Lec 1 IR
42 pages
Lecture 5-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 5-Dictionaries and Tolerant Retrieval
48 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
Quality Assurance and Improvement Program
No ratings yet
Quality Assurance and Improvement Program
15 pages
Elie Maths CV 1
No ratings yet
Elie Maths CV 1
1 page
Essay of Book
100% (2)
Essay of Book
5 pages
Anuj Seminar Final Report
No ratings yet
Anuj Seminar Final Report
62 pages
(Prepflix) Wipro NLTH Top 30 HR Round Qns & Ans
No ratings yet
(Prepflix) Wipro NLTH Top 30 HR Round Qns & Ans
8 pages
DV Over HF - General Information
No ratings yet
DV Over HF - General Information
6 pages
Preliminary Examination Facilitating Learning
No ratings yet
Preliminary Examination Facilitating Learning
4 pages
Sheet3
No ratings yet
Sheet3
2 pages
Graphene 3
No ratings yet
Graphene 3
10 pages
Hes 008 - Sas 14
No ratings yet
Hes 008 - Sas 14
1 page
Shock Absorbers Linear Dampers and Dashpots From ACE Controls Inc.
No ratings yet
Shock Absorbers Linear Dampers and Dashpots From ACE Controls Inc.
3 pages
REF Fire Code
No ratings yet
REF Fire Code
4 pages
20 Expanding
No ratings yet
20 Expanding
2 pages
Action Listener
No ratings yet
Action Listener
7 pages
Tabel Hitungan Poligon Tertutup Kav. KPFT Ugm
No ratings yet
Tabel Hitungan Poligon Tertutup Kav. KPFT Ugm
1 page
CONFORMING TO IS:1161-1998 M.S STEEL TUBE GRADE Yst-210 240 For Structural Purposes
No ratings yet
CONFORMING TO IS:1161-1998 M.S STEEL TUBE GRADE Yst-210 240 For Structural Purposes
1 page
Nondestructive Measurement of Dry Film Thickness of Applied Organic Coatings Using An Ultrasonic Gage
No ratings yet
Nondestructive Measurement of Dry Film Thickness of Applied Organic Coatings Using An Ultrasonic Gage
4 pages
Momano Headhunter
100% (1)
Momano Headhunter
3 pages
QAPCO Interview For Welding Piping NDT Inspector
No ratings yet
QAPCO Interview For Welding Piping NDT Inspector
5 pages
Health Care Management
No ratings yet
Health Care Management
6 pages
Health: Quarter 2 - Module 1: The Healthy School and Community Environments
100% (1)
Health: Quarter 2 - Module 1: The Healthy School and Community Environments
19 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

lec7

Uploaded by

lec7

Uploaded by

WILD-CARD QUERIES

B-trees handle *’s at the end of a query

Permuterm query processing

Bigram (k-gram) indexes

– $ is a special word boundary symbol

Bigram index example

Processing wild-card queries

You might also like