06 VectorSpaceModel PDF
06 VectorSpaceModel PDF
Jaime Arguello
INLS 509: Information Retrieval
[email protected]
retrieved objects
evaluation
4
Y Y
X X
Z 11
Y Y
X X
Z
12
Y Y
y y
X X
z
x
Z
13
Y Y
X X
15
16
bite 1
17
18
19
man
1
dog
bite
20
man
dog
1
bite
21
man
1
dog
1
1
bite
22
man
23
man
dog
24
V
∑ xi × yi
i =1
25
V able 0 1 0
∑ xi × yi :: :: :: ::
i =1
zoom 0 0 0
inner product => 2
26
V zoom 0 0 0
∑ xi × yi inner product => 2
i =1
27
28
29
30
V
∑ i =1xi × yi
! !
V 2× V 2
x
∑ i =1 i y
∑ i =1 i
length of length of
vector x vector y
31
32
33
query 0 1 0 0 1 ... 1
• 0’s and 1’s indicate whether the term occurs (at least
once) in the document/query
• Let’s explore a more sophisticated representation 34
• Plot:
Rocky Balboa is a struggling boxer trying to make the big time. Working in a meat factory in Philadelphia for a
pittance, he also earns extra cash as a debt collector. When heavyweight champion Apollo Creed visits
Philadelphia, his managers want to set up an exhibition match between Creed and a struggling boxer, touting the
fight as a chance for a "nobody" to become a "somebody". The match is supposed to be easily won by Creed, but
someone forgot to tell Rocky, who sees this as his only shot at the big time. Rocky Balboa is a small-time boxer
who lives in an apartment in Philadelphia, Pennsylvania, and his career has so far not gotten off the canvas. Rocky
earns a living by collecting debts for a loan shark named Gazzo, but Gazzo doesn't think Rocky has the
viciousness it takes to beat up deadbeats. Rocky still boxes every once in a while to keep his boxing skills sharp,
and his ex-trainer, Mickey, believes he could've made it to the top if he was willing to work for it. Rocky, goes to a
pet store that sells pet supplies, and this is where he meets a young woman named Adrian, who is extremely shy,
with no ability to talk to men. Rocky befriends her. Adrain later surprised Rocky with a dog from the pet shop that
Rocky had befriended. Adrian's brother Paulie, who works for a meat packing company, is thrilled that someone
has become interested in Adrian, and Adrian spends Thanksgiving with Rocky. Later, they go to Rocky's apartment,
where Adrian explains that she has never been in a man's apartment before. Rocky sets her mind at ease, and they
become lovers. Current world heavyweight boxing champion Apollo Creed comes up with the idea of giving an
unknown a shot at the title. Apollo checks out the Philadelphia boxing scene, and chooses Rocky. Fight promoter
Jergens gets things in gear, and Rocky starts training with Mickey. After a lot of training, Rocky is ready for the
match, and he wants to prove that he can go the distance with Apollo. The 'Italian Stallion', Rocky Balboa, is an
aspiring boxer in downtown Philadelphia. His one chance to make a better life for himself is through his boxing
and Adrian, a girl who works in the local pet store. Through a publicity stunt, Rocky is set up to fight Apollo Creed,
the current heavyweight champion who is already set to win. But Rocky really needs to triumph, against all the
odds...
35
N
id f t = log( )
d ft
38
t f t × id f t
40
$(,-$)! $")). $6"")-' ! ! .)3'! .#)+-! ),"-+! )8)"&! )4(*3*'*#-! )4'",! /,"! /*5('! /#" 5,99# ! ! 5)'+ ! 5*"2
43
$,##7*7 !$#))*$5-.1! $#))*$5#" ! $"**+ $8""*.5 +*'+/*'57 ! ! ! +*/5! +*/57! +-75'.$*! +#*7.! +#4.5#4.
*'".7! *'7*! *'7-)& ! *0,-/-5-#.! *05"'! *05"*6*)& ! 9'$5#"& ! 9-1,5! 9#"1#5! 1'22# ! 1*'"! 1#55*.
,*'3&4*-1,5 ! ,-7 ! -7 ! :*"1*.7! )'5*"! )#'.! )#5! )#3*"7! 6'.'1*"7! 6'5$,! 6*'5! 6-$%*&! .'6*+
.#/#+& ! #++7 ! ('$%-.1! ('8)-*! (*..7&)3'.-'! (*5 (,-)'+*)(,-' (-55'.$* ("#6#5*"
! ! !
(8/)-$-5&! "*'+& ! "#$%& ! 7*))7 ! 7*5! 7,'"% ! 7,'"(! 7,#5! 7,& ! 7#6*/#+&! 7#6*#.*! 75'))-#.! 75#"*
75"811)-.1! 758.5! 78(()-*7! 78((#7*+! 78"("-7*+! 5,'.%71-3-.1! 5,-.%! 5,"-))*+! 5-6*! 5-5)*! 5#85-.1! 5"'-.*"! 5"'-.-.1
5"-86(,!8(!3*!3-$-#87.*77!3-7-57 !4,*"*!4,#!4-))-.1!4#.!4#"%7
44
*$'+! *$7()$&"$"
*$7()$&"%! *$++$(! *#8$(! *#8$%! *#8)&,
!
)+'3)'& .$(,$&%
! 3#'& 3#+ 3#2$(% 9'&',$(% 9'+04 9$'+
! :$$/! 3)2)&,! ! ! ! ! !
($'"5 (#0:5 %$33% %4'(: %4'(/ %4#/ %45 %:)33% %#9$*#"5 %/$&"%
! ! ! ! ! ! ! ! !
6#&
45
• TF usually equals 1
N
id f t = log( )
d ft
46
man
doc_1
query
doc_2
bite
48
V
xi × yi
∑ i =1
! !
V V
∑i=1 xi × ∑i=1 y2i
2
(1 × 1) + (0 × 1) + (1 × 0)
√ √ = 0.5
12 + 02 + 12 × 12 + 12 + 02
49
Z = dog
basis vectors for 3-dimensional space
50
w1 w2 MI w1 w2 MI
francisco san 6.619 dollars million 5.437
angeles los 6.282 brooke rick 5.405
prime minister 5.976 teach lesson 5.370
united states 5.765 canada canadian 5.338
9 11 5.639 un ma 5.334
winning award 5.597 nicole roman 5.255
brooke taylor 5.518 china chinese 5.231
con un 5.514 japan japanese 5.204
un la 5.512 belle roman 5.202
belle nicole 5.508 border mexican 5.186 52
Z
53
54
55
56
57
• Assumption:
y=x
2
1
y = 1 + log( x )
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
-2
-3
59
60
! " ! "
N N
t f t × log (1 + log(t f t )) × log
d ft d ft
term tf.idf (linear tf) tf.idf (sub-linear tf)
rocky 96.72 20.08
philadelphia 30.95 16.15
boxer 22.19 13.24
fight 10.02 7.01
mickey 8.96 7.58
for 4.75 2.00
61
62
63
‣ a document
‣ a query
‣ a sentence
‣ a word
‣ an entire encyclopedia
• Rank documents based on their cosine similarity to query
64
• A power tool!
65
66
67
68
news
shopping
images
69
70
71
computers
sports
politics
72
73
74
‣ a document
‣ a query
‣ a sentence
‣ a word
‣ an entire encyclopedia
• Rank documents based on their cosine similarity to query
75