Lab 2
Lab 2
Lab 2
Lab Content
What is Data?
Types of Attributes . ❑ Similarity Measures
Properties of Attribute Values ✓ cos similarity
🠶
Properties of Attribute Values
🠶 The type of an attribute depends on which of the following properties it possesses:
❖ Distinctness: =
❖ Order: < >
❖ Addition: + -
❖ Multiplication: */
🠶 Dissimilarity
▪ Numerical measure of how different are two data objects
▪ Lower when objects are more alike
▪ Minimum dissimilarity is often 0
▪ Upper limit varies
❑ Euclidean distance
❑ Mahalanobis Distance
❑ Minkowski distance
❑ Supermum distance
Euclidean Distance
Example
x = (0, 1, 0, 1), y = (1, 0, 1, 0)
Euclidean distance =
=2
Euclidean Distance (cont.)
Example
Mahalanobis Distance
Example
X= (2,3), y= (3,4)
🠶 r = 2 Euclidean distances
L p1 p2 p3 p4
p1 0 2 3 5
p2 2 0 1 3
p3 3 1 0 2
p4 5 3 2 0
Distance Matrix
For the following vectors, x and y, calculate the distance
measures.
1. supremum distance
2. Euclidean Distance
3. Mahalanobis Distance
Solutions
• X= (2,0) , Y=(5,1)
1. supremum distance = 3
2. Euclidean Distance = 10 = 3.162
3. Mahalanobis Distance = 4
Similarity: Cosine Similarity
🠶 If d1 and d2 are two document vectors, then
cos (d1, d2 ) = (d1 • d2) / ||d1|| ||d2||
where • indicates vector dot product and || d || is the length of vector d.
🠶 Example:
d1 = 3 2 0 5 0 0 0 2 0 0
d2 = 1 0 0 0 0 0 0 1 0 2
d1 • d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481
||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2) 0.5 = (6) 0.5 = 2.245
cos (d1, d2)= 0.3150
Extended Jaccard Coefficient (Tanimoto)
🠶 Example
d1 = ( 0 , 1 , 0 , 1)
d2 = ( 1 , 0 , 1 , 0 )
Solutions