0% found this document useful (0 votes)

11 views10 pages

Answer Model Final2021-2022Term1

The document describes a k-means clustering algorithm question. It provides an explanation of the k-means algorithm and its flowchart. It then applies the algorithm to cluster eight sample points into three clusters over two iterations, calculating distances between points and cluster centers and updating the cluster centers.

Uploaded by

Nada Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

Answer Model Final2021-2022Term1

Uploaded by

Nada Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Benha University Faculty of Computers & AI

1th Term (Nov. 2021) Final Exam Date: 09/01/2022

‫نموذج إجابة‬
Time: 3 hours
Class: The fourth Year Total Marks: 65 Marks
Subject: Big Data Examiner(s): Prof. E. Badr
Course Code: SC 446

Answer the following questions [ 4 questions in 2 page]:

Question No. 1 [20 Marks]

a) Write k-means clustering algorithm and explain it by its flowchart?

b) Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1 (2, 10), A2 (2, 5), A3 (8, 4), A4 (5, 8), A5 (7, 5), A6 (6, 4), A7 (1, 2), A8 (4, 9)
Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
Ρ(a, b) = |x2 – x1| + |y2 – y1|
Use K-Means Algorithm to find the three cluster centers after the second iteration.
Solution:
a)

Page 1 of 10
K-Means Clustering Algorithm-
Step-01:
Choose the number of clusters K.
Step-02:
 Randomly select any K data points as cluster centers.
 Select cluster centers in such a way that they are as farther as possible from each
other.
Step-03:
Calculate the distance between each data point and each cluster center.
 The distance may be calculated either by using given distance function or by using
euclidean distance formula.
Step-04:
 Assign each data point to some cluster.
 A data point is assigned to that cluster whose center is nearest to that data point.
Step-05:
Re-compute the center of newly formed clusters.
 The center of a cluster is computed by taking mean of all the data points contained
in that cluster.
Step-06:
Keep repeating the procedure from Step-03 to Step-05 until any of the following
stopping criteria is met-
 Center of newly formed clusters do not change
 Data points remain present in the same cluster
Maximum number of iterations are reached

b)
We follow the above discussed K-Means Clustering Algorithm-
Iteration-01:
 We calculate the distance of each point from each of the center of the three clusters.
 The distance is calculated by using the given distance function.
The following illustration shows the calculation of distance between point A1(2, 10) and each of the center of
the three clusters-
Calculating Distance Between A1(2, 10) and C1(2, 10)-
Ρ(A1, C1)

Page 2 of 10
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0
Calculating Distance Between A1(2, 10) and C2(5, 8)-
Ρ(A1, C2)
= |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = 3 + 2 = 5
Calculating Distance Between A1(2, 10) and C3(1, 2)-
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = 1 + 8 = 9
In the similar manner, we calculate the distance of other points from each of the center of the three clusters.
Next,
 We draw a table showing all the results.
 Using the table, we decide which point belongs to which cluster.
 The given point belongs to that cluster whose center is nearest to it.

Distance
Distance from from Distance from Point
Given
center (2, 10) center (5, center (1, 2) of belongs to
Points
of Cluster-01 8) of Cluster-03 Cluster
Cluster-02

A1(2, 10) 0 5 9 C1

A2(2, 5) 5 6 4 C3

A3(8, 4) 12 7 9 C2

A4(5, 8) 5 0 10 C2

A5(7, 5) 10 5 9 C2

A6(6, 4) 10 5 7 C2

A7(1, 2) 9 10 0 C3

A8(4, 9) 3 2 10 C2

From here, New clusters are-

Cluster-01:
First cluster contains points-
 A1(2, 10)

Page 3 of 10
Cluster-02:
Second cluster contains points-
 A3(8, 4)
 A4(5, 8)
 A5(7, 5)
 A6(6, 4)
 A8(4, 9)
Cluster-03:
Third cluster contains points-
 A2(2, 5)
 A7(1, 2)
Now,
 We re-compute the new cluster clusters.
 The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01:
 We have only one point A1(2, 10) in Cluster-01.
 So, cluster center remains the same.
For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-01.
Iteration-02:
 We calculate the distance of each point from each of the center of the three clusters.
 The distance is calculated by using the given distance function.
The following illustration shows the calculation of distance between point A1(2, 10) and each of the center of
the three clusters-
Calculating Distance Between A1(2, 10) and C1(2, 10)-
Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|= 0

Page 4 of 10
Calculating Distance Between A1(2, 10) and C2(6, 6)-
Ρ(A1, C2)
= |x2 – x1| + |y2 – y1| = |6 – 2| + |6 – 10| = 4 + 4= 8
Calculating Distance Between A1(2, 10) and C3(1.5, 3.5)-
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1| = |1.5 – 2| + |3.5 – 10| = 0.5 + 6.5 = 7
In the similar manner, we calculate the distance of other points from each of the center of the three clusters.
Next,
 We draw a table showing all the results.
 Using the table, we decide which point belongs to which cluster.
 The given point belongs to that cluster whose center is nearest to it.

Distance
Distance Distance
from
from center from center Point belongs to
Given Points center (6,
(2, 10) of (1.5, 3.5) of Cluster
6) of
Cluster-01 Cluster-03
Cluster-02

A1(2, 10) 0 8 7 C1

A2(2, 5) 5 5 2 C3

A3(8, 4) 12 4 7 C2

A4(5, 8) 5 3 8 C2

A5(7, 5) 10 2 7 C2

A6(6, 4) 10 2 5 C2

A7(1, 2) 9 9 2 C3

A8(4, 9) 3 5 8 C1

From here, New clusters are-

Cluster-01:
First cluster contains points-
 A1(2, 10)
 A8(4, 9)
Cluster-02:
Second cluster contains points-
 A3(8, 4)
 A4(5, 8)
 A5(7, 5)

Page 5 of 10
 A6(6, 4)

Cluster-03:
Third cluster contains points-
 A2(2, 5)
 A7(1, 2)
Now,
 We re-compute the new cluster clusters.
 The new cluster center is computed by taking mean of all the points contained in that cluster.

For Cluster-01:
Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5)

For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)

For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)

This is completion of Iteration-02.

After second iteration, the center of the three clusters are-

 C1(3, 9.5)
 C2(6.5, 5.25)
 C3(1.5, 3.5)

Question 2 [20 Marks]

a) Derivatve two equations ( using least square method) that determine the constant A and B for the best fit
curve Y = AX + B ?

b) Fit the least square line to the following data and find Y(10)
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Solution:
a)
Since d i  [ y i  ( y i )curve ]
2 2

Since ( y i )curve  Ax i  B

Then d i  [ y i  (Ax i  B )]
2 2

Taking the summation from 1 to n

Page 6 of 10
n n

d
i 1
i
2
 [ y i  (Ax i  B )]2
i 1

This summation is a function of A and B only why?

n
E (A , B )  [ y i  (Ax i  B )]2
i 1

E (A , B ) n
  2[ y i  (Ax i  B )]1 (x i )  0
A i 1

n n n
A  x i2  B  x i   x i y i ----------------------------(1)
i 1 i 1 i 1

E (A , B ) n
  2[ y i  (Ax i  B )]1 (1)  0
B i 1

n n n
A  x i  nB  x i   y i …………………………….. (2)
i 1 i 1 i 1

b)
x y x2 xy

1 1 1 1

3 2 9 6

4 4 16 16

6 4 36 24

8 5 64 40

9 7 81 63

11 8 121 88

14 9 196 126

  56   40   524   364

Page 7 of 10
From the equation (2) 56A+8B=40--------1
From the equation (1) 524A + 56B =364---2
Then B = 6/11 and A =7/11 so y= 7/11 (x) + 6/11 therefore y(10)= 76/11

Question 3 [20 Marks]

Choose the correct answer:
1) True-False: Linear Regression is a supervised machine learning algorithm.
A) TRUE B) FALSE

2) Which of the following methods do we use to find the best fit line for data in Linear Regression?
A) Least Square Error B) Maximum Likelihood C) Logarithmic Loss D) Both A and B

3) Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
A) AUC-ROC B) Accuracy C) Logloss D) Mean-Squared-Error

4) Which of the following statement is true about outliers in Linear regression?

A) Linear regression is sensitive to outliers B) Linear regression is not sensitive to outliers
C) Can’t say D) None of these

Question Context 5:
Consider the following data where one input(X) and one output(Y) is given.

5) What would be the root mean square training error for this data if you run a Linear Regression
model of the form (Y = A0+A1X)?
A) Less than 0 B) Greater than zero C) Equal to 0 D) None of these

Question Context 6:
Suppose, you got a situation where you find that your linear regression model is under fitting the data.
6) In such situation which of the following options would you consider?
1.Add more variables 2.Start introducing polynomial degree variables 3.Remove some variables
A) 1 and 2 B) 2 and 3 C) 1 and 3 D) 1, 2 and 3

7) In practice, Line of best fit or regression line is found when _____________

a) Sum of residuals (∑(Y – h(X))) is minimum
b) Sum of the absolute value of residuals (∑|Y-h(X)|) is maximum
c) Sum of the square of residuals ( ∑ (Y-h(X))2) is minimum

Page 8 of 10
d) Sum of the square of residuals ( ∑ (Y-h(X))2) is maximum

8) If Linear regression model perfectly first i.e., train error is zero, then
a) Test error is also always zero b) Test error is non zero
c) Couldn’t comment on Test error d) Test error is equal to Train error

9) How many coefficients do you need to estimate in a simple linear regression model (One independent
variable)?
a) 1 b) 2 c) 3 d) 4

10) In a simple linear regression model (One independent variable), If we change the input variable by
1 unit. How much output variable will change?
a) by 1 b) no change c) by intercept d) by its slope

11) In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to ………….
a) (X-intercept, Slope) b) (Slope, X-Intercept) c) (Y-Intercept, Slope) d) (slope, Y-Intercept)

Question No. 4 [5 Marks]

Formulate the travelling salesman problem as a mathematical model and apply this mathematical model on
the complete graph K4 ?
Solution:
Label the cities as 1, 2, . . ., n , in which n is the total number of cities and arbitrarily assume 1 as the origin.
Define the decisions variables:

In addition, for each city i = 1, 2, . . ., n , let u i   

be an auxiliary variable and let cij be the distance between cities i and j and . Then, the MTZ formulation to
the TSP is the following:

Page 9 of 10
GOOD LUCK
Prof. Dr. E. Badr

Page 10 of 10

K Means Example
No ratings yet
K Means Example
8 pages
Unit 5
No ratings yet
Unit 5
189 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
Unit IV
No ratings yet
Unit IV
51 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
Unit V
No ratings yet
Unit V
165 pages
K Means
No ratings yet
K Means
25 pages
Unit 4 - K-Means Clustering Algorithm With Examples
No ratings yet
Unit 4 - K-Means Clustering Algorithm With Examples
14 pages
08 K-Means
No ratings yet
08 K-Means
19 pages
Clustering Solved Examples
No ratings yet
Clustering Solved Examples
13 pages
K Means
No ratings yet
K Means
66 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
K Means
No ratings yet
K Means
14 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
3 00f3f2a7d5 K Means
No ratings yet
3 00f3f2a7d5 K Means
13 pages
K Means
No ratings yet
K Means
19 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
19 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
ME3435E ADDTE Lect33 Machine Learning For Signal Processing 07.04.25
No ratings yet
ME3435E ADDTE Lect33 Machine Learning For Signal Processing 07.04.25
16 pages
Kmeans Clustering Lecture 8
No ratings yet
Kmeans Clustering Lecture 8
20 pages
Segmentaion 6S
No ratings yet
Segmentaion 6S
37 pages
Module 5
No ratings yet
Module 5
98 pages
K-Means Clustering
No ratings yet
K-Means Clustering
21 pages
DWM Question Bank Solution
No ratings yet
DWM Question Bank Solution
23 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
Example 1
No ratings yet
Example 1
8 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Machine Learning Week 8 Homework
No ratings yet
Machine Learning Week 8 Homework
5 pages
Lect 4
No ratings yet
Lect 4
34 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
K Means Alg, Example
No ratings yet
K Means Alg, Example
9 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
K - Means Clustering
No ratings yet
K - Means Clustering
8 pages
Updated - K-Means Naive Bayes
No ratings yet
Updated - K-Means Naive Bayes
11 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
K Means Example
No ratings yet
K Means Example
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
HW 8
No ratings yet
HW 8
4 pages
K Means Tutorial
No ratings yet
K Means Tutorial
8 pages
Kmeans Clustering Numerical - 1
No ratings yet
Kmeans Clustering Numerical - 1
5 pages
KMeans Example
No ratings yet
KMeans Example
8 pages
Assignment1 M0719077 Naufal Adhi Iyansyah
No ratings yet
Assignment1 M0719077 Naufal Adhi Iyansyah
4 pages
K Means Example
No ratings yet
K Means Example
10 pages
Q1 Week 6 Polynomials Long Division
No ratings yet
Q1 Week 6 Polynomials Long Division
19 pages
Personal Statement
No ratings yet
Personal Statement
3 pages
Chapter 11 Design and Support of Tunnels: Operational Criteria and Principal Support Types
No ratings yet
Chapter 11 Design and Support of Tunnels: Operational Criteria and Principal Support Types
18 pages
DIPMAT41 Notes
No ratings yet
DIPMAT41 Notes
96 pages
Tensor Numerical Methods in Scientific Computing Boris N Khoromskij Download
100% (1)
Tensor Numerical Methods in Scientific Computing Boris N Khoromskij Download
83 pages
Lab Report 5
No ratings yet
Lab Report 5
6 pages
Module - 2 Updated On 29 July 2020-1 PDF
No ratings yet
Module - 2 Updated On 29 July 2020-1 PDF
79 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
Mesh Adaptation For Computational Fluid Dynamics 1: Continuous Riemannian Metrics and Feature-Based Adaptation Alain Dervieuxdownload
100% (2)
Mesh Adaptation For Computational Fluid Dynamics 1: Continuous Riemannian Metrics and Feature-Based Adaptation Alain Dervieuxdownload
54 pages
Dca2101 Computer Oriented Numerical Methods
No ratings yet
Dca2101 Computer Oriented Numerical Methods
5 pages
Chapter 6 - Integer Programming (Part 1)
No ratings yet
Chapter 6 - Integer Programming (Part 1)
22 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Cubic Spline
No ratings yet
Cubic Spline
19 pages
NME-Methods For ODEs
No ratings yet
NME-Methods For ODEs
18 pages
LU Factorisation of A Matrix
No ratings yet
LU Factorisation of A Matrix
10 pages
Interpolation
No ratings yet
Interpolation
13 pages
Dynamic Programming vs. Divide-&-Conquer: Independent
No ratings yet
Dynamic Programming vs. Divide-&-Conquer: Independent
11 pages
2018 2 Solutions
No ratings yet
2018 2 Solutions
9 pages
23MA1407 - or - Syllabus
No ratings yet
23MA1407 - or - Syllabus
3 pages
LEC 3 Matries Direct Method
No ratings yet
LEC 3 Matries Direct Method
14 pages
Simplex Method
No ratings yet
Simplex Method
11 pages
Methods of Determining Potential Distribution: Research
No ratings yet
Methods of Determining Potential Distribution: Research
10 pages
Interpolation Slides - EFTF
No ratings yet
Interpolation Slides - EFTF
8 pages
ISYE6669 LP 10 21 1 - AndySun - FW
No ratings yet
ISYE6669 LP 10 21 1 - AndySun - FW
8 pages
Activity Sheet 1 (Factoring Polynomials)
No ratings yet
Activity Sheet 1 (Factoring Polynomials)
4 pages
Lec 5
No ratings yet
Lec 5
5 pages
Homework # 8 Solutions: Math 102, Winter 2016
No ratings yet
Homework # 8 Solutions: Math 102, Winter 2016
3 pages
1024 Lagrange Multipliers
No ratings yet
1024 Lagrange Multipliers
2 pages
Numet Activity 4 - Nono
No ratings yet
Numet Activity 4 - Nono
2 pages
1 Logistic Regression
No ratings yet
1 Logistic Regression
1 page
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
From Everand
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
Manish Soni
No ratings yet
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Geometry and Locus (Geometry) Mathematics Question Bank
From Everand
Geometry and Locus (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet