1. What is the primary goal of unsupervised learning?
A) Minimize error between predicted and actual outputs
B) Discover patterns or structures in input data
C) Classify data into predefined categories
D) Predict future outcomes based on historical data
Answer: B) Discover patterns or structures in input data
2. Which algorithm is commonly used for clustering in unsupervised learning?
A) Decision Trees
B) Support Vector Machines
C) K-Means
D) Random Forest
Answer: C) K-Means
3. Which of the following tasks is NOT typically associated with unsupervised learning?
A) Anomaly detection
B) Dimensionality reduction
C) Classification
D) Clustering
Answer: C) Classification
4. What is a common application of Principal Component Analysis (PCA) in unsupervised
learning?
A) Data labeling
B) Outlier detection
C) Feature extraction and dimensionality reduction
D) Classifying data points into categories
Answer: C) Feature extraction and dimensionality reduction
5. Which evaluation metric is often used to assess the quality of clustering algorithms?
A) F1 Score
B) Accuracy
C) Silhouette Score
D) Mean Absolute Error
Answer: C) Silhouette Score
6. Which of the following is NOT a type of clustering algorithm?
A) Hierarchical Clustering
B) K-Means Clustering
C) DBSCAN
D) Decision Tree
Answer: D) Decision Tree
7. Which unsupervised learning technique is suitable for detecting outliers in a dataset?
A) K-Means clustering
B) PCA
C) Isolation Forest
D) Linear Regression
Answer: C) Isolation Forest
8. What does the "elbow method" typically help determine in K-Means clustering?
A) The optimal number of clusters
B) The best feature to use for clustering
C) The outliers in the dataset
D) The distance between data points
Answer: A) The optimal number of clusters
9. Which clustering algorithm is known for its ability to handle clusters of arbitrary shapes and
sizes?
A) K-Means
B) DBSCAN
C) Agglomerative Hierarchical Clustering
D) Divisive Clustering
Answer: B) DBSCAN
10. What does DBSCAN stand for?
A) Density-Based Spatial Clustering of Applications with Noise
B) Distribution-Based Spectral Clustering with Accuracy
C) Divisive Binary Search Clustering for Analyzing Networks
D) Density-Boosted Sequential Classification with Neural Networks
Answer: A) Density-Based Spatial Clustering of Applications with Noise
11. Which of the following is a characteristic of DBSCAN?
A) It requires the number of clusters to be specified in advance.
B) It assigns each data point to the nearest centroid.
C) It is sensitive to the order of data points.
D) It can identify outliers as noise.
Answer: D) It can identify outliers as noise.
12. Which clustering algorithm is based on the concept of "density reachability" and "density
connectivity"?
A) K-Means
B) Agglomerative Hierarchical Clustering
C) DBSCAN
D) Distribution Model-Based Clustering
Answer: C) DBSCAN
13. What does the acronym "DB" in DBSCAN refer to?
A) Density-Based
B) Distribution-Based
C) Divisive-Based
D) Distance-Based
Answer: A) Density-Based
14. Which clustering algorithm forms clusters by merging or dividing them based on their
distance or similarity?
A) DBSCAN
B) Distribution Model-Based Clustering
C) Agglomerative Hierarchical Clustering
D) Divisive Clustering
Answer: C) Agglomerative Hierarchical Clustering
15. In hierarchical clustering, what is the agglomerative approach?
A) Starting with each point as a separate cluster and then merging them iteratively.
B) Starting with one cluster containing all points and then dividing it into smaller
clusters.
C) Assigning each point to the nearest centroid.
D) Dividing the dataset into equal-sized partitions.
Answer: A) Starting with each point as a separate cluster and then merging them iteratively.
16. Which of the following clustering algorithms does not require the number of clusters to be
specified beforehand?
A) K-Means
B) Distribution Model-Based Clustering
C) Agglomerative Hierarchical Clustering
D) DBSCAN
Answer: D) DBSCAN
17. Which clustering algorithm is based on the concept of iteratively partitioning the data into
clusters until convergence?
A) K-Means
B) DBSCAN
C) Agglomerative Hierarchical Clustering
D) Distribution Model-Based Clustering
Answer: A) K-Means
18. Which clustering algorithm is sensitive to the initial placement of centroids?
A) DBSCAN
B) Agglomerative Hierarchical Clustering
C) K-Means
D) Divisive Clustering
Answer: C) K-Means
19. Which clustering algorithm is based on modeling the distribution of data points in the feature
space?
A) K-Means
B) DBSCAN
C) Distribution Model-Based Clustering
D) Agglomerative Hierarchical Clustering
Answer: C) Distribution Model-Based Clustering
20. In hierarchical clustering, which approach starts with one cluster containing all points and
then divides it into smaller clusters?
A) Agglomerative
B) Divisive
C) DBSCAN
D) K-Means
Answer: B) Divisive
21. What is the main disadvantage of agglomerative hierarchical clustering?
A) It requires a predefined number of clusters.
B) It is computationally expensive for large datasets.
C) It cannot handle clusters of arbitrary shapes.
D) It is sensitive to outliers.
Answer: B) It is computationally expensive for large datasets.
22. Which clustering algorithm is based on the concept of partitioning the data into spherical
clusters?
A) DBSCAN
B) Agglomerative Hierarchical Clustering
C) K-Means
D) Distribution Model-Based Clustering
Answer: C) K-Means
23. Which clustering algorithm forms clusters by continuously merging the nearest clusters until
a stopping criterion is met?
A) K-Means
B) DBSCAN
C) Agglomerative Hierarchical Clustering
D) Distribution Model-Based Clustering
Answer: C) Agglomerative Hierarchical Clustering
24. Which distance measure calculates the distance between two points as the sum of the absolute
differences of their coordinates?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Minkowski Distance
Answer: B) Manhattan Distance
25. Which distance measure is often used for text mining and document clustering, where the
magnitude of the vectors is not important?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Hamming Distance
Answer: C) Cosine Similarity
26. What is the range of values for Euclidean distance?
A) [0, ∞)
B) (-∞, ∞)
C) [0, 1]
D) [0, n]
Answer: A) [0, ∞)
27. Which distance measure is suitable for data with binary attributes?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Hamming Distance
Answer: D) Hamming Distance
28. Which distance measure is also known as the L2 norm?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Hamming Distance
Answer: A) Euclidean Distance
29. Which distance measure considers both the magnitude and orientation of vectors, often used
in recommendation systems and text mining?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Hamming Distance
Answer: C) Cosine Similarity
30. Which distance measure is also known as the city block distance or taxicab metric?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Minkowski Distance
Answer: B) Manhattan Distance
31. Which distance measure generalizes both Euclidean and Manhattan distances based on a
parameter p?
A) Euclidean Distance
B) Manhattan Distance
C) Cosine Similarity
D) Minkowski Distance
Answer: D) Minkowski Distance
32. What is the primary goal of the K-means clustering algorithm?
A) Minimize the within-cluster variance
B) Maximize the between-cluster variance
C) Minimize the number of clusters
D) Maximize the number of clusters
Answer: A) Minimize the within-cluster variance
33. What is the typical initialization method for K-means clustering?
A) Randomly selecting K data points as initial cluster centroids
B) Sorting the data points based on their features
C) Initializing all cluster centroids to zero
D) Using hierarchical clustering to determine initial centroids
Answer: A) Randomly selecting K data points as initial cluster centroids
34. Which step is repeated iteratively in the K-means algorithm until convergence?
A) Randomly initializing cluster centroids
B) Calculating the distance between data points and centroids
C) Updating the cluster centroids based on the mean of data points in each cluster
D) Assigning data points to the nearest centroid
Answer: C) Updating the cluster centroids based on the mean of data points in each cluster
35. What is the computational complexity of the K-means algorithm?
A) O(n log n)
B) O(n^2)
C) O(n)
D) O(n^3)
Answer: D) O(n^3)
36. How does K-means determine the optimal number of clusters?
A) By performing feature selection
B) By using the Silhouette score
C) By plotting the elbow method curve
D) By using hierarchical clustering to identify clusters
Answer: C) By plotting the elbow method curve
37. Which of the following is a limitation of the K-means algorithm?
A) It is sensitive to the initial placement of centroids
B) It cannot handle non-linearly separable clusters
C) It requires a predefined number of clusters
D) It is computationally expensive for large datasets
Answer: A) It is sensitive to the initial placement of centroids
38. What does the "K" in K-means represent?
A) Kernel
B) Kernels
C) K-meaning
D) Number of clusters
Answer: D) Number of clusters
39. Which step follows centroid initialization in the K-means algorithm?
A) Assigning data points to the nearest centroid
B) Randomly selecting K data points as initial cluster centroids
C) Calculating the distance between data points and centroids
D) Updating the cluster centroids based on the mean of data points in each cluster
Answer: A) Assigning data points to the nearest centroid
40. A retail company wants to segment its customers based on their purchasing behavior. Which
clustering algorithm is most suitable for this case study?
A) K-Means
B) Decision Trees
C) Linear Regression
D) Support Vector Machines
Answer: A) K-Means
41. What would be the most likely input data for this customer segmentation task?
A) Images of products
B) Customer reviews
C) Customer purchase history and demographics
D) Employee performance metrics
Answer: C) Customer purchase history and demographics
42. In customer segmentation, what could be potential clusters that the retail company might
identify?
A) Types of products purchased
B) Customer satisfaction ratings
C) Employee job roles
D) Geographical locations of customers
Answer: A) Types of products purchased
43. How might the retail company utilize customer segmentation results?
A) To optimize employee schedules
B) To personalize marketing strategies
C) To increase office productivity
D) To reduce manufacturing costs
Answer: B) To personalize marketing strategies
44. Which evaluation metric would the retail company likely use to assess the quality of customer
segmentation?
A) F1 Score
B) Accuracy
C) Silhouette Score
D) Mean Squared Error
Answer: C) Silhouette Score
45. After clustering, what action might the retail company take based on the segments identified?
A) Reduce the number of products offered
B) Send targeted promotional offers to specific customer segments
C) Increase product prices uniformly
D) Expand into new geographical regions
Answer: B) Send targeted promotional offers to specific customer segments
46. What preprocessing steps might be necessary before applying clustering to the customer data?
A) Normalizing numerical features and encoding categorical features
B) Removing missing values from the dataset
C) Scaling the target variable
D) Transforming text data into numerical representations
Answer: A) Normalizing numerical features and encoding categorical features
47. What potential challenges might the retail company encounter when implementing customer
segmentation?
A) Difficulty in accessing customer data
B) Lack of computational resources
C) Inadequate domain knowledge
D) Limited availability of clustering algorithms
Answer: C) Inadequate domain knowledge