CMSC-691-Assignment-3
CMSC-691-Assignment-3
Assignment 3
Total Points – 30 Due: April 23, 2025
This assignment consists of two parts. For each part, please answer all the questions in a single
document. Also submit the R files or python files. You will find both the datasets in the
CourseDataSet folder in Blackboard.
Part 1: [15]
Using Breadbasket dataset, do the following:
1. Describe the dataset and list the distinct items in the dataset.
2. Do a market basket analysis and uncover the association rules. Make sure your rules have
least two items or at most five items. Then filter the rules so that “Coffee” doesn’t appear
on the right hand side.
3. Sort the rules using metrics of your choice (e.g. lift etc.).
4. Choose a rule from the top 3 rules and describe it and explain what information it provides
you.
Part 2: [15]
1. You will use MongoDB for this part. First, download MongoDB in your computer. Then
do the following.
2. For this assignment, it is best that you use Python.
3. Please download movies, tags and ratings files. Write a program to read the given 3
different csv files (movies, ratings, tags), and insert all the records into 3 different
collections (movies, ratings, tags).
4. Next, write a program to add five movies that you have watched this year, or you would
like to watch to the collection “movies”. Make sure that you assign unique movie IDs
and specify the genres (genres need not be completely accurate).
5. Corresponding to the movies that you added, write a program to add some suitable
ratings to the collection “ratings” and some suitable “tags” to the collection “tags”.
Make sure that you use a unique userid for yourself.
6. For the following items, you must use Aggregation Pipeline. If you use any other
method, no credit will be given.
a. Develop code to find number of movies released per year.
b. Develop code to find number of movies per genre.
c. Develop code to find number of movies per rating.
d. Develop code to find number of movies tagged.
e. Develop code to find the most popular tag.
7. What to submit?
a. Jupyter Notebook file that contains all the above code.
b. Summarize all the data you added.
c. A document that summarizes what you learnt while doing the Assignment.
Links:
• https://docs.mongodb.com/manual/administration/install-community/
• https://docs.mongodb.com/manual/installation/
• https://docs.mongodb.com/drivers/pymongo/
• https://www.mongodb.com/developer/quickstart/python-quickstart-aggregation/
• https://www.analyticsvidhya.com/blog/2020/08/how-to-create-aggregation-pipelines-
in-a-mongodb-database-using-pymongo/
• https://www.mongodb.com/docs/manual/core/aggregation-pipeline/
• https://www.mongodb.com/basics/aggregation-pipeline
• https://www.mongodb.com/docs/v6.0/core/aggregation-pipeline/
• https://www.mongodb.com/docs/manual/reference/operator/aggregation/count/