0% found this document useful (0 votes)
4 views2 pages

CMSC-691-Assignment-3

The assignment for CMSC 691 involves two parts: a market basket analysis using the Breadbasket dataset and a MongoDB project using movie data. Students must describe the dataset, perform association rule mining, and implement various data manipulations in Python with MongoDB. Submissions include code, a summary of added data, and reflections on the learning experience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views2 pages

CMSC-691-Assignment-3

The assignment for CMSC 691 involves two parts: a market basket analysis using the Breadbasket dataset and a MongoDB project using movie data. Students must describe the dataset, perform association rule mining, and implement various data manipulations in Python with MongoDB. Submissions include code, a summary of added data, and reflections on the learning experience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CMSC 691 – Introduction to Data Science

Assignment 3
Total Points – 30 Due: April 23, 2025

This assignment consists of two parts. For each part, please answer all the questions in a single
document. Also submit the R files or python files. You will find both the datasets in the
CourseDataSet folder in Blackboard.

Part 1: [15]
Using Breadbasket dataset, do the following:
1. Describe the dataset and list the distinct items in the dataset.
2. Do a market basket analysis and uncover the association rules. Make sure your rules have
least two items or at most five items. Then filter the rules so that “Coffee” doesn’t appear
on the right hand side.
3. Sort the rules using metrics of your choice (e.g. lift etc.).
4. Choose a rule from the top 3 rules and describe it and explain what information it provides
you.

Part 2: [15]
1. You will use MongoDB for this part. First, download MongoDB in your computer. Then
do the following.
2. For this assignment, it is best that you use Python.
3. Please download movies, tags and ratings files. Write a program to read the given 3
different csv files (movies, ratings, tags), and insert all the records into 3 different
collections (movies, ratings, tags).
4. Next, write a program to add five movies that you have watched this year, or you would
like to watch to the collection “movies”. Make sure that you assign unique movie IDs
and specify the genres (genres need not be completely accurate).
5. Corresponding to the movies that you added, write a program to add some suitable
ratings to the collection “ratings” and some suitable “tags” to the collection “tags”.
Make sure that you use a unique userid for yourself.
6. For the following items, you must use Aggregation Pipeline. If you use any other
method, no credit will be given.
a. Develop code to find number of movies released per year.
b. Develop code to find number of movies per genre.
c. Develop code to find number of movies per rating.
d. Develop code to find number of movies tagged.
e. Develop code to find the most popular tag.
7. What to submit?
a. Jupyter Notebook file that contains all the above code.
b. Summarize all the data you added.
c. A document that summarizes what you learnt while doing the Assignment.

For doing this part, it may be easier to setup a virtual environment -


(https://pypi.org/project/virtualenv/)
Use PyMongo - https://pypi.org/project/pymongo/

Links:
• https://docs.mongodb.com/manual/administration/install-community/
• https://docs.mongodb.com/manual/installation/
• https://docs.mongodb.com/drivers/pymongo/
• https://www.mongodb.com/developer/quickstart/python-quickstart-aggregation/
• https://www.analyticsvidhya.com/blog/2020/08/how-to-create-aggregation-pipelines-
in-a-mongodb-database-using-pymongo/
• https://www.mongodb.com/docs/manual/core/aggregation-pipeline/
• https://www.mongodb.com/basics/aggregation-pipeline
• https://www.mongodb.com/docs/v6.0/core/aggregation-pipeline/
• https://www.mongodb.com/docs/manual/reference/operator/aggregation/count/

You might also like