0% found this document useful (0 votes)
52 views1 page

ML Python Lab

The document provides instructions for a machine learning exam that involves: 1. Loading an income dataset and exploring the data through visualizations and statistics. 2. Commenting on any imbalances, outliers or missing values found in the data. 3. Dropping irrelevant columns for classification and dealing with any missing values, with explanations. 4. Finding the best classification method and hyperparameters through cross-validation optimized for the F1 macro measure. 5. Reporting performance measures and confusion matrices for the best models. 6. Commenting on the results. The solution must be in a Python notebook following specific naming, formatting and submission guidelines.

Uploaded by

葛恩泽
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views1 page

ML Python Lab

The document provides instructions for a machine learning exam that involves: 1. Loading an income dataset and exploring the data through visualizations and statistics. 2. Commenting on any imbalances, outliers or missing values found in the data. 3. Dropping irrelevant columns for classification and dealing with any missing values, with explanations. 4. Finding the best classification method and hyperparameters through cross-validation optimized for the F1 macro measure. 5. Reporting performance measures and confusion matrices for the best models. 6. Commenting on the results. The solution must be in a Python notebook following specific naming, formatting and submission guidelines.

Uploaded by

葛恩泽
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Exam for Machine Learning Python Lab

Consider the file income.csv, explore the data, drop the columns that you consider useless for the classification and find the best classification scheme.
The solution must be produced as a Python Notebook, assuming that the dataset is in the same folder as the notebook.

The notebook must include appropriate comments and must operate as follows:

1. Load the data file and explore the data, showing size, data descriptions, Quality of the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4pt
data distributions with boxplot, pairplots . . . . . . . . . . . . . . . . . . . . . . . . 2pt • Include appropriate comments with reference to the numbered require-
ments
• Useless cells, pieces of code and non-required output will be penalised
• Remove the code you use for testing and inspecting the variables during
2. Comment the exploration of step 1 pointing out if there are imbalanced
the development
distributions, outliers, missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2pt
• Naming style of variables must be uniform and in English
• Bad indentation and messy code will be penalised
• Non generalised solution, such as three sequential statements with the
3. Drop the columns that are not relevant for the classification operation, same kind of operation instead of a loop, will be penalised
if any, and explain why you do that. Additional directions, the assignments not compliant with the rules below will
Deal with missing values, if any . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4pt not be considered:

• The notebook name must be youremailusername.ipynb in lowercase


letters (underscore instead of dot inside the email username can also be
accepted
4. find the best classification scheme considering two classification meth- E.G. if your email is [email protected], the notebook
ods, find the best hyperparameters using cross validation; the optimiza- filename will be mario.rossi45.ipynb (mario_rossi45.ipynb can also
tion must be focused on the f1_macro measure . . . . . . . . . . . . . . . . . . .4pt be accepted)
• The solution must directly access the data in the same folder of the
notebook, the name of the file must be the same as the file provided. If
the notebook is developed using Google Colab, the code must be able to
5. Show the performance measures and the confusion matrices for the best work also out of the Google Colab environment without any change.
hyperparameters of each model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2pt
• Upload the notebook only to http://eol.unibo.it in the activity spec-
ified by the teacher, any other way of submitting the notebook will be
ignored
6. Comment the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2pt Cooperative work will be heavily sanctioned
The candidate can freely access any kind of materials.

You might also like