ML Python Lab
ML Python Lab
Consider the file income.csv, explore the data, drop the columns that you consider useless for the classification and find the best classification scheme.
The solution must be produced as a Python Notebook, assuming that the dataset is in the same folder as the notebook.
The notebook must include appropriate comments and must operate as follows:
1. Load the data file and explore the data, showing size, data descriptions, Quality of the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4pt
data distributions with boxplot, pairplots . . . . . . . . . . . . . . . . . . . . . . . . 2pt • Include appropriate comments with reference to the numbered require-
ments
• Useless cells, pieces of code and non-required output will be penalised
• Remove the code you use for testing and inspecting the variables during
2. Comment the exploration of step 1 pointing out if there are imbalanced
the development
distributions, outliers, missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2pt
• Naming style of variables must be uniform and in English
• Bad indentation and messy code will be penalised
• Non generalised solution, such as three sequential statements with the
3. Drop the columns that are not relevant for the classification operation, same kind of operation instead of a loop, will be penalised
if any, and explain why you do that. Additional directions, the assignments not compliant with the rules below will
Deal with missing values, if any . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4pt not be considered: