Lab 8
Lab 8
Module-7
Report for the Iris Dataset Classification Program
Introduction:
The provided program demonstrates the process of loading the Iris dataset, preprocessing it,
and applying two types of machine learning models: Logistic Regression and Multi-layer
Perceptron (MLP) Neural Network. The program evaluates the models using accuracy
metrics and visualizes the results using confusion matrices.
Step-by-Step Approach
Import Necessary Libraries:
The libraries numpy and pandas for data manipulation, scikit-learn for machine
learning models, and matplotlib and seaborn for plotting.
Load the Dataset:
Use load_iris() from scikit-learn to load the Iris dataset into variables X (features) and
y (target labels).
Split the Dataset:
Split the data into training and testing sets using train_test_split with a 70/30 train-test
ratio and a random state for reproducibility.
Standardize the Features:
Standardize the feature values to have a mean of 0 and a standard deviation of 1 using
StandardScaler. This step is crucial for optimizing the performance of many machine
learning algorithms.
Logistic Regression Model:
Initialize and Train:
Initialize LogisticRegression with a maximum of 200 iterations and train it using the
training data.
Predict:
Predict the classes for the test data.
Evaluate:
Print the confusion matrix, classification report, and precision and recall scores to
evaluate model performance.
Neural Network Model (MLP):
Initialize and Train:
Initialize MLPClassifier with three hidden layers of 10 neurons each, and a maximum
of 1000 iterations.
Predict:
Predict the classes for the test data.
Evaluate:
Print the confusion matrix, classification report, and precision and recall scores to
evaluate model performance.
Visualization:
Plot confusion matrices for both Logistic Regression and MLP models using seaborn
heatmaps for better visualization of the results.
Results and Evaluation
Logistic Regression:
Confusion Matrix and Classification Report:
The confusion matrix shows the true positive, false positive, true negative, and false
negative counts for each class.
The classification report provides detailed metrics such as precision, recall, and F1-
score for each class.
Precision and recall scores are also printed for an overall understanding of model
performance.
Neural Network (MLP):
Confusion Matrix and Classification Report:
Similarly, the MLP confusion matrix and classification report offer insights into the
model's performance across different classes.
Plots:
Two confusion matrices are plotted side-by-side for easy comparison of the
performance of Logistic Regression and MLP models.
Optimization Considerations:
Hyperparameter Tuning:
For both models, consider using techniques such as grid search or random search to
find the optimal hyperparameters.
Feature Engineering:
Examine the features for possible transformations or interactions that could improve
model performance.
Advanced Models:
Test more sophisticated models such as Support Vector Machines (SVM), Random
Forests, or Gradient Boosted Trees.
Cross-validation:
Use cross-validation to ensure the model’s performance is robust and generalizes well
to unseen data.
Conclusion:
This program effectively demonstrates the process of loading, preprocessing, training,
evaluating, and visualizing the performance of machine learning models on the Iris dataset.
By considering optimization strategies, the performance can be further enhanced.