1
National University of Computer and Emerging Sciences, Lahore
Potato Disease Classification
Touqeer Ahson Roll No. 21L-6034
Afeef Junaid Roll No. 21L-6036
Mubeen Ilyas Roll No. 21L-7627
AI Project Report
December 12, 2024
2
1. Introduction
1.1 Project Title
Plant Disease Classification
1.2 Problem Statement
Diseases of plants lead to massive losses in the amount of agricultural yield produced.
This requires early detection and identification, which will make sure the plants receive
timely treatment. Early treatment can be of great help to minimize further crop loss.
Detection through manual methods takes a lot of time and is error-prone. Therefore, in
this work, the process of the above phenomenon is going to be automated using deep
learning.
1.3 Objectives
Develop a convolutional neural network (CNN) model to classify plant leaves as
Healthy, Early Blight, or Late Blight.
Include data augmentation to enhance model robustness.
Evaluate the model using accuracy, precision, recall, and other performance
metrics.
1.4 Significance of the Project
The main contribution of this project is toward sustainable agriculture by providing cost-
effective and efficient means of disease detection, hence allowing farmers to take
appropriate action on time for better and qualitative yields.
2. AI Goals and Subareas
2.1 Overview of AI Goals
The objective of the presented work, in more general terms, fully corresponds to the
broader AI objective—to advance computer vision applications for agriculture. This work
proves that AI really enhances productivity and decision-making in many crucial
industries.
3
2.2 Relevant Subareas of AI
Deep Learning: Convolutional neural networks for image classification.
Computer Vision: Detection and Classification of Plant Diseases Based on their
characteristics.
2.3 Achievements in AI
Recent developments in CNN architecture, such as ResNet and VGG, have achieved
great improvements for the image classification task.
3. Dataset Description
3.1 Source of the Dataset
The dataset consists of 2,152 labeled images of plant leaves sourced from open datasets
and curated collections. Dataset was gathered from Kaggle.
3.2 Dataset Size and Features
Total Images: 2,152
Classes: 3 (Early Blight, Late Blight, Healthy Leaf)
Image Dimensions: 256x256 pixels
3.3 Preprocessing Steps
Data augmentation was performed in the form of flipping, rotating up to 40%,
zooming up or down by 20%, and changing the contrast by 20%, before finally
cropping the images at a random position by 20 pixels.
Resizing: Images resized back to the target size of 256x256 pixels.
Normalization: Pixel values scaled between 0 and 1.
4
4. Workflow Overview
4.1 Project Workflow Diagram
4.2 Key Steps
Data Collection
Model Processing
Model Development
Model Testing and Evaluation
Testing on unseen data.
5. Module Breakdown
5.1 Module 1: Preprocessing
Description: Utilizing augmentation techniques to bring variance in the dataset
and handle cases of overfitting. Resizing, normalizing, and augmentation was
applied on images and then they were sent into data pipeline.
Implementation Details: Preprocessing was done through the libraries of
TensorFlow and Keras.
5
5.2 Module 2: Model development and Training
Description: The CNN model was designed with five convolutional layers
combined with dense layers for classification.
Implementation Details: It included the Conv2D, MaxPooling2D, Flatten, and
Dense layers; ReLU and Softmax activation functions were used. To enhance
learning of neural network and to enhance convergence ADAM optimizer
was used.
6. Methodologies and Approaches
6.1 Selected Algorithms or Techniques
Rationale for choosing the method the project uses a CNN architecture to extract
hierarchical features from images. Data augmentation techniques enhance model
robustness.
6.2 Comparison with Existing Methods
Compared to traditional machine learning methods, CNNs provide superior performance
in handling high-dimensional image data.
7. Results and Discussion
7.1 Evaluation Metrics
Accuracy
Precision
Recall
F1-Score
7.2 Performance Analysis
Loss and Validation Loss Analysis
1. Training Loss (red line): The training loss decreases steadily, indicating that the
model is learning and fitting the training data well.
6
2. Validation Loss (blue line): The validation loss also decreases initially, which
suggests that the model generalizes well to unseen data. At epoch 8 we see a
sudden rise in validation loss suggesting potential overfitting but at epoch 11
validation loss becomes lesser than that of epoch 8 and generally stays low
Summary:
Model achieves generally good training performance but the fluctuations in validation
loss could be furthered smoothened out by including a neurons dropout rate and
weight decay.
Accuracy and validation accuracy:
The Accuracy vs. Validation Accuracy graph shows how well the model classifies data:
Training Accuracy (green line): The training accuracy increases steadily and
reaches close to 1.0, showing the model's capability to fit the training data.
Validation Accuracy (orange line): The validation accuracy also improves
significantly, reaching a high value. This indicates that the model generalizes well
for most epochs, although there is a chance of overfitting as validation accuracy
fluctuates but is not that dangerous and in the end it is close to the accuracy line.
7
Summary:
The high validation accuracy demonstrates that the CNN is effective in taking out
the meaningful features from images.
The minor fluctuations suggest that further refinement, such as more
improvement in the areas of data augmentation, early stopping would be helpful
Overall summary:
We find that the CNN model performs well on both training and validation datasets,
with high accuracies and low losses.
And on testing data the model correctly identifies the type of leaf from 3 given
classes although the flections in graph can be smoothened out by improving our data
augmentation techniques by cropping out background of image and focus on the main
object.
7.3 Key Findings and Interpretations
The model achieved high accuracy and demonstrated robustness in classifying plant
diseases across varied image conditions.
8
8. Challenges and Resolutions
8.1 Challenges Faced
Imbalanced dataset
Overfitting during training
Computational limitations
8.2 Solutions and Workarounds
Used data augmentation to address dataset imbalance.
Implemented early stopping to prevent overfitting.
Optimized batch size and learning rate for efficient training.
9. Appreciation of AI
9.1 Achievements in AI Through This Project
The project showcases how CNNs can be effectively applied to solve complex
classification problems in agriculture, contributing to the AI-driven transformation of
traditional industries.
9.2 Difficulties in Advancing AI Goals
Limited access to labeled agricultural datasets remains a challenge. This project
highlights the need for more collaborative efforts in dataset creation.
10. Conclusion and Future Work
10.1 Summary of Achievements
Successfully developed and trained a CNN model for plant disease classification.
Incorporated data augmentation techniques to enhance model performance.
10.2 Limitations
Dependence on high-quality images for accurate classification.
Limited scope to three classes.
9
10.3 Future Work and Improvements
Extend the model to classify additional plant diseases.
Deploying the model as a mobile application for real-time use by farmers.
11. References
Dataset: https://www.kaggle.com/datasets/arjuntejaswi/plant-village
Libraries: tensorflow : https://www.tensorflow.org/guide/keras/sequential_model
Imghdr: https://docs.python.org/3.11/library/imghdr.html
Opencv-python : https://opencv.org/get-started/?utm_source=opcv&utm_medium=home