0% found this document useful (0 votes)

36 views13 pages

Crop Selection and Yield Prediction

Uploaded by

priyadharshinimr50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views13 pages

Crop Selection and Yield Prediction

Uploaded by

priyadharshinimr50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ISSN: 2347-4688, Vol. 11, No.(3) 2023, pg.

968-980

Current Agriculture Research Journal

www.agriculturejournal.org

Crop Selection and Yield Prediction using Machine

Learning Approach
PRITESH PATIL*, PRANAV ATHAVALE, MANAS BOTHARA,
SIDDHI TAMBOLKAR and ADITYA MORE

Department of Information Technology, AISSMS Institute of Information Technology, Pune, India.

Abstract
In recent years, Agriculture sector has been researched a lot with the
advancements in technologies like machine learning and smart computing.
With the dynamic economics of Agri-produce, it is becoming challenging for Article History
farmers to utilize the land efficiently to get maximum profit in the specific Received: 11 May 2023
landscape. Crop Yield Prediction (CYP) is crucial and is greatly dependent Accepted: 31 October
on environmental factors like soil contents, humidity, rainfall as well as area 2023
under cultivation and other required metrics. Due to insufficient incorporation
of the multiple environmental circumstances, a number of existing tools Keywords
and techniques used for CYP, such as historical averages, tend to produce Crop Yield Prediction;
Digital Agriculture;
inaccurate findings. In such situation, with multiple options of crop, it is Machine Learning;
essential for farmers to plan the crop strategy in advance. If the farmer can Naïve Bayes;
get estimate of the crop yield in advance, cultivation can be done accordingly. Random Forest.
To solve this problem, machine learning approach is implemented as a base
for accurate predictions. Crop prediction is done by classification model and
yield prediction uses regression models to learn from the data. Multiple ML
models are analyzed based on performance metrics. Best performer model
is incorporated in backend. Among the used models for yield prediction,
Random Forest Regression gives best results with MAE of 0.64 and R2
score of 0.96. For crop prediction, Naïve Bayes classifier gives most accurate
results with accuracy of 99.39. The study emphasizes how machine learning
could revolutionize crop management techniques by giving farmers insights
about optimizing resource allocation and boost overall crop yield.

Introduction sample data or experience rather than the ability to

The field of machine learning is advancing day immediately design a computer program to solve
by day. Learning is important when we need a particular problem. When there is no human

CONTACT Pritesh Patil [email protected] Department of Information Technology, AISSMS Institute of Information
Technology, Pune, India.

© 2023 The Author(s). Published by Enviro Research Publishers.

This is an Open Access article licensed under a Creative Commons license: Attribution 4.0 International (CC-BY).
Doi: https://dx.doi.org/10.12944/CARJ.11.3.26
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 969

knowledge or when people are unable to express crop for farmer. The use of various fertilizers is also
their expertise, learning becomes important. unclear because of seasonal climate variations and
Computers are programmed with machine in order changes in the availability of fundamental resources
to improve performance criteria based on actual like soil, water, and air. The agricultural yield rate is
or hypothetical facts. Computer program learns to continuously decreasing in this situation.5 Farmers
optimize the parameters used for the model using today cultivate crops based on knowledge gained
training input or previous information. The model from earlier generations. Since the traditional
may be descriptive to draw conclusions based on method of cultivation has been refined, there are
model data or predictive which estimates trends in either excessive or insufficient yields without really
future.1 A subset of artificial intelligence (AI), machine meeting the need.6 If the producer knows yield
learning (ML) enables computers to learn for a estimates in advance, it would help to form the crop
specific dataset such as playing chess or making strategy. Machine learning is a rapidly expanding
recommendations on social networks without having methodology that supports and provides a guide
to be explicitly programmed. Precision farming in decision process in various applications of
and Agri-technology, now referred to as Digital multiple different industries. The majority of modern
Agriculture, are evolving into emerging fields in gadgets benefit from models being examined before
research that employ highly data-driven techniques deployment. The primary idea is to increase the
to boost productivity in agriculture while shrinking efficiency and profits of the agriculture industry
the adverse effects on the environment. Machine by using data as a tool with models. Precision
learning (ML), alongside big data technology and farming, which prioritizes quality above unfavorable
robust computing infrastructure, has arisen to create environmental variables, would be the main focus.7
potential solutions for unravelling, quantifying, ML has advanced its applications in agriculture in
and comprehending data-intensive processes in areas like predicting soil properties, rainfall analysis,
agricultural operational environments. Data analysis, yield prediction, disease and weed detection,
as an evolved scientific discipline, is essential to the ML based computer-vision and many more.8
development of a wide range of crop management
applications. Many times, it is possible to efficiently The use of computer vision, machine learning,
use ML without having integrating data from many and IoT applications will assist boost productivity,
sources. There tends to be less emphasis on enhance quality, and ultimately increase the
data integration when large datasets are easily profitability of farmers and related industries.
available, especially on a major scale. The main To increase the overall harvesting output, precision
force behind this development is the complexity farming is crucial in the world of agriculture. 9
of data preprocessing and analytical processes, as For example, smart irrigation systems, crop disease
opposed to the machine learning models' generally prediction, crop selection, weather forecasting,
straightforward implementation.2 Agriculture sector and determining the minimal support price are all
has a major contribution of almost 20% in India’s examples of techniques employed in agriculture.
GDP in year 2019-20. 3 Also, it is the principal These methods will increase field productivity
source of employment in India. In addition to being while requiring less work from farmers.10 Crop yield
a significant part of the global economy, it is crucial estimation may be used for a variety of purposes,
for the continued existence of humanity. Weather, including helping farmers enhance production,
pests, and the readiness of harvesting operations optimizing the supply-demand cycle for fertilizers,
are the main factors that influence agricultural insecticides, and other agricultural products,
production. For managing agricultural risk, it's predicting prices, and calculating the risk levels for
essential to have accurate crop history information.4 agricultural insurance.11
Unethical practices are being used to produce
higher yields of less-nutritious hybrid cultivars as the Literature Review
population grows. These techniques tend to harm Prior research12 used data that included nutrients
soil quality. It results in environmental loss. Given and other environmental elements to anticipate
the changing patterns of weather conditions and crops. For CYP, several feature selection techniques
also economics, it is getting difficult to choose right and ML models are employed. In this study, the
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 970

following factors were looked at: To assess the on the limitations of present approaches and their
effectiveness of feature selection and classification applicability for yield prediction. The suggested
algorithms, F1 Score, Mean Absolute Error (MAE), approach then connects the farmers with an
Logarithmic Loss (LL), Accuracy (ACC), Specificity effective yield forecasting system via an app for
(S), Recall (R), Precision (P), and Recall (R) smartphones. To assist them in selecting a crop,
were utilized. (AUC). Using Modified Removal people may select from a number of attributes.
of recursive Features, six variables - average soil The integrated prediction system assists farmers in
and air temperatures, min and max air temperatures, estimating the crop produce. A user may research
precipitation, and humidity are selected. A variety possible crops and their yield using the integrated
of data splitting validation techniques, including recommendation system in order to make better
(25- 75), (30-70), (35-65), (40-60), (45-55), (50-50), educated judgements. Based on data from states
(55-45), (60-40), (65-35), (70-30), and (75- 25), of Maharashtra and Karnataka, several ML models
are used and evaluated against the previously like RF, KNN, MLR, SVM, and ANN were built
stated accuracy criteria. Additionally, versions of and compared for accuracy. Results confirm
the feature selection techniques such as MRFE, RF Regressor, which has a 95% accuracy rate, is
RFE, and Boruta have been applied. According the best standard algorithm when applied to the
to the results, the Random Forests Classifier is the presented datasets.
most accurate in comparison with kNN and other
classifiers discussed above. As characteristic ranges In,15 the Random Forest Algorithm is used. In spite
broadened, the measurement values decreased. of extensive research into challenges and topics like
weather, temperature, humidity, and rainfall, there
Another study by Anakha Venugopal, Jinsu Mani, are still no acceptable remedies or ideas to deal with
Aparna S Rima Mathew, Prof. Vinu Williams13 uses the difficulty we face. In nations like India, there are
several machine learning approaches to forecast numerous different sorts of rising economic growth,
the agricultural production. By taking into account including in the agriculture sector. Additionally, crop
variables like temperature, rainfall, area, and other yield predictions can be made using the processing.
characteristics, Farmers will be able to select the The current study proved the value of data mining
crop that will provide the highest produce by using techniques for predicting agricultural output based
the forecasts made by ML models. The study is on input features related to the climate. All new
focused on Kerala’s Agri-produce. Among the grains and regions chosen for the investigation
classifier models utilized here, Random Forest should have accuracy of prediction above 75%,
has the highest accuracy, followed by Logistic demonstrating improved predictive performance.
Regression and Naive Bayes. The produced website is user-friendly. The website
was developed utilizing data from that area to predict
A Research14 A smartphone app which is used crop yield.
in the proposed method connects farmers to the
internet. GPS helps user in locating his location. According to a study,16 selecting the best crop before
The user enters the location and soil type. The most sowing will increase agricultural yield. It depends
profitable crop list can be picked using machine on a variety of factors, such as the soil type and
learning algorithms, and they can also forecast crop its composition, climate, local terrain, crop yield,
yields for user-selected crops. Machine learning market prices, etc. Techniques like Decision Trees,
models, including random forest (RF), artificial K-nearest Neighbors, and Artificial Neural Networks
neural network (ANN), support vector machine have a position in the crop selection framework,
(SVM), multivariate linear regression (MLR), which depends on a variety of different factors.
and k-nearest neighbor (KNN), are used to estimate Machine learning has been used to choose crops
crop productivity. Random forest demonstrated the based on how natural disasters like hunger could
best outcomes with 95% accuracy. The algorithm affect them. Researchers have employed artificial
also makes recommendations on when to apply neural networks to choose crops depending on soil
fertilizers to increase yield. This research focused and climate with success.
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 971

When attempting to create a high-performance the input qualities, the test data may be applied to
predictive model, ML studies face a variety of the generated training sets. The RF method and
difficulties. To tackle the issue at hand, it is essential the dataset were used to evaluate the efficacy of
to choose the appropriate algorithms, and both the this technique. The advantage of the random forest
algorithms and the supporting platforms must be able approach is that overfitting is less of an issue with
to handle the sheer amount of data.17 random forests than it is with decision tree-based
model. The random forest does not need to be
A study18 suggested a method for unsupervised trimmed. The loaded data sets are divided into train,
fuzzy categorization that identifies crop kinds with test data of 67 or 33 percentage points, or 0.67 or
springtime harvests. The categorization outcomes 0.33 respectively. In order to enable the mapping
likely to get better with time. Strategy used in19 of attribute values to appropriate values and list
made use of the Bayesian network categorization placement, the training data must be categorized.
supervised learning model. Crop information is By contrasting the initial data with model predictions,
analyzed with environmental parameters like the probability is determined. Based on the result,
temperature and rainfall to categorize crops. the highest likelihood is utilized to make a forecast.
The accuracy may be calculated by comparing the
A study by D. A. Reddy, B. Dadore, and A. Watekar20 generated class value with the test data set.
highlights how despite being one of the nations with
the highest agricultural output, India's agriculture According to a different study,22 agriculture has
productivity is still fairly low. Productivity needs to be positive economic effects on the country. It falls short,
increased so that farmers may get better profit from nevertheless, in terms of using modern machine
decreased costs. In order to reliably and successfully learning techniques. As a result, our farmers ought
propose a suitable crop based on soil data, it offers to be knowledgeable with all of the most recent
solutions such as offering a recommender utilizing machine learning technology and fresh approaches.
an ensemble approach with a large proportion The productivity of agriculture is increased by using
of voting methods employing random tree, CHAID, these methods. To increase agricultural productivity
kNN, and naive bayes classifier. Soil types, soil rates, a number of machine learning approaches are
characteristics, and crop yield data collection are used. These techniques can help with agricultural
taken into consideration when advising the farmer on problems. We may also assess the accuracy
the best crop to grow. The majority voting process, of the yield by looking at several ways. Thus, we may
which is the most popular assembly technique, perform better by contrasting the accuracy of several
is used in this system. Any number of primary crops. In agriculture, sensor technology is widely
learners may be used in the voting process. used. The study helps increase agricultural yield
A minimum of two base learners are required. rates. helps choose the right crop for the chosen
The chosen learners complement one another site and season.
and impart knowledge to the others. With more
competition, a better forecast may be made. The Materials and Methods
specified training data set is used to train the model. Data Pre-processing
When a new record has to be categorized, each A technique called data pre-processing transforms
model chooses the class independently. Class unprocessed, uncleaned data ready for further
predicted by consensus of learners is chosen as analysis. Data may be gathered from multiple
class label for current record. sources, but as they are collected in raw form,
analysis is not possible. We convert data into
A study21 says Building a random forest, a group a comprehensible format by using several strategies,
of decision trees that considers two- thirds of the such as substituting missing values and null values.
records in the datasets, takes into account data Fields in the dataset which are insignificant for
sets on temperature, production, perception, and label prediction are eliminated. If required, One-
rainfall. These decision trees are then applied to the Hot Encoding is performed on the dataset to have
remaining data to ensure accurate categorization. dataset ready for regression model fitting. The
For accurate crop production prediction based on division of train and test data is the final stage in the
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 972

data preparation process. As training the machine Both discrete and continuous data may be used
learning algorithm usually requires as much data with it. It is extremely scalable and unaffected by
points as possible, the data typically has uneven insignificant features.
distribution. Training dataset, which in this case
makes up 70% of total data, is used to train machine Decision Trees
learning models and make accurate predictions. A decision tree is a type of tree structure that
resembles a flowchart and is frequently employed
Factors Affecting Crop Yield in supervised machine learning for classification
The yield of every crop is impacted by a wide range and prediction. A DT may be transformed into a set
of variables. These are essentially the characteristics of rules, with each path serving as a different rule,
that aid in estimating a crop yield. For crop yield with each path travelling from the root node to each
prediction, this study includes parameters such as leaf node. In a decision tree, each leaf node has
temperature, rainfall, area, humidity, soil nutrients, a class that may be reached if an attribute matches
pH, and AUC (Area under Cultivation). the prerequisite for the branch that leads to it. In
a decision tree, each internal node corresponds to
Comparison and Selection of ML Algorithm a test, condition, or attribute.6
We first must assess and compare different
algorithms before selecting the one that best KNN
matches this particular dataset. Machine Learning The machine learning approach known as kNN,
is an effective way to solve crop prediction problem which is supervised and nonparametric, is used
as it learns from past data and gives predictions to solve classification and regression issues.
on current parameters. In order to make precise Labeled data is used with supervised algorithms.
predictions and stand by erratic patterns in weather The technique relies on the distances between the
conditions like temperature and rainfall, various points, which may be calculated in a few different
machine learning classifiers like Logistic Regression, ways. The fact that the distance must always
Naïve Bayes, Random Forest, KNN are used and be either zero or positive should be taken into
compared for the performance metrics and the model account. The distance is squared, raised to a given
with best accuracy is selected for crop prediction. power, or the absolute values are used to do this.
For Yield prediction, regressors like Linear Regression, Pre-processing of all the labelled data is necessary
Random Forests and Decision Trees Regression before we apply the kNN algorithm. All of the
are compared for metrics like MAE, Median Absolute data must first be normalized. As kNN struggles
Error and R2 Score. The model with best values is to function when there are too many features present,
selected for predicting yield. feature selection must then be used to eliminate the
insignificant features. Missing data must be filled in.
Naïve Bayes Else, that particular record must be eliminated. The
Based on Bayes' theorem, Naïve Bayes model is performance can be enhanced by including more
frequently employed in many classification tasks. train samples. The fundamental drawback of KNN is
The multinomial, Bernoulli, and Gaussian algorithms that as the size of dataset grows, cost of computing
make up the three Naive Bayes algorithms. Naive rises, the algorithm's speed decreases.
Bayes Algorithm is mostly employed for classification
problems. It operates under the presumption that Random Forests (RF)
each feature has an equal chance of occurring The RF technique is a perfect example of ensemble
and that the likelihood of each feature occurring is learning in action since it connects several classifiers
independent of the probabilities of the occurrence to tackle the challenging problem and improve a
of all other features. The Bayes theorem determines model's efficiency. The "forest" created with this
likelihood of an event happening when another event approach is actually a collection of decision trees.
is occurred. Multi-class classification makes use In each decision split, RF characteristics are chosen
of Bayes theorem. Also, in comparison to other ML at random. Picking traits that encourage prediction
techniques, it is quicker and simpler to construct. and lead to increased efficiency reduces the
Additionally, it doesn't need a lot of training data. correlation across trees. The Random Forest ML
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 973

classification approach generates the final output by random and fed into the Random Forest Technique's
combining the results of all the decision trees after decision trees. It can also carry out jobs requiring
segmenting the dataset into smaller subsets or trees. both regression and classification. It also works well
The Bagging subcategory of ensemble learning with huge, highly dimensional data sets, and most
methods includes Random Forest. A Sample of rows significantly, it greatly improves the model's accuracy
and features from the primary dataset are selected at and fixes the overfitting problem.

Fig. 1: System Architecture

System Architecture 27 unique crops and 4 unique seasons. Crops are

System architecture is represented in Figure 1. End as follows: Arhar/tur, bajra, castor seed, gram,
user interacts with web user interface (UI) which groundnut, jowar,linseed, maize, moong, niger seed,
is hosted on a server. Open Weather Map API is other cereals, kharif pulses, rabi pulses, summer
connected to server to deliver weather data. The pulses, ragi, rapeseed and mustard, rice, safflower,
machine learning models are trained and tested by sesamum, millets, soyabean, sugarcane, sunflower,
admin and are loaded in the server for predicting tobacco, urad, wheat, oilseeds.
crop and yield in tone per hectare of land area.
District Wise Rainfall Normal23
Datasets This dataset is used for collecting district wise
The public datasets have been chosen because they rainfall data to predict yield. It is used for extracting
are readily available and easily accessible. Kaggle district wise average annual rainfall for each district
is a popular platform for finding and sharing datasets, of Maharashtra, India. This feature is combined with
so we were able to find datasets that met our criteria. Yield dataset mentioned above to get estimates
We selected 3 datasets namely of production for particular crop in the given season.

India Agriculture Crop Production23 Crop Recommendation23

The dataset has following features: State, District, The dataset is used for crop prediction. It has
Crop, Year, Season, Area, Area Units, Production, features like N, P, K, rainfall, humidity, pH and
Production Units, Yield. This dataset is used to build crop. N, P, K stands for Nitrogen, Phosphorous and
regression model for yield prediction. Yield is the Potassium nutrients in soil. It has 2200 total records
required label. It has total 12176 records containing containing 22 unique crops. Data consists 100
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 974

records for each of the following crops: rice, maize, set to "mean". This strategy replaces the missing
chickpea, kidney beans, pigeon peas, moth beans, values with the mean of the corresponding column.
mung beans, black gram, lentil, pomegranate,
banana, mango, grapes, watermelon, muskmelon, Standardization
apple, orange, papaya, coconut, cotton, jute, coffee. The features are standardized using the Standard
Scaler transformer. This step scales the features to
Data Pre-Processing have zero mean and unit variance.
Crop Prediction
Prior to start modelling the data, we need to carry Applying the Preprocessing Pipeline
out data-pre-processing. It is done in following steps The line X = my_pipeline.fit_transform(df) applies
as shown in Figure. the preprocessing pipeline (my_pipeline) to the
entire DataFrame (df). It fits the pipeline on the
Handling Missing Values data to learn the mean values (for imputation) and
The line df = pd.read_csv('Crop_recommendation2. the standardization parameters. Then, it transforms
csv', na_values='=') reads the CSV file into the data by applying the learned transformations.
a DataFrame (df), replacing any occurrences of '='
with NaN values, which are commonly used to Train-Test Split
represent missing data in pandas. The train_test_split() function from scikit-learn is
used to split the processed features (X) and the
Separating the Target Variable target variable (b) into training and testing sets.
The line b = df['label'] extracts the target variable The stratify=b parameter ensures that the class
column ('label') from the DataFrame df and assigns distribution is maintained in both the training and
it to the variable b. testing sets. The split is performed with a test size
of 30% (test_size=0.3) and a random state of 42
Creating a Preprocessing Pipeline (random_state=42).
The code creates a pipeline (my_pipeline) using
scikit-learn's Pipeline class. The pipeline consists These pre-processing steps help handle missing
of two steps values, standardize the features, and split the data
into training and testing sets for further analysis and
Imputation model training.
The missing values in the DataFrame are imputed
using the SimpleImputer transformer with a strategy

Fig. 2: Preprocessing for Crop prediction

PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 975

Yield Prediction did data transformation for India Agriculture Crop

In data preprocessing, we did clean the data Production23 dataset as it had categorical variables
containing missing values, outliers, or errors that which need to be encoded as numerical values to
need to be addressed before the data can be used pass to the machine learning model. We used One
for machine learning. Also, we did data integration Hot Encoding for data transformation. We had to
of District wise rainfall normal23 and India Agriculture do data reduction to limit the dataset to the state
Crop Production23 as we required it to be merged of Maharashtra otherwise the dataset would have
for passing it to the machine learning model. We been too large in terms of the rows and columns.

Fig. 3: Preprocessing for Yield Prediction

Feature Selection as input along with other data. This helps the system
A machine learning model's performance can be to give real-time predictions. “OpenWeatherMap”
improved through feature selection, which is the API is used for the same. For creating an API URL,
process of choosing a subset of the relevant features base URL and API key is used which is unique
from available data. For Crop Prediction, following with each subscription. User’s city name is passed
features were selected: Nitrogen, Phosphorous, in complete URL as a parameter and response is
Potassium, Temperature, Humidity, pH and Rainfall. collected. From the collected response, required
For Yield Prediction, features selected are as follows: fields i.e., temperature and humidity are passed
City, Crop, Annual Rainfall (in mm), Season. to ML model for predicting crop.

Train Test Splitting of Data Training and Evaluation of Models

We have split the data in the ratio 70:30 using The crop prediction uses the multi-class classification
random sampling and stratification. Choosing machine learning model to predict the crop for a set
an appropriate train-test split is important in ML, of given input features. Whereas the yield prediction
because it can affect the accuracy and generalization incorporates the regression model to predict the
of the resulting model. yield for a given set of input features. For training
and evaluation of models, Google Colab Platform is
API Integration used. While the User Interface for the project is built
The city of user taken as an input is given to API call using ReactJs, the backend is built using Python
as a parameter. The temperature and humidity fields Flask framework.
from API response are given to crop prediction model
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 976

Application and Advantages over existing Results and Discussions

versions Crop Prediction
The model can be used to create an impact on right First, datasets are loaded and cleaned from insignificant
crop selection as the user would get fair prediction features. After Data Preparation, data is split into
on yield as well as crop. Also yield prediction would training and testing data and various models are
be important in financial assessment of crop strategy. fitted and tested for accuracy. Feature Importance
Model is useful if the user wants to compare yield is calculated to determine the relative significance
for multiple crop options and then select the best or contribution of individual features in ML model. For
one. It could also be used in a wide geography crop predication model, Drop Column Importance,
to estimate the yield for a particular crop. This also called as "permutation importance" or "feature
project can be used directly by end users as farmers importance by feature shuffling," is calculated.
for taking predictions for their conditions. Instead,
it can also be used by government agencies for Drop Column Importance = Baseline Metric −
planning and policy making if modified with wider Shuffled Metric
access to reliable closed source government Drop column importance is based on the idea that
data. It can also be used by NGOs which work for removing a feature that is crucial to the performance
educating farmers in adopting new technologies and of the model would cause it to perform less significant
precision agriculture. Also, it can be used in fields than before. It is calculated in following steps
where monetary calculations come in picture as
it is dependent on how much yield could be produced 1. Train a model with all features
like in insurance claims or loan policies. 2. Measure baseline performance with a validation
data
The project improves the prediction accuracy by 3. One feature is determined of which importance
suitable data gathering cleaning and selecting best is to be calculated
accurate model. Also, the project incorporates both 4. Train a model with all other features except the
crop as well as yield prediction. So, the project is selected one
using classification as well as regression models for 5. Calculate performance with a validation data
necessary functionality. It adds value to the modern 6. The feature importance is the drop in perform-
agriculture setup by providing a way to add to the ance from baseline
reliability of crop selection which in turn improves 7. Follow same steps 3 through 6 for every feature
the yield and financial stability.

Fig. 4: Feature Importance for Crop Prediction

PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 977

As shown in above Figure 4, rainfall is most important Classification Models’ results for Crop prediction as
feature for crop prediction followed by humidity, depicted in Figure 5.
Potassium(K), Phosphorous(P), Nitrogen(N) and pH.

Fig. 5: Accuracy Metric for Classification Models

When trained on the dataset, KNN gives accuracy of Regression has accuracy of 94.69%. Based on these
97.72%, RF gives accuracy of 99.24%, Naïve Bayes results, Naïve Bayes classifier is incorporated in the
Classifier has 99.39% accuracy score. Logistic backend for Crop Prediction.

Fig. 6: Code Snippet for Feature Importance in Yield Prediction

Yield Prediction 1. Train the Random Forest Regressor

Calculating feature importance for a Random Forest 2. Access Feature Importances: Random
Regressor with one-hot encoded features involves Forest Regressor has built in attribute named
determining the contribution of each feature to the feature_importances_.
model's predictive performance. It is done through 3. Map Feature Importances to Original
following steps. Features: Every one-hot encoded feature is
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 978

mapped with its original feature. As shown in Figure 7, Crop is most important feature
4. Aggregate Feature Importances: By in order to predict yield followed by District, Rainfall
aggregating we get every categorical feature’s and Season.
importance
5. Rank Feature Importances in descending
order of importance.

Fig. 7: Feature Importance for Yield Prediction

Yield prediction is done by regression. For Median Absolute Error and R2 Score are used. The
comparison between different regression models, results are depicted in figure 8.
performance metrics like Mean Absolute Error,

Fig. 8: Regression Results for Yield Prediction

Random Forest Regressor gives most reliable and R2 score of 0.96. Decision Trees Regressor
results when given required inputs with Mean has Mean Absolute Error of 0.80, Median Absolute
Absolute Error of 0.64, Median Absolute Error is 0.16 Error of 0.18, R2 Score of 0.94. Linear Regression
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 979

has Mean Absolute Error of 1.08, Median Absolute estimates in accordance with current market prices.
Error of 0.47 and R2 Score of 0.92. Paid datasets may bring more reliable and accurate
data which in turn might help in model accuracy.
Conclusion They may contain more features which may help
Crop yield prediction is a complex process which correlate more with label.
relies on several different factors including weather,
soil, fertilizers, pest infestations, etc. In this paper, Acknowledgements
we predict the crop yield using weather and soil We are grateful to Prof. Pritesh A Patil for providing
parameters. The research is based on the datasets valuable input and feedback throughout the research
limited to districts in Maharashtra. The system process.
incorporates regression techniques to estimate
yield and multi-class classification to predict type Funding
of the crop. Among the used models for yield The author(s) received no financial support for the
prediction, Random Forest Regression gives best research, authorship, and/or publication of this
results with MAE of 0.64 and R2 score of 0.96. article.
For crop prediction, Naïve Bayes classifier gives
most accurate results with accuracy of 99.39. The Conflict of Interest
suggested method aids farmers in choosing which The authors declare no conflict of interest regarding
crop to plant in the field and how much yield any this research. However, it should be noted that
crop would give in that specific environment. Dataset the first author of this paper is an employee of a
used in the research can be improved by taking real company that develops and markets machine
time data through IoT devices. Also, various factors learning software for crop yield prediction. The results
like irrigation and fertilizers use can be included and conclusions presented here are solely based
for better prediction. Mobile App can be developed on the authors' research and do not reflect any
for mobile devices with added services like price external influence.

References

1. Alpaydın, Ethem. "Introduction to machine tation of crop yield prediction model in

learning, second edition." MIT Press, 2010. agriculture." International Journal of
ISBN: 978-0-262-01243-0. Engineering Research & Technology (IJERT),
2. Liakos, K.G.; Busato, P.; Moshou, D.; vol. 9, no. 4, pp. 305-310, Apr. 2020.
Pearson, S.; Bochtis, D., "machine learning 7. Johnson LK, Bloom JD, Dunning RD, Gunter
in agriculture: a review", Sensors, vol. 18, CC, Boyette MD, Creamer NG, "Farmer
no. 8, pp. 2674, August 2018. https://doi. harvest decisions and vegetable loss in
org/10.3390/s18082674 primary production. Agricultural Systems",
3. Sabitha, "A study on sectorial contribution vol. 176, pp. 102672, November 2019.
of gdp in india from 2010 to 2019", AJEBA, 8. Sharma A, Jain A, Gupta P, Chowdary V.
vol. 19, no. 1, pp. 18-31, January 2020. Article Machine learning applications for precision
no. AJEBA. 62227. agriculture: A comprehensive review. IEEE
4. Jain A., "Analysis of growth and instability Access. 2020 Dec 31;9:4843-73.
in the area, production, yield, and price 9. Meshram V, Patil K, Meshram V, Hanchate D,
of rice in India", Journal of Social Change and Ramkteke SD. Machine learning in agriculture
Development, vol. 2, pp. 46-66, N/A, 2018. domain: A state-of-art survey. Artificial
5. Wolfert S, Ge L, Verdouw C, Bogaardt Intelligence in the Life Sciences. 2021 Dec
MJ, "Big data in smart farming– a review. 1;1:100010.
Agricultural Systems", vol. 153, pp. 69-80, 10. Reddy, D. J., & Kumar, M. R. (2021).
May 2017. Crop Yield Prediction using Machine
6. Sangeeta, Shruthi G. "Design and implemen- Learning Algorithm. 2021 5th International
PATIL et al., Curr. Agri. Res., Vol. 11(3) 968-980 (2023) 980

Conference on Intelligent Computing and for prediction of crop yield," Int. J. Comput.
Control Systems (ICICCS). doi:10.1109/ Intell. Inform., vol. 6, no. 4, pp. 298–305,
iciccs51141.2021.9432236 2017.
11. Ranjini B Guruprasad, Kumar Saurav, 17. van Klompenburg, T., Kassahun, A., &
Sukanya Randhawa,”Machine learning Catal, C. (2020). Crop yield prediction using
methodologies for paddy yield Estimation in machine learning: A systematic literature
India: a case study”, 2019. review in computers and electronics in
12. S. P. Raja, B. Sawicka, Z. Stamenkovic Agriculture, 177, pp. 105709. doi: 10.1016/j.
and G. Mariammal, "Crop prediction compag.2020.105709.
based on characteristics of the agricultural 18. M. Liu, T. Wang, A. K. Skidmore, and X. Liu,
environment using various feature selection "Heavy metal-induced stress in rice crops
techniques and classifiers," IEEE Access, detected using multi-temporal Sentinel-2
vol. 10, pp. 23625-23641, 2022, doi: 10.1109/ satellite images," Sci. Total Environ.,
ACCESS.2022.3154350. vol. 637-638, pp. 18-29, Oct. 2018.
13. Venugopal, Anakha, S, Aparna, Mani, Jinsu, 19. K. E. Eswari and L. Vinitha, "Crop yield
Mathew, Rima, Williams, Vinu. "Crop yield prediction in tamil nadu using bayesian
prediction using machine learning algorithms." network," Int. J. Intell. Adv. Res. Eng.
International Journal of Engineering Research Comput., vol. 6, no. 2, pp. 1571-1576, 2018.
& Technology (IJERT) NCREIS – 2021, vol. 20. D. A. Reddy, B. Dadore, and A. Watekar,
09, no. 13, pp. 1-6, Dec. 2021. "Crop recommendation system to maximize
14. S. M. PANDE, P. K. RAMESH, A. ANMOL, crop yield in ramtek region using machine
B. R. AISHWARYA, K. ROHILLA and K. learning," Int. J. Sci. Res. Sci. Technol., vol.
SHAURYA, "Crop recommender system 6, no. 1, pp. 485-489, Feb. 2019.
using machine learning approach," 2021 21. Priya, P., Muthaiah, U., Balamurugan, M.
5th International Conference on Computing "Predicting yield of the crop using machine
Methodologies and Communication learning algorithm." International Journal of
(ICCMC), 2021, pp. 1066-1071, doi: 10.1109/ Computer Science and Mobile Computing,
ICCMC51019.2021.9418351. vol. 4, no. 5, pp. 1-7, May 2015.
15. Suresh, N., et al. "Crop yield prediction 22. Medar, Ramesh, S, Vijay, Shweta. "Crop
using random forest algorithm." 2021 7th yield prediction using machine learning
International Conference on Advanced techniques." International Journal of
Computing and Communication Systems Advanced Research in Computer Science
(ICACCS), pp. 279-282, 2021, doi: 10.1109/ and Software Engineering, vol. 9, no. 5, pp.
ICACCS51430.2021.9441871. 1-6, May 2019.
16. E. Manjula and S. Djodiltachoumy, "A model 23. https://www.kaggle.com/

Crop Yield Prediction and Efficient Use of Fertilizers
No ratings yet
Crop Yield Prediction and Efficient Use of Fertilizers
5 pages
Crop Yield Prediction to Achieve Precision Agriculture Using Machine Learning
No ratings yet
Crop Yield Prediction to Achieve Precision Agriculture Using Machine Learning
6 pages
Logcat Home Fota Update Log
No ratings yet
Logcat Home Fota Update Log
1,669 pages
Codeless Automation Testing Guide
No ratings yet
Codeless Automation Testing Guide
12 pages
Business Proposal (3)
No ratings yet
Business Proposal (3)
14 pages
OS
No ratings yet
OS
19 pages
Yield Prediction Using Machine Learning
0% (1)
Yield Prediction Using Machine Learning
8 pages
ServiceNowCIS DiscoveryDumps2024PDFExamQuestions78QAs
No ratings yet
ServiceNowCIS DiscoveryDumps2024PDFExamQuestions78QAs
5 pages
DUAL ADDRESS RAM Project Idea
No ratings yet
DUAL ADDRESS RAM Project Idea
9 pages
Faldo 465 p paper cutter
No ratings yet
Faldo 465 p paper cutter
38 pages
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
100% (1)
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
56 pages
Price Prediction Model of Fruits Vegetables and Pulses According To Weather
No ratings yet
Price Prediction Model of Fruits Vegetables and Pulses According To Weather
5 pages
Sat - 40.Pdf - Agricultural Product Price and Crop Cultivation Prediction Based On SMLT
No ratings yet
Sat - 40.Pdf - Agricultural Product Price and Crop Cultivation Prediction Based On SMLT
11 pages
Unit III
No ratings yet
Unit III
85 pages
SecurityPlus SY0-701 Categorized Acronyms
No ratings yet
SecurityPlus SY0-701 Categorized Acronyms
3 pages
Finalyearproject Report
No ratings yet
Finalyearproject Report
20 pages
Ourlog 7546
No ratings yet
Ourlog 7546
6 pages
Satisfiability of 3-CNF
No ratings yet
Satisfiability of 3-CNF
36 pages
Unit-4-1
No ratings yet
Unit-4-1
3 pages
CONFERENCE
No ratings yet
CONFERENCE
8 pages
Kali Muthu 2020
No ratings yet
Kali Muthu 2020
7 pages
CST Studio Suite - Thermal and Mechanical Simulation
No ratings yet
CST Studio Suite - Thermal and Mechanical Simulation
72 pages
1234 Report
No ratings yet
1234 Report
37 pages
Seminar Report Te It
No ratings yet
Seminar Report Te It
18 pages
Crop Recommendation System Synapsis
No ratings yet
Crop Recommendation System Synapsis
11 pages
CropPredictionUsingMLPythonReport
No ratings yet
CropPredictionUsingMLPythonReport
58 pages
Affiliate Marketing Cheat Sheet For South Africans
No ratings yet
Affiliate Marketing Cheat Sheet For South Africans
10 pages
Predicting Agriculture Yields Based On Machine Lea
No ratings yet
Predicting Agriculture Yields Based On Machine Lea
11 pages
Study on Machine-Learning Algorithms in Crop Yield Predictions Specific to Indian Agricultural Contexts
No ratings yet
Study on Machine-Learning Algorithms in Crop Yield Predictions Specific to Indian Agricultural Contexts
12 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
1234_report
No ratings yet
1234_report
37 pages
Preprints202305 1519 v1
No ratings yet
Preprints202305 1519 v1
20 pages
Crop Yield Journal
No ratings yet
Crop Yield Journal
10 pages
PRJT
No ratings yet
PRJT
26 pages
6. Research Paper
No ratings yet
6. Research Paper
7 pages
Crop Recomendation System Using Machine Learning
No ratings yet
Crop Recomendation System Using Machine Learning
5 pages
Adobe Scan Dec 16, 2023
No ratings yet
Adobe Scan Dec 16, 2023
10 pages
Crop Yield Prediction Using Machine Learning Algorithm
No ratings yet
Crop Yield Prediction Using Machine Learning Algorithm
6 pages
Crop Research
No ratings yet
Crop Research
5 pages
Data+Science+4(4),44-54+(1)
No ratings yet
Data+Science+4(4),44-54+(1)
11 pages
Catia Help HVAC-DESIGN-5
No ratings yet
Catia Help HVAC-DESIGN-5
5 pages
Navya_paper
No ratings yet
Navya_paper
6 pages
Agriculture
No ratings yet
Agriculture
10 pages
Chapter-1 1.1 Overview
No ratings yet
Chapter-1 1.1 Overview
44 pages
Crop Yield Prediction Using ML Algorithms
No ratings yet
Crop Yield Prediction Using ML Algorithms
8 pages
Crop Yield Prediction Using ML Algorithms
No ratings yet
Crop Yield Prediction Using ML Algorithms
8 pages
Determinants
No ratings yet
Determinants
18 pages
MCQ based on Java Script
No ratings yet
MCQ based on Java Script
7 pages
Agrilyst The Crop Advisor
No ratings yet
Agrilyst The Crop Advisor
7 pages
Tellicorp An Ensemble Model To Predict Crop Using Machine Learning Algorithms
No ratings yet
Tellicorp An Ensemble Model To Predict Crop Using Machine Learning Algorithms
11 pages
major_ppt[1]
No ratings yet
major_ppt[1]
10 pages
7 - AI for Smart Farming Machine Learning Models for Precision Crop Yield Prediction
No ratings yet
7 - AI for Smart Farming Machine Learning Models for Precision Crop Yield Prediction
6 pages
A Model For Prediction of Crop Yield: Pachaiyappas College India Pachaiyappas College India
No ratings yet
A Model For Prediction of Crop Yield: Pachaiyappas College India Pachaiyappas College India
8 pages
AN APPROACH FOR PREDICTION OF CROP
No ratings yet
AN APPROACH FOR PREDICTION OF CROP
9 pages
A Novel Strategy For Predicting Agriculture Crop and Its Yield
No ratings yet
A Novel Strategy For Predicting Agriculture Crop and Its Yield
5 pages
Crop Yield Prediction Using Machine Learning Techniques
No ratings yet
Crop Yield Prediction Using Machine Learning Techniques
5 pages
Research Paper Final
No ratings yet
Research Paper Final
10 pages
Final Project Black 00
No ratings yet
Final Project Black 00
33 pages
Crop Yield Prediction Using Machine Learning A Pra
No ratings yet
Crop Yield Prediction Using Machine Learning A Pra
13 pages
Good Health Clinic - Case Study v1
No ratings yet
Good Health Clinic - Case Study v1
15 pages
1st Paper Regarding Crop Prediton and Yield
No ratings yet
1st Paper Regarding Crop Prediton and Yield
4 pages
01 1. Survey Paper (Final) - Review#1-March3-2022
No ratings yet
01 1. Survey Paper (Final) - Review#1-March3-2022
5 pages
2nd Sessional Examination Schedule Spring 2024 - Ver-Final - As On 26-03-2024
No ratings yet
2nd Sessional Examination Schedule Spring 2024 - Ver-Final - As On 26-03-2024
5 pages
Crop Yield Pred Iction Using Regression Model
No ratings yet
Crop Yield Pred Iction Using Regression Model
6 pages
Crop
No ratings yet
Crop
5 pages
crop_yield_prediction_paper
No ratings yet
crop_yield_prediction_paper
6 pages
Crop Prediction and Fertilizer Recommend-99770082
No ratings yet
Crop Prediction and Fertilizer Recommend-99770082
5 pages
13 Is 0088 Ps Real Time Data Services UPDATE 2016
No ratings yet
13 Is 0088 Ps Real Time Data Services UPDATE 2016
2 pages
How To Download MTA Certificate
No ratings yet
How To Download MTA Certificate
18 pages
UGRD CYBS6101 Artificial Intelligence Fundamentals Final Lab Exam - 92 Over 100
No ratings yet
UGRD CYBS6101 Artificial Intelligence Fundamentals Final Lab Exam - 92 Over 100
17 pages
Class Notes
No ratings yet
Class Notes
19 pages
SQ15
No ratings yet
SQ15
32 pages
Crop Yield Prediction Using Machine Learning
No ratings yet
Crop Yield Prediction Using Machine Learning
4 pages
A Presentation On: "Ovonic Unified Memory"
No ratings yet
A Presentation On: "Ovonic Unified Memory"
16 pages
Crop Yield Prediction Using Machine Learning Algorithms IJERTCONV9IS13019
No ratings yet
Crop Yield Prediction Using Machine Learning Algorithms IJERTCONV9IS13019
5 pages
Agriculture Crop Yield Prediction Using Machine Learning
No ratings yet
Agriculture Crop Yield Prediction Using Machine Learning
8 pages
Final Synopsis
No ratings yet
Final Synopsis
9 pages
Machine Learning Paper
No ratings yet
Machine Learning Paper
5 pages
Smart Farming Report
No ratings yet
Smart Farming Report
67 pages
Use Cases of AI and ML in Agriculture: Smart Project Ideas
From Everand
Use Cases of AI and ML in Agriculture: Smart Project Ideas
Zemelak Goraga
No ratings yet
An Approach For Prediction of Crop Yield Using Machine Learning and Big Data Techniques
No ratings yet
An Approach For Prediction of Crop Yield Using Machine Learning and Big Data Techniques
9 pages
AI-Powered_Crop_Suggestion_Yield_Prediction_Disease_Detection_and_Soil_Monitoring
No ratings yet
AI-Powered_Crop_Suggestion_Yield_Prediction_Disease_Detection_and_Soil_Monitoring
5 pages
Crop Yield Prediction
No ratings yet
Crop Yield Prediction
5 pages
Iot5X Module 1 Activity - Testing An Iot House: Open The Activity Starting Point File
No ratings yet
Iot5X Module 1 Activity - Testing An Iot House: Open The Activity Starting Point File
6 pages
TP Link Manual Instalacion
No ratings yet
TP Link Manual Instalacion
2 pages
Crop&Fertilizer Synopsis
No ratings yet
Crop&Fertilizer Synopsis
7 pages
Crop Management System Using Machine Learning
No ratings yet
Crop Management System Using Machine Learning
3 pages
Crop Yield Prediction Using Machine Learning
No ratings yet
Crop Yield Prediction Using Machine Learning
4 pages
Advanced Analytics of Agricultural Datasets
From Everand
Advanced Analytics of Agricultural Datasets
Dr. Zemelak Goraga
No ratings yet
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
PT-7828 Series: IEC 61850-3 / EN 50155 24+4g-Port Layer 3 Gigabit Modular Managed Rackmount Ethernet Switches
No ratings yet
PT-7828 Series: IEC 61850-3 / EN 50155 24+4g-Port Layer 3 Gigabit Modular Managed Rackmount Ethernet Switches
9 pages
Lab Guide - End-to-End Scenario
100% (1)
Lab Guide - End-to-End Scenario
44 pages
BD 4251 Big Data Mining and Analytics
100% (2)
BD 4251 Big Data Mining and Analytics
3 pages
Wifi Smart Net Camera V380
No ratings yet
Wifi Smart Net Camera V380
3 pages

Crop Selection and Yield Prediction

Uploaded by

Crop Selection and Yield Prediction

Uploaded by

ISSN: 2347-4688, Vol. 11, No.(3) 2023, pg.

Current Agriculture Research Journal

Crop Selection and Yield Prediction using Machine

Department of Information Technology, AISSMS Institute of Information Technology, Pune, India.

Introduction sample data or experience rather than the ability to

© 2023 The Author(s). Published by Enviro Research Publishers.

Fig. 1: System Architecture

System Architecture 27 unique crops and 4 unique seasons. Crops are

India Agriculture Crop Production23 Crop Recommendation23

Fig. 2: Preprocessing for Crop prediction

Yield Prediction did data transformation for India Agriculture Crop

Fig. 3: Preprocessing for Yield Prediction

Train Test Splitting of Data Training and Evaluation of Models

Application and Advantages over existing Results and Discussions

Fig. 4: Feature Importance for Crop Prediction

Fig. 5: Accuracy Metric for Classification Models

Fig. 6: Code Snippet for Feature Importance in Yield Prediction

Yield Prediction 1. Train the Random Forest Regressor

Fig. 7: Feature Importance for Yield Prediction

Fig. 8: Regression Results for Yield Prediction

1. Alpaydın, Ethem. "Introduction to machine tation of crop yield prediction model in

You might also like