ML Project Report
ML Project Report
Problem Statement:
You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This survey was
conducted on 1525 voters with 9 variables. You have to build a model, to predict which party a voter will vote for on
the basis of the given information, to create an exit poll that will help in predicting overall win and seats covered by
a particular party.
1.1 Read the dataset. Describe the data briefly. Interpret the inferences for each. Initial steps
like head() .info(), Data Types, etc . Null value check, Summary stats, Skewness must be
discussed
1.2 Perform EDA. Perform Uni-variate and Bivariate Analysis. Do exploratory data analysis.
Check for Outliers.
- UniVariate Analysis:
- Bi-Variate and Multi-Variate Analysis:
- Checking for Correlations:
1.3 Encode the data (having string values) for Modelling. Is Scaling necessary here or not?
Data Split: Split the data into train and test (70:30).
- Encoding the Target Variable to get the values in 0’s and 1’s:
- KNN Model:
- Gaussian Naïve Bayes Model:
1.6 Apply Model Tuning, Bagging (Random Forest should be applied for Bagging), and
Boosting
1. The Worst Performing model are KNN and Bagging as there recall values for 1’s is very less in Test set.
2. The best model is Naive Bayes model as the recall values and the accuracy is stable for both train and test
set.
Bagging and Gradient Boosting which performed well in Train set have performed worst in Test set.
Naive Bayes and KNN has shown some stability but the difference between train and set scores lead to
perform SMOTE analysis, which increases the recall scores of both.
The over all best model for this prediction is Naive Bayes Model with and without Grid search.
Naive Bayes with Smote and KNN with SMOTE gives better accuracy, recall and f1 scores for both training
and Testing set.
In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be
looking at the following speeches of the Presidents of the United States of America:
1.1 Find the number of characters, words, and sentences for the mentioned documents
The Inaugural corpora contains speeches by three former presidents of the United States, Pres. Roosevelt,
Pres. Kennedy and Pres. Nixon.
- Word Count:
- Characters Count:
- Sentence Count:
- Lowercase conversion:
- Punctuation and Special Characters:
- Stemming:
- Stop words:
1.3 Which word occurs the most number of times in his inaugural address for each president?
Mention the top three words. (after removing the stop-words)
- President Roosevelt:
- President Kennedy:
- President Nixon:
1.4 Plot the word cloud of each of the speeches of the variable. (after removing the stop -
words)
- President Roosevelt:
- President Kennedy:
- President Nixon:
1.5 Inference:
Pres. Nixon's Speech is the longest with respect to the other two president's speeches with a total word count
of 1769 with 51 sentences.
In all the three speeches, us has been used the highest number of times followed by nation and america.