Mini Poject-Cardio Good Fitness
Module 1
Submitted By:
Rathin Kukreja
Project Aim:
Minimum Steps for exploration:
1. Importing the dataset into R
2. Understanding the structure of dataset
3. Graphical exploration
4. Descriptive statistics
5. Insights from the dataset
Setting Working Directory:
> setwd("C:/Users/RATHIN KUKREJA/Desktop")
Importing dataset into R
> cardio_data_set<-read.csv("CardioGoodFitness.csv")
> cardio_data_set
Dimensions of Data Set
> dim(cardio_data_set)
[1] 180 9
Summarising the Data Set
> summary(cardio_data_set)
Product Age Gender Education MaritalStatus
TM195:80 Min. :18.00 Female: 76 Min. :12.00 Partnered:107
TM498:60 1st Qu.:24.00 Male :104 1st Qu.:14.00 Single : 73
TM798:40 Median :26.00 Median :16.00
Mean :28.79 Mean :15.57
3rd Qu.:33.00 3rd Qu.:16.00
Max. :50.00 Max. :21.00
Usage Fitness Income Miles
Min. :2.000 Min. :1.000 Min. : 29562 Min. : 21.0
1st Qu.:3.000 1st Qu.:3.000 1st Qu.: 44059 1st Qu.: 66.0
Median :3.000 Median :3.000 Median : 50597 Median : 94.0
Mean :3.456 Mean :3.311 Mean : 53720 Mean :103.2
3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.: 58668 3rd Qu.:114.8
Max. :7.000 Max. :5.000 Max. :104581 Max. :360.0
Structure of Each Feature
> str(cardio_data_set)
'data.frame': 180 obs. of 9 variables:
$ Product : Factor w/ 3 levels "TM195","TM498",..: 1 1 1 1 1 1 1 1 1
1 ...
$ Age : int 18 19 19 19 20 20 21 21 21 21 ...
$ Gender : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 1 2 2 1
...
$ Education : int 14 15 14 12 13 14 14 13 15 15 ...
$ MaritalStatus: Factor w/ 2 levels "Partnered","Single": 2 2 1 2 1 1 1 2
2 1 ...
$ Usage : int 3 2 4 3 4 3 3 3 5 2 ...
$ Fitness : int 4 3 3 3 2 3 3 3 4 3 ...
$ Income : int 29562 31836 30699 32973 35247 32973 35247 32973 352
47 37521 ...
$ Miles : int 112 75 66 85 47 66 75 85 141 85 ...
Bar Plotting of Age
> barplot(table(cardio_data_set$Age))
Panelling Graphics
> par (mfrow=c(3,3))
> hist(cardio_data_set$Age, main="Age Distribution", xlab="Age", ylab="Fre
quency", col="blue")
> hist(cardio_data_set$Education, main="Education", xlab="Education", ylab
="Frequency", col="blue")
> hist(cardio_data_set$Usage, main="Usage", xlab="Usage", ylab="Frequency"
, col="blue")
> boxplot(cardio_data_set$Age,horizontal=TRUE, main="Age Distribution", xl
ab="Age", ylab="Frequency", col="red")
> boxplot(cardio_data_set$Education,horizontal=TRUE, main="Education", xla
b="Education", ylab="Frequency", col="red")
> boxplot(cardio_data_set$Usage,horizontal=TRUE, main="Usage", xlab="Usage
", ylab="Frequency", col="red")
It is clear from the below graphs majority Age group using cardio products is between 20-25 years
and education level between 14-16.
As evident from the below graphs that majority people using cardio fitness products have income
between 40 K units to 60K units.