0% found this document useful (0 votes)

71 views25 pages

DSR 2879

Uploaded by

radha gulati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views25 pages

DSR 2879

Uploaded by

radha gulati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1| Data Science Using R

Week(s) Experiment
R AS CALCULATOR APPLICATION
1 a. Using with and without R objects on console
b. Using mathematical functions on console
c. Write an R script, to create R objects for calculator application and save in a specified
locationin disk.

2 DESCRIPTIVE STATISTICS IN R
a. Write an R script to find basic descriptive statistics using summary, str, quartile
function on mars& cars datasets.
b. Write an R script to find subset of dataset by using subset (), aggregate ()
functions on iris dataset.
3 READING AND WRITING DIFFERENT TYPES OF
DATASETS a. Reading different types of data sets (.txt, .csv) from
Web and disk and writing in file in specific disk location.
b. Reading Excel data sheet in R.
c. Reading XML dataset in R.
4 VISUALIZATIONS
a. Find the data distributions using box and scatter plot.
b. Find the outliers using plot.
c. Plot the histogram, bar chart and pie chart on sample data.

5 CORRELATION AND
COVARIANCE a. Find the correlation
matrix.
b. Plot the correlation plot on dataset and visualize giving an overview of relationships among
data on iris data.
c. Analysis of covariance: variance (ANOVA), if data have categorical variables on iris data.

6 REGRESSION MODEL
Import a data from web storage. Name the dataset and now do Logistic Regression to find out
relation between variables that are affecting the admission of a student in a institute based on
his or her GRE score, GPA obtained and rank of the student. Also check the model is fit or not.
Require (foreign), require (MASS).

7 MULTIPLE REGRESSION MODEL

Apply multiple regressions, if data have a continuous Independent variable. Apply on
above dataset.
8 REGRESSION MODEL FOR PREDICTION
Apply regression Model techniques to predict the data on above dataset.

9 CLASSIFICATION MODEL
a. Install relevant package for classification.
b. Choose classifier for classification problem.
c. Evaluate the performance of classifier.
10 CLUSTERING MODEL
a. Clustering algorithms for unsupervised classification.
b. Plot the cluster data using R visualizations.
2| Data Science Using R
WEEK-1

R AS CALCULATOR APPLICATION

Aim: To perform various operations on console using with and without objects
A. Using without R objects on console

Using with R objects on console:

B. Using mathematical functions on console

C. Write an R script, to create R objects for calculator application and save in a

specified location in disk.
<- function(x, y) {
return(x + y)
3| Data Science Using R
}
Subtract
<- function(x, y)
{
return(x -
y) }
multiply
<- function(x, y)
{
return(x *
y) }
Divide
<- function(x, y)
{
return(x /
y) }
# take input from the user
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))
num1 = as.integer(readline(prompt="Enter first number: "))
num2 = as.integer(readline(prompt="Enter second number: "))
operator <- switch(choice,"+","-","*","/")
result <- switch(choice, add(num1, num2), subtract(num1,
num2), multiply(num1, num2), divide(num1, num2))
print(paste(num1, operator, num2, "=", result))
output:
[1] "Select operation."
[1] "1.Add"
[1] "2.Subtract"
[1] "3.Multiply"
[1] "4.Divide"
Enter choice[1/2/3/4]: 3
Enter first number: 300
Enter second number: 4
[1] "300/4”= 75

Theory: In the console when the objects are declared these data will be stored in
environment and these will be stored until the next same variable declared and then it
will be erased after the ending the session

4| Data Science Using R

WEEK-2

DESCRIPTIVE STATISTICS IN
Aim: Write a R programRto find the descriptive statistics using summary

a.Write an R script to find basic descriptive statistics using summary,

str, quartile function on mtcars& cars datasets

5| Data Science Using R

6| Data Science Using R
7| Data Science Using R
b. Write an R script to find subset of dataset by using subset (), aggregate () functions on iris
dataset.

Theory:
Descriptive statistics, as the name implies, refers to the statistics that describe your dataset.
For a large dataset, it gives you a bite-sized summary that can help you understand your data.
Imagine this as being the Resume of the data you are going to work with, it tells you what
your data holds. Statisticians often need to create a descriptive statistics report as a first step
before diving into rigorous analytics and inferential statistics of a data.

8| Data Science Using R

WEEK-3
00000
READING AND WRITING DIFFERENT TYPES
Aim: Write a script to read and write different types of datasetsTS
OFDATASE
a. Reading different types of data sets (.txt, .csv) from web and disk and writing in file in
specific disk location.

Theory:
Usually we will be using data already in a file that we need to read into R in order to work on
it. R can read data from a variety of file formats

To read an entire data frame directly, the external file will normally have a special form
The first line of the file should have a name for each variable in the data frame.

9| Data Science Using R

WEEK-4
Each additional line of the file has as its first item a row label and the values for each variable
Aim: Write a R script to visualize the various types of plot types
VISUALIZATION
S using box and scatterd plot
a. Find the data distribution

10 | D a t a S c i e n c e U s i n g R
b.Find the outliers using plot.

c. Plot the histogram, bar chart and pie chart on sample data.

Histogram:

11 | D a t a S c i e n c e U s i n g R
Barchart:

Piechart:

Theory: boxplot() in R helps to visualize the distribution of the data by quartile and detect
the presence of outliers. You can use the geometric object geom_boxplot () from ggplot2
library to draw a boxplot () in R.

12 | D a t a S c i e n c e U s i n g R
WEEK-5

CORRELATION AND COVARIANCE

Aim: Write a R script to plot the correlation on iris data set

a) How to find a correlation matrix and plot the correlation on iris data set

13 | D a t a S c i e n c e U s i n g R
b. Plot the correlation plot on dataset and visualize giving an overview of
relationships among data on iris data.

c. Analysis of covariance: variance (ANOVA) if data have categorical variables

on iris data.
Theory:

Covariance in R programming
In Statistics, Covariance is the measure of the relation between two variables of a dataset.
That is, it depicts the way two variables are related to each other.

For an instance, when two variables are highly positively correlated, the variables move
ahead in the same direction.
Correlation in R programming

Correlation on a statistical basis is the method of finding the relationship between the
variables in terms of the movement of the data. That is, it helps us analyze the effect of
changes made in one variable over the other variable of the dataset.

When two variables are highly (positively) correlated, we say that the variables depict the
same information and have the same effect on the other data variables of the dataset .

14 | D a t a S c i e n c e U s i n g R
WEEK-6

REGRESSION MODEL

Aim: Write a script to read the data from web and to get the relation between variables

Import a data from web storage. Name the dataset and now do Logistic Regression to find out
relation between variables that are affecting the admission of a student in a institute based on
his or her GRE score, GPA obtained and rank of the student. Also check the model is fit or
not. require (foreign),require(MASS).

Theory:
Regression Analysis in R
Regression analysis is a group of statistical processes used in R programming and statistics to
determine the relationship between dataset variables. Generally, regression analysis is used
to determine the relationship between the dependent and independent variables of the
dataset. Regression analysis helps to understand how dependent variables change when one
of the independent variables is changes and other independent variables are kept constant.
This helps in building a regression model and further, helps in forecasting the values with
respect to a change in one of the independent variables. On the basis of types of dependent
variables, a number of independent variables, and the shape of the regression line, there are
4 types of regression analysis techniques i.e., Linear Regression, Logistic Regression,
Multinomial Logistic Regression and Ordinal Logistic Regression.

15 | D a t a S c i e n c e U s i n g R
WEEK-6

MULTIPLE REGRESSION MODEL

Aim: Write a script to Apply multiple regression

a. Apply multiple regression, if data have a continuous independent variable

16 | D a t a S c i e n c e U s i n g R
Theory:
Multiple linear regression is a statistical analysis technique used to predict a variable’s
outcome based on two or more variables. It is an extension of linear regression and also
known as multiple regression. The variable to be predicted is the dependent variable, and
the variables used to predict the value of the dependent variable are known as independent
or explanatory variables.

The multiple linear regression enables analysts to determine the variation of the model and
each independent variable’s relative contribution. Multiple regression is of two types,
linear and non-linear regression.

17 | D a t a S c i e n c e U s i n g R
WEEK-8

REGRESSION MODEL FOR PREDICTION

Aim: Write a script to Apply regression Model techniques to predict the data on various
sets

a. Apply regression Model techniques to predict the data on above dataset.

Theory:
In R programming, predictive models are extremely useful for forecasting future outcomes
and estimating metrics that are impractical to measure. For example, data scientists could
use predictive models to forecast crop yields based on rainfall and temperature, or to
determine whether patients with certain traits are more likely to react badly to a new
medication

18 | D a t a S c i e n c e U s i n g R
WEEK-9

CLASSIFICATION MODEL

Aim: Write a r script to install the packages and to classify the various
problems
a.Install relevant package for classification
install.packages("rpart.plot")
install.packages("tree")
install.packages ("ISLR")
install.packages("rattle")
library(tree)
library(ISLR)
library(rpart.plot)
library(rattle)

b. Choose classifier for classification problem

SOURCE CODE:
19 | D a t a S c i e n c e U s i n g R
> tree.fit <- tree(Salary~Hits+Years, data=Hitters)
> summary(tree.fit)
Regression tree:
tree(formula = Salary ~ Hits + Years, data = Hitters) Number of terminal nodes:
8
Residual mean deviance: 101200 = 25820000 / 255 Distribution of residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1238.00 -157.50 -38.84 0.00 76.83 1511.00

plot(tree.fit, uniform=TRUE,margin=0.2) text(tree.fit, use.n=TRUE, all=TRUE,

cex=.8)
#plot(tree.fit)
>split <- createDataPartition(y=Hitters$Salary, p=0.5,
list=FALSE) > train <- Hitters[split,]
> test <- Hitters[-split,] #Create tree model
> trees <- tree(Salary~., train)
> plot(trees)
> text(trees, pretty=0)

# Cross validate to see whether pruning the tree will improve Performance
OUTPUT:

SOURCE CODE:
#Cross validate to see whether pruning the tree will improve performance
> cv.trees <- cv.tree(trees)
> plot(cv.trees)
> prune.trees <- prune.tree(trees, best=4)
> plot(prune.trees)
20 | D a t a S c i e n c e U s i n g R
> text(prune.trees, pretty=0)

OUTPUT:

SOURCE CODE:
yhat <- predict(prune.trees, test)
> plot(yhat, test$Salary)
> abline(0,1 [1] 150179.7
> mean((yhat - test$Salary)^2)
> [1] 150179.7

OUTPUT:

> mean((yhat - test$Salary)^2) [1] 150179.7

Theory:
In classification in R, we try to predict a target class. The possible classes are
already known and so are all of the classes’ identifying properties. The
algorithm needs to identify which class does a data object belong to.
21 | D a t a S c i e n c e U s i n g R
WEEK-9

CLUSTERING MODEL

Aim: Write a script to cluster the various algorithms for unsupervised classification

a. Clustering algorithms for unsupervised classification.

1. Clustering algorithms for unsupervised classification.

2. library(cluster)
> set.seed(20)
> irisCluster <- kmeans(iris[, 3:4], 3, nstart = 20)
# nstart = 20. This means that R will try 20 different random starting assignments and then
select the one with the lowest within cluster variation.
> irisCluster

OUTPUT:

Petal.Length Petal.Width 1 1.462000 0.246000

2 4.269231 1.342308
3 5.595833 2.037500

Clustering vector:

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[42] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2
[83] 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3
[124] 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3

Within cluster sum of squares by cluster:

[1] 2.02200 13.05769 16.29167

(between_SS / total_SS = 94.3 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

SOURCE CODE:

>irisCluster$cluster <- as.factor(irisCluster$cluster)

>ggplot(iris, aes(Petal.Length, Petal.Width, color = irisCluster$cluster)) + geom_point()

OUTPUT:

22 | D a t a S c i e n c e U s i n g R
SOURCE CODE:
> d <- dist(as.matrix(mtcars)) # find distance matrix
> hc <- hclust(d) # apply hirarchical clustering
> plot(hc) # plot the dendrogram
OUTPUT:

2.Plot the cluster data using R visualizations.

SOURCE CODE:
## generate 25 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
cbind(rnorm(15,5,0.5), rnorm(15,5,0.5))) clusplot(pam(x, 2))

OUTPUT:

SOURCE CODE:
## add noise, and try again :
x4 <- cbind(x, rnorm(25), rnorm(25)) clusplot(pam(x4, 2))

23 | D a t a S c i e n c e U s i n g R
OUTPUT:

Theory:
In clustering in R, we try to group similar objects together. The principle behind R clustering is that
objects in a group are similar to other objects in that set and no objects in different groups are similar
to each other.

24 | D a t a S c i e n c e U s i n g R

Introduction to R for Statistics
No ratings yet
Introduction to R for Statistics
56 pages
Unit 4 - R Programming
No ratings yet
Unit 4 - R Programming
26 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
73 pages
Unit-15 Data Analysis and R
No ratings yet
Unit-15 Data Analysis and R
12 pages
R Manual
No ratings yet
R Manual
10 pages
R Data Analytics Lab Experiments
No ratings yet
R Data Analytics Lab Experiments
25 pages
R Studio Basics: Data Mining & Operations
No ratings yet
R Studio Basics: Data Mining & Operations
7 pages
Medical Students' Guide to Statistics
No ratings yet
Medical Students' Guide to Statistics
67 pages
Data Science Practical Completion Report
No ratings yet
Data Science Practical Completion Report
31 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Disadvantages of R Programming in Academia
No ratings yet
Disadvantages of R Programming in Academia
4 pages
DAUR Lab Manual
No ratings yet
DAUR Lab Manual
14 pages
Data Structures and Statistics in R
No ratings yet
Data Structures and Statistics in R
78 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
R Programming Exercises for Beginners
No ratings yet
R Programming Exercises for Beginners
42 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
R Programming Exercises for Beginners
No ratings yet
R Programming Exercises for Beginners
38 pages
Data Science Basics with R Analytics
No ratings yet
Data Science Basics with R Analytics
53 pages
Csc121 Full Notes
No ratings yet
Csc121 Full Notes
227 pages
RP Lab Manual
No ratings yet
RP Lab Manual
24 pages
Introduction to R for Statistical Analysis
No ratings yet
Introduction to R for Statistical Analysis
77 pages
R Studio Manual FT
No ratings yet
R Studio Manual FT
39 pages
R Viva Ques
No ratings yet
R Viva Ques
24 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
14 pages
Rintro
No ratings yet
Rintro
14 pages
R Tutorial: Basics and Functions
No ratings yet
R Tutorial: Basics and Functions
9 pages
Introduction to R for Statistical Graphs
No ratings yet
Introduction to R for Statistical Graphs
20 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Practical R Programming Guide
No ratings yet
Practical R Programming Guide
103 pages
Basic Statistics
No ratings yet
Basic Statistics
66 pages
R Basics: Data Handling and Plotting
No ratings yet
R Basics: Data Handling and Plotting
4 pages
RStudio Econometrics I Overview
No ratings yet
RStudio Econometrics I Overview
77 pages
R Programming
No ratings yet
R Programming
3 pages
R Data Types and Variables Explained
No ratings yet
R Data Types and Variables Explained
19 pages
Essential R
No ratings yet
Essential R
261 pages
R Basics: Math, Data Frames, Analysis
No ratings yet
R Basics: Math, Data Frames, Analysis
18 pages
Introduction to R Programming Language
No ratings yet
Introduction to R Programming Language
27 pages
R Practicals
No ratings yet
R Practicals
32 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
34 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
17 pages
MCSL 229 (English)
No ratings yet
MCSL 229 (English)
15 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
SELF GUIDE B in R
No ratings yet
SELF GUIDE B in R
4 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
Exploratory Data Analysis Techniques
No ratings yet
Exploratory Data Analysis Techniques
50 pages
Unit 5 - R and Data Analysis
No ratings yet
Unit 5 - R and Data Analysis
29 pages
An R Companion To Applied Regression 2nd Edition
No ratings yet
An R Companion To Applied Regression 2nd Edition
538 pages
Scatterplots and R Programming Basics
No ratings yet
Scatterplots and R Programming Basics
9 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Modeling and Visulizing Data Using R: A Practical Introduction
No ratings yet
Modeling and Visulizing Data Using R: A Practical Introduction
106 pages
? Overview of R Programming Language Unit 5
No ratings yet
? Overview of R Programming Language Unit 5
23 pages
R Data Visualization & Descriptive Analytics
No ratings yet
R Data Visualization & Descriptive Analytics
48 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
Depression Identification Using EEG Signals Via A Hybrid of LSTM and Spiking Neural Networks
No ratings yet
Depression Identification Using EEG Signals Via A Hybrid of LSTM and Spiking Neural Networks
13 pages
Reliability Analysis and Weibull Distribution
No ratings yet
Reliability Analysis and Weibull Distribution
21 pages
Computer Vision - Unit 9 - Week 7
No ratings yet
Computer Vision - Unit 9 - Week 7
4 pages
Nondeterministic Search in AI MDPs
No ratings yet
Nondeterministic Search in AI MDPs
11 pages
Introduction to SCILAB Programming
No ratings yet
Introduction to SCILAB Programming
24 pages
Audio Classification Feature Analysis
No ratings yet
Audio Classification Feature Analysis
6 pages
Lecture 1
No ratings yet
Lecture 1
179 pages
CSC 411: Intro to Machine Learning
No ratings yet
CSC 411: Intro to Machine Learning
92 pages
Efficient CBIR System Using SVM & Evolutionary Algorithms
No ratings yet
Efficient CBIR System Using SVM & Evolutionary Algorithms
7 pages
Deep Learning: Estruturas e Conceitos
No ratings yet
Deep Learning: Estruturas e Conceitos
610 pages
Lasso Algorithms and Applications
No ratings yet
Lasso Algorithms and Applications
44 pages
Chapter 5 Rethorical Modes
100% (1)
Chapter 5 Rethorical Modes
8 pages
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
No ratings yet
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
233 pages
Data Mining for Students
No ratings yet
Data Mining for Students
14 pages
Leaf Disease Detection with Decision Trees
No ratings yet
Leaf Disease Detection with Decision Trees
29 pages
Heart Disease Prediction Project Report
No ratings yet
Heart Disease Prediction Project Report
29 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
275 pages
Python Basics For Machine Learning and Introduction To Machine Learning
No ratings yet
Python Basics For Machine Learning and Introduction To Machine Learning
6 pages
Survey of Hybrid Classifier Systems
No ratings yet
Survey of Hybrid Classifier Systems
15 pages
Machine Learning Algorithm Guide
No ratings yet
Machine Learning Algorithm Guide
4 pages
Counterfactual Explanations For Machine Learning A Review
No ratings yet
Counterfactual Explanations For Machine Learning A Review
13 pages
Computer Science - Data Warehouse MCQS With Answer
No ratings yet
Computer Science - Data Warehouse MCQS With Answer
35 pages
Credit Risk Assessment Models
No ratings yet
Credit Risk Assessment Models
15 pages
ML Syllabus Cse251a
No ratings yet
ML Syllabus Cse251a
48 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
Ex - No.5 - Naïve Bayesian Classifier
No ratings yet
Ex - No.5 - Naïve Bayesian Classifier
4 pages
Fuzzy Logic in Academic Performance
No ratings yet
Fuzzy Logic in Academic Performance
11 pages
ML Laboratory Lesson Plan-BISL607
No ratings yet
ML Laboratory Lesson Plan-BISL607
7 pages
Predicting Good Probabilities With Supervised Learning: Alexandru Niculescu-Mizil Rich Caruana
No ratings yet
Predicting Good Probabilities With Supervised Learning: Alexandru Niculescu-Mizil Rich Caruana
8 pages
PAC Learning Overview and Concepts
No ratings yet
PAC Learning Overview and Concepts
3 pages
Artificial Neural Networks Guide
No ratings yet
Artificial Neural Networks Guide
63 pages
AnIntrusion Detection System Over The IoT Data Streams Using Explainable Artificial Intelligence (XAI)
No ratings yet
AnIntrusion Detection System Over The IoT Data Streams Using Explainable Artificial Intelligence (XAI)
30 pages
Bitcoin Final
No ratings yet
Bitcoin Final
16 pages
AI Unit-4
No ratings yet
AI Unit-4
58 pages
Addressing Gender Bias in AI Systems
No ratings yet
Addressing Gender Bias in AI Systems
16 pages