0% found this document useful (0 votes)
12 views18 pages

Data Analytics Lesson 12 Slides

Lesson 4 of the Data Analytics diploma focuses on linear regression, including multiple linear regression with multiple predictor variables and model fit statistics in R. It introduces data frames and the Tidyverse collection of tools for data transformation and visualization, emphasizing the use of tibbles for better data handling. Additionally, it covers the Lubridate package for managing dates and times in R, including functions for converting strings to date formats.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

Data Analytics Lesson 12 Slides

Lesson 4 of the Data Analytics diploma focuses on linear regression, including multiple linear regression with multiple predictor variables and model fit statistics in R. It introduces data frames and the Tidyverse collection of tools for data transformation and visualization, emphasizing the use of tibbles for better data handling. Additionally, it covers the Lubridate package for managing dates and times in R, including functions for converting strings to date formats.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Diploma in

Data Analytics
Lesson 4: Linear regression continued…
Linear regression continued
Data frames
Dates in R

Lesson Objectives
Lesson 4
Linear regression
continued
Recap linear
regression

• 𝑦ො = 𝑎 + 𝑏𝑥𝑖
• lm.fit()
• Summary(lm.fit)
Multiple linear
regression
• More than one predictor variable
• 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + … + 𝛽𝑛 𝑋𝑛 + 𝜀
• Estimate of standard
deviation of 𝜀
• Average deviation of
Residual response from true
standard error regression line
• Measure the lack of fit
of the model
• Measures amount of
variability left
unexplained by model
• 0 < RSS < 1
𝑅 statistic
2 • Close to 0
• Model does not
(RSS) explain variability
in response well
• Close to 1
• Model explains
variability in
response well
Model fit
statistics in R
Summary(lm.fit)
Data frames
Image by: https://www.linkedin.com/learning/learning-the-r-
tidyverse/what-is-the-tidyverse
Tidyverse recap
• Collection of data analytics
tools contained in R for
transforming and visualising
data
• Tibbles
• Data frame with tweaks
• Create tibble from data
More about frame
• As.tibble()
tibbles • Create new tibble
(package contained in tidyverse) • Tibble()
• Define a tibble row by
row
• Tribble()
Tibble vs
data frame
• Tibble have refined print method
• Shows first 10 row, columns fit on screen, large
data is easy to work with

• Subsetting
• Stricter
Dates & times
in R
Lubridate package
in R
• Easier to handle dates and times
in R
• Built by: H. Wickham & G.
Grolemund
• Maintained by: V. Spinu
Current dates and times

• Name of current timezone: Sys.timezone()


• Current date: Sys.Date()
• Current date and time: Sys.time()

Lubridate package
• Current date and time: Now()
• Convert strings to
date format:

Converting as.Date(y, format =

dates to “%m/%d/%Y”)

strings • Lubridate package:


ymd(x), mdy(x)
Challenge
Create multiple linear
regression models each with
different predictor variables.
Which model fits the data best
and why?

#exploredata

You might also like