Diploma in
Data Analytics
Lesson 4: Linear regression continued…
Linear regression continued
Data frames
Dates in R
Lesson Objectives
Lesson 4
Linear regression
continued
Recap linear
regression
• 𝑦ො = 𝑎 + 𝑏𝑥𝑖
• lm.fit()
• Summary(lm.fit)
Multiple linear
regression
• More than one predictor variable
• 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + … + 𝛽𝑛 𝑋𝑛 + 𝜀
• Estimate of standard
deviation of 𝜀
• Average deviation of
Residual response from true
standard error regression line
• Measure the lack of fit
of the model
• Measures amount of
variability left
unexplained by model
• 0 < RSS < 1
𝑅 statistic
2 • Close to 0
• Model does not
(RSS) explain variability
in response well
• Close to 1
• Model explains
variability in
response well
Model fit
statistics in R
Summary(lm.fit)
Data frames
Image by: https://www.linkedin.com/learning/learning-the-r-
tidyverse/what-is-the-tidyverse
Tidyverse recap
• Collection of data analytics
tools contained in R for
transforming and visualising
data
• Tibbles
• Data frame with tweaks
• Create tibble from data
More about frame
• As.tibble()
tibbles • Create new tibble
(package contained in tidyverse) • Tibble()
• Define a tibble row by
row
• Tribble()
Tibble vs
data frame
• Tibble have refined print method
• Shows first 10 row, columns fit on screen, large
data is easy to work with
• Subsetting
• Stricter
Dates & times
in R
Lubridate package
in R
• Easier to handle dates and times
in R
• Built by: H. Wickham & G.
Grolemund
• Maintained by: V. Spinu
Current dates and times
• Name of current timezone: Sys.timezone()
• Current date: Sys.Date()
• Current date and time: Sys.time()
Lubridate package
• Current date and time: Now()
• Convert strings to
date format:
Converting as.Date(y, format =
dates to “%m/%d/%Y”)
strings • Lubridate package:
ymd(x), mdy(x)
Challenge
Create multiple linear
regression models each with
different predictor variables.
Which model fits the data best
and why?
#exploredata