0% found this document useful (0 votes)
222 views37 pages

Time Series Forecasting Fundamentals

This document provides an overview of time series forecasting concepts and steps. It discusses key concepts like predictive vs descriptive goals in time series analysis, and the eight steps of forecasting: 1) setting goals, 2) collecting data, 3) exploring data, 4) setting horizons, 5) choosing techniques, 6) applying techniques and making forecasts, 7) evaluating performance, and 8) adapting models. It also discusses predictor variables, examples of time series data, and issues around data quality that are important for accurate forecasting.

Uploaded by

ravi m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views37 pages

Time Series Forecasting Fundamentals

This document provides an overview of time series forecasting concepts and steps. It discusses key concepts like predictive vs descriptive goals in time series analysis, and the eight steps of forecasting: 1) setting goals, 2) collecting data, 3) exploring data, 4) setting horizons, 5) choosing techniques, 6) applying techniques and making forecasts, 7) evaluating performance, and 8) adapting models. It also discusses predictor variables, examples of time series data, and issues around data quality that are important for accurate forecasting.

Uploaded by

ravi m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

1 Forecasting

Concepts

2 Exploratory TS Data
Analysis

3 Time Series Object


4 Spotting Trends in Time
Series

Time Series Analysis : Chapter 1 Code

Ravi Mummigatti

1 Forecasting
Concepts
1.1 Fundamental
Concepts
Forecasting is a technique that uses historical data to make
estimates about the future evolutions
(trends)

Through forecasting we try to predict what will happen in the


future based on what happened in
the past

Forecasting can be used in any organization (company or


institution) that works with
quantitative data like: production ;
inventory ; number of customers ; market demand ; sales
volume (or sales
value) ; prices ; revenue and profit ; costs/expenditures ; inflation ;
unemployment rate ; traffic and so on.

Forecasting is required in many situations: deciding whether to


build another power generation
plant in the next five years requires
forecasts of future demand; scheduling staff in a call centre
next week
requires forecasts of call volumes; stocking an inventory requires
forecasts of stock
requirements. Forecasts can be required several years
in advance (for the case of capital
investments), or only a few minutes
beforehand (for telecommunication routing). Whatever the
circumstances
or time horizons involved, forecasting is an important aid to effective
and efficient
planning.

The predictability of an event or a quantity depends on several


factors including:

1. how well we understand the factors that contribute to


it;

2. how much data is available;

3. whether the forecasts can affect the thing we are trying to


forecast.

For example, forecasts of electricity demand can be highly accurate


because all three conditions are
usually satisfied. We have a good idea
of the contributing factors: electricity demand is driven largely
by
temperatures, with smaller effects for calendar variation such as
holidays, and economic conditions.
On the other hand, when forecasting currency exchange rates, only one
of the conditions is satisfied:
there is plenty of available data.
However, we have a limited understanding of the factors that affect
exchange rates, and forecasts of the exchange rate have a direct effect
on the rates themselves. If there
are well-publicized forecasts that the
exchange rate will increase, then people will immediately adjust
the
price they are willing to pay and so the forecasts are
self-fulfilling.

Often in forecasting, a key step is knowing when something can be


forecast accurately, and when
forecasts will be no better than tossing a
coin. Good forecasts capture the genuine patterns and
relationships
which exist in the historical data, but do not replicate past events
that will not occur
again

Forecasting situations vary widely in their time horizons,


factors determining actual outcomes,
types of data patterns, and many
other aspects. Forecasting methods can be simple, such as using
the most
recent observation as a forecast (which is called the naïve
method), or highly complex,
such as neural nets and econometric
systems of simultaneous equations. Sometimes, there will
be no data
available at all. For example, we may wish to forecast the sales of a
new product in its
first year, but there are obviously no data to work
with. In situations like this, we use judgmental
forecasting.The choice
of method depends on what data are available and the predictability of
the quantity to be forecast.

1.2 Forecasting &


Planning Goals
Forecasting is a common statistical task in business, where it helps
to inform decisions about the
scheduling of production, transportation
and personnel, and provides a guide to long-term
strategic planning.
However, business forecasting is often done poorly, and is frequently
confused with planning and goals. They are three different things.

Forecasting is about predicting the


future as accurately as possible, given all of the information
available,
including historical data and knowledge of any future events
that might impact the forecasts.

Goals are what you would like to have


happen. Goals should be linked to forecasts and plans, but this does not
always occur. Too often, goals are set without any plan for how to
achieve them, and no forecasts for whether
they are realistic.

Planning is a response to forecasts and


goals. Planning involves determining the appropriate actions that are
required to make your forecasts match your
goals.

Short-term forecasts are needed for the


scheduling of personnel, production and transportation. As part of
the
scheduling process, forecasts of demand are often also required

Medium-term forecasts are needed to


determine future resource requirements, in order to purchase raw
materials, hire personnel, or buy machinery and equipment.

Long-term forecasts are used in strategic


planning. Such decisions must take account of market
opportunities,
environmental factors and internal resources.

Examples of time series data include:

Annual Google profits

Quarterly sales results for Amazon


Monthly rainfall

Weekly retail sales

Daily IBM stock prices

Hourly electricity demand

1.3 Forecasting
Steps
Predictor variables and time series forecasting

Predictor variables are often useful in time series forecasting. For


example, suppose we wish to
forecast the hourly electricity demand (ED)
of a hot region during the summer period. A model with
predictor
variables might be of the form

ED=f (current temperature, strength of economy, population,time of


day, day of week, error).

The relationship is not exact — there will always be changes in


electricity demand that cannot be
accounted for by the predictor
variables. The “error” term on the right allows for random variation and
the effects of relevant variables that are not included in the model. We
call this an explanatory model
because it helps explain
what causes the variation in electricity demand.

Because the electricity demand data form a time series, we could also
use a time series model for
forecasting. In this case, a suitable time
series forecasting equation is of the form

EDt+1=f(EDt,EDt−1,EDt−2,EDt−3,…,error) {t is the present hour, t+1 is


the next hour, t−1 is the
previous hour and so on. }

Here, prediction of the future is based on past values of a variable,


but not on external variables which
may affect the system. Again, the
“error” term on the right allows for random variation and the effects
of
relevant variables that are not included in the model.

Forecasting can be accomplished in “Eight Steps”:

1. Set the forecasting goal

2. Collect data

3. Explore data

4. Set the forecasting horizon

5. Choose a forecasting technique

6. Apply the technique and make forecasts

7. Evaluate the forecast performance

8. Adapt or change the forecasting model

Let dwell a bit deeper into these steps

Set the forecasting Goal


In this step we have to determine why we want to make forecasts?.This
step requires an understanding
of the way the forecasts will be used,
who requires the forecasts, and how the forecasting function fits
within
the organisation requiring the forecasts

There are two main goals of using time series data.

Predictive goals {Time Series Forecasting} and Descriptive


goals {Time Series Analysis}.

Time Series Forecasting aims to predict or estimate


future values using past values.The purpose of
forecasting is to answer
questions like :

1. How will our company sales look in the next few months?

2. What will the oil price be next week?

3. How many paying customers will we have the next month?

4. How will the Apple stock price will change in the next
days?

Time Series Analysis tries to describe the series by


determining its components, trends, cycles, seasonal
patterns,
relationship with other data, etc. Here the purpose is
not forecasting,but decision making.

For example, a retail company could analyze the evolution of its


merchandise inventory over the last
years to determine the most
appropriate supply policies and avoid unwanted situations like stock out
or excess inventory.

A passenger transportation company could investigate the ridership


patterns in different periods of the
year for planning the number and
capacity of the vehicles to be used.In this case, there is no interest
for
making predictions about the future ridership levels.

Data Collection

Most of the data can be available from internal sources, various


departments of the organization like
sales, accounting, etc., for
example. Sales volume, ridership, monthly profits and so on.

For some other data like inflation rate, oil prices, market demand,
market shares, etc. it might be
necessary to appeal to external
sources.

Two kinds of information required (a) statistical data, and (b) the
accumulated expertise of the people
who collect the data and use the
forecasts. Often, it will be difficult to obtain enough historical data
to
be able to fit a good statistical model. In that case, the judgmental
forecasting methods can be used.
Occasionally, old data will be less
useful due to structural changes in the system being forecast; then we
may choose to use only the most recent data.

Data quality is essential for time series


forecasting because the sample is small in general, only a
few
hundred data points.

The factor that can negatively affect data quality are


measurement, inaccuracy, data entry errors.

Missing data Corrupted data errors appear during the time of writing,
reading, storage, transmission,or
processing data. {We are talking about
data in electronic format.}

Data collection is not a one time endeavor, but an ongoing activity


because new data appear over time
and they must be gathered and recorded
on regular on a regular basis.
Exploring Data

Data exploration is mainly based on visualization, so we must always


chart start by charting our data.

So this way we can (1) See the general trend (2) Detect cycles and
seasonal patterns (3) Identify
outliers.

Various tools like Data Decomposition can be used at this step.

We start by graphing the data and answering the following questions


Are there consistent patterns?

Is there a significant trend?

Is seasonality important? Is there evidence of the presence of


business cycles?

Are there any outliers in the data that need to be explained by


those with expert knowledge?

How strong are the relationships among the variables available


for analysis?

Setting the Forecast Horizon

The Forecast Horizon is defined as the number of


time periods between the current period and the
date of a future
forecast. For example, for the case of monthly data, if the current
period is month T,
then a forecast of sales for month T+3 has a forecast
horizon of three steps. For quarterly data, a step is
one quarter (three
months), but for annual data, one step is one year (twelve months). The
forecast
changes with the forecast horizon. The choice of the best and
most appropriate forecasting models and
strategy usually depends on the
forecasting horizon

Long-term forecasting is often times useful. However, the forecast


accuracy will probably diminish if we
go far into the future. If
long-term forecasting is absolutely necessary, it is highly recommended
to
review our forecasting model on a regular basis by including fresh
information as we collect new data.

For example, if we make a 12 month sales forecast, it is advised to


update our model and our forecasts
every month as soon as the sales
volume for the last month becomes known.

Choose the Forecasting Technique

Depending on the data pattern, we can pick either a Data


Driven forecasting method or a Model Based
forecasting method.

Data Driven forecasting methods learn patterns from the data


themselves whilst Model Based
forecasting methods use a statistical or
mathematical method to estimate the series values.

If there is a need for one time forecasting, in-house expertise is


available, smaller number of series
exist, typically model based methods
are used and these are typical “manual”. In the other hand, if there
is
ongoing forecasting, no in-house expertise available, many series to
forecast etc., then typically data
driven methods are used and these are
“automated” and computationally fast. Ensembles are often
used by
combining forecasts from different methods
In the case of Model Based forecasting , the model to use depends on
the availability of historical data,
the strength of relationships
between the forecast variable and any explanatory variables, and the way
in which the forecasts are to be used.These models include regression
models , exponential smoothing
methods , Box-Jenkins ARIMA models and
several advanced methods including neural networks and
vector
auto-regression.

Evaluating Forecasting Models

After deciding on which alternative methods are suitable for


available data, the next step is to evaluate
how well each method
performs in forecasting the time series.

Measures such as R2 and the sign and magnitude of the regression


coefficients will help provide a
general assessment of our models.
However, for forecasting, an examination of the error terms from
the
model is usually the best strategy for assessing performance.

First, each method is used to forecast the data series. Second, the
forecast from each method is
evaluated to see how well it fits relative
to the actual historical data.

Forecast fit is based on taking the difference between individual


forecast and the actual value. This
exercise produces the forecast
errors. Instead of examining individual forecast errors, it is
preferable
and much easier to evaluate a single measurement of overall
forecast error for the entire data under
analysis. Error (et) on
individual forecast, the difference between the actual value and the
forecast of
that value, is given as:

et = Yt - Ft

Where:

et = the error of the forecast (Accuracy)

Yt = the actual value

Ft = the forecast value

A number of methods have been developed to help in assessing the


error of the forecast i.e. accuracy of
forecasts some of which include
Root Mean Square Error (RMSE) , Mean Absolute Error (MAE) ,
Akaike’s
Information Criterion (AIC) , Bayesian Information Criterion (BIC) and
so on…

The best forecast model is that with the smallest overall error
measurement value. The choice of which
error criteria are appropriate
depends on the forecaster’s business goals, knowledge of data, and
personal preferences.
Key Notations

Notation Description

t = 1, 2, 3… Time periods. These periods can be hours, days, weeks, months,


quarters,
semesters, years…

y1, y2, y3,, …, yn The time series values, measured over n periods

Ft Forecasted (estimated) value for the period t

Ft+k The k-step ahead forecast when the current period is t

et The forecast error for the period t, which is the difference between
the
actual series value and the forecasted value: y t - Ft

2 Exploratory TS Data
Analysis
In this section , we will gain insights on how to organize and
visualize time series data in R. We will learn
several simplifying
assumptions that are widely used in time series analysis, and common
characteristics of financial time series.

2.1 Exploring raw


time series
The most common first step when conducting time series analysis is to
display your time series dataset
in a visually intuitive format. The
most useful way to view raw time series data in R is to use the print()
(https://www.rdocumentation.org/packages/base/versions/3.3.1/topics/print)
command, which
displays the Start , End , and
Frequency of your data along with the observations.

Another useful command for viewing time series data in R is the length()
(https://www.rdocumentation.org/packages/base/versions/3.3.1/topics/length)
function, which tells
you the total number of observations in your
data.

Some datasets are very long, and previewing a subset of data is more
suitable than displaying the entire
series. The
head(___, n =___) and tail(___, n =___)
functions, in which n is the number of items
to display,
focus on the first and last few elements of a given dataset
respectively.

Let us explore the famous River Nile annual streamflow data,


Nile . This time series data-set includes
some metadata
information. When calling print(Nile) , note that
Start = 1871 indicates that 1871
is the year of the first
annual observation, and End = 1970 indicates 1970 is the
year of the last annual
observation.

Measurements of the annual flow of the river Nile at Aswan (formerly


Assuan) in 10^8 m^3

Note : This is one of the standard data-sets accompanying R and can


be called directly.

Use the print() function to display the River Nile data. The data
object is called Nile

Hide
# Print the Nile datase

print(Nile)

## Time Series:

## Start = 1871

## End = 1970

## Frequency = 1

## [1] 1120 1160 963 1210 1160 1160 813 1230 1370 1140 995 935 1110 994 1020

## [16] 960 1180 799 958 1140 1100 1210 1150 1250 1260 1220 1030 1100 774 840

## [31] 874 694 940 833 701 916 692 1020 1050 969 831 726 456 824 702

## [46] 1120 1100 832 764 821 768 845 864 862 698 845 744 796 1040 759

## [61] 781 865 845 944 984 897 822 1010 771 676 649 846 812 742 801

## [76] 1040 860 874 848 890 744 749 838 1050 918 986 797 923 975 815

## [91] 1020 906 901 1170 912 746 919 718 714 740

Use the length() function to identify the number of


elements in your Nile dataset

Hide

# List the number of observations in the Nile dataset

length(Nile)

## [1] 100

Use head() to display the first 10 elements of the Nile


data-set. To do so, set the n argument equal to
10

Hide

# Display the first 10 elements of the Nile dataset

head(Nile , n = 10)

## [1] 1120 1160 963 1210 1160 1160 813 1230 1370 1140

Use tail() to display the last 12 elements of the Nile


data-set, again setting an appropriate value to
the n
argument

Hide

# Display the last 12 elements of the Nile dataset

tail(Nile , n =12)

## [1] 975 815 1020 906 901 1170 912 746 919 718 714 740
2.2 Basic time
series plots
While simple commands such as print() ,
length() , head() , and tail()
provide crucial information
about your time series data, another very
useful way to explore any data is to generate a plot.

In this exercise, we will plot the River Nile annual stream flow data
using the plot()
(https://www.rdocumentation.org/packages/graphics/versions/3.3.1/topics/plot)
function. For time
series data objects such as Nile , a
Time index for the horizontal axis is typically included.
From the
previous exercise, you know that this data spans from 1871 to
1970, and horizontal tick marks are
labeled as such. The default label
of "Time" is not very informative. Since these data are
annual
measurements, you should use the label "Year" . While
we’re at it, we should change the vertical axis
label to
"River Volume (1e9 m^{3})" .

Additionally, it helps to have an informative title, which can be set


using the argument main . For your
purposes, a useful title
for this figure would be “Annual River Nile Volume at Aswan,
1871-1970”.

Finally, the default plotting type for time series


objects is "l" for line. Connecting consecutive
observations can help make a time series plot more interpretable.
Sometimes it is also useful to include
both the observations points as
well as the lines, and we instead use "b" for both.

Use plot() to display the Nile data-set

Hide

# Plot the Nile data

plot(Nile)
Use a second call to plot() to display the data, but add
the additional arguments: xlab = "Year" ,
ylab = "River Volume (1e9 m^{3})" .

Hide

# Plot the Nile data with xlab and ylab arguments

plot(Nile, xlab = "Year", ylab = "River Volume (1e9 m^{3})")


Use a third call to plot() with your Nile data, but this
time also add a title and include observation
points in the figure by
specifying the following arguments:
main = "Annual River Nile Volume at Aswan, 1871-1970" ,
type ="b" .

Hide

# Plot the Nile data with xlab, ylab, main, and type arguments

plot(Nile ,

xlab = "Year",

ylab = "River Volume (1e9 m^{3})",

main = "Annual River Nile Volume at Aswan, 1871-1970", type = "b")


2.3 Sampling
Frequency
Sampling frequency: exact : Some time series data is
exactly evenly spaced. For example, hourly
temperature measurements for
every hour in a day.

Sampling frequency: approximate :Some time series


data is only approximately evenly spaced. For
example, temperature
measurements recorded every time you check your email.

Sampling frequency: missing values :Some time series


data is evenly spaced, but with missing values.
For example, hourly
temperature measurements while you are awake

Basic assumptions : The analysis of time series data


proceeds with some simplifying assumptions:

1. The first assumption is that consecutive observations are equally


spaced.
2. Secondly, a discrete-time observation index is applied. In practice,
this may only hold
approximately, and sometimes data may be missing. For
example, daily log returns on a stock may
only be available for
weekdays, and data may not be available for certain holidays. Monthly
CPI
values are equally spaced by month, but not by days.

Identifying the sampling frequency : In addition to


viewing our data and plotting over time, there are
several additional
operations that can be performed on time series data-sets.

The start() and end() functions return the


time index of the first and last observations, respectively.
The time() function calculates a vector of time indices,
with one element for each time index on which
the series was
observed.

The deltat() function returns the fixed time interval


between observations

The frequency() function returns the number of


observations per unit time.

The cycle() function returns the position in the cycle


of each observation.

We will be applying these functions to the AirPassengers


data-set, which reports the monthly total
international airline
passengers (in thousands) from 1949 to 1960.

Let us begin by plotting the AirPassengers data using a


simple call to plot()

Hide

# Plot AirPassengers

plot(AirPassengers)

Now let us list the first and last time observations in


AirPassengers using start() and
end() ,
respectively.

Hide
# View the start and end dates of AirPassengers

start(AirPassengers)

## [1] 1949 1

Hide

end(AirPassengers)

## [1] 1960 12

Time Series Starts on Jan-1949 and Ends on


Dec-1960

Now let us gain some additional insight into this data-set by using
the time() , deltat() ,
frequency() , and cycle() commands
AirPassengers

Hide

# Use deltat(), frequency(), and cycle() with AirPassengers

frequency(AirPassengers)

## [1] 12

Hide

deltat(AirPassengers)

## [1] 0.08333333

Hide

cycle(AirPassengers)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

## 1949 1 2 3 4 5 6 7 8 9 10 11 12

## 1950 1 2 3 4 5 6 7 8 9 10 11 12

## 1951 1 2 3 4 5 6 7 8 9 10 11 12

## 1952 1 2 3 4 5 6 7 8 9 10 11 12

## 1953 1 2 3 4 5 6 7 8 9 10 11 12

## 1954 1 2 3 4 5 6 7 8 9 10 11 12

## 1955 1 2 3 4 5 6 7 8 9 10 11 12

## 1956 1 2 3 4 5 6 7 8 9 10 11 12

## 1957 1 2 3 4 5 6 7 8 9 10 11 12

## 1958 1 2 3 4 5 6 7 8 9 10 11 12

## 1959 1 2 3 4 5 6 7 8 9 10 11 12

## 1960 1 2 3 4 5 6 7 8 9 10 11 12

Frequency : Monthly , indicated by 12

Fixed Time Interval between observations : 0.083 i.e. 1/12


implying Monthly

Cycle : Monthly , indicated by Month Numbers every Year from


1949 to 1960

The sampling frequency is often only approximate and the interval


between observations is not quite a
fixed unit. For example, there are
usually 365 days in a year based on the Gregorian calendar. However,
(almost) every four years there are 366 days (leap years). This
compensates for the fact that the Earth
completes a rotation around Sol,
the sun, in approximately 365.2422 days, on average.

As a simplifying assumption, we often ignore these small


discrepancies and proceed as though the
sampling frequency and
observation intervals are fixed constants. Typically, our results will
not be
sensitive to approximation when the underlying process is not
changing too quickly.

For example : Sampling Frequency would be exact in case of hourly


observations recorded each hour
for several days since there are always
the exact same number of hours in each day without exception.

2.4 Missing
values
Sometimes there are missing values in time series data, denoted
NA in R, and it is useful to know their
locations. It is
also important to know how missing values are handled by various R
functions.
Sometimes we may want to ignore any missingness, but other
times we may wish to impute or estimate
the missing values.

Let’s again consider the monthly AirPassengers dataset,


but now the data for the year 1956 are
missing. We will explore the
implications of this missing data and impute some new data to solve the
problem.

We will modify the AirPassengers dataset to introduce missing values


for the year 1956

Hide

# Create a new DataFrame with copies of original data

my_Airpassengers = data.frame(AirPassengers)
Hide

# Convert the DataFrame to a Time Series Object

# Frequency is Annual i.e. 12 and Start Year is 1949

df <- ts(my_Airpassengers, frequency = 12, start = 1949)

Hide

# Introduce Missing values by replacing with NA

df2 <- replace(df , 85:96 , "NA")

Hide

# Inspect

str(df2)

## Time-Series [1:144, 1] from 1949 to 1961: 112 118 132 129 ...

## - attr(*, "dimnames")=List of 2

## ..$ : NULL

## ..$ : chr "AirPassengers"

Let us plot the Time Series

Hide

# Plot New Air Passengers

plot(df2)

## Warning in xy.coords(x = matrix(rep.int(tx, k), ncol = k), y = x, log = log, :

## NAs introduced by coercion

## Warning in xy.coords(x, y): NAs introduced by coercion


The mean() function calculates the sample mean, but it
fails in the presence of any NA values. We will
use
mean(___, na.rm = TRUE) to calculate the mean with all
missing values removed

Hide

# Compute the mean of AirPassengers

mean(AirPassengers , na.rm=TRUE)

## [1] 280.2986

It is common to replace missing values with the mean of the observed


values

Hide

# Impute mean values to NA in New AirPassengers

df2[85:96] <- mean(AirPassengers, na.rm = TRUE)

Let us now plot the New Air Passengers Time Series

Hide
# Generate another plot of New AirPassengers

plot(df2,

main = "Annual Air Passenger Volume - Missing Imputed with Mean")

Let us now overlay the “Original Plot and the Plot with Missing
Imputed by Mean”

Hide

# Plot Time Series with Missing Imputed

plot(df2,

main = "Annual Air Passenger Volume - Missing Imputed v/s Original")

# Overlay the Original Time Series

points(AirPassengers, type = "l", col = 2, lty = 3)


Based on our plot, it seems that simple data imputation using the
mean is not a great method to
approximate what’s really going on in the
AirPassengers data

3 Time Series Object


3.1 Creating a TS
Object
A time series is more than a vector of numbers, it also includes the
time indices for each observation.
Given a vector of numbers you can
apply the ts() function to create a time series object. Such objects
are
of the `ts` class. They represent data that is at least approximately
evenly spaced over time. If you
want the time series to start in the
year 2001 with 1 observation per year you should apply the ts()
function
with the additional arguments start = 2001 and frequency = 1 You can use
the function is.ts()
to check whether a given object is a time
series.

The advantage of creating and working with time series objects of the
ts class is that many methods
are available for utilizing
time series attributes, such as time index information. For example, as
you’ve
seen in earlier exercises, calling plot() on a
ts object will automatically generate a plot over
time.

The function ts() (https://www.rdocumentation.org/packages/stats/versions/3.3.1/topics/ts)


can be
applied to create time series objects. A time series object is a
vector (univariate) or matrix
(multivariate) with additional attributes,
including time indices for each observation, the sampling
frequency and
time increment between observations, and the cycle length for periodic
data. Such
objects are of the ts class, and represent data
that has been observed at (approximately) equally
spaced time points

The value of the frequency parameter in the ts()


function decides the time intervals at which the data
points are
measured. A value of 12 indicates that the time series is for 12 months.
Other values and its
meaning is as below −

frequency = 12 pegs the data points for every


month of a year.

frequency = 4 pegs the data points for every


quarter of a year.

frequency = 6 pegs the data points for every 10


minutes of an hour.

frequency = 24*6 pegs the data points for every


10 minutes of a day.

frequency = 52 pegs the data points for every


week for a 52 week period

We will familiarize ourself with the ts class by


encoding some time series data (saved as
data_vector ) into
ts and exploring the result. Our time series
data_vector starts in the year 2004
and has 4 observations
per year (i.e. it is quarterly data).

Hide

# create data vector

data_vector = c(2.0521941073,4.2928852797,3.3294132944,3.5085950670,0.0009576938,

1.9217186345,0.7978134128,0.2999543435,0.9435687536,0.5748283388,

-0.0034005903,0.3448649176,2.2229761136,0.1763144576,2.7097622770,

1.2501948965,-0.4007164754,0.8852732121,-1.5852420266,-2.2829278891,

-2.5609531290,-3.1259963754,-2.8660295895,-1.7847009207,-1.8894912908,

-2.7255351194,-2.1033141800,-0.0174256893,-0.3613204151,-2.9008403327,

-3.2847440927,-2.8684594718,-1.9505074437,-4.8801892525,-3.2634605353,

-1.6396062522,-3.3012575840,-2.6331245433,-1.7058354022,-2.2119825061,

-0.5170595186,0.0752508095,-0.8406994716,-1.4022683487,-0.1382114230,

-1.4065954703,-2.3046941055,1.5073891432,0.7118679477,-1.1300519022)

Hide

# print data vector

print(data_vector)
## [1] 2.0521941073 4.2928852797 3.3294132944 3.5085950670 0.0009576938

## [6] 1.9217186345 0.7978134128 0.2999543435 0.9435687536 0.5748283388

## [11] -0.0034005903 0.3448649176 2.2229761136 0.1763144576 2.7097622770

## [16] 1.2501948965 -0.4007164754 0.8852732121 -1.5852420266 -2.2829278891

## [21] -2.5609531290 -3.1259963754 -2.8660295895 -1.7847009207 -1.8894912908

## [26] -2.7255351194 -2.1033141800 -0.0174256893 -0.3613204151 -2.9008403327

## [31] -3.2847440927 -2.8684594718 -1.9505074437 -4.8801892525 -3.2634605353

## [36] -1.6396062522 -3.3012575840 -2.6331245433 -1.7058354022 -2.2119825061

## [41] -0.5170595186 0.0752508095 -0.8406994716 -1.4022683487 -0.1382114230

## [46] -1.4065954703 -2.3046941055 1.5073891432 0.7118679477 -1.1300519022

Hide

# plot data vector

plot(data_vector)

R has automatically added the Indices (Horizontal Axis Points) to the


Data Vector… These are not
“Time”.

Let us now convert the Data Vector to a Time Series Object , set the
start argument equal to 2004
and the
frequency argument equal to 4 . Assign the
result to time_series

Hide
# Convert data_vector to a ts object with start = 2004 and frequency = 4

time_series = ts(data_vector ,

frequency = 4 ,

start = 2004)

Let us Print and Plot the Time Series

Hide

# print the time series

print(time_series)

## Qtr1 Qtr2 Qtr3 Qtr4

## 2004 2.0521941073 4.2928852797 3.3294132944 3.5085950670

## 2005 0.0009576938 1.9217186345 0.7978134128 0.2999543435

## 2006 0.9435687536 0.5748283388 -0.0034005903 0.3448649176

## 2007 2.2229761136 0.1763144576 2.7097622770 1.2501948965

## 2008 -0.4007164754 0.8852732121 -1.5852420266 -2.2829278891

## 2009 -2.5609531290 -3.1259963754 -2.8660295895 -1.7847009207

## 2010 -1.8894912908 -2.7255351194 -2.1033141800 -0.0174256893

## 2011 -0.3613204151 -2.9008403327 -3.2847440927 -2.8684594718

## 2012 -1.9505074437 -4.8801892525 -3.2634605353 -1.6396062522

## 2013 -3.3012575840 -2.6331245433 -1.7058354022 -2.2119825061

## 2014 -0.5170595186 0.0752508095 -0.8406994716 -1.4022683487

## 2015 -0.1382114230 -1.4065954703 -2.3046941055 1.5073891432

## 2016 0.7118679477 -1.1300519022

Hide

# plot the time series

plot(time_series)
3.2 Validating if an
Object is a TS Object
As you can see, ts objects are treated differently by
commands such as print() and plot() . For
example, automatic use of the time-index in your calls to
plot() requires a ts object

When you work to create your own datasets, you can build them as
ts objects. Recall the dataset
data_vector
previously created, which was just a vector of numbers, and
time_series , the ts object
you created from
data_vector using the ts() function and
information regarding the start time and
the observation frequency. As a
reminder, data_vector and time_series are
shown in the plot on the
right.

When you use datasets from others, such as those included in an R


package, you can check whether
they are ts objects using
the is.ts()
(https://www.rdocumentation.org/packages/stats/versions/3.3.1/topics/ts)
command. The result of the
test is either TRUE when the
data is of the ts class, or FALSE if it is
not.

Let us use is.ts() on the data_vector and


time_series objects from the previous exercise.

Let us use another call to is.ts() to check the class of


the Nile dataset used earlier.

Let us use another call to is.ts() on the


AirPassengers dataset.

Hide
# Check whether data_vector and time_series are ts objects

is.ts(data_vector)

## [1] FALSE

Hide

is.ts(time_series)

## [1] TRUE

Hide

# Check whether Nile is a ts object

is.ts(Nile)

## [1] TRUE

Hide

# Check whether AirPassengers is a ts object

is.ts(AirPassengers)

## [1] TRUE

We can see, the Nile and AirPassengers


datasets we worked with earlier are both encoded as ts
objects.

3.3 Plotting a Time


Series Object
It is often very useful to plot data we are analyzing, as is the case
when conducting time series analysis.
If the dataset under study is of
the ts class, then the plot() function has
methods that automatically
incorporate time index information into a
figure.

Let’s consider the eu_stocks dataset (available in R by


default as EuStockMarkets ). This dataset
contains daily
closing prices of major European stock indices from 1991-1998,
specifically, from
Germany ( DAX ), Switzerland
( SMI ), France ( CAC ), and the UK
( FTSE ). The data were observed when the
markets were open, so there are no observations on weekends and
holidays. We will proceed with the
approximation that
this dataset has evenly spaced observations and is a four dimensional
time series.

Use is.ts() to check whether eu_stocks


is a ts object.

View the start, end, and frequency of eu_stocks


using the start() , end() , and
frequency()
functions, respectively.
Generate a simple plot of your eu_stocks data using
the plot() command.

Generate a more complex time series plot of your


eu_stocks data using the ts.plot()
command.
Input the eu_stocks dataset into the pre-written code, but
leave the other arguments
as they are.

Hide

# Check whether eu_stocks is a ts object

is.ts(EuStockMarkets)

## [1] TRUE

Hide

# View the start, end, and frequency of eu_stocks

start(EuStockMarkets)

## [1] 1991 130

Hide

end(EuStockMarkets)

## [1] 1998 169

Hide

frequency(EuStockMarkets)

## [1] 260

Start : 130th Business Day of 1991 ; End : 169th Business Day of 1998
; Frequency

Hide

# Generate a simple plot of eu_stocks

plot(EuStockMarkets)
Hide

# Use ts.plot with eu_stocks

ts.plot(EuStockMarkets,

col = 1:4,

xlab = "Year",

ylab = "Index Value",

main = "Major European Stock Indices, 1991-1998")

legend("topleft", colnames(EuStockMarkets), lty = 1, col = 1:4, bty = "n")


4 Spotting Trends in Time
Series
No Trend

Some time series do not exhibit any clear trends over time as seems
to be the case for figures A and B
Linear Trend

Here are examples of series with Linear Trends over time. On the
left, you see an Upward trend, and on
the right, a Downward trend.

Rapid Growth
Upward trends may be increasing more quickly than linear. Figures A
and B are two examples of Rapid
Growth Trends over time. Rapid decay is
also a possibility, but it is not as common in most applications

Periodic Trend

Some series can exhibit Periodic or Sinusoidal Trends over time. In


figure A you see a periodic series
with a cycle length of about 75
observations. In figure B the series oscillates more quickly and the
cycle
length is much smaller.

Variance in Trends
Time series can also exhibit trends in variability. Figures A and B
both show examples of series with
Increasing Variance Trends over time

4.1 Removing Trends :


Logarithmic Transformation
The logarithmic function log()
(https://www.rdocumentation.org/packages/base/versions/3.3.1/topics/log)
is a data transformation
that can be applied to positively valued time
series data. It slightly shrinks observations that are greater
than one
towards zero, while greatly shrinking very large observations. This
property can stabilize
variability when a series exhibits increasing
variability over time. It may also be used to linearize a rapid
growth
pattern over time.

Let us create a Time Series with Rapid Growth and Visualize it

Hide

# create a data frame with rapid growth data

rapid_growth = read.csv("rapid_growth.csv")

head(rapid_growth)

## Values

## 1 505.9547

## 2 447.3556

## 3 542.5831

## 4 516.0634

## 5 506.9599

## 6 535.0162
Hide

# convert to time series object

rapid_growth_ts = ts(rapid_growth , start=1 , frequency = 1)

plot(rapid_growth_ts)

Let us apply the log() function to


rapid_growth_ts , saving the result as
linear_growth .

Hide

# Log rapid_growth

linear_growth <- log(rapid_growth_ts)

Now let us use ts.plot() to show the transformed series


linear_growth and note the condensed
vertical range of the
transformed data.

Hide

# Plot linear_growth using ts.plot(

ts.plot(linear_growth)
We see, that logarithmic transformation helps stabilize our data by
inducing linear growth over time

4.2 Removing
trends in level by differencing
The first difference transformation of a time series z[t] consists of
the differences (changes) between
successive observations over time,
that is z[t]−z[t−1].

Differencing a time series can remove a time trend. The function diff()
(https://www.rdocumentation.org/packages/base/versions/3.3.1/topics/diff)
will calculate the first
difference or change series. A difference
series lets you examine the increments or changes in a given
time
series. It always has one fewer observation than the original
series.

Let us create a Time Series with Linear Trend and Visualize it

Hide

# create a data frame with rapid growth data

linear_trend = read.csv("level_differencing.csv")

# convert to time series

linear_trend_ts = ts(linear_trend , start = 1 , frequency = 1)

# plot the time series

plot(linear_trend_ts)
Let us now apply the diff() function to the
linear_trend_ts and store it in another variable

Hide

# Generate the first difference

linear_trend_diff_ts = diff(linear_trend_ts)

Let us use ts.plot() to view a time series plot of the


transformed series

Hide

# Plot the transformed Time Series

ts.plot(linear_trend_diff_ts)
Let us now examine the lengths of the two time series

Hide

# length of original time series

length(linear_trend_ts)

## [1] 104

Hide

# length of transformed time series

length(linear_trend_diff_ts)

## [1] 103

By removing the long-term time trend, we can view the amount of


change from one observation to the
next

4.3 Removing
seasonal trends with seasonal
differencing
For time series exhibiting seasonal trends, seasonal differencing can
be applied to remove these
periodic patterns. For example, monthly data
may exhibit a strong twelve month pattern. In such
situations, changes
in behavior from year to year may be of more interest than changes from
month to
month, which may largely follow the overall seasonal
pattern.

The function diff(..., lag = s) will calculate the lag


s difference or length s seasonal change
series. For monthly or quarterly data, an appropriate value of
s would be 12 or 4, respectively. The
diff()
function has lag = 1 as its default for first differencing.
Similar to before, a seasonally
differenced series will have
s fewer observations than the original series.

Let us create a Time Series with values ranging below -10 to above
+10 and a quarterly seasonality

Hide

# create the data frame

seasonal = read.csv("seasonal_ts.csv")

head(seasonal)

## Values

## 1 -4.198033

## 2 9.569009

## 3 5.175143

## 4 -9.691646

## 5 -3.215294

## 6 10.843669

Hide

# convert to time series and plot it

seasonal_ts = ts(seasonal , frequency = 4)

plot(seasonal_ts)
Now let us apply the diff(..., lag = 4) function to the
time series , saving the result as dx

We will use ts.plot() to show the transformed series


dx and note the condensed vertical range of
the transformed
data

Hide

# Generate a diff of x with lag = 4. Save this to dx

dx = diff(seasonal_ts , lag = 4)

# Plot dx

ts.plot(dx)
Notice how differencing allows us to remove the longer-term time
trend - in this case, seasonal
volatility - and focus on the change from
one period to another

You might also like