Time Series Modeling: Shouvik Mani April 5, 2018
Time Series Modeling: Shouvik Mani April 5, 2018
Shouvik Mani
April 5, 2018
Forecasting
Outline
Properties of time series data
Forecasting
What is a time series?
A time series is a sequence of observations over time.
Notation: We have observations 𝑋" , … , 𝑋% , where 𝑋& denotes the observation at time 𝑡
In this lecture, we will consider time series with observations at equally-spaced times
(not always the case, e.g. point processes).
Dependent Observations
Each observation in a time series is dependent on all other observations.
Why is this important? Most statistical models assume that individual observations
are independent. But this assumption does not hold for time series data.
Analysis of time series data must take into account the time order of the data.
Trend and Seasonality
Many time series display trends and seasonal effects.
The season (or period) is the length of the cycle (e.g. an annual season).
More formally, a time series is stationary if 𝑋":) and 𝑋&*)+" have the same distribution,
for all 𝑘 and 𝑡. (Every section of length 𝑘 has the same distribution of values).
Stationarity
Is this time series stationary?
Forecasting
Applications of Time Series
A few applications of time series data:
• Description
• Explanation
• Control
• Forecasting
Application: description
Can we identify and measure the trends, seasonal effects, and outliers in the series?
Trend
component
Seasonal
component
Original Series
Application: explanation
Can we use one time series to explain/predict values in another series?
Model using linear systems: convert one series to another using linear operations.
Application: control
Can we identify when a time series is deviating away from a target?
Upper limit
Metric Target
Lower limit
time
• Explanation
• Control
• Forecasting
Daily observations of atmospheric CO2 concentrations since 1958 at the Mauna Loa
Observatory in Hawaii.
Example: Keeling Curve
Forecasting
Time plot
The first thing you should do in any time series analysis is plot the data.
plt.plot(df['date'], df['CO2'])
plt.xlabel('Date', fontsize=12)
plt.ylabel('CO2 Concentration (ppm)', fontsize=12)
plt.title('Keeling Curve: 1990 - Present', fontsize=14)
)
1
𝑋& = 0 𝑋&*2
2𝑘
23+)
Measuring the trend
Implementing the moving average is easy.
moving_avg = df['CO2'].rolling(12).mean()
fig = plt.figure(figsize=(12,6))
plt.plot(moving_avg.index, moving_avg)
plt.xlabel('Date', fontsize=12)
plt.ylabel('CO2 Concentration (ppm)', fontsize=12)
plt.title('Trend of Keeling Curve: 1990 - Present', fontsize=14)
Removing the trend
We can also remove the trend by first-order differencing.
𝑋′& = X 6 − X 6+"
Forecasting
Forecasting
?
Can we predict future values of the Keeling curve using observed values?
Forecasting
Now, we will introduce a class of linear models called the ARIMA models, which can
be used for time series forecasting.
There are several variants of ARIMA models, and they build on each other.
AR(p)
ARIMA(p,d,q) SARIMA(p,d,q)(P,D,Q)
MA(p)
This is the same as doing linear regression with lagged features. For example, this is
how you would set up your dataset to fit an autoregressive model with 𝑝 = 2:
t Xt
Xt-2 Xt-1 Xt
1 400
2 500
400 500 300
3 300
500 300 100
4 100
300 100 200
5 200
Moving Average Model: MA
A moving average model predicts the response 𝑋& using a linear combination of past
forecast errors.
𝑋& = 𝛽; + 𝛽" 𝜖&+" + 𝛽= 𝜖&+= + … + 𝛽A 𝜖&+A
Parameterized by 𝑞, the number of past errors to include. The predictions 𝑋& can be
the weighted moving average of past forecast errors.
AutoRegressive Integrated Moving Average
Model: ARIMA
Combining a autoregressive (AR) and moving average (MA) model, we get the ARIMA
model.
𝑋′& = 𝜃; + 𝜃" 𝑋&+" + 𝜃= 𝑋&+= + … + 𝜃> 𝑋&+>
Note that now we are regressing on 𝑋′& , which is the differenced series 𝑋& . The order
of difference is determined by the the parameter 𝑑. For example, if 𝑑 = 1:
𝑋′& = X 6 − X 6+" for t = 2, 3, … , N
So the ARIMA model is parameterized by: p (order of the AR part), q (order of the MA
part), and d (degree of differencing).
Seasonal ARIMA: SARIMA
Extension of ARIMA to model seasonal data.
Includes a non-seasonal part (same as ARIMA) and a seasonal part. The seasonal
part is similar to ARIMA, but involves backshifts of the seasonal period.
In total, 6 parameters:
• (p, d, q) for non-seasonal part
• In practice, just do grid search over the (p, q) and (P, Q) values to find the
parameters that optimize performance (usually minimize AIC).
Implementing an ARIMA model
?
Lets fit an SARIMA model to the Keeling curve to forecast future values.
Implementing an ARIMA model
Dataframe contains variable CO2, which we want to predict
df.head()
Implementing an ARIMA model
(p, d, q)
Fit SARIMA model using StatsModels library.
(P, D, Q, m)
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(df['CO2'],
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12))
result = model.fit()
print(result.summary().tables[1])
Implementing an ARIMA model
Generating point forecasts and confidence intervals 100 time steps into the future.
pred = result.get_forecast(steps=100)
pred_point = pred.predicted_mean
pred_ci = pred.conf_int(alpha=0.01)