The SARIMAX Model
Full Name: Short Description:
The Seasonal Autoregressive Integrated Moving Average The SARIMAX is the seasonal equivalent of the
eXogenous Model ARIMAX model. Of course, there exist seasonal
versions of the other models as well (SARMA,
Mathematical Notation: SARIMA, SARMAX, etc.).
Seasonal models help capture patterns which
aren’t ever-present but appear periodically. For
example, the amount of flights leaving an
international hub like JFK Airport in NYC are far
larger in December compared to October.
That is mainly due to the festive period for
many countries in December. Thus, October is
far less busy. Therefore, we need a way to
account for this expected influx of demand in
December and we can do so by checking the
values in December of the previous year.
The SARIMAX Model
Equivalents of the SARIMAX:
The SARIMAX is among the most-complicated
models we can have, since it can incorporate
seasonality, integration and/or exogenous
variables.
However, it doesn’t have to.
By setting the values of certain orders to 0, or
by not providing certain information, the model
can be simplified.
For instance, by not including exogenous
variables and having no integration, the model
automatically becomes equivalent to a SARMA.
The equation on the left is exactly that - a
SARIMAX equivalent of a SARMA.
The SARIMAX Model
The Original Equation:
So, a seasonal model has 7 orders split into two parts – seasonal vs nonseasonal: SARIMAX (p, d, q) (P, D, Q, s)
The nonseasonal ones are the ARIMA lags we’re already used to: p, d and q. The rest are the seasonal ones – P, D, Q
and s. The first 3 are obviously the seasonal equivalents of the p, d and q, while s is the only new one. It represents the
length of the season, hence the name – ‘s’.
Now, the seasonal order (P, Q) determines the number of seasons we’re going back. For instance, if P = 2 and s = 10,
then we’re including the values from 1 and 2 seasons ago, which is the same as 10 and 20 periods ago.
Then, if p = 1, we’d be including 𝑋𝑡−1 , 𝑋𝑡−10 , 𝑋𝑡−11 , 𝑋𝑡−20 and 𝑋𝑡−21 . That is because for each of the two seasons, we also
need to include p-many past values relevant to it. Thus, for each seasons (𝑋𝑡−10 , 𝑋𝑡−20 ), we also include 1 additional
past value (𝑋𝑡−11 , 𝑋𝑡−21 ).
To make it easier, let’s see what a SARIMAX (1,0,0) (2,0,0,10) model looks like:
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + 𝜙10 𝑋𝑡−10 + 𝜙11 𝑋𝑡−11 + 𝜙20 𝑋𝑡−20 + 𝜙21 𝑋𝑡−21 + 𝜖𝑡
The SARIMAX Model
The Modified Equation:
However, the values for 𝜙11 and 𝜙21 are restricted. They must be equal to 𝜙1 𝜙10 and 𝜙1 𝜙20 respectively.
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + 𝜙10 𝑋𝑡−10 + 𝜙1 𝜙10 𝑋𝑡−11 + 𝜙20 𝑋𝑡−20 + 𝜙1 𝜙20 𝑋𝑡−21 + 𝜖𝑡
Thus, we can rewrite the equation to get the following:
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + 𝜙10 (𝑋𝑡−10 + 𝜙1 𝑋𝑡−11 ) + 𝜙20 (𝑋𝑡−20 + 𝜙2 𝑋𝑡−21 ) + 𝜖𝑡
For consistency, we like to use distinct notation for the seasonal coefficients as well, so we plug in Φ1 and Φ2 for 𝜙10
and 𝜙20.
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + Φ1 (𝑋𝑡−10 + 𝜙1 𝑋𝑡−11 ) + Φ2 (𝑋𝑡−20 + 𝜙2 𝑋𝑡−21 ) + 𝜖𝑡
Now, this is the actual model that gets regressed. In other words, Python only complies a constant and 3 coefficients:
𝜙1 , Φ1 and Φ2 . Thus, even though the model uses very many past variables, it only needs to compute (p + P) – many
values.
The SARIMAX Model
Past Seasons and Past Residuals:
Now, if we decide to include residuals, you need to know that the seasonal orders don’t directly affect one another. To
see what we mean, here is what a SARIMA (1,0,2)(2,0,1,10) looks like:
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 + 𝜙10 𝑋𝑡−10 + 𝜙11 𝑋𝑡−11 + 𝜙20 𝑋𝑡−20 + 𝜙21 𝑋𝑡−21
+ 𝜃10 𝜖𝑡−10 + 𝜃11 𝜖𝑡−11 + 𝜃12 𝜖𝑡−12 + 𝜖𝑡
We’ve highlighted the new additions in red. We se that simply because we’re adding lags, doesn’t mean we’re
expanding the coefficients we’re including. In other words, we’re not including the value for t-12 only because we’re
adding the residual for that period.
Additionally, the coefficients 𝜃11 and 𝜃12 are restricted too and equal 𝜃10 𝜃1 and 𝜃10 𝜃2 respectively. We can once again
plug in and substitute a few things. We don’t plan on going over each step once more, so we eventually reach the
following:
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 + Φ1 𝑋𝑡−10 + 𝜙1 𝑋𝑡−11 + Φ2 𝑋𝑡−20 + 𝜙1 𝑋𝑡−21 + Θ1 (𝜖𝑡−10 + 𝜃1 𝜖𝑡−11 + 𝜃2 𝜖𝑡−12 ) + 𝜖𝑡
The SARIMAX Model
A Quick Look at the Coefficients:
Now that we know what the actual equation of a SARIMAX (1,0,2) (2,0,1,10) looks like, let’s make a few remarks:
𝑋𝑡 = 𝐶 + 𝜙1 𝑋𝑡−1 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 + Φ1 𝑋𝑡−10 + 𝜙1 𝑋𝑡−11 + Φ2 𝑋𝑡−20 + 𝜙1 𝑋𝑡−21 + Θ1 (𝜖𝑡−10 + 𝜃1 𝜖𝑡−11 + 𝜃2 𝜖𝑡−12 ) + 𝜖𝑡
Even though we are using values from 10 different past values and/or residuals, we’re only estimating 6 coefficients
(excluding the constant). Therefore, when we fit a model, we only get a coefficient for each order, rather than one for
each value we’re using.
Additionally, we have to include more past values because if the value from yesterday affects the value today, then
the value from 11 days ago affects the one from 10 days ago. This is the entire reason we’re not only including 𝑋𝑡−10 ,
𝑋𝑡−20 and 𝜖𝑡−10 in the model, but also the values that shape them.
You can think of the values we’re including as a time series with a different frequency. Notice how 𝑋𝑡−10 + 𝜙1 𝑋𝑡−11
and 𝑋𝑡−20 + 𝜙1 𝑋𝑡−21 are essentially the same thing 1 season (10 periods) apart. Then, we just think of seasonal
patterns as trends with a different frequency we need to include in order to make good estimations.
The SARIMAX Model
Implementation of the Model in Python:
The library the
SARIMAX
method comes The method we The seasonal order
from are importing of the model
The non- *For an SARIMAX(p,d,q)(P,D,Q,s) model,
The variable storing the The time
seasonal simply change the order from (1,0,2) to
model characteristics series we wish
order of the (p,d,q), and the seasonal order from (2,0,1,10)
that we will fit later to analyse
model to (P,D,Q,s).