A lot of time series practitioners resort to the well-known Box-Jenkins methods, such as ARIMA and SARIMA, when modelling time series. Those methods are easily incorporated into the State Space framework. This provides to be beneficial, as missing observations are easily dealt with in the State Space framework. Moreover, no observations need to be discarded due to the differencing and introduced lagged variables. The loglikelihood is calculated in an exact manner! In this document, we show you how to estimate ARIMA and SARIMA models using statespacer. We reproduce some estimation results as in Box et al. (2015).

ARIMA modelling of the yearly sunspot data

To showcase the estimation of ARIMA models, we make use of the sunspot.year data, which contains yearly numbers of sunspots from 1700 to 1988. See ?sunspot.year for details. We only use the data from 1770 to 1869, to stay in line with Box et al. (2015). We estimate an \(\text{ARIMA}(3, ~ 0, ~ 0)\) with deterministic level (or constant if you prefer) as follows:

# Load statespacer
library(statespacer)

# Load the dataset
library(datasets)
Data <- matrix(window(sunspot.year, start = 1770, end = 1869))

# Estimate the ARIMA model
fit <- statespacer(y = Data,
                   H_format = matrix(0),
                   local_level_ind = TRUE,
                   arima_list = list(c(3, 0, 0)),
                   format_level = matrix(0),
                   initial = c(0.5*log(var(Data)), 0, 0, 0),
                   verbose = TRUE,
                   standard_errors = TRUE)
#> Starting the optimisation procedure at: 2023-01-27 22:26:03
#> Parameter scaling:[1] 1 1 1 1
#> initial  value 5.022561 
#> iter  10 value 4.114363
#> final  value 4.111163 
#> converged
#> Finished the optimisation procedure at: 2023-01-27 22:26:03
#> Time difference of 0.194862127304077 secs

Note that we eliminate the observation error by setting its variance to 0, although it’s perfectly fine to include observation errors along with ARIMA models, as long as you watch out for identification issues of course. For details about specifying proper initial values, please see vignette("dictionary", "statespacer").

We obtain the following estimates:

# Coefficients of the ARMA component
arma_coeff <- rbind(
   fit$system_matrices$AR$ARIMA1,
   fit$standard_errors$AR$ARIMA1
)
arma_coeff <- cbind(
   arma_coeff,
   c(fit$smoothed$level[1],
     sqrt(fit$system_matrices$Z_padded$level %*%
          fit$smoothed$V[,,1] %*%
          t(fit$system_matrices$Z_padded$level))
   )
)
rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ar1", "ar2", "ar3", "intercept")
arma_coeff
#>                    ar1       ar2       ar3 intercept
#> coefficient 1.55976415 -1.005462 0.2129622 48.605905
#> std_error   0.09962468  0.155982 0.1003591  6.358039

goodness_fit <- rbind(
   fit$system_matrices$Q$ARIMA1,
   fit$diagnostics$loglik,
   fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit
#>                      [,1]
#> Variance       222.496532
#> Loglikelihood -411.116313
#> AIC              8.322326

We see that the results are fairly similar to the results as obtained by Box et al. (2015). Differences may occur due to the different estimation procedures. We don’t have to eliminate observations, so we use the full information available at hand, in contrast to traditional estimation procedures. Note that not much has to be done to estimate VARIMA models. In fact, you only need to specify a dependent variable y that has more than one column! It’s also straightforward to add explanatory variables, by making use of the addvar_list option, see vignette("seatbelt", "statespacer") for an example of adding explanatory variables.

SARIMA modelling of the airline data

To showcase the estimation of SARIMA models, we make use of the classic AirPassengers data, which contains monthly totals of international airline passengers from 1949 to 1960. See ?AirPassengers for details. We estimate a \(\text{SARIMA}(0, ~ 1, ~ 1)_{1} ~ \times ~ (0, ~ 1, ~ 1)_{12}\). Note that in the multivariate case, there is a subtle difference between \(\text{SARIMA}(0, ~ 1, ~ 1)_{1} ~ \times ~ (0, ~ 1, ~ 1)_{12}\) and \(\text{SARIMA}(0, ~ 1, ~ 1)_{12} ~ \times ~ (0, ~ 1, ~ 1)_{1}\) as matrix multiplication is not commutative.

We proceed as follows:

# Load the dataset
Data <- matrix(log(AirPassengers))

# The SARIMA specification, must be a list containing lists!
sarima_list <- list(list(s = c(12, 1), ar = c(0, 0), i = c(1, 1), ma = c(1, 1)))

# Fit the SARIMA model
fit <- statespacer(y = Data,
                   H_format = matrix(0),
                   sarima_list = sarima_list,
                   initial = c(0.5*log(var(diff(Data))), 0, 0),
                   verbose = TRUE)
#> Starting the optimisation procedure at: 2023-01-27 22:26:04
#> Parameter scaling:[1] 1 1 1
#> initial  value -1.034434 
#> final  value -1.616321 
#> converged
#> Finished the optimisation procedure at: 2023-01-27 22:26:04
#> Time difference of 0.942636966705322 secs

We obtain the following estimates:

# Coefficients of the ARMA component
arma_coeff <- rbind(
   c(fit$system_matrices$SMA$SARIMA1$S1, fit$system_matrices$SMA$SARIMA1$S12),
   c(fit$standard_errors$SMA$SARIMA1$S1, fit$standard_errors$SMA$SARIMA1$S12)
)

rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ma1 s = 1", "ma1 s = 12")
arma_coeff
#>               ma1 s = 1  ma1 s = 12
#> coefficient -0.40188859 -0.55694248
#> std_error    0.08963614  0.07309788

goodness_fit <- rbind(
   fit$system_matrices$Q$SARIMA1,
   fit$diagnostics$loglik,
   fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit
#>                        [,1]
#> Variance        0.001347882
#> Loglikelihood 232.750284785
#> AIC            -3.010420622

As you can see, fitting the Box-Jenkins models with statespacer is quite easy!

References

Box, George EP, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time Series Analysis: Forecasting and Control. John Wiley & Sons.