A lot of time series practitioners resort to the well-known Box-Jenkins methods, such as ARIMA and SARIMA, when modelling time series. Those methods are easily incorporated into the State Space framework. This provides to be beneficial, as missing observations are easily dealt with in the State Space framework. Moreover, no observations need to be discarded due to the differencing and introduced lagged variables. The loglikelihood is calculated in an exact manner! In this document, we show you how to estimate ARIMA and SARIMA models using statespacer. We reproduce some estimation results as in Box et al. (2015).
To showcase the estimation of ARIMA models, we make use of the
sunspot.year
data, which contains yearly numbers of
sunspots from 1700 to 1988. See ?sunspot.year
for details.
We only use the data from 1770 to 1869, to stay in line with Box et al. (2015). We estimate an \(\text{ARIMA}(3, ~ 0, ~ 0)\) with
deterministic level (or constant if you prefer) as follows:
# Load statespacer
library(statespacer)
# Load the dataset
library(datasets)
Data <- matrix(window(sunspot.year, start = 1770, end = 1869))
# Estimate the ARIMA model
fit <- statespacer(y = Data,
H_format = matrix(0),
local_level_ind = TRUE,
arima_list = list(c(3, 0, 0)),
format_level = matrix(0),
initial = c(0.5*log(var(Data)), 0, 0, 0),
verbose = TRUE,
standard_errors = TRUE)
#> Starting the optimisation procedure at: 2023-01-27 22:26:03
#> Parameter scaling:[1] 1 1 1 1
#> initial value 5.022561
#> iter 10 value 4.114363
#> final value 4.111163
#> converged
#> Finished the optimisation procedure at: 2023-01-27 22:26:03
#> Time difference of 0.194862127304077 secs
Note that we eliminate the observation error by setting its variance
to 0, although it’s perfectly fine to include observation errors along
with ARIMA models, as long as you watch out for identification issues of
course. For details about specifying proper initial values, please see
vignette("dictionary", "statespacer")
.
We obtain the following estimates:
# Coefficients of the ARMA component
arma_coeff <- rbind(
fit$system_matrices$AR$ARIMA1,
fit$standard_errors$AR$ARIMA1
)
arma_coeff <- cbind(
arma_coeff,
c(fit$smoothed$level[1],
sqrt(fit$system_matrices$Z_padded$level %*%
fit$smoothed$V[,,1] %*%
t(fit$system_matrices$Z_padded$level))
)
)
rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ar1", "ar2", "ar3", "intercept")
arma_coeff
#> ar1 ar2 ar3 intercept
#> coefficient 1.55976415 -1.005462 0.2129622 48.605905
#> std_error 0.09962468 0.155982 0.1003591 6.358039
goodness_fit <- rbind(
fit$system_matrices$Q$ARIMA1,
fit$diagnostics$loglik,
fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit
#> [,1]
#> Variance 222.496532
#> Loglikelihood -411.116313
#> AIC 8.322326
We see that the results are fairly similar to the results as obtained
by Box et al. (2015). Differences may
occur due to the different estimation procedures. We don’t have to
eliminate observations, so we use the full information available at
hand, in contrast to traditional estimation procedures. Note that not
much has to be done to estimate VARIMA models. In fact, you only need to
specify a dependent variable y
that has more than one
column! It’s also straightforward to add explanatory variables, by
making use of the addvar_list
option, see
vignette("seatbelt", "statespacer")
for an example of
adding explanatory variables.
To showcase the estimation of SARIMA models, we make use of the
classic AirPassengers
data, which contains monthly totals
of international airline passengers from 1949 to 1960. See
?AirPassengers
for details. We estimate a \(\text{SARIMA}(0, ~ 1, ~ 1)_{1} ~ \times ~ (0, ~ 1,
~ 1)_{12}\). Note that in the multivariate case, there is a
subtle difference between \(\text{SARIMA}(0, ~
1, ~ 1)_{1} ~ \times ~ (0, ~ 1, ~ 1)_{12}\) and \(\text{SARIMA}(0, ~ 1, ~ 1)_{12} ~ \times ~ (0, ~
1, ~ 1)_{1}\) as matrix multiplication is not commutative.
We proceed as follows:
# Load the dataset
Data <- matrix(log(AirPassengers))
# The SARIMA specification, must be a list containing lists!
sarima_list <- list(list(s = c(12, 1), ar = c(0, 0), i = c(1, 1), ma = c(1, 1)))
# Fit the SARIMA model
fit <- statespacer(y = Data,
H_format = matrix(0),
sarima_list = sarima_list,
initial = c(0.5*log(var(diff(Data))), 0, 0),
verbose = TRUE)
#> Starting the optimisation procedure at: 2023-01-27 22:26:04
#> Parameter scaling:[1] 1 1 1
#> initial value -1.034434
#> final value -1.616321
#> converged
#> Finished the optimisation procedure at: 2023-01-27 22:26:04
#> Time difference of 0.942636966705322 secs
We obtain the following estimates:
# Coefficients of the ARMA component
arma_coeff <- rbind(
c(fit$system_matrices$SMA$SARIMA1$S1, fit$system_matrices$SMA$SARIMA1$S12),
c(fit$standard_errors$SMA$SARIMA1$S1, fit$standard_errors$SMA$SARIMA1$S12)
)
rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ma1 s = 1", "ma1 s = 12")
arma_coeff
#> ma1 s = 1 ma1 s = 12
#> coefficient -0.40188859 -0.55694248
#> std_error 0.08963614 0.07309788
goodness_fit <- rbind(
fit$system_matrices$Q$SARIMA1,
fit$diagnostics$loglik,
fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit
#> [,1]
#> Variance 0.001347882
#> Loglikelihood 232.750284785
#> AIC -3.010420622
As you can see, fitting the Box-Jenkins models with statespacer is quite easy!