This document provides extensive details about the object that is
returned by statespacer()
. In order to do so, we start with
introducing the form of the general linear Gaussian state space model,
following the notation used by Durbin and Koopman
(2012). Obtaining a grasp of the notation used will help to get
the most out of the statespacer package!
There are many ways to write down the form of the general linear Gaussian state space model. We use the form used by Durbin and Koopman (2012):
\[ \begin{aligned} y_t ~ &= ~ Z_t\alpha_t ~ + ~ \varepsilon_t, &\varepsilon_t ~ &\sim ~ N(0, ~ H_t), \\ \alpha_{t+1} ~ &= ~ T_t\alpha_t ~ + ~ R_t\eta_t, &\eta_t ~ &\sim ~ N(0, ~ Q_t), \\ & &\alpha_1 ~ &\sim ~ N(a_1, ~ P_1), \end{aligned} \]
where \(y_t\) is the observation vector, a \(p ~ \times ~ 1\) vector of dependent variables at time \(t\), \(\alpha_t\) is the unobserved state vector, a \(m ~ \times ~ 1\) vector of state variables at time \(t\), and \(\varepsilon_t\) and \(\eta_t\) are disturbance vectors of respectively the observation equation, and the state equation. To initialise the model, \(a_1\) is used as the initial guess of the state vector, and \(P_1\) is the corresponding uncertainty of that guess. The matrices \(Z_t\), \(H_t\), \(T_t\), \(R_t\), and \(Q_t\) are called the system matrices of the state space model. Different specifications of these system matrices, lead to different interpretations of the model at hand.
Having obtained a better understanding of the notation used, it is
easier to find our way in the object that is returned by
statespacer()
. Let’s say we store the object of statespacer
in a variable called fit
, that is,
fit <- statespacer(...)
. fit
is then a
list, containing many items, including other lists. This section
describes the items that are included in fit
one by
one.
function_call
is a list that contains, as the name
suggests, the call to the statespacer()
function, including
default values for the input arguments that were not specified. For
details about the various input arguments, check out
?statespacer
.
system_matrices
is a list containing all of the system
matrices of each of the components. For the variance - covariance
matrices \(H\) and \(Q\), it also contains 2 decompositions,
namely the Cholesky \(LDL^{\top}\)
decomposition, where \(L\) is the
loading matrix and \(D\) is the
diagonal matrix, and the correlation / standard deviation decomposition.
The initial guess for the state vector, a1
, is also
included, together with the corresponding uncertainty split out by its
diffuse component, P_inf
, and its stationary component
P_star
. Further, it contains Z_padded
, which
is a list containing the \(Z\) matrices
of the components augmented with zeroes, such that its dimension is
\(p ~ \times ~ m\). These matrices are
useful to extract individual components (which is already done for you),
or to extract standard deviations of the components. There’s also a
vector called state_label
, which labels the state vector to
indicate which state parameters belongs to which components. If
components are specified that introduce parameters into the system
matrices, then these parameters are also included here. At the moment,
these parameters are lamba
(frequency) and rho
(dampening factor) for the cycles, AR
and MA
for the ARIMA components, SAR
and SMA
for the
SARIMA components, and self_spec
for the self specified
component. Note that coefficients of explanatory variables are put into
the state vector, so these are treated as state parameters, and readily
returned by the Kalman filter.
predicted
is a list that contains the one-step ahead
predicted (predicting time \(t\) using
data up to time \(t ~ - ~ 1\)) objects
as returned by the Kalman filter:
yfit
is the predicted value of \(y\).v
is the prediction error.Fmat
is the uncertainty of the prediction.a
is the predicted state.P
is the uncertainty of the predicted state.P_inf
is the diffuse part of P
.P_star
is the non-diffuse part of P
.a_fc
is the predicted state for time \(N ~ + ~ 1\) (\(N\) being the last observed time
point).P_fc
is the uncertainty of a_fc
.P_inf_fc
is the diffuse part of P_fc
.P_star_fc
is the non-diffuse part of
P_fc
.Further, the contributions of the components to the predicted values are extracted separately.
filtered
is a list that contains the filtered (estimates
for time \(t\) using data up to time
\(t\)) objects as returned by the
Kalman filter. Here, a
is the filtered state,
P
the uncertainty of the filtered state, P_inf
is the diffuse part of P
, and P_star
is the
non-diffuse part of P
. Further, the filtered values of the
components are extracted separately.
smoothed
is a list that contains smoothed (estimates for
time \(t\) using all of the time
points) objects as returned by the Kalman smoother:
a
is the smoothed state.V
the uncertainty of the smoothed state.eta
the smoothed state disturbance.eta_var
the uncertainty of eta
.epsilon
the smoothed observation disturbance.epsilon_var
the uncertainty of
epsilon
.Further, the smoothed values of the components are extracted separately.
diagnostics
is a list that contains items useful for
diagnostic tests and model selection:
initialisation_steps
is the number of timesteps
required before initialisation was achieved of the diffuse elements of
the state vector.loglik
is the loglikelihood value at the estimated
parameters.AIC
is the Akaike Information Criterion for the
model.BIC
is the Bayesian Information Criterion for the
model.r
is the scaled smoothed state disturbance.N
is the uncertainty of r
.param_indices
is a list containing the indices of the
parameters in the parameter vector for each of the components.hessian
is the hessian of the loglikelihood evaluated
at the estimated parameters.The following objects are only returned if
diagnostics = TRUE
:
e
is the smoothing error.D
is the uncertainty of e
.Tstat_observation
is the T-statistic for testing
whether deviations from the observation equation are significant.Tstat_state
is the T-statistic for testing whether
deviations from the state equation are significant.v_normalised
is the normalised prediction error.Skewness
is the skewness of
v_normalised
.Kurtosis
is the Kurtosis of
v_normalised
.Jarque_Bera
is the Jarque-Bera statistic for testing
for normality.Jarque_Bera_criticalvalue
is the critical value of the
Jarque-Bera test.correlogram
is the correlogram of
v_normalised
.Box_Ljung
are the Box-Ljung statistics for testing for
serial correlation.Box_Ljung_criticalvalues
are the critical values of the
Box-Ljung tests.Heteroscedasticity
are statistics for testing for
heteroscedasticity.Heteroscedasticity_criticalvalues
are the critical
values of the heteroscedasticity tests.optim
is the list as returned by
stats::optim
or optimx::optimr
, depending on
if you have optimx installed. See ?stats::optim
and
?optimx::optimr
for details. Only returned if
fit = TRUE
.
This section provides details about the parameter vector that’s
supplied to statespacer()
. It clarifies which elements are
used for what components.
Most components use a variance - covariance matrix, which are constructed using the Cholesky \(LDL^{\top}\) decomposition. The parameters supplied to build the variance - covariance matrix are ordered as follows: First, parameters are used for the Diagonal matrix \(D\) and transformed by \(exp(2x)\). Second, the remaining parameters are assigned columnwise to the Loading matrix \(L\), so first the \(1_{st}\) column, then the \(2_{nd}\) column, and so on.
The parameters are assigned to the components in the following order:
BSM_vec
.damping_factor_ind = TRUE
. The remaining parameters are
used for the variance - covariance matrix.s
, first used for the AR coefficients of the
first seasonality, and then the MA coefficients of the first
seasonality, and so on for the subsequent seasonalities.Care should be taken in specifying the initial parameters! Usually, I check out the variances of the dependent variables and then apply the transformation \(0.5\log(x)\) to the variances, and specify those as initial values for the parameters that go to the various variance - covariance matrices. For the AR and MA coefficients, it might be beneficial to initialise them close to 0, to prevent them from converging to unit root solutions. Using the information in this section, it should make the trial and error process of finding proper initial parameters less cumbersome!