Description of the algorithm

Next:Choice of parametersUp:SSA detection of structuralPrevious:Heterogeneities in time series

Description of the algorithm

Let $x_1, x_2, \ldots, x_N$ be a time series, where

is large enough (possibly, $N\!=\!\infty$ ). Let us choose two integers: an even integer

$(m\leq N)$ , the window width, and

$(M\leqm/2)$ , the lag parameter. We also set

. For each $n=0,1,\ldots, N-m$ we take the time intervals

and construct the trajectory matrices

$\displaystyle {{\bf X}}^{(n)}=(x_{n+i+j-1})_{i,j=1}^{M,K}=\left(\begin{array}{......&\vdots&\vdots&\ddots\crx_{n+M}&x_{n+M+1}&\ldots&x_{n+m}\cr\end{array}\right)$

(4)

The columns of ${\bf X}^{(n)}$ are the vectors $X_j^{(n)}\, (j=1, \ldots,K),$ where

$\begin{displaymath}X_j^{(n)}=(x_{n+j},\ldots,x_{n+M+j-1})^T\,, \;\;\;j\geq \!-n\!+\!1 .\end{displaymath}$

For each $n=0,1,\ldots$ we define the lag-covariance matrix ${\bf S}_n={\bf X}^{(n)}({\bf X}^{(n)})^T$ . The SVD of ${\bf S}_n$ gives us a collection of eigenvectors, and a particular combination of of them determines an -dimensional subspace ${\cal L}_{n,I}$ in the -dimensional space of vectors $X_j^{(n)}$ . We denote the eigenvectors that determine the subspace ${\cal L}_{n,I}$ by $P_{i_1},\ldots,P_{i_l}$ and the normalized sum of squares of the (Euclidean) distances between the vectors $X_j^{(n)} (j=p+1,\ldots,q)$ and this -dimensional subspace by ${\calD}_{n,I,p,q}$ . The normalization is made to the number of vectors considered and the lag parameter . Since the eigenvectors are orthogonal and their norm is one, the square of the Euclidean distance between an -vector and the subspace ${\cal L}_{n,I}$ spanned by the eigenvectors $P_{i_1},\ldots,P_{i_l}$ , is just

$\begin{displaymath}\vert\vert Z \vert\vert^2- \vert\vert P^T Z\vert\vert^2 = Z^T Z - Z^T P P^T Z \end{displaymath}$

where $\vert\vert\cdot\vert\vert$ is the usual Euclidean norm and is the ( $M\times l$ )-matrix with columns $P_1,\ldots,P_l$ . Therefore

$\begin{displaymath}{\cal D}_{n,I,p,q}= \frac1{M(q-p)} \sum_{j=p+1}^q \left((X_j^{(n)})^T X_j^{(n)} - (X_j^{(n)})^T P P^T X_j^{(n)} \right)\end{displaymath}$

(5)

For fixed

the part of the sample $x_{n+1}, \ldots, x_{n+m}$ that is used to construct the trajectory matrix ${\bf X}^{(n)}$ will be called the `test sample', and another subseries, $x_{n+p+1}, \ldots, x_{n+q+M-1}$ , which is used to construct the vectors $X_j^{(n)} (j=p+1,\ldots,q)$ and thus to compute the normalized sum of squared distances ${\calD}_{n,I,p,q}$ , will be called the `test sample'. Of course, the base and test samples may intersect. We shall however assume that $q \geq m$ , so that the test sample is not strictly a subsample of the base sample. In the most interesting case

with, say,

. If a change in the mechanism generating

occurs at a certain point $\tau$ , then we would expect that the vectors $X_j =X_{j-n}^{(n)}$ with $j > \tau$ lie further away from the

-dimensional subspace ${\cal L}_{n,I}$ than the vectors

with $j \leq \tau$ . This means that we expect that the sequence $D(n)={\cal D}_{n,I,p,q}$ , considered as a function of

, starts growing somewhere around $\hat{n}$ such that $\hat{n}+q+M-1 = \tau$ . (This value $\hat{n}=\tau-q-M+1$ is the first value of

such that the test sample $x_{n+p+1}, \ldots, x_{n+q+M-1}$ contains the point with a change.) This growth continues for some time, the expected time of the growth depends on the signal and the relations between

and

. In the most interesting case, when

and $q-p \leq M$ and in the case of a single change, the function

stops growing after

iterations, that is around the point $n = \tau -p -M$ . Then during the following

iterations one would expect reasonably high values of the function

which must be followed by its decrease to, perhaps, a new level. (This is due to the fact that the SSA decomposition should incorporate the new signal at the intervals

with $n \geq \tau -M$ .) Large values of $D(n)={\cal D}_{n,I,p,q}$ indicate on the presence of a structural change in the time series. We also use the CUSUM-type statistics $\begin{displaymath}W_1=S_1,\;\;\;W_{n+1}=\left(W_n+S_{n+1}-S_n\right)^+\, \;\;\; n\geq 1\end{displaymath}$

where $S_n = {\cal D}_{n,I,p,q} / \mu_{n,I}$ and $\mu_{n,I}$ is an estimator of the normalized sum of squared distances ${\cal D}_{j,l,p,q}$ at the time intervals where the hypothesis of no change can be accepted. We use $\mu_{n,I} = {\cal D}_{m,I,0,K}$ where is the largest value of $m \leq n$ so that the hypothesis of no change is accepted. The formal description of the algorithm is as follows. Let , , and be some integers such that is even, $M\leq m/2$ , $0\leq p < q$ and $m \leq q$ and be a subset of $\{1,\ldots,M\}$ .
For every $n=0,1,\ldots$ we compute:

the trajectory matrix ${{\bf X}}^{(n)}$ , see (),
the lag-covariance matrix ${\bf S}_n={\bf X}^{(n)}({\bf X}^{(n)})^T$ ,
the SVD of ${\bf S}_n$ , and
${\calD}_{n,I,p,q}$ , see (), the normalized sum of the squared Euclidean distances between the vectors $X_j^{(n)} (j=p+1,\ldots,q)$ and the-dimensional subspace ${\cal L}_{n,I}$ .

The decision rule in the algorithm is to announce the change if for a certain

we obtain $W_n \geq h,$ where

is a fixed threshold. In this case, a change in the structure of the time series is announced to have happened at about the point $\hat \tau = \hat{n} +q+M-1,$ where $\hat{n}$ is the iteration number where the statistics $D(n)={\cal D}_{n,I,p,q}$ started to grow the last time before reaching the threshold.

Next:Choice of parametersUp:SSA detection of structuralPrevious:Heterogeneities in time series