nextupprevious
Next:Choice of parametersUp:SSA detection of structuralPrevious:Heterogeneities in time series


Description of the algorithm

Let $x_1, x_2, \ldots, x_N$ be a time series, where $N$ is large enough (possibly, $N\!=\!\infty$). Let us choose two integers: an even integer $m$$(m\leq N)$, the window width, and $M$$(M\leqm/2)$, the lag parameter. We also set $K=m-M+1$. For each $n=0,1,\ldots, N-m$ we take the time intervals$[n+1,n+m]$ and construct the trajectory matrices
$\displaystyle {{\bf X}}^{(n)}=(x_{n+i+j-1})_{i,j=1}^{M,K}=\left(\begin{array}{......&\vdots&\vdots&\ddots\crx_{n+M}&x_{n+M+1}&\ldots&x_{n+m}\cr\end{array}\right)$
(4)


The columns of ${\bf X}^{(n)}$ are the vectors $X_j^{(n)}\, (j=1, \ldots,K),$ where

\begin{displaymath}X_j^{(n)}=(x_{n+j},\ldots,x_{n+M+j-1})^T\,, \;\;\;j\geq \!-n\!+\!1 .\end{displaymath}


For each$n=0,1,\ldots$ we define the lag-covariance matrix ${\bf S}_n={\bf X}^{(n)}({\bf X}^{(n)})^T$. The SVD of ${\bf S}_n$ gives us a collection of $M$ eigenvectors, and a particular combination of$l<M$ of them determines an $l$-dimensional subspace ${\cal L}_{n,I}$ in the $M$-dimensional space of vectors $X_j^{(n)}$. We denote the $l$ eigenvectors that determine the subspace ${\cal L}_{n,I}$ by $P_{i_1},\ldots,P_{i_l}$ and the normalized sum of squares of the (Euclidean) distances between the vectors $X_j^{(n)} (j=p+1,\ldots,q)$ and this $l$-dimensional subspace by ${\calD}_{n,I,p,q}$. The normalization is made to the number of vectors $q-p$ considered and the lag parameter $M$. Since the eigenvectors are orthogonal and their norm is one, the square of the Euclidean distance between an $M$-vector $Z$ and the subspace ${\cal L}_{n,I}$ spanned by the $l$ eigenvectors $P_{i_1},\ldots,P_{i_l}$, is just

\begin{displaymath}\vert\vert Z \vert\vert^2- \vert\vert P^T Z\vert\vert^2 = Z^T Z - Z^T P P^T Z \end{displaymath}


where $\vert\vert\cdot\vert\vert$ is the usual Euclidean norm and $P$ is the ($M\times l$)-matrix with columns $P_1,\ldots,P_l$. Therefore

\begin{displaymath}{\cal D}_{n,I,p,q}= \frac1{M(q-p)} \sum_{j=p+1}^q \left((X_j^{(n)})^T X_j^{(n)} - (X_j^{(n)})^T P P^T X_j^{(n)} \right)\end{displaymath} (5)
For fixed $n$ the part of the sample $x_{n+1}, \ldots, x_{n+m}$ that is used to construct the trajectory matrix ${\bf X}^{(n)}$ will be called the `test sample', and another subseries, $x_{n+p+1}, \ldots, x_{n+q+M-1}$, which is used to construct the vectors $X_j^{(n)} (j=p+1,\ldots,q)$ and thus to compute the normalized sum of squared distances ${\calD}_{n,I,p,q}$, will be called the `test sample'. Of course, the base and test samples may intersect. We shall however assume that$q \geq m $, so that the test sample is not strictly a subsample of the base sample. In the most interesting case $p=m<q $ with, say, $q=m+M$. If a change in the mechanism generating $x_t$ occurs at a certain point $\tau$, then we would expect that the vectors $X_j =X_{j-n}^{(n)}$ with $j > \tau$ lie further away from the$l$-dimensional subspace ${\cal L}_{n,I}$ than the vectors $X_j$ with $j \leq \tau$. This means that we expect that the sequence $D(n)={\cal D}_{n,I,p,q}$, considered as a function of $n$, starts growing somewhere around $\hat{n}$ such that $\hat{n}+q+M-1 = \tau$. (This value $\hat{n}=\tau-q-M+1$ is the first value of $n$ such that the test sample $x_{n+p+1}, \ldots, x_{n+q+M-1}$ contains the point with a change.) This growth continues for some time, the expected time of the growth depends on the signal and the relations between $p, q$ and $m$. In the most interesting case, when $p= m$ and $q-p \leq M$ and in the case of a single change, the function $D(n)$ stops growing after$q-p$ iterations, that is around the point $n = \tau -p -M $. Then during the following $M-(q-p)$ iterations one would expect reasonably high values of the function $D(n)$ which must be followed by its decrease to, perhaps, a new level. (This is due to the fact that the SSA decomposition should incorporate the new signal at the intervals $[n+1,n+m]$ with $n \geq \tau -M$.) Large values of $D(n)={\cal D}_{n,I,p,q}$ indicate on the presence of a structural change in the time series. We also use the CUSUM-type statistics
\begin{displaymath}W_1=S_1,\;\;\;W_{n+1}=\left(W_n+S_{n+1}-S_n\right)^+\, \;\;\; n\geq 1\end{displaymath}


where $S_n = {\cal D}_{n,I,p,q} / \mu_{n,I}$ and $ \mu_{n,I} $ is an estimator of the normalized sum of squared distances ${\cal D}_{j,l,p,q}$ at the time intervals $[j+1,j+m]$ where the hypothesis of no change can be accepted. We use $\mu_{n,I} = {\cal D}_{m,I,0,K}$ where $m$ is the largest value of $m \leq n $ so that the hypothesis of no change is accepted. The formal description of the algorithm is as follows. Let $m$$M$$p$ and $q$ be some integers such that $m$ is even,$M\leq m/2$$0\leq p < q$ and $ m \leq q$ and $I$ be a subset of $\{1,\ldots,M\}$.
For every $n=0,1,\ldots$ we compute:

The decision rule in the algorithm is to announce the change if for a certain $n > m/2$ we obtain$W_n \geq h,$ where $h$ is a fixed threshold. In this case, a change in the structure of the time series is announced to have happened at about the point $\hat \tau = \hat{n} +q+M-1, $ where$\hat{n}$ is the iteration number where the statistics $D(n)={\cal D}_{n,I,p,q}$ started to grow the last time before reaching the threshold. 
nextupprevious
Next:Choice of parametersUp:SSA detection of structuralPrevious:Heterogeneities in time series