Ciclical time series with squeezed time

Ivan Pacheco Soto¹

April 2005

Abstract

We discuss the estimation of the frequency in a strictly periodic time series that has been recorded as if fully observed at equally spaced times. This problem arises in the study of historical climate change based on the measurements of the thickness of laminae found in sedimentary rocks. We explore the effect of the missing time and the observations in a sinusoidal signal. Two types of random variations are introduced in the model. The first one is due to variations in the scale, where an unobserved gap could be regarded as the destruction of the record during a whole period of time. Hence it would seem that we have a complete record of the time series, but the time recorded would not correspond to the real scale of time. The second random variation is the observational error, which could be regarded as a measurement error or a common variation in the underlying process. A consistent estimator is obtained only when the first type of random variation is considered in the model. A modified version of this estimator is proposed when also the observational error is considered and its performance is assessed numerically.

We first approach this problem by estimating the frequency using the classical Spectral Analysis method. We found a closed expression for the location of the peak of the spectral density, which would be an estimate of the modulated frequency. Unfortunately we would need more information on the distribution of the missing time to estimate consistently the frequency of the squeezed time series. Then we apply the non-linear Newton's method, which leads to our proposed method. We estimate the parameters of the sinusoidal curve by sliding a window through the whole series and locally computing the least squares estimates. We represent these estimates as points in the complex plane. The changes in their path produce an estimate of the mean distribution of the size of the time gaps. A complex ratio of these estimates turns out to be a linear fractional transformation. The geometry of these functions allows us to develop a method to estimate the frequency of the sinusoidal curve.

In Chapter 5 we prove the consistency of the estimates for the frequency and the mean of the distribution of the time gaps by applying the Uniform Ergodic Theorem. These results hold for a sliding window of size two in the situation of no noise in the measurements. In Chapter 6 we present a simulation study of the estimation when some noise is added to the measurements.

1 Introductory Remarks

This dissertation discusses the estimation of the frequency in a strictly periodic time series that has been recorded as if fully observed at equally spaced times. In this situation, not only observations are missed but also time is lost, so we do not know whether there is a missing observation or how many observations are missed between two recorded measurements.

This problem arises in the study of historical climate change based on measurements of the thickness of laminae found in sedimentary rocks. It is well known that the growth rings in trees record climate fluctuations. Wide growth rings correspond to good years, and thin growth rings correspond to poor years. Since the rings correspond to annual cycles, they can also be used as a dating method. Similarly to tree rings, sediments which accumulate in the bottom of lakes can be used to determine climate fluctuations. The organic communities responsible for these sediments are sensitive to climate fluctuations, which become recorded in very thin and parallel layers or laminae. Couplets of laminae that result from seasonal changes are called varves. Typically, varves consist of alternating dark and light colored layers. In northern lakes, the light layers may form during the summer because only large particles can settle to the bottom due to the wave action. During the winter, the lake freezes over so that even very fine particles (including much of the organic material) can settle to the bottom and form a darker layer. The varving is produced in response to climate fluctuations. Therefore, the thickness of these varves can be used to determine climatic conditions, and are taken as proxies of climate change.

Like the growth rings of trees, varves record annual cycles. Unfortunately, there are many geological processes that affect the sequence of varve thickness in a sedimentation section. These sections might contain hiatuses, which might have been produced by erosion or non-deposition intervals of time. Thickness of the varves might have been affected by changes in sedimentation rates and degree of compaction. Furthermore, varves can also be generated by non-seasonal mechanisms such as storms, floods or by possible mixing of the sediments which homogenizes a certain thickness interval, so not all laminae are varves.

Varves have been used in the study of patterns of prehistoric climate change. The Eocene Green River Formation in Utah, Colorado and Wyoming is one of the most studied lake deposits. Bradley(1929) described the varves in this formation, and from measurements of the varves, he estimated the Green River epoch to have lasted between 5 and 8 million years. These varves constitute an important source for the study of prehistoric climate change. Varves from Lake Gosiute in the formation are generally accepted to represent truly annual cycles. One of the objectives in the study of these ancient sequences has been the identification of cycles, which would imply a cyclical climate signal.

Hiatuses in the sedimentation section can alter the calculated frequency of the cycles, so we would like to determine the effect of these gaps on the estimation of the frequency. The problem is that by looking at the sedimentation record the empty spaces seem fully occupied. There is no information of what took place between consecutive layers or varves. Figure illustrates this situation. The plot to the left shows three varves that have been identified from a sedimentation section, and the one to the right shows the history of the deposition of this record. It would appear that this sediment has accumulated during three consecutive years, but there has been an erosion during the third year and lack of deposition during the fifth year. Therefore the measurements recorded at the apparent times 1, 2 and 3 correspond to the real times 1, 4 and 6, respectively.

Figure Figure 1: Varve sequence.

2 Model

Suppose that a sinusoidal curve is observed at unobserved random times. An observation recorded at the apparent time t corresponds to the value of the sinusoidal curve at the real time t+S_t, where S_t represents the lost time up to this point.

The sinusoidal curve could represent the thickness of the varves in a sedimentation section. Then the apparent time is the numbering of the observed varves, and the lost time is the varves missed by erosion, null sedimentation or measurement errors.

Then the model is the following.

Y_t = A₀cos((t+S_t)l₀ + F) + sZ_t t = 1,2,...,

(1)

where

A₀ > 0, 0 < l₀ < p, and s > 0 are unknown real numbers;
F has a uniform distribution on (-p, p];
S_t = N₁ + ... + N_t, where N_t Î { 0, 1, ...,m} (m known) are i.i.d. with mean m, and P(N₁ = 0) > 0 and P(N₁ = m) > 0;
{ S_t} and {Z_t} are independent sequences of random variables;
Z_t's are i.i.d. with mean zero and |Z_t| £ 1.

The N_t's represent the lost time between consecutive observed points, i.e., they are the number of missing varves due to erosion or null sedimentation between two observed varves. When there is positive sedimentation and the record is destroyed by erosion, we assume that complete annual records are lost. Therefore N_t would represent an interval of time of null sedimentation that might also include intervals of time where there was positive sedimentation that was completely destroyed by erosion. Then the record of a varve thickness would represent a full annual cycle.

Z_t represents the error measurements and random fluctuations in the sedimentation process. We will need these random effects to be uniformly bounded. The magnitude of these effects is controlled by s.

We want to estimate the unknown frequency l₀ and the mean amount of lost time m from a realization of this process.

The process Y_t is stationary and the frequency is modulated by the lost time. In Chapter 3 we propose estimates for l₀ and m when there is no noise in the measurements, i.e., s = 0. Under this assumption, this results lead to a basic theory which is approximately valid if s is not too large.

3 Literature Review

3.1 Spectral Analysis

Spectral analysis is the main method that has been used in the study of cycles in varved sediments. With spectral analysis we can estimate the strength of various frequencies in the series, i.e., the power spectrum. Two spectral methods have been usually applied in the analysis of varved sediments: Fourier spectral analysis and maximum entropy spectral analysis (MESA).

In Fourier analysis the time series is decomposed into a weighted average of sinusoidal components. The spectrum is estimated by computing the periodogram, which can be computed by using the fast Fourier transform (FFT) of the time series. Under the main assumption that the time series is stationary, smoothing the periodogram gives a consistent estimate of the spectrum. A complete description of this method can be found in Brockwell and Davis (1991).

MESA is based on choosing a spectral density [^g] corresponding to the most random time series subject to the constraint that the first m Fourier coefficients of [^g] must be exactly the first m sample autocovariances of the time series under study. Then we choose the spectral density [^g] which maximizes the entropy

ó
õ

p

-p

lng(l) dl

over the class of all densities g, subject to the constraints

ó
õ

p

-p

e^ihl g(l) dl =

(l), h = 0,±1,...,±m,

where [^(g)] is the sample autocovariance function of the time series. Although this method seems very different from the method of Fourier analysis, the maximum entropy spectral estimate is the same as the estimate obtained by fitting an autoregressive model (Brockwell at al. (1991)). MESA might detect more efficiently strict periodicities in the time series. This method has also been used because it allows to estimate a more dense power spectrum.

The methods of spectral analysis described above assume the use of a stationary time series, but there might be changes in the length of cycles due to changes in the sedimentation rates or to hiatuses in the sequence, so the time series of the thickness of the varves might not be stationary. To deal with this situation, the spectrum is estimated locally by sliding a window through the whole series. Then the spectrum estimated from all these overlapping windows is stacked up in one plot with the successive spectrum aligned along one axis and the frequency aligned along the other one. These plots can also be contoured in color so that changes in power can be detected easier on the resulting plot. These diagrams can be used to detect non-stationarity of sections and changes in the sedimentation rates.

3.2 Estimation of Cycles

We review some of the references related to the statistical analysis on varved sediments from the Green River Formation. The authors looked for cycles in sequences of varve thickness on the assumption that changes in thickness offer a proxy for climate fluctuations.

Bradley(1929) gave a very detail geological description of the Green River Formation, but he only performed a visual analysis of the variation in thickness of the varves. After plotting a series of the thickness of the varves, Bradley concluded that the average distance between peaks in the series was a little more than 11 years, that he interpreted as the sunspot cycle. Bradley found that certain beds or layers were regularly spaced. He computed the average distance between two particular beds, then divided it by the mean varve thickness for the particular location, and in this way he computed and identified two cycles: one related to the Precession of the Equinoxes (around 21000 years long), and another cycle of 50 years long that has not been related to any known phenomena.

Crowley, Duchon and Rhi(1986) performed a more detail analysis on three time series of thickness of the varves. They used a variety of statistical techniques such as Fourier Analysis, MESA and tests for the spectrum. They also stacked up the successive spectrum obtained from sliding a window to detect periodic fluctuations in sections of the series. They concluded that the varve thicknesses are completely random, so they should not be taken as proxies of climate change. Ripepe, Roberts and Fisher(1991) suggested that the data they analyzed were not truly varved because the varved facies they studied were more likely to record non-seasonal phenomena such as dust storms.

Crowley at al.(1986) visually split the series to obtain stationary subsets. After testing for three models for a possible trend, they concluded that the varve thickness follows a lognormal distribution after removal of a linear trend and addition of a constant or multiplication by a power trend. Although they recognized the nonstationary character of the data, they performed a Fourier analysis and MESA for each complete series, and they found no evidence of a periodic fluctuation in varve thickness. In order to investigate periodic fluctuations in different parts of the time series, they performed a Fourier analysis and MESA of 200-year segments overlapped by 40 years. They called these methods moving Fourier analysis and moving MESA. After stacking the spectra, they did not find significant peaks, and concluded that the series are mainly noise.

The resolution of the system used to measure the thickness of the varves was 30 mm in Crowley at al.(1986), so any lamination thinner than this value might have been included in adjacent varve measurements. To study this situation, they simulated time series according to an autoregressive model of order 1 and added Gaussian noise centered at the varve boundaries. They found that random errors in boundary interpretation add variance at high frequencies in the spectrum but have a minor effect on the low frequencies. So they concluded that interpretation of varve boundaries have a small effect on the structure of the time series.

Ripepe at al.(1991) develop an image analysis method to identify, count and measure thickness of varves. They set an arbitrary lower limit for varve thickness at 30 mm. In their analysis they first filtered out frequencies with periodicities of less than 4 years and of more than 100 years. Then they computed FFT in sequential 200-year windows shifted in 25-year increments. For each window, the time series was detrended before spectral analysis was performed, and the spectrum was normalized to increase the effect of the low amplitude cyclicity. The spectra were stacked into a single spectrum. They identified three cycles that were interpreted as an El Niño (ENSO)-type phenomenon (4.8-5.6 years), the sunspot cycle (10.4-14.7 years) and a 30-year cycle of unknown nature.

In Schwarzacher(1993), Chapter 4, there is a summary of the techniques that have been applied in all these situations.

4 Synopsis

This work is divided in six chapters. The first one is the introduction and motivation of the problem. The second one contains the discussion of the spectral analysis. The next three chapters introduce our estimation method and prove the convergence of the estimators of the frequency and the mean size of the time jumps when there is no noise in the measurement. In the last chapter we present a simulation study for the case when some noise is added to the measurements. Now we describe in more detail the main parts of our work.

4.1 Spectral Analysis

In Chapter 2 we explore the use of the classical method of Spectral Analysis to estimate the true frequency l₀. Grenander and Rosenblatt(1984) showed that the spectral density of a stationary time series that is observed at random times can be expressed in terms of the Poisson kernel from Complex Analysis. We apply this result to the sinusoidal model and obtained the spectral density of the modulated time series. The line spectrum of the sinusoidal curve is transformed into a continuous spectrum when some observations are dropped at random.

For a small frequency l₀, we prove that the spectral density is a unimodal function on (0,p) and find an analytical expression for its location, which is greater than the true frequency. Hence there is a blueing of the spectrum, i.e., there is a shift to higher frequencies.

Unfortunately the analytical expression for the mode is not simple to work with, so we give an approximation to its value that clarifies how much blueing of the spectrum is taking place. The mode can be approximated by l₀ + argf(l₀) where f is the characteristic function of the time jumps. If the frequency of the squeezed time series were to be estimated with a periodogram, then we would obtain the modulated frequency, which would be quite close to l₀ + f(l₀). Hence we would not be able to estimate l₀ without further knowledge of the distribution of the time jumps.

4.2 Estimates of the Parameters

In Chapter 3 we start by applying the non-linear Newton's method to estimate the parameters in the squeezed time series. We slide a window of size three through the whole series and estimate locally the parameters, so we obtain a sequence of non-linear estimates ([^A]_t, [^(l_t)], [^(h)]_t), t = 1,2,... Figure shows that we can combine the plots of [^A]_t vs. h_t and l_t vs. h_t to obtain an estimate of the true frequency. Lemma 3.12 gives a result in this direction. This leads to our proposed method of estimation. Instead of using a non-linear method, we test for a frequency, a value between 0 and p, and estimate locally the amplitude and the phase in the sinusoidal model by using the classical (linear) least squares estimates.

We slide a window of size B through the whole series and obtain a sequence {[^(b)]_t}_t=1^n-B+1 of LSE's for the amplitude and the phase. If the test frequency were equal to the true frequency and there were not missing observations in a block of data, then the LSE's would be the same. Hence we look for instances where consecutive LSE's are approximately the same. Then we define the following function:

G_n,e(l) =

B æ
Ö

n-B
å
t=1

1_{{ |[^(b)]_t+1(l) [^(b)]_t^-1(l) - 1| < e}}

This function basically computes the proportion of consecutive LSE's that are approximately the same. We prove that the highest values of G_n,e occur when the test frequency l is a multiple of the true frequency l₀. Hence we combine the different peaks to estimate the frequency, and the estimate is

argmax
l Î (0,[(p)/(m+1)])

m+1
å
k=1

G_{n,e_n}(kl),

where e_n is a sequence decreasing to zero.

Suppose that only one observation is dropped at some point. The only difference of the LSE's before and after the missing observation would be a change of phase. We compute the angle between consecutive LSE's to keep track of the change of phase at each location. Hence an estimate of m the mean size of the time jumps is

n-B
å
t=1

arg(

t+1

(

)

-1
t

(

))

4.3 Sliding Linear Fractional Transformations (SLFT)

We observe that the ratio of the LSE's can be expressed as the image of a random walk in the unit circle S¹ under a SLFT f, i.e.,

t+1

-1
t

= f_s,l(e^x_tl₀ U) and f_s,l(z) =

az+b

cz+d

where x_t is a random walk on S¹, U is a uniform random variable on S¹, and the coefficients of f depends only on the frequencies l₀, l, and the time jumps contained in the sliding window. The main properties of the SLFT are given in Chapter 4.

Since [^(b)]_t+1[^(b)]_t^-1 » 1 when the test frequency is equal to the true frequency and there are no time jumps contained in the sliding window, a constant SLFT could identify the true frequency. If the sliding window has size B=2, Corollary shows that f º 1 if and only if all the time jumps in the window have the same magnitude and the test frequency is a multiple of the true frequency. Hence if there were a constant SLFT, we would expect a higher peak in the function G_n,e.

4.4 Consistency

First we define a Markov chain {J_t}_t=0^¥ with state space S to track the configuration of time jumps included in the sliding window of size B. Since the ratio of [^(b)]_t+1[^(b)]_t^-1 involves a random walk on the unit circle S¹, we apply the Random Ergodic Theorem to obtain pointwise convergence. Then we apply the Uniform Ergodic Theorem to prove that

G_{n,e_n}(l)

a.s.
®

B æ
Ö

å
s Î S

P(J₀ = s)n(f_s,l^-1({ 1 }))

as n ® ¥,

where n is the normalized circular Lebesgue measure. Note that if the SLFT f_s,l is a non-constant function, then the Lebesgue measure of the preimage of {1} would be zero. Thus the problem resides in finding when the SLFT is a constant function equal to one. We solve this problem for B=2 and prove that

G_{n,e_n}(l)

a.s.
®

ì
ï
í
ï
î

p_k

if l = (k+1)l₀ for some k = 0,1,...,m,

otherwise,

Hence, we show in Theorem that [^(l)]_n ® ^a.s. l₀ as n® ¥.

We also apply the Uniform Ergodic Theorem to prove in Proposition that

a.s.
®

l₀

å
s Î S

P(J₀ = s) argf_s,l₀(0) as n ® ¥.

Here the problem resides in identifying the limit. Proposition gives a result in this direction for a general sliding window, but it is based on a condition that is not clear or simple to verify. This condition is easily verified for a sliding window of width B=2. Theorem proves that [^(m)]_n ® ^a.s.m as n ® ¥.

Footnotes:

¹Under the direction of Dr. Simons from The University of North Carolina at Chapel Hill.

File translated from T_EX by T_TH, version 3.81.
On 04 Aug 2008, 12:23.

Ciclical time series with squeezed time

Ivan Pacheco Soto1

April 2005

Abstract

1 Introductory Remarks

2 Model

3 Literature Review

3.1 Spectral Analysis

3.2 Estimation of Cycles

4 Synopsis

4.1 Spectral Analysis

4.2 Estimates of the Parameters

4.3 Sliding Linear Fractional Transformations (SLFT)

4.4 Consistency

Footnotes:

Ivan Pacheco Soto¹