Gap filling and noise reduction of unevenly sampled data by means of the Lomb-Scargle periodogram

The Lomb-Scargle periodogram is widely used for the estimation of the power spectral density of unevenly sampled data. A small extension of the algorithm of the Lomb-Scargle periodogram permits the estimation of the phases of the spectral components. The amplitude and phase information is su ﬃ cient for the construction of a complex 5 Fourier spectrum. The inverse Fourier transform can be applied to this Fourier spectrum and provides an evenly sampled series (Scargle, 1989). We are testing the proposed reconstruction method by means of artiﬁcial time series and real observations of mesospheric ozone, having data gaps and noise. For data gap ﬁlling and noise reduction, it is necessary to modify the Fourier spectrum before the inverse Fourier 10 transform is done. The modiﬁcation can be easily performed by selection of the relevant spectral components which are above a given conﬁdence limit or within a certain frequency range. Examples with time series of lower mesospheric ozone show that the reconstruction method can reproduce steep ozone gradients around sunrise and sunset and superposed planetary wave-like oscillations observed by a ground-based 15 microwave radiometer at Payerne.


Introduction
Atmospheric data are often unevenly sampled due to spatial and temporal gaps in networks of ground stations, radiosondes, and satellites.Many measurement techniques such as lidar, solar UV backscatter, occultation depend on weather, solar zenith angle, or constellation geometry and cannot provide continuous data series.Unevenly sampled data have one advantage since aliasing effects at high frequencies are reduced, compared to evenly sampled data (Press et al., 1992).
There are several approaches to come over with undesired data gaps: linear and cubic interpolation and triangulation (appropriate for small gaps), spherical harmonics expansion, Kalman filtering and Bayesian cost functions (using a priori knowledge), and data assimiliation of observations into atmospheric general circulation models.Atmospheric parameters often have periodic oscillations due to forcing processes within the dynamical system of Sun, atmosphere, ocean, and soil.Thus, gap filling by spectral methods can also be considered as an efficient tool.Adorf (1995) performed a very brief and valuable survey on available interpolation methods for irregularly sampled data series in astronomy.Kondrashov and Ghil (2006) describe the application of singular spectrum analysis for spatio-temporal filling of missing points in geophysical data sets.The leading orthogonal eigenvectors of the lagcovariance matrix of data series are utilized for iterative gap filling.Ghil et al. (2002) give a detailed review on advanced spectral methods for reconstruction of climatic time series (e.g.singular spectrum analysis, maximum entropy method, multitaper method).
The review of Ghil et al. (2002) does not mention the Lomb-Scargle periodogram which provides the least-squares spectrum of unevenly sampled data series (Lomb, 1976;Scargle, 1982;Press et al., 1992).The Lomb-Scargle periodogram reduces to the Fourier transform in case of evenly sampled data.In presence of data gaps, the sine and cosine model functions are orthogonalized by additional phase factors (Lomb, 1976).Scargle (1989) investigated the reconstruction of unevenly sampled time series by application of the Lomb-Scargle periodogram and subsequent, inverse Fourier transform.The potential of the study of Scargle (1989) for gap filling of atmospheric data sets has not been recognized yet, maybe because the article has been published in an astronomical journal.Another reason is that most available computer programs of the Lomb-Scargle periodogram solely derive the spectral power density but not the phase or the complex Fourier spectrum.Hocke (1998) modified the Lomb-Scargle algorithm (Fortran program "period.f") of Press et al. (1992) in such a way that amplitudes and phases are returned as functions of frequency and tested the program by means of artificial time series.The modified algorithm has been applied in various studies, e.g. for estimation of amplitudes and phases of tides, planetary waves, and gravity waves in the mesosphere and lower thermosphere, and for investigation of energy dissipation rates in the auroral zone (Nozawa et al., 2005;Altadill et al., 2003; Ford et al., 2006;Hall et al., 2003).
The present study continues the basic idea of Scargle (1989) and utilizes the Lomb-Scargle periodogram for reconstruction of time series.We explain the derivation of the complex Fourier spectrum from the Lomb-Scargle periodogram (Sect.2).The complex Fourier spectrum should be reduced to its relevant spectral components before the inverse Fourier transform yields the equally spaced time series without gaps and without noise.The reconstruction method is tested with synthetic data and with real observations of lower mesospheric ozone (Sect.3).Particularly, the tests with time series of lower mesospheric ozone show that the reconstruction method is not only appropriate for gap filling.The reconstruction method also supports the interpretation of the data and retrieves the regular daily change of lower mesospheric ozone with a high temporal resolution which is required around sunset and sunrise.

Data analysis
The flow chart of the data analysis for reconstruction of data gaps and noise reduction is illustrated in Fig. 1.In the following, we explain the principle of the Lomb-Scargle periodogram.Special emphasis is put on the construction of a Fourier spectrum which is used for the inverse Fourier transform from the frequency domain back to the time domain.The selection of relevant spectral components has not been considered by Scargle (1989).This idea is related to the selection of principal components or leading orthogonal eigenfunctions of the lag-covariance matrix for filling of data gaps (Kondrashov and Ghil, 2006).

Lomb-Scargle periodogram
The Lomb-Scargle periodogram is equal to a linear least-squares fit of sine and cosine model functions to the observed time series y(t i ) which shall be centered around zero Introduction
where y(t i ) is the observable at time t i , a and b are constant amplitudes, ω is the angular frequency, n i is noise at time t i , and Θ is the additional phase which is required for the orthogonalization of the sine and cosine model functions of Eq. ( 1) when the data are unevenly spaced.
According to the addition theorem of sine and cosine functions, Eq. ( 1) can also be written as (2) This writing style shows that a cosine function with amplitude A and phase ϕ=Θ+φ is fitted to the observed time series.For the construction of the Fourier spectrum we need both, amplitude and phase.
The power spectral density P (ω) of the Lomb-Scargle periodogram is given by Printer-friendly Version Interactive Discussion σ 2 is the variance of the y series.The phase Θ is calculated with the four quadrant inverse tangent Scargle (1982) presented this formula of Θ as the exact solution for orthogonalization of the sine and cosine model functions of Eq. ( 1) in case of unevenly sampled data Equations (1-9) describe the Lomb-Scargle periodogram (Lomb, 1976;Scargle, 1982;Press et al., 1992;Bretthorst, 2001a).Scargle (1982) or Hocke (1998) explained that the variables R and I are closely related with the amplitudes of Eq. ( 1)

Construction of the complex Fourier spectrum
For calculation of the complex Fourier spectrum (as required for the FFT algorithm of the program language Matlab), the normalization of the spectral power density P (ω) of Eq. ( 3) is removed by multiplication of P with σ 2 (variance of the y series).The amplitude spectrum A F T is where n is the dimension of the complex Fourier spectrum F .It is better to start the time vector with t 1 =0 (before calculation of the Lomb-Scargle periodogram).Within the program period.f of Press et al. (1992), the phase is calculated with respect to the average time t ave =(t 1 +t N )/2.The change of the phase due to the time shifts can be easily corrected for each spectral component (Scargle, Introduction Conclusions References Printer-friendly Version Interactive Discussion 1989; Hocke, 1998).The phase ϕ F T of the complex Fourier spectrum (inclusive phase correction ωt ave ) is given by The real part of the Fourier spectrum is The imaginary part of the Fourier spectrum is The FFT-algorithm of the program language Matlab requires a complex vector F of such a format: The first number is the mean of the time series (zero mean), then comes the complex spectrum, and finally the reverse, complex conjugated spectrum is attached.
For the frequency grid, we used an oversampling factor ofac=4 in order to have a small spacing of the frequency grid points .The frequencies are at There are other small details, and the reader is referred to our Matlab program lspr.m which is an extension of the program period.f(Press et al., 1992) and which is provided as auxiliary material of the present study.

Reconstruction
Once the complex Fourier spectrum is in the right format, the data analysis is quite simple.The amplitude spectrum can be easily analysed and modified.For example, Introduction

Conclusions References
Tables Figures Full Screen / Esc

Printer-friendly Version
Interactive Discussion all spectral components with amplitudes below a certain confidence limit can be set to zero, and/or spectral components within certain frequency bands can be easily removed.Thus, there are many possibilities for modifications of the time series in the frequency domain, and several examples are shown in the next section.The inverse Fourier transform of the modified Fourier spectrum F m quickly provides the result in the time domain (Fig. 1).The real part of F −1 (F m ) is an evenly spaced time series with reduced noise and composed by the selected spectral components.We have not investigated yet, but it is likely, that the phase information could be utilized for modification of the Fourier spectrum.For example, the phase spectrum may contain valuable information on phase coupling of atmospheric oscillations.

Test with synthetic data
The periodic signal within the synthetic data series consist of 5 superposed sine waves with frequencies from 0.3 to 3 cycles/day.The time interval is 60 days long, and the sampling time is 20 min.A segment of the signal curve is shown in Fig. 2a.Random noise is added to the signal curve.The random noise is by a factor of 2.5 larger than the signals.The combined series of periodic signals, random noise, and data gaps are depicted in Fig. 2b.The series will be used for a test of data gap filling by the Lomb-Scargle periodogram.
For preparation, we subtract the arithmetic mean from the series of Fig. 2b and multiply the series with a Hamming window.Then we apply the data analysis as described by Fig. 1 and obtain the blue curve in Fig. 2c which is almost identical with the true signal curve (black line in Fig. 2c).For the sake of a better comparison of the blue and the black line, Fig. 2d shows a zoom of Fig. 2c.Our data analysis successfully reduced the noise of the red line in Fig. 2b which was the input series and correctly filled the large data gaps.The modification of the Fourier spectrum (Fig. 1) is not mentioned by Scargle (1989) and is described in more detail by the amplitude spectra of Fig. 3.The red line is the amplitude spectrum as obtained from the red time series of Fig. 2b by means of the Lomb-Scargle periodogram.The blue symbols denote the modified spectrum which is used for the inverse fast Fourier transform providing the blue line of Fig. 2c.All spectral components with amplitudes smaller than the dashed horizontal line of Fig. 3 are removed by setting their amplitudes to zero.
In case of geophysical data, confidence limits can be utilized for separation between relevant spectral components and noise.As already mentioned, more sophisticated methods of spectrum modification are also thinkable.In the following we give an example with real data, showing the consequence if the spectrum is not modified.The vertical distribution of ozone is retrieved from the recorded pressure-broadened ozone emission spectra by means of the optimal estimation method (Rodgers, 1976).A radiative transfer model is used to compute the expected ozone emission spectrum at the ground.Beyond 45 km altitude, the model atmosphere is based on monthly averages of temperature and pressure of the CIRA-86 climatology, adjusted to actual ground values using the hydrostatic equilibrium equation.Either a winter or summer CIRA-86 ozone profile is taken as a priori.The retrieved profiles minimize a cost func- tion that includes terms in both the measurement and state space.The altitude h is fixed in both the forward and inverse models, and altitude-dependent information is extracted from the measured spectra by reference to the used pressure profile.SOMORA retrieves the volume mixing ratio of ozone with less than 20% a priori contribution in the 25 to 65 km altitude range, with a vertical resolution of 8-10 km, and a time resolution (spectra integration time) of ∼30 min.More details concerning the instrument design, data retrieval, intercomparisons, and applications can be found in Calisesi (2000), Calisesi (2003), Calisesi et al. (2005), Hocke et al. (2006), and Hocke et al. (2007).Here, we use SOMORA measurements of ozone volume mixing ratio in the lower mesosphere at h=52 km having a time resolution of about 30 min.

Lower mesospheric ozone in summer and winter
A data segment with a length of 40 days and starting on 10 July 2004 is selected.The lower mesospheric ozone series of the SOMORA radiometer are depicted by the black line in Fig. 4. Two major gaps appear around day 6.5 and day 12.If we apply the data analysis (Fig. 1) without the modification of the spectrum, the red line is retrieved which well captures the original series.However the noise is not reduced and the gaps are just filled by the arithmetic mean of the time series.This is certainly not desired but the example gives us some insight into the Lomb-Scargle periodogram method.The variance of the evenly sampled series (red line) minimizes at the locations of the data gaps.
Reduction of the spectrum to its dominant spectral components seems to be necessary for an improved filling of the data gaps.Thus all spectral components with amplitudes smaller than the 98%-confidence limit (dashed horizontal line) are set to zero in Fig. 5 where the full spectrum is shown by the red line, and the modified spectrum is given by the blue symbols.Application of the fast Fourier transform to the modified spectrum yields the series of Fig. by blue and green vertical lines respectively.Due to fast recombination of atomic and molecular oxygen after sunset, the ozone volume mixing ratio is larger during nighttime.The reconstructed series well captures the regular daily variation.Additionally a planetary wave-like oscillation with a period of about 8 days is also present in the reconstructed series of Fig. 6.If we go back to the spectrum in Fig. 5, we see spectral components due to planetary wave-like oscillations (e.g.8-day, 2-day) as well as the diurnal, semidiurnal, and 6-h components.Because of these periodic signals, the reconstruction by spectral methods yields good results in Fig. 6.
Finally the reconstruction method is applied to lower mesospheric ozone observed at h=52 km in winter (December 2005) by the SOMORA radiometer.The daily variation of the red curve in Fig. 7 is different to Fig. 6, since the nights are longer in winter.Additionally, planetary wave activity is usually stronger in winter.However the red reconstructed series of Fig. 7 well represent the daily variations and the planetary wave-like oscillations in winter too.The major gap around day 13 is well filled by the Lomb-Scargle reconstruction method.We like to emphasize that the reduction of noise and the filling of data gaps are valuable for a better recognition of basic processes such as the impact of sunrise and sunset on lower mesospheric ozone.A good estimate of the regular daily variation of lower mesospheric ozone with a high temporal resolution is needed for analysis of gravity wave-induced short-term variance of ozone (Hocke et al., 2006).Digital filtering of the ozone series in the time domain cannot correctly distinguish between sudden changes of the ozone distribution around the solar terminator and gravity wave-induced short-term fluctuations of ozone.A reconstruction by means of spectral methods can solve this problem (Fig. 7).

Conclusions
The reconstruction of unevenly sampled data series with data gaps has been per- is reasonable for data series with various periodic signals.Compared to reconstruction methods in the time domain, the Lomb-Scargle reconstruction provides the amplitude and phase spectrum which are utilized for the modification of the time series in the spectral domain.Thus the data user has analytical support by the spectrum and a high degree of freedom for modifications, adjustable to specific data sets and retrieval purposes.
The reconstruction method has been successfully tested with synthetic data series and ground-based measurements of lower mesospheric ozone.The Lomb-Scargle periodogram is the basic element of the reconstruction method.Recently the Lomb-Scargle periodogram has been derived from the Bayesian probability theory, and the method of the Lomb-Scargle periodogram has been expanded for the nonsinusoidal case (Bretthorst, 2001a,b)

3. 2
Test with observational data of lower mesospheric ozone 3.2.1 SOMORA ozone microwave radiometer The stratospheric ozone monitoring radiometer (SOMORA) monitors the radiation of the thermal emission of ozone at 142.175 GHz.SOMORA has been developed at the Institute of Applied Physics, University Bern.The instrument was first put into operation on 1 January 2000 and was operated in Bern (46.95 N, 7.44 E) until May 2002.In June 2002, the instrument was moved to Payerne(46.82 N, 6.95 E) where its operation has been taken over by MeteoSwiss.SOMORA contributes primary data to the Network for Detection of Atmospheric Composition Change (NDACC).
6.The red line is the reconstructed series which has been retrieved from the black, original series.The data gap around day 12 is well filled.The time points of sunrise and sunset are independently determined and are marked 4612 formed by means of the Lomb-Scargle periodogram, with subsequent modification of the Fourier spectrum, and inverse fast Fourier transform.This reconstruction method Introduction

Fig. 1 .Fig. 2 .Fig. 3 .Fig. 4 .
Fig. 1.Flow chart of the reconstruction method: Phases and amplitudes of the spectral components of the series y(t) are calculated with a special version of the Lomb-Scargle periodogram(Hocke, 1998) and a complex Fourier spectrum F (ω) is constructed from these informations.The Fourier spectrum is modified, e.g.removal of spectral components which are below a certain confidence limit.The real part of the inverse fast Fourier transform of the modified Fourier spectrum F m (ω) provides the evenly spaced series y m (t) without noise and without gaps.

Fig. 5 .
Fig. 5. Complete amplitude spectrum of F (ω) (red) and modified spectrum of F m (ω) (blue) only considering values above the 98% confidence limit which is shown by the horizontal dashed line.The spectra are derived from the O 3 series of Fig. 4 by means of the Lomb-Scargle periodogram.The result of inverse Fourier transform of the complete spectrum F (ω) is shown in Fig. 4 by the red line.The result of inverse Fourier transform of the modified spectrum F m (ω) is shown in Fig. 6 by the red line.It is better to take the modified spectrum F m (ω) for reconstruction of the ozone series.

Fig. 6 .
Fig. 6.Comparison of the O 3 series (black) with the reconstructed series (red) using the modified spectrum of Fig. 5.The reconstruction method provides good predictions or estimates of O 3 VMR within the gaps of the original series, e.g.large gap around day 12.The daily variation of lower mesospheric ozone (h=52 km) is obviously well described by the variation of the reconstruced series (red): Rapid increase of ozone after sunset (time of sunset is indicated by the green vertical line) and rapid decrease of ozone after sunrise (blue vertical line).
. Possibly, the analysis and reconstruction of unevenly sampled data series by means of the Lomb-Scargle periodogram have potential for further developments and applications.Introduction