Lagrangian Gravity Wave spectra in the lower stratosphere of current (re)analyses

. Due to their increasing spatial resolution, numerical weather prediction (NWP) models and the associated analyses resolve a growing fraction of the gravity wave (GW) spectrum. However, it is unclear how well this “resolved” part of the spectrum actually compares to the actual atmospheric variability. In particular, the Lagrangian variability, relevant, e.g., to atmospheric dispersion and to microphysical modeling in the Upper Troposphere-Lower Stratosphere (UTLS), has not yet been documented 5 in recent products. To address this shortcoming, this paper presents an assessment of the GW spectrum as a function of the intrinsic (air parcel following) frequency in recent (re)analyses (ERA-Interim, ERA5, the ECMWF operational analysis and MERRA-2). Long-duration, quasi-Lagrangian balloon observations in the equatorial and Antarctic lower stratosphere are used as a reference for the atmospheric spectrum and compared to synthetic balloon observations along trajectories calculated using the wind and 10 temperature ﬁelds of the reanalyses. Overall, the reanalyses represent realistic features of the spectrum, notably the spectral gap between planetary and gravity waves and a peak in horizontal kinetic energy associated with inertial waves near the Coriolis frequency f in the polar region. In the tropics, they represent the slope of the spectrum at low frequency. However, the variability is generally underestimated, even in the low-frequency portion of the spectrum. In particular, the near-inertial peak, although present in the reanalyses, has a much reduced magnitude compared to balloon observations. We compare the variability of temperature, zonal momentum ﬂux and vertical wind speed, which are related to low, mid and high frequency waves, respectively. The distributions (PDFs) have similar shapes, but show increasing disagreement with increasing intrinsic frequency. Since at those altitudes they are mainly caused by gravity waves, we also compare the geographic distribution of vertical wind ﬂuctuations in the different products, which emphasizes the increase of both GW variance and intermittency with horizontal resolution. Finally, we quantify the fraction of resolved variability and its dependency on model resolution for the different variables. In all (re)analyses products, a signiﬁcant part of the variability is still missing and should hence be parameterized, in particular at high intrinsic frequency. Among the two polar balloon datasets used, one was broadcast on the global telecommunication system for assimilation in analyses while the other is made of independent observations (unassimilated in the reanalyses). Comparing the Lagrangian spectra between the two campaigns shows that they are largely inﬂuenced by balloon data assimilation, which especially enhances the variance at low frequency.

(2019), like other authors, implicitly assumed that most of the GWs seen in the observations were absent from the original reanalysis products. In the present paper, we compare GW induced fluctuations in modern (re)analyses from the same point of view as SPB, i.e. a Lagrangian point of view, to determine which part of the GW spectrum they resolve and how this resolved part compares with observations. Particular focus will be spent on the intermittency of the gravity wave fluctuations in the (re)analyses. 5 Besides the interest for studies using Lagrangian trajectories, the comparison presented here also serves another purpose, this time from the point of view of model developers. For different models there are different criteria of success, a priori emphasizing the representation of the large-scale circulation. As the resolution of the models increases, a richer array of phenomena is present in the resolved fields and the question arises whether or not to assimilate information on these processes. In that respect, assessing model errors on the GW field, which is described in both observations (e.g. radiosondes, GPS temperature 10 profiles) and model output, is essential from the modelers' point of view. The long-duration balloon dataset provides a unique opportunity to perform such a task and assess the realism of modeled GW in the lower stratosphere.
The paper is organized as follows. Section 2 presents the balloon dataset and reanalyses used, as well as the comparison methodology. Then, the results and comparisons are presented in Sect. 3 and discussed in Sect. 4. Finally, summary and conclusions are provided in Sect. 5. 15 2 Datasets and Methodology

Long-duration balloon observations
Most observations of the atmosphere are obtained either at a fixed location (Radar, lidar) or in motion relative to the flow (satellite, aircraft). In both cases, the relative speed between the measuring instruments and the air hampers direct Lagrangian analysis of flow variability. One of the platforms which overcome that limitation are long-duration superpressure balloons 20 (SPB), which provide observations in a quasi-Lagrangian frame of reference (e.g. Hertzog et al., 2002;Podglajen et al., 2016a).
In the present study, we use SPB measurements as a reference to evaluate the representation of Lagrangian (quasi intrinsicfrequency) spectra. The observations were gathered in the Lower Stratosphere (16-20 km above sea level) during three SPB campaigns coordinated by the French Space Agency (CNES): Vorcore, PreConcordiasi and Concordiasi. Table 1 summarizes the location and timing of the campaigns, as well as the sampling frequency and status regarding data assimilation. Vorcore 25 and Concordiasi took place in the Southern polar vortex during austral spring, while Preconcordiasi flights were launched from an equatorial location (the Seychelles) in boreal spring. Whereas data from the later Concordiasi campaign was broadcast on the Global Telecommunication System (GTS), the earlier Vorcore and PreConcordiasi observations are independent datasets which can be used to evaluate reanalyses (Boccara et al., 2008;Podglajen et al., 2014). For that reason, we will focus in Sect. 3 on Vorcore and PreConcordiasi, while Concordiasi will be used to assess the impact of balloon data assimilation in Sect. 4.1. 30 During the three campaigns, a whole set of measurements was performed, including e.g. ozone and particle measurements; for our purpose, however, only the horizontal position of the balloon from the on-board GPS, and pressure and temperature from the Thermodynamic SENsor (TSEN) will be used. Apart from occasional changes in the mass or volume of the system (such as dropsonde launching, changes in the incoming radiation flux) and balloon inertia acting at periods shorter than 15 min (Vincent and Hertzog, 2014;Podglajen et al., 2016b), SPB are essentially passively advected on isopycnic (constant-density) surfaces once they have ascended to their equilibrium level in the stratosphere. This quasi-Lagrangian behavior renders measurement interpretation relatively simple. Horizontal winds u and v are deduced from the successive positions of the balloon, while pressure is directly measured by the TSEN 5 meteorological sensors. In case of slightly uneven time sampling (varying sampling time step), the raw measurements were interpolated linearly onto a regular time grid. A fast Fourier transform algorithm was then applied to the time series, yielding the signal's Fourier transform (û(ω),v(ω),p(ω)). Periodograms are then obtained directly from |û(ω)| 2 , |v(ω)| 2 and |p(ω)| 2 .
In practice, we use a variant of Welch's method and estimate spectra by averaging periodograms obtained from 8-day windows with 4 days overlaps. Note that for consistency with, e.g., Fritts and Alexander (2003), our definition of the intrinsic-frequency 10 Fourier transform is such that the inverse transform reads: While this sign convention does not affect the periodograms (estimated from squared modulus quantities and insensitive to the phase), it matters as far as the phase of the signal is concerned, notably for the polarization relations relating the Fourier transforms of the different variables. 15 Combining the spectra of |û| 2 (ω) and |v| 2 (ω) leads to the horizontal kinetic energy spectrum E k h (ω) per unit mass: On the other hand, pressure fluctuations p along the balloon trajectory can be used to estimate the vertical displacement of isopycnic surfaces ζ ρ and isentropic surfaces ζ θ along air-parcel trajectories (see Hertzog et al., 2002;Podglajen et al., 2014Podglajen et al., , 2017: where ρ is the average segment density, α = g Cp + dT dz g R + dT dz depends on the local temperature lapse rate dT /dz (g is the gravitational acceleration, C p the thermal capacity of air at constant pressure and R the gas constant for air). dT /dz is not directly measured but interpolated from the ECMWF ERA5 reanalysis; the sensitivity to the exact value of dT dz is small and does not affect the conclusions presented below. Relation 3 relies on a few assumptions (small vertical excursions, small Eulerian pressure perturbation relative to temperature perturbations, adiabatic and hydrostatic flow) which are well-met for 5 the intermediate-frequency (periods between 1 day and 15 minutes) motions of interest here (as they are the ones resolvable by the reanalyses). Using Eq. 3, the potential energy spectrum per unit mass E p can be deduced from the estimated pressure spectrum: As an illustration, Figure 1 (updated from Podglajen et al. (2016a)) shows the intrinsic-frequency spectra (power spectral densities) of E k h , E p and of the zonal momentum flux per unit mass u w for the Concordiasi (polar) and PreConcordiasi (tropical) campaigns, which have the highest sampling frequency (Table 1). The important characteristics that are present in the observations and will be searched for in the reanalyses are: in the polar region, a spectral gap at f between low frequencies and GW frequencies, seen in both E p and E k h . Then, a local peak in E k h at frequencies just higher than f , almost absent in E p . In that frequency range (f <ω < 4 f ), in the tropics and in the mid-frequency range (ω f ) for the polar latitudes, the spectra scale with a power-law behavior ω −s with s 2 for E p and E k h , and s 1 for |u w |. This scaling appears untilω 100 cy/day.
Above about 100 cy/day, the balloon-observed E p and |u w | increase and peak near the Brunt-Väisälä frequency N . Although there exist potential physical reasons for a spectral peak of momentum and potential energy near N in the atmosphere 20 (Podglajen et al., 2016a), part of the balloon-observed enhancement is likely an artifact caused by the non-isopynic response of the platform (Podglajen et al., 2016a;Vincent and Hertzog, 2014). Furthermore, due to their expected small horizontal scale and the importance of non hydrostatic effects (absent from hydrostatic weather models), we can safely assume that buoyancy waves (near N ) are absent from hydrostatic, low resolution analysis weather forecast models. Hence, we will focus our analysis on intrinsic frequencies smaller than 48 cy/day (periods larger than 30 min). For pressure observations, there is a significant 25 aliasing at 48 cy/day from higher-frequency motions (in particular near the N -peak); this is overcome by averaging pressure observations at the highest-available frequency (1 min for Vorcore, 30 s for Concordiasi and PreConcordiasi, see Table 1) with a 15-minute Hann window weighting function and using this subsampled, low-pass-filtered version of the data in the subsequent analysis. the PreConcordiasi and Concordiasi balloon campaigns). The model version and spatial resolution of the different products are summarized in Table 2, together with the time resolution at which the outputs (reanalyses and forecasts) are saved and/or made available. For more details regarding the model set-ups, physical parameterization and data assimilation flow, we refer the reader to the S-RIP introduction paper (Fujiwara et al., 2017).

Atmospheric (re)analyses
Due to finite storage ability, the (output) time sampling is rather coarse (hourly at the best). This may have been a limiting 5 factor for our study: GWs indeed have intrinsic periods ranging from 12 hours down to a few minutes (see Fig. 1), so that with fields output every 3 or 6 hours the dominant fraction of GW variance is aliased towards lower frequencies, thus potentially affecting our estimates. However, this limitation is mitigated by the fact that we analyse spectra as a function of intrinsic frequencyω: 10 which combines the ground-based frequency of the motion ω and its horizontal scale (given by the zonal and meridional wavenumbers k and l). Investigations of different time sampling with ERA5 (for which hourly outputs are available) demonstrated that, while in polar regions the considered intrinsic frequency spectra are strongly sensitive to a change of time sampling from 6 to 3 hours, even more frequent (than 3-hourly) sampling only marginally affected the results. In light of this, the main body of the paper will not present the results obtained with JRA-55, available only every 6 hours (see Table 2) for which 15 our spectral analysis suffers from aliasing. For a fair comparison, all (re)analyses (including ERA5) will be used at a time resolution of 3 hours, except in Appendix A where the impact of the output frequency is investigated.

Comparison methodology
A simplistic approach to compare balloon observations and reanalyses would be to interpolate the model fields along the actual balloon trajectory. However, this might lead to erroneous conclusions regarding the variability present: if for instance the balloon were to record a vertical oscillation with a sheared flow u in the reanalysis, this may lead to an oscillation in the interpolated u wind although they might be absent from an actual trajectory computed with the reanalysis wind. To avoid that 5 complication, we directly computed isopycnic (balloon-like) trajectories using the reanalysis fields. In other words, we solve the following system of ordinary differential equations: the air density ρ being a strictly decreasing function of geometric altitude. In practice, System 6 is solved with ln(ρ) as the vertical coordinate, and the 2D trajectories are integrated using a Runge-Kutta scheme of order 4 and a time step of 1 minute, 10 adjusted when needed to satisfy the CFL criterion. We note that the dependency of the trajectory on the integration time step and the details of the numerical scheme is small compared to other sources of errors (Bowman et al., 2013). Gridded reanalyses fields (typically p, T , u and v) are interpolated in the horizontal and time dimension using cubic splines, leading to vertical profiles; then the vertical coordinate ln(ρ) is calculated and finally the wind is interpolated at the density level of the balloon. To examine the Lagrangian variability and estimate spectra, we calculate 8-days trajectories started at the balloon position every 15 4 days, thus matching the segments used in the observations. Examples of such trajectories for ERA5 reanalysis are displayed in Fig. 2.
At this point, two remarks should be made. First, regarding trajectory accuracy: while it is affected by the sampling frequency (in space and time) of the model and the interpolation method used (Stohl et al., 1995;Bowman et al., 2013), the main source of uncertainty in the lower stratosphere stems from errors in the reanalysis fields (Boccara et al., 2008;Podglajen et al., 2014). 20 In polar regions, the analyses compare well to observations at lower stratospheric altitudes (Boccara et al., 2008), so that there is generally not much difference between observed and simulated balloon trajectories over periods of ∼8 days, as illustrated in Fig. 2 for the case of ERA5. In the tropics, on the contrary, analyses may exhibit large deficiencies (Podglajen et al., 2014;Dharmalingam et al., 2019), and the computed trajectories can largely diverge from the observed ones. However, our results are not overly sensitive to the exact number and location of trajectory segments used. Errors in the large-scale wind are deemed very 5 unlikely to generate a sampling bias and artificially degrade GW variability (which they could do in particular and improbable cases, e.g., by making the balloons drift systematically in quieter areas than those where they actually flew).  The calculated trajectories are 3D-kinematic (with omega velocity, black), 3D-diabatic (with diabatic heating rates, orange), 2D-isentropic (blue) and 2D-isopycnic (red) trajectories; all were computed using ERA5 wind and temperature field at 3-hourly output resolution. The observed trajectory of the balloon is shown in grey. (Right) Pressure time series corresponding to the horizontal trajectories on the left panel.

120°W
Please note that the y-scale only covers a narrow range of altitudes (roughly 1 km maximum deviation peak-to-peak, i.e. about 3 model levels for ERA5).
Second, we note that physical applications and process studies are interested in air parcel trajectories, which are a priori distinct from the isopycnic trajectories of the balloons. However, as argued above and in Podglajen et al. (2016b), the horizontal positions of isopycnic, isentropic and true air parcel trajectories should remain close (relative to the distance traveled) at 10 synoptic time scales (a few days), and Equation 4 is expected to be a good approximation in the stratosphere with low diabatic heating. In order to verify this, we compared the 8-day trajectories and spectra for isopycnic, isentropic (constant potential temperature θ) and full-3D trajectories, computed either in p coordinate with ω = Dp/Dt as vertical velocity (kinematic tra-jectories) or with θ coordinate andθ vertical velocity (diabatic trajectories). The different horizontal and vertical trajectories for the special case of Vorcore balloon 2 on 2005/11/02 are displayed on Fig. 2. Figure 2 demonstrates the validity of our approximations for ERA5: the 3 trajectory types only slightly diverge from one another in the horizontal. Although as expected the isopycnic trajectory diverges more rapidly from the 3D trajectory than the isentropic one, for an integration time of 8 days the paths remain close. While their vertical positions are clearly distinct, the 5 isentropic, 3D-kinematic and isopycnic (scaled onto the isentropes using Eq. 3) trajectories are highly correlated at short time scales. As examined in Appendix B, the estimated Lagrangian energy spectra exhibit only slight differences. Hence, for a more direct comparison to balloon observations, we considered isopycnic trajectories (adjusted using Eq. 3) in the following.
3 Results: Intrinsic frequency spectra of Lagrangian fluctuations In Fig. 3 are depicted the intrinsic frequency spectra (power spectral density or PSD) of E k h , E p and the zonal pseudo- from isopycnic trajectories computed using 3-hourly wind data from the different (re)analyses. The trajectories are output every 15 minutes, so that from the point of view of the sampling, the highest resolvable frequency (Nyquist frequency) in the spectra is 1/48 cy/day. However, we should recall that, as mentioned above and explained in Appendix A, the effective resolution in terms of intrinsic frequency actually depends the limited time and the space resolution of the (re)analyses. 15

Horizontal kinetic energy spectra E kh (ω)
The top two panels of Fig. 3 show the E k h spectra (solid lines) obtained from polar (Vorcore, left) and equatorial (PreConcordiasi, right) trajectories for the balloon observations (black lines) and the reanalyses (colors). In the polar case, low frequency features (below the Coriolis frequency f ) are well-represented all reanalyses. However, in the gravity wave frequency-range (above f ), the variance in E k h is largely underestimated in all 3 systems compared to observations. Nevertheless, the reanalysis 20 E k h -spectra exhibit typical features similar to the observations, including: 1. a local minimum in E k h between f /2 and f , the characteristic spectral gap separating GW (ω > f ) from synoptic-scale motions (ω < f ) 2. a local maximum in E k h around 1.2 − 1.5 f (the near inertial peak, see Hertzog et al. (2002)) 3. a power-law decrease in E k h for frequencies larger than 1.5 f 25 The differences between reanalyses and observations mainly concern: 1. the frequency at which the peak around f in E k h (ω) occurs and its magnitude.
2. the power-law slope of the decreasing E k h forω f . The observations have the shallower slope, followed by ERA5 and ERA-Interim, MERRA-2 having the steepest slope.
Despite the quantitative disagreement between the different reanalyses, they also exhibit qualitative similarities with one another and with the observations. In particular, they all show a near-inertial E k h -peak, consistent with observations (Hertzog et al., 2002). In order to further investigate the agreement between the resolved perturbations and gravity wave theory, we examine to what extent the polarisation relation for inertio gravity waves are fulfilled. On the f -plane, with the convention given by Eq. 1, the polarization relation for the horizontal wind reads (Fritts and Alexander, 2003): whereû is the amplitude of the horizontal wind along the wave vector andû ⊥ the amplitude perpendicular to the wave vector. Keeping in mind the convention expressed by Eq. 1, Equation 7 indicates that low-frequency waves (near f ) induce anticyclonic flow rotation whereas high frequency waves have their horizontal wind perturbation aligned with the wave vector and no preferred rotation direction. To make use of this property and further demonstrate that the near-inertial peak in the 10 analyses is due to inertial oscillations, we turn to rotary spectral analysis. The rotary Fourier transform of the horizontal wind, As a consequence of Eq. 7, the rotary ratio R(ω), ratio of the PSD of anticlockwise horizontal motions (|Û (−ω)| 2 with our Fourier transform convention in Eq. 1) and clockwise ones (|Û (ω)| 2 ), verifies the relation: Note that the rotary ratio depends on the sign of f , so that the rotation direction is reversed in the southern hemisphere with respect to the northern hemisphere, with anticyclonic motions being always favored.
Equation 9 has been exploited in previous studies (e.g. Hertzog et al., 2002;Conway et al., 2019) to evaluate the consistency of the observed horizontal wind spectrum with linear gravity wave theory. In particular, the dominance of anticyclonic (here 20 in the Southern hemisphere) motions at low frequencies is consistent with Eq. 9 and indicates the importance of linear gravity waves, opposed to, e.g. stratified turbulence for which no systematic phase relation between the horizontal wind components is expected. In Fig. 3, R(ω) is displayed together with its theoretical value ratio given by Eq. 9. The dominance of anticyclonic motions is a further argument in favor of inertio-GW being responsible for the f -peak in both reanalyses and observations.
However, we note that it is less pronounced in the reanalyses than in the observations, and in ERA5 than ERA-Interim and 25 MERRA-2. Possible reasons for this will be examined in the next section.
In the tropics, the statistical agreement between reanalyses and observations regarding the representation of high-frequency variability is surprisingly better than in the polar latitudes. This contrasts with the representation of large-scale wind field which is better in polar regions (Boccara et al., 2008) than in equatorial ones (Podglajen et al., 2014). The agreement obtained for lowfrequency waves is quantitative as well as qualitative, and reaches up to a cut-off frequency which is about 4 cycles/day for the 30 ERA-Interim, ERA5 and the ECMWF operational analysis, and 2 cycles/day for MERRA-2. Below this cut-off frequency, the observed and modeled age spectra are in quantitative as well as qualitative agreement. At the cut-off frequency, the reanalysis spectra show a kink and above it they drop with much larger slopes than the observed ones, which keep decaying with a constant slope.

Potential energy spectra
The middle two panels of Fig. 3 show the spectra of the potential energy per unit mass E p , for the polar (left) and tropical (right) 5 flights. To a large extent, the situation is similar to the one described above for E k h : in the polar case, there is a qualitative agreement in the structure of the power spectra (notably regarding the spectral gap around f ), but a quantitative disagreement and a steeper power law-slope than in the observations.
Besides the direct value of the potential energy, it is again interesting to investigate whether the fluctuations in reanalyses verify the polarisation relation expected for gravity waves. In the absence of constructive interference, the ratio of potential to 10 horizontal kinetic energy for frequenciesω N should obey (Podglajen et al., 2016b):

Figure 3 exhibits
Ep(ω) for the 3 reanalyses and the observations (dashed lines), together with its theoretical value given in Eq. 10. The observations again closely follow theoretical expectations from ∼ 1.2 f , with a dominance of kinetic energy, to higher frequencies (the so-called mid-frequency range) where there is an equipartition between E p and E k h . The variability 15 in reanalysis trajectories shows the same evolution with frequency as the observations, and the E k h /E p ratio decreasing with increasing frequency. However, reanalysis also suffer from a significant overestimation of the kinetic energy compared to the potential energy. This is the case by a factor 1.5 to 3 for ERA5, and 5 for the ERA-Interim and MERRA-2. Together with the rotary ratio analysis, this suggests that a significant fraction of the variability does not obey the observed (and expected) polarisation relations for gravity waves. One might speculate that this inconsistency stems from numerical dissipation. 20 In the tropics, as for the E k h spectra, the E p spectra from reanalyses and observations are in better agreement than over the pole. In particular, the E k h /E p ratio is close to 1 for the reanalyses, as expected, whereas there is a quantitative agreement between observed and analyzed E p up to a threshold frequency similar to that encountered with E k h . Beside the reanalyses, for the tropical case the ECMWF operational analysis trajectories are also displayed, which shows similar quantitative results as for the ERA5 reanalysis.

EP flux and vertical wind spectra
The bottom two panels of Fig. 3 display the zonal pseudo-momentum 1 − f 2 /ω 2 |u w | flux spectra of resolved waves in the reanalyses. This quantity characterizes the forcing of the large-scale flow by the waves so that those panels enlighten us regarding the missing gravity wave drag from resolved waves compared to observations. Over the pole, the reanalyses underestimate the variability in the whole GW range, with ERA5 being the closest to the 30 balloon observations. Although the number of reanalysis products is limited, there appears to be a correlation between the fraction of variability resolved at low frequency and the vertical resolution of the product (given in Table 2). With the highest vertical resolution, ERA5 resolved the largest fraction of the variability, followed by ERA-Interim and MERRA-2 with coarser resolutions.
In the equatorial region, on the other hand, similar to the E k h and E p spectra, the momentum flux by low-frequency equatorial GW is comparable to observations in all products, up to the threshold frequency where it drops with a slope twice as large 5 as the actual slope.
We do not show the vertical wind spectrum, but its shape can be readily deduced from the potential energy spectrum as E kv = 1 2ŵ 2 ∝ ω N 2 E p , so that the inequal performance of the reanalysis can be deduced from the middle panel of Fig. 3. It should be noticed that the different quantities considered have changing power-law slopes in the gravity wave range fromω −2 for the horizontal kinetic energy E k h and potential energy E p power spectra, toω −1 for the EP flux spectrum 1 − f 2 /ω 2 |u w |, 10 and toω ∼0 for the vertical kinetic energy E kv . As a consequence, the variability of different fields emphasizes different parts of the spectrum: while u, v (E k h ) and T (E p ) are more connected to the low-frequency part of the gravity wave spectrum, 1 − f 2 /ω 2 |u w | corresponds to the intermediate frequencies, and the vertical wind component w (related to E kv ) to the high-frequency waves. 15 The PSDs examined above provide information on the autocorrelation and the repartition of the fluctuations as a function of intrinsic frequency. However, as pointed out by several authors (e.g. Hertzog et al., 2012;Podglajen et al., 2016b), the probability distributions are also relevant to determine whether the variance is due to ubiquitous, constant variability generated by the superposition of random waves, or if rare, large excursions created by specific wave events can occur. The latter behavior is expected for GW which are known to be intermittent.

Intermittency of the fluctuations
Similar to what was used for the Lagrangian spectra, we compare the distribution of the fluctuations in the reanalyses to the observations based on computed isopycnic trajectories corrected for the non-isopycnic behavior (by using the coefficient α defined in Eq. 3). In order to focus on the GW range, we filter the outputs of the trajectories to keep only the signal corresponding to intrinsic frequencies shorter than 48 cycles/day and larger than f (30 • )/(2π) 1 cycle/day (tropics) or f (70 • )/(2π) 2 25 cycles/day (pole). The lower bound is considered as the lower bound of the GW frequency range in the regions studied, while the upper bound corresponds to the Nyquist frequency of Vorcore balloon position information (every 15 minutes). It is below the frequency at which the balloons start to depart significantly from the isopycnic behavior, so that the comparison can only be marginally affected by the non-isopycnic balloon response. In that frequency range [ f 2π ; ∼ N 4π ], we consider temperature, zonal momentum flux and vertical velocity to characterize statistics and intermittency of respectively low, medium and 30 high-frequency waves. Figure 4 shows the probability distribution (PDF) of the three quantities (temperature, zonal momentum flux and vertical wind) for all the (re)analyses considered. From top to bottom, the considered fluctuations are primarily induced by waves of increasing frequency; |T l | 2 ∝ E p (ω) emphasizes the low frequencies while |u w | ∝ωE p (ω) and |w | 2 ∝ω 2 E p (ω) are related to increasing frequencies. Since high frequency waves are more poorly represented than low frequency ones (Fig. 3), it comes with no surprise that the PDF of analyzed and observed fluctuations are in increasing disagreement from temperature to momentum flux and from momentum flux to vertical velocity in both tropical and Southern polar regions.
The width of the temperature PDFs for the polar and tropical regions is redundant with the level of the variance in the E p 5 spectrum already shown in Fig. 3. The additional information provided in Fig. 4 concerns the shape of the PDF. In the tropics, GW temperature fluctuations are ubiquitous and characterized by a Gaussian PDF. This behavior is fairly well reproduced in all reanalyses examined here. In contrast, in the pole, the balloon PDFs are no longer Gaussian. In particular, they have longer tails than Gaussian PDFs, a behavior which is reproduced by reanalyses.
Regarding the momentum flux, the tropical and polar PDFs both exhibit log normal PDFs, as well as the reanalyses. We note 10 that the tropical regions during PreConcordiasi show more activity than the polar regions during Vorcore, but this is largely due to a wider GW range in the tropics, where f → 0. Finally, the vertical wind PDFs show Laplace distributions in both cases, and the order of the reanalyses is the same as for the rest. In summary, while the fraction of resolved variance varies from one analysis to the other, it appears that the basic intermittency properties of the GW field and the shape of the PDF of the fluctuations are consistent between observations and the different reanalyses. 15 It should also be noticed that the conclusion regarding the resolved fraction of variability for the different reanalyses are transferable from one campaign to the other. In other terms, for all quantities and regions examined here, we find that the more realistic reanalysis is ERA5 followed by ERA-Interim and MERRA-2. The 2010 ECMWF operational model performs better than ERA5, likely because of its superior horizontal resolution. 20 We have considered above the intermittency of GWs sampled by the balloons along their trajectories. This intermittency stems from both space and time variability of the GW field in which the balloons drift. A significant part of it comes from the geographic variability of wave sources, as illustrated in Fig. 5 through global maps of the vertical wind standard deviation at 100 hPa (related to high frequency wave activity) for the month of March 2010. The geographic structure is similar between the different reanalyses, which is consistent with the fact that intermittency is similar in the different products (see the shapes 25 in Fig. 4). GW amplitudes largely differ from one product to the other, but the geographic variability is to a large extent common to all reanalyses. In the season examined (boreal spring 2010), mountain ranges such as the Rockies, the Andes, the Himalaya or the Antarctic peninsula stand out as regions of large activities. Convective regions such as the Intertropical Convergence Zone are also characterized by a larger activity in all products. However, although the general geographic pattern of σ w do match between the different products, there are differences in the details. In particular, the features are sharper and 30 smaller scale (especially around orography) in the high resolution products (the operational analysis and ERA5) than in the lower resolution reanalyses (ERA-Interim, MERRA-2 and JRA-55). Although this property might be expected, it implies that intermittency is increased in those products.

Geographic distribution of the fluctuations
Since the different resolutions in the reanalyses imply different horizontal wave numbers which may undergo different filtering and vertical propagation, we present a vertical profile of σ w in the tropics for the different products in Fig. 6, panel a). There is a drastic reduction in σ w from the troposphere to the stratosphere, related to the increased stratospheric stability hampering vertical motions there. Fig. 6, panel b) displays the same σ w equatorial profile but normalized by the standard deviation from the ECMWF analysis. Although they do not resolve the same wave population, the different analyses essentially 5 show similar vertical structure in GW variance, except ERA5 for which σ w is relatively reduced around the tropopause. This is likely due to the increased stability in that region in the high vertical resolution ERA5, which can better resolve strong vertical gradients near the tropopause.

Discussion
4.1 Impact of balloon data assimilation 10 Balloon observations from the PreConcordiasi and Vorcore campaigns were not assimilated in the reanalyses, so that they provide an independent evaluation of the resolved GW variability. However, data collected during the Concordiasi campaign, which took place in austral winter 2010, was broadcast on the GTS and assimilated in most analyses, including ECMWF operational data, ERA5 and ERA-Interim, and MERRA-2 (Hoffmann et al., 2017). Hence, comparing the period of the (assimilated) Concordiasi campaign with that of the (unassimilated) Vorcore provides an opportunity to characterize the extent to which 15 balloon data assimilation in the atmospheric models impacts the representation of the wave field (either through spurious wave generation or realistic waves introduced in the initial state that propagate in the forecast). Since Concordiasi only flew in austral spring 2010, GW activity in the 2010 southern lower stratospheric polar vortex is not typical but largely influenced by data assimilation. However, with the development of the Loon dataset of superpressure balloons (Conway et al., 2019) and studies regarding its assimilation in NWP models (Coy et al., 2019), it is interesting to document the impact of such a SPB dataset. 20 The E k h spectra for the two campaigns are shown in Fig. 7. There are slight differences between the two observed spectra (black lines), notably with a more pronounced f -peak during Vorcore than Concordiasi. Those are likely due to the different latitudinal sampling during the two campaigns, with more variable f for Vorcore trajectories than Concordiasi's (see the shaded grey area in Fig. 7). However, the main differences between the two campaigns does not lie in the observed spectra but rather in the analyzed ones. Whereas the reanalysis E k h spectra during Vorcore largely underestimate E k h and in particular the 25 magnitude of the spectral peak near f , there is a clearer spectral peak in the reanalyses with assimilated balloon data. This better performance of the reanalyses during Concordiasi shows that besides adjusting the state of the flow, assimilation of the balloon dataset statistically reinforces some dynamical ingredients in the simulation (here inertial waves).

Time sampling
Proper representation of the GW variability requires sufficient time resolution, in particular to avoid aliasing of the spectral gap and the f -peak. As mentioned above, this is not as much of a critical issue for the model time step (which is small enough not to filter out the GW spatially resolved) as for its time sampling. The time sampling of the reanalysis should indeed be high 5 enough to isolate unequivocally the highest ground-based frequency present in the simulation. Furthermore, this parameter is one of the most sensitive in trajectory calculations (e.g. Pisso et al., 2010;Bowman et al., 2013), at least with the spatial resolution and integration time step we used. This is especially the case since the winds are instantaneous fields rather than time averages, which implies aliasing when subsampling (e.g. Stohl et al., 1995;Bowman et al., 2013). In order to test the impact of this key parameter on our Lagrangian spectrum estimation, we performed trajectory calculations with ERA5 varying the 10 time sampling of the reanalysis outputs between 6, 3 and 1 hour. We use the instantaneous ERA5 wind values and do not apply any time averaging which would reduce aliasing but cannot be applied to products with coarser temporal sampling (Hoffmann et al., 2019).
The results of this experiment are presented in Appendix A and can be summarized as follows. First, improving the time sampling of the reanalysis from every 6 to every 3 hours induces large changes in the Lagrangian GW spectrum over the pole 15 and enables inertial oscillations at periods of ∼ 12 hours to be resolved. On the contrary, a further improvement of the time resolution from every 3 hours to every 1 hour has little noticeable effect, suggesting that a large fraction of the resolved highintrinsic frequency GW activity is actually caused by the background wind moving the air parcels across "quasi-stationary" wave features with ground-based periods larger than about 6 hours. 20 The nominal resolution of the model, given in Table 2, largely controls the magnitude of GW fluctuations in reanalyses. This is shown in Figure 8, which displays the fraction of resolved variance in the 2-48 cy/day intrinsic frequency range in the reanalysis products compared to balloon observations, for the horizontal kinetic energy E k h , the potential energy E p , the zonal pseudo-momentum flux 1 − f 2 /ω 2 |u w | and the vertical kinetic energy E kv . Considering the (re)analyses from ECMWF (i.e. ERA-Interim, ERA5 and op. ECMWF), the sorting of the resolved variance reflects the horizontal resolution 25 of the products, for all considered variables. MERRA-2 stands out in this consideration: although its nominal resolution(∼ 60 km, see Table 2) is better than that of ERA-Interim (∼ 79 km), it systematically resolves a smaller fraction of GW variance.

Impact of vertical and horizontal resolution
However, this reanalysis has a non-spectral grid, which likely implies some additional diffusion to ensure numerical stability, thus reducing the effective resolution of the model (Holt et al., 2016).
Although the dependency on the reanalysis horizontal resolution is present for all variables considered, it is more pronounced 30 at the lowest intrinsic frequency (i.e. for the variables on the left in Fig. 8). Indeed, the dependency on horizontal resolution appears to be stronger for E k h than for variables with variance primarily contained at largeω (u w and E kv ). This is expected when acknowledging thatω is loosely related to the horizontal wavenumber: while E k h and E p are already partly resolved in low resolution products and do not depend as much on resolution, E kv increases strongly when including the additional small scales brought by the high resolution. Along the same line of argumentation, we note that the fraction of resolved variance is larger for the variables depending on low-frequency waves (E k h and E p ) compared to those whose variance is focused at high frequency. In particular, the fraction of resolved E p is closer to the fraction of resolved E k h in the tropics than over the pole, where there is a relative increase in E k h variance at low frequency which is better resolved than the variability at higher 5 frequency. The difference seen between the tropical and polar region in terms of resolved E k h and E p variance is almost absent for E kv .
Since GW occur at the mesoscale where the grey zone of global models currently lies, the strong sensitivity to horizontal resolution is expected: models with higher resolution are able to resolve a larger fraction of the spectrum due to their higher cut-off wavenumber. For a fair comparison of the datasets with different resolutions, we also considered a version of ERA5 10 truncated in spectral space to match the resolution of ERA-Interim (truncature T255). The GW present are those that can be represented on the ERA-Interim grid; differences hence arise from the different vertical resolutions, propagation properties and sources of large-scale horizontal waves. Note that this does not remove the impact of horizontal resolution entirely, since it still plays a role in the generation and propagation of the resolved waves, but rather provides a lower bound estimate of this impact.
The corresponding map of vertical wind standard deviation σ w = 2 E kv is shown in Fig. 5, panels d). Two interesting results 15 emerge from this exercise. First, the confinement of regions of high GW activity is similar between close-to-native resolution ERA5 and its spectrally-truncated version, but much less pronounced in ERA-Interim where regions of large GW activity are more spread around. Second, although they effectively have the same resolution, the truncated ERA5 has enhanced variance compared to ERA-Interim.
Besides horizontal resolution, vertical resolution and vertical mixing formulation have also been shown to play a dominant 20 role in controlling the wind energetics in atmospheric models (Skamarock et al., 2019). In particular, in their global simulations, Skamarock et al. (2019) found that both the horizontal kinetic energy spectrum and the wind shear (quantified by the gradient Richardson number) largely increased in amplitude with vertical resolution, with the first signs of convergence observed with vertical mesh spacing near 100 m. Near-inertial waves with reduced vertical scales are believed to largely contribute to the enhanced wind variance at high vertical resolutions (Waite and Snyder, 2009;Skamarock et al., 2019). The dispersion relation: indeed shows that whenω → f , the vertical wavelength is reduced, and these waves are potentially more sensitive to the vertical resolution. In accord with those previous works, our analysis suggests a specific difficulty to simulate small vertical wavelength waves near the inertial frequency. The E k h amplitude of GWs of frequency between 2 and 4 cy/day are only slightly 30 underestimated in the tropics (see Fig. 3) where they have larger vertical scales (f → 0 in Eq. 11), whereas it is much reduced in the polar latitude (Fig. 3) where those waves are near inertial and hence close to critical levels. However, the different reanalysis products from the ECMWF do not exhibit large changes in the description of the f-peak although the vertical resolution of the model in the lower stratosphere varies by more than a factor of four between ERA-Interim and ERA5 (Table 2). This is the case whether or not balloon data are assimilated (Fig. 7). When considering Fig. 8, there is no clear (i.e. monotonic) relationship between vertical resolution and fraction of resolved variance, with for instance the ECMWF operational analysis performing better in the tropics than the higher vertical resolution ERA5. It is likely that the limited vertical resolution of the reanalyses in the lower stratosphere is too low or the vertical mixing too large for the sensitivity to vertical resolution to clearly emerge.

5
A simple recommendation can be drawn from the analysis presented above. As pointed out by a number of studies, gravity waves are now partly represented in modern reanalysis, in particular if one considers the temperature and horizontal wind fluctuations, which are tied to the low frequency part of the gravity wave intrinsic frequency spectrum. In that context of low frequency-large scale waves, the use of reanalysis to quantify atmospheric gravity wave variability is justified, in particular for case studies in the tropics. However, the fraction of the variability resolved by the reanalyses gets smaller with the scale of 10 the waves that generate that variability, so that the momentum flux (intermediate scale waves) and a fortiori the vertical wind variability (tied to small-scale waves) are still underestimated.

Conclusions
This paper examined the representation of gravity wave induced fluctuations in the lower stratosphere of modern reanalyses through the computation of Lagrangian trajectories and their comparison with quasi-Lagrangian balloon observations. Con-15 sistent with previous studies, we find that reanalysis systems resolve a significant part of the gravity wave spectrum at low intrinsic frequencies, and represent part of the associated variability in temperature and horizontal wind. In particular, specific characteristics of the observed spectrum can be well reproduced, such as the spectral gap between gravity waves and the large-scale flow, or the increased horizontal wind variance near the Coriolis frequency. However, the resolved fraction of the variability decreases with increasing intrinsic frequency so that the vertical wind variability, tied to high intrinsic frequency, is 20 largely underestimated in modern systems. The momentum flux, peaking at intermediate scales, is also underestimated, though less than vertical velocity.
Generally, the analyses examined fulfill the expectation that the higher the nominal horizontal resolution, the larger the fraction of resolved variability. Contrasting the tropics with the southern high latitudes further suggests that small-vertical wavelength waves (approaching the inertial frequency) at high latitudes are not well represented. Despite the capability of the 25 underlying models, we also found that the waves in the analyses are not purely self generated by the models, but may rely for a significant part on data assimilation. In particular we found that the assimilation of long-duration balloon observations over Antarctica in austral spring enhanced GW activity in the reanalyses. In any case, the fact that GW are present in NWP models suggest that observations with GW signature (radiosondes, GPS radio-occultations) could be successfully assimilated.
In the future, improved vertical and horizontal resolutions (and maybe specific data assimilation strategies) will allow NWP models to resolve an increased fraction of GW induced variability, which will make them even more valuable than currently for GW related studies. The increased presence of this high-frequency horizontal wind variability requires using higher time sampling (3 hourly) for accurate trajectory calculations. However, we expect that vertical wind variability, which is tied to the smallest spatial scales (∼1-10 km) and for which current modeling approximations (such as the hydrostatic one) are not valid will still remain a challenge. This emphasizes the long-term need for gravity wave parameterizations in NWP-based process studies, especially for processes depending on vertical velocity.
Appendix A: Aliasing in the frequency spectrum due to the finite output frequency of the reanalyses 5 As indicated by Eq. 5, to a given intrinsic frequency corresponds a range of horizontal wavenumbers and ground relative frequency. It is unclear what the contribution of low and high horizontal phase speed is, and whether part of the missing activity results from undersampling the time space. Indeed, the horizontal kinetic energy (for example) at a given intrinsic frequency E k h (ω) actually depends on different ground-based frequencies: where δ is the 1D Dirac distribution and E is the 3D-power spectral density (note that only 3D integration is necessary since for internal gravity waves the dispersion relation directly imposes the vertical wavenumber given the three other wave parameters and the background). It is apparent that a givenω corresponds to a whole surface of ω and wavenumbers. High intrinsic frequency waves may hence have high ground relative frequency or be short-scale waves in a large background wind. The 15 two cases are affected differently by our analysis method. Since we retrieve the reanalysis at horizontal resolution close to the actual grid spacing, we fully resolve analysis waves which have high intrinsic frequencies due to their small horizontal scales (large kū). However, the reanalyses are sampled at a time resolution of typically 1 to 6 hours, i.e. lower than the model time step (less than 15 minutes). There can be aliasing in theω for high frequncy waves with large scales but high ω. In order to evaluate this effect, we take advantage of the 1-hourly ERA5 to compare trajectories using 1, 3 or 6 hourly reanalysis fields. 20 The results are displayed in Fig. A1 for Vorcore flights. Large differences occur for both the E k h spectrum and the rotary ratio between 6 and 3 hours: the inertial peak atω 12 hours is completely aliased with the resolution of 6 hours (for which it is close to the Nyquist frequency), but resolved at 3 hours. This shows that the waves dominating the energy spectrum at frequencies near f have predominantly large scales andω ω. On the contrary, further improving does not affect much the E k h spectrum, but leads to a more realistic rotary ratio at higher frequencies. This exercise demonstrates that the 3-hourly resolution 25 provided by most reanalyses (except JRA-55) is sufficient for our purpose, i.e. evaluating the GW intrinsic-frequency spectrum.
Appendix B: Comparison of GW Lagrangian spectra estimates using isopycnic, isentropic, kinematic and diabatic trajectories To test the validity of the assumptions made in Sect. 2.1 to use balloon isopycnic trajectories to infer Lagrangian spectra, we computed isopycnic, isentropic and 3D-kinematic and 3D-diabatic trajectories in the reanalyses. The resulting E k h and E p spectra are presented in Fig. B1 for ERA5 with two time resolutions: 3 and 1 hourly. For the E k h spectrum, the 3 calculations 5 converge at 1 and 3 hourly. For the E p spectrum, only the kinematic case with 3 hourly winds disagrees with the others. The discrepancy is resolved, suggesting aliasing of high frequency vertical motions at 3 hourly resolution ; this effect is filtered out in the isopycnic and isentropic cases, for which the vertical wind is implicitly accounted for through the fully resolved displacement of the θ surfaces.
Author contributions. AP designed the study and carried out the analysis, with important contributions from AH and RP. AH and AP developed the balloon trajectory code used in the study. BL guided the use of reanalysis data. AP wrote the paper, with contributions from all co-authors.
Competing interests. The authors declare that they have no conflict of interest.       Figure B1. Intrinsic-frequency spectrum of (left) kinetic energy E k h and (right) potential energy Ep per unit mass along ERA5 trajectories started at Vorcore balloon locations, for different trajectory types: isopycnic (constant ρ; note that Ep spectrum is in that case adjusted following Eq. 3, as is done for the balloons), isentropic (constant θ) and kinematic and different reanalysis output frequencies (top: 3 hours; bottom: 1 hour).