Gravitational separation of Ar / N 2 and age of air in the lowermost stratosphere in airborne observations and a chemical transport model

. Accurate simulation of atmospheric circulation, particularly in the lower stratosphere, is challenging due to unresolved wave–mean ﬂow interactions and limited high-resolution observations for validation. Gravity-induced pressure gradients lead to a small but measurable separation of heavy and light gases by molecular diffusion in the stratosphere. Because the relative abundance of Ar to N 2 is exclu-sively controlled by physical transport, the argon-to-nitrogen ratio (Ar / N 2 ) provides an additional constraint on circulation and the age of air (AoA), i.e., the time elapsed since entry of an air parcel into the stratosphere. Here we use airborne measurements of N 2 O and Ar / N 2 from nine campaigns with global coverage spanning 2008–2018 to calculate AoA and to quantify gravitational separation in the lowermost stratosphere. To this end, we develop a new N 2 O–AoA relationship using a Markov chain Monte Carlo algorithm. We ob-serve that gravitational separation increases systematically with increasing AoA for samples with AoA between 0 and 3 years. These observations are compared to a simulation of the TOMCAT/SLIMCAT 3-D chemical transport model, which has been updated to include gravitational fractionation of gases. We demonstrate that although AoA at old ages is slightly underestimated in the model, the relationship between Ar / N 2 and AoA is robust and agrees with the observations. This highlights the potential of Ar / N 2 to become a new AoA tracer that is subject only to physical transport phenomena and can supplement the suite of available AoA indicators.

Abstract. Accurate simulation of atmospheric circulation, particularly in the lower stratosphere, is challenging due to unresolved wave-mean flow interactions and limited highresolution observations for validation. Gravity-induced pressure gradients lead to a small but measurable separation of heavy and light gases by molecular diffusion in the stratosphere. Because the relative abundance of Ar to N 2 is exclusively controlled by physical transport, the argon-to-nitrogen ratio (Ar/N 2 ) provides an additional constraint on circulation and the age of air (AoA), i.e., the time elapsed since entry of an air parcel into the stratosphere. Here we use airborne measurements of N 2 O and Ar/N 2 from nine campaigns with global coverage spanning 2008-2018 to calculate AoA and to quantify gravitational separation in the lowermost stratosphere. To this end, we develop a new N 2 O-AoA relationship using a Markov chain Monte Carlo algorithm. We observe that gravitational separation increases systematically with increasing AoA for samples with AoA between 0 and 3 years. These observations are compared to a simulation of the TOMCAT/SLIMCAT 3-D chemical transport model, which has been updated to include gravitational fractionation of gases. We demonstrate that although AoA at old ages is slightly underestimated in the model, the relationship be-tween Ar/N 2 and AoA is robust and agrees with the observations. This highlights the potential of Ar/N 2 to become a new AoA tracer that is subject only to physical transport phenomena and can supplement the suite of available AoA indicators.

Introduction
Transport in the middle atmosphere is driven by a combination of advection by the Brewer-Dobson circulation (BDC) (Brewer, 1949;Dobson, 1956) and quasi-horizontal, twoway mixing by breaking waves (Holton et al., 1995). Models consistently predict an acceleration of the BDC due to climate change (Butchart, 2014), but subgrid-scale mixing processes and momentum transfer by unresolved buoyancy waves limit our ability to accurately simulate circulation in the stratosphere (Haynes, 2005;Plumb, 2007). An acceleration of the BDC has important repercussions for stratosphere-troposphere exchange (STE) and thus recovery of the ozone layer and the greenhouse effect of stratospheric water vapor; however, observational evidence of an acceler-ation of the BDC is weak (Engel et al., , 2017Waugh, 2009;Ray et al., 2010Ray et al., , 2014. The mean age of air (AoA) is a widely used indicator of stratospheric circulation (Hall and Plumb, 1994;Waugh and Hall, 2002;Linz et al., 2016). Air can be transported to any location r in the stratosphere via a myriad of different paths, and each path will have an associated transit time. The probability density function that describes the likelihood of air reaching location r with a specific transit time is called the age spectrum. Although the age spectrum is not directly observable, some aspects of its shape can be inferred from observations of short-and long-lived tracers (Andrews et al., 1999(Andrews et al., , 2001Schoeberl et al., 2005;Hauck et al., 2019Hauck et al., , 2020Podglajen and Ploeger, 2019). For tracers that are conserved in the stratosphere and whose concentrations increase approximately linearly with time in the troposphere, such as SF 6 and CO 2 , the mean AoA, i.e., the first moment of the distribution, can simply be calculated as the time difference, or "lag time", to when tracer concentrations in the upper troposphere last had comparable values as measured in a stratospheric sample (Hall and Plumb, 1994;Boering et al., 1996;Waugh and Hall, 2002). The stratospheric concentration of N 2 O has also been calibrated as an independent AoA tracer by relating the gradual photolytic loss of N 2 O in the stratosphere to AoA as determined from CO 2 (Boering et al., 1996;Andrews et al., 2001;Linz et al., 2017).
In contrast to early measurements made on rocket samples (Bieri and Koide, 1970), Ishidoya et al. (2008Ishidoya et al. ( , 2013 have shown using a balloon-borne sampling system that stratospheric air is detectably fractionated by gravitational settling (GS), with the degree of fractionation strongly correlated to AoA Sugawara et al., 2018;Belikov et al., 2019). GS leads to depletion of heavier gases in the stratosphere yielding lower ratios of heavy to light gases, e.g., Ar/N 2 , 18 O/ 16 O of O 2 and 15 N/ 14 N of N 2 , with increasing elevation. The vertical gradients induced by gravimetric separation are steeper at higher latitudes Sugawara et al., 2018), and consistent with patterns observed in stratospheric AoA Belikov et al., 2019). Gravimetric settling in the stratosphere has also been simulated in 1-D (Keeling, 1988;Ishidoya et al., 2008), 2-D Sugawara et al., 2018), and 3-D stratospheric models (Belikov et al., 2019); 2-D and 3-D models both show a pattern of GS which increases with altitude and latitude, similar to the patterns observed in tracers with a significant stratospheric sink such as N 2 O, and consistent with a positive correlation with AoA Sugawara et al., 2018;Belikov et al., 2019). Here we attempt to calibrate GS of Ar/N 2 as an AoA tracer similar to previous work on the N 2 O-AoA relationship.
Observing GS in the stratosphere is challenging, however, as the signals are small and because of the need to avoid artifacts caused by temperature-and pressure-induced fractionation near the sampling inlet (Blaine et al., 2006;Ishidoya et al., 2008Ishidoya et al., , 2013. The resulting scatter in existing balloonbased measurements precludes a clear evaluation of the relationship between GS and AoA (Belikov et al., 2019).
Here we present a dataset of gravitational fractionation of Ar/N 2 and AoA observations made on flask samples from three airborne research projects, totaling nine campaigns in the lowermost stratosphere with mean AoA < 3 years Stephens, 2017). The HIAPER Pole-to-Pole Observations (HIPPO) project was a global survey of the Pacific troposphere to lower stratosphere on the NSF/NCAR Gulfstream-V aircraft (Wofsy et al., 2011), composed of five individual campaigns from 2008 to 2011. The O 2 /N 2 Ratio and CO 2 Airborne Southern Ocean (ORCAS) study was conducted using the same aircraft but focused on the Drake Passage and Antarctic Peninsula during January-February 2016 . The Atmospheric Tomography (ATom) project was a survey of the troposphere and lower stratosphere of the Pacific and Atlantic Ocean basins on the NASA DC-8 aircraft, composed of four individual campaigns from 2016 to 2018 (Prather et al., 2017). The observations are compared to new simulations of GS with the TOMCAT/SLIMCAT (Chipperfield, 2006) tracer transport model. Our goals are to demonstrate the consistency of our data with gravitational fractionation, to evaluate model performance, and to highlight the potential of Ar/N 2 as a new age tracer.

Measurements
Discrete 1.5 L flask samples were taken with the NCAR/Scripps Medusa Whole Air Sampler Stephens et al., 2020) (https://www.eol.ucar.edu/ instruments/ncar-scripps-airborne-flask-sampler, last access: 7 June 2020). Medusa holds 32 borosilicate glass flasks sealed with Viton o-rings and uses active pressure control to fill the flasks with cryogenically dried air to ∼ 760 torr. Flasks are shipped to the Scripps Institution of Oceanography (SIO) for analysis of Ar/N 2 ratios on an IsoPrime Mass Spectrometer. We report changes in Ar/N 2 ratios in delta notation: where R SA is the mixing or isotope ratio in the sample and R REF the ratio in a reference mixture. Measurements are made on the Scripps O2 Program Argon Scale, as defined on 21 January 2020. δ(Ar/N 2 ) values are reported after applying an offset to the data to yield a mean of zero in the tropical airborne observations of the free troposphere between 3 and 8 km. Uncertainty in δ(Ar/N 2 ) observations arises from a combination of analytical limitations and artifactual fractionation during sampling. Replicate agreement of surface flasks shows a 1σ repeatability of ±6.1 per meg for δ(Ar/N 2 ), but additional scatter in the data may be introduced by small leaks in the Medusa system and thermal or pressure gradient fractionation at the sample intake Stephens et al., 2020). For further details, see . These effects are challenging to separate from true atmospheric variability and differ between campaigns because sampling strategies have generally improved over time. To constrain uncertainty due to aircraft sampling, we quantify total variability in observations from the presumably fairly homogeneous tropical troposphere between 3 and 8 km ( Fig. S1 in the Supplement). Data from the earlier campaigns HIPPO 1-5 show a pooled standard deviation of 24 per meg, whereas data from ATom 2-4 yield a pooled standard deviation of 9 per meg, illustrating advances in sampling and sample handling. While no ORCAS samples are available in the tropical troposphere, ORCAS data show similar scatter to ATom 2-4 data between 20 and 50 • S. For all campaigns, the uncertainty is small compared to the stratospheric signal of tens to hundreds per meg shown below. We also show that the stratospheric signal in δ(Ar/N 2 ) is not due to pressure-dependent inlet fractionation by evaluating the residual δ(Ar/N 2 ), which has been corrected for the natural gravitational signal (Fig. S2).
We have compiled available simultaneous, high-frequency measurements of a range of other trace gases, including N 2 O, CO 2 , O 3 , CH 4 , and CO, to identify Medusa samples with stratospheric influence and calculate mean AoA. N 2 O was measured continuously with a precision of 0.09 ppb at 1 Hz frequency using the Harvard Quantum Cascade Laser Spectrometer (QCLS) (Santoni et al., 2014) during HIPPO and ORCAS and measured every 1-3 min using the NOAA gas chromatograph PAN and Trace Hydrohalocarbon ExpeRiment (PANTHER) during ATom. The Unmanned Aircraft Systems Chromatograph for Atmospheric Trace Species (UCATS) was used to measure O 3 during HIPPO and ATom. O 3 was not measured on ORCAS. We use continuous H 2 O data from the NCAR open-path near-infrared multi-pass spectrometer Vertical Cavity Surface Emitting Laser (VC-SEL) (Zondlo et al., 2010) for HIPPO and ORCAS, whereas H 2 O was measured using the NASA Diode Laser Hygrometer (DLH) (Diskin et al., 2002) during ATom flights. CH 4 , CO 2 , and CO were measured by QCLS during HIPPO and ORCAS and by the NOAA Picarro (Karion et al., 2013) during ATom. An averaging kernel is applied to the continuous and semi-continuous aircraft data, such as N 2 O, O 3 , and H 2 O, to match it to Medusa samples. The kernel multiplies a weighting function w i (t) by all continuous data before time t s , when Medusa switched from sample flask i to sample flask i + 1. w i (t) for each sample i is given by where t is each 1 s increment of the continuous data and τ = V Q is the flushing time of air in a Medusa flask determined by the flask volume V and airflow Q. Stratospheric samples are identified based on their N 2 O, O 3 , and water vapor levels. Classification based on chemical composition rather than potential temperature or altitude effectively selects samples with a clear stratospheric signature in the lowermost stratosphere and excludes air which has experienced substantial mixing with tropospheric air masses. We label samples as "stratospheric" if (i) water vapor levels are below 15 ppm and either (ii.a) O 3 values exceed 140 ppb or (ii.b) N 2 O (detrended to a reference year of 2009) is below 315 ppb. These criteria yield 234 lower stratospheric samples with high-quality N 2 O and δ(Ar/N 2 ) data, spanning a wide range of latitudes poleward of 40 • in both the Northern Hemisphere and Southern Hemisphere (Fig. 1). We use Medusa samples from all five HIPPO campaigns, ORCAS, and ATom 2-4. We do not use samples from ATom 1 because inlet fractionation due to the unique inlet design and location on the DC-8 on this campaign introduced apparent biases on the order of 30 per meg. An additional 25 stratospheric samples are available from the START-08 campaign on the NSF/NCAR GV, but we have not used these here because of the limited number of flasks collected and greater scatter in the data.

AoA calculation
Stratospheric (mean) AoA for Medusa samples is calculated from N 2 O using an updated hemisphere-specific N 2 O-AoA relationship. Our method broadly follows Andrews et al. (2001), who assumed a bimodal age spectrum and used multiple observations of CO 2 binned by N 2 O values to resolve the seasonal cycle of CO 2 in each bin. Properties of the age spectrum for each N 2 O bin, including mean AoA, were constrained by optimizing the agreement between observed CO 2 concentrations and concentrations implied by randomly generated age spectra in each N 2 O bin. Andrews et al. (2001) used a highly efficient "genetic algorithm" to yield the most likely relationship between mean AoA and N 2 O in each bin. In contrast, we use a more computationally costly Markov chain Monte Carlo (MCMC) method that allows us to obtain more robust uncertainties for all estimated parameters of the age spectrum. Following Malinverno (2002) and Green (1995), our algorithm builds on a Metropolis-Hastings sampler (Metropolis et al., 1953;Hastings, 1970) to evaluate probability distributions for each age spectrum parameter and automatically chooses whether a unimodal or bimodal representation of the age spectrum is more appropriate in each N 2 O bin. Finally, we constructed a new tropical upper troposphere reference time series for CO 2 in this study to ensure maximum consistency between all observations used. Analytical and methodological uncertainties are propagated thoroughly and reported as the 95 % confidence interval around a mean. We consider the tropical upper troposphere to be the single entry point of air into the stratosphere for our AoA calculation. Mixing between tropospheric and lower stratospheric air can occur outside the tropics, leading to air masses that are transitional between stratospheric and tropospheric in the lowermost extratropical stratosphere (e.g., Škerlak et al., 2014). This additional pathway for tropospheric entry has been accounted for in previous studies via additional model information and/or multiple tracers (Ray et al., 1999;Hoor et al., 2004;Hauck et al., 2020). However, we are limited by the number of tracers available to us and therefore instead aim to largely exclude samples influenced through this additional pathway by using thresholds in N 2 O, O 3 , and H 2 O when identifying stratospheric samples. Because of this sample selection strategy, we note that the ages calculated here should not be considered representative of the region that air was sampled in.
Our upper troposphere reference time series (TRTS) consists of a long-term trend and a representation of the mean seasonal cycle. Because direct CO 2 observations in the region are limited, the long-term trend and a first guess of the mean seasonal cycle are estimated from monthly mean surface observations at the Mauna Loa station in Hawaii (MLO, 19 • N, 155 • W; Fig. 2a) (Keeling et al., 2001) which are later adjusted to match airborne observations in the tropics (20 • S-20 • N) above 8 km (Fig. 2c). The adjustment includes (i) a constant offset, (ii) a reduced amplitude of the seasonal cycle, and (iii) a phase lag of 1 month. As shown in Fig. 2, after these adjustments the TRTS matches the mean seasonal cycle and absolute value of the airborne data, thus accounting for known vertical gradients of CO 2 and reduced seasonality in the upper troposphere (Fig. 2d). The amplitude of the seasonal cycle in our time series is also in good agreement with the boundary condition used by Andrews et al. (1999Andrews et al. ( , 2001. The new TRTS is used to estimate the age spectrum in 13 N 2 O bins of 5 or 10 ppb width (320-325, ..., 290-295, 280-290, ..., 230-240 ppb) using a Markov chain Monte Carlo (MCMC) algorithm which compares observed CO 2 concentrations in the bin to concentration expected from the TRTS. To maximize data availability, we use high-frequency data (see Sect. 2.1) identified as stratospheric according to the criteria above from the 10 s merged products available for HIPPO, ORCAS, and ATom rather than data averaged to the lower Medusa sampling interval for this calibration exercise. Small corrections are applied to the observed CO 2 concentrations (< 0.2 ppm) to account for oxidation of CH 4 and CO in the stratosphere.
The MCMC algorithm considers random noise in the TRTS and uses a unimodal or bimodal inverse Gaussian shape of the age spectrum characterized by mean ages 1 and 2 and shape parameters λ 1 and λ 2 : where factor A determines the relative weight of each peak and a value of A = 1 yields a unimodal spectrum (Hall and Plumb, 1994;Andrews et al., 2001;. Though more complex representations of the age spectrum have recently been proposed, accounting for multiple modes and seasonal variations (e.g., Li et al., 2012;Hauck et al., 2019Hauck et al., , 2020Podglajen and Ploeger, 2019), these require additional information from multiple tracers or models and good seasonal data coverage that is not available here. In any case, we only rely on the first moment of the age distribution (i.e., the mean age) for calibrating the N 2 O-AoA relationship, and the mean age is not particularly sensitive to the assumed shape of the age spectrum (Andrews et al., 2001). By setting A = 1 for 50 % of all tested age spectra, the MCMC algorithm automatically selects whether a unimodal or bimodal representation of the age spectrum is optimal for matching observations. Although bimodal solutions with five instead of two free parameters will always be able to fit the data better, a larger number of parameters also decreases the likelihood of randomly selecting a combination of parameters that match the observations well because the fraction of the total parameter space region which yields good agreement with the observation decreases as more parameters are added (Malinverno, 2002). The algorithm is thus able to make an appropriate choice between unimodal and bimodal distributions from the data themselves, without any further a priori assumptions. The MCMC algorithm is set up separately with 2000 independent chains for each N 2 O bin to account for uncertainty in the TRTS and obtain best estimates of the mean AoA, i.e., the first moment of Eq. (3).
To simplify the algorithm, possible values of 1 , 2 , λ 1 , and λ 2 are repeatedly sampled from the same N 2 O bin-specific prior distributions. Details of the algorithm are presented in Appendix A. Finally, the resulting relationship between mean AoA and the N 2 O concentration of each bin is fit separately for each hemisphere by a quadratic polynomial, and the polynomial is evaluated at the N 2 O value of each Medusa sample to pair every observation of δ(Ar/N 2 ) with an AoA. Uncertainty in N 2 O and the polynomial fits are propagated by a Monte Carlo scheme. Overall, our method estimates the most likely mean AoA for each Medusa sample and improves upon previous methods by providing a framework for the treatment of uncertainty resulting from (i) analytical error, (ii) uncertainty in the shape of the age spectrum, and (iii) uncertainty in the composition of source gas introduced into the stratosphere.

TOMCAT/SLIMCAT model
TOMCAT/SLIMCAT (hereafter TOMCAT) is an offline 3-D chemical tracer transport model that has been used extensively for studies of stratospheric ozone depletion and circulation (e.g., Chipperfield, 2006;Chipperfield et al., 2017;Krol et al., 2018). For this study, TOMCAT was run over 31 years, from 1988 to 2018, with a timestep of 30 min at 2.8 • × 2.8 • horizontal resolution forced by the ERA-Interim reanalysis (Dee et al., 2011) at 60 vertical hybrid sigmapressure (σ -p) levels up to ∼ 60-65 km. The first 20 years (i.e., before the first flask observation) were treated as spinup. The TOMCAT AoA tracer is initialized at the surface and corrected to a value of AoA = 0 just below the tropical tropopause. Vertical motion was calculated from the divergence of the horizontal mass fluxes. Although this approach gives slightly younger stratospheric AoA than using isentropic levels and radiative heating rates, it allows a more detailed treatment of tropospheric transport (Chipperfield, 2006;Monge-Sanz et al., 2007). The model simulation was sampled at the times and locations of the Medusa flask observations to provide a direct comparison between the measurements and model.
Following the methodology of Belikov et al. (2019), we include an additional vertical flux term in the model representing the GS of gases in the atmosphere. The vertical flux f i (mol m −2 s −1 ) of tracer i due to molecular diffusion in Earth's gravitational field is given as (Banks and Kockarts, 1973;Belikov et al., 2019) where D i is the tracer-specific binary molecular diffusivity in air (m 2 s −1 ), N is the number density of air (mol m −3 ), C i is the mixing ratio, H i = RT gM i is the tracer-specific atmospheric equilibrium scale height (m), R is the fundamental gas constant (J K −1 mol −1 ), g is the gravitational constant (m s −2 ), M i is the tracer-specific atomic or molecular B. Birner et al.: Gravitational separation of Ar/N 2 and age of air in the lowermost stratosphere mass (kg mol −1 ), α i is the tracer-specific thermal diffusivity (m 2 s −1 ), and T is temperature (K). The three terms in Eq. (4) represent molecular diffusion driven by (i) vertical gradients in the mole fraction of i, (ii) pressure gradients caused by gravity and described by the barometric law, and (iii) temperature gradients (left to right). We neglect the first and third terms in the brackets in Eq. (4), leaving only the gravitational settling term 1 H i − 1 H air on the basis that both terms are orders of magnitude less important than the gravitational separation term under stratospheric conditions (see Appendix B) Belikov et al., 2019). No fluxes are allowed through the top boundary, and C i is held constant at the Earth's surface.
To simplify the numerical treatment, we simulate an idealized reference tracer of gravitational fractionation δ GST , which has a molecular mass 1 amu greater than air and diffusivity equal to that of Ar (see Appendices B and C). This tracer can be scaled offline to obtain the gravitational separation signal in any other species, including δ(Ar/N 2 ), e.g., The appropriate diffusivity values D Ar and D N 2 for Ar and N 2 in air are derived in Appendix B for a ternary mixture of Ar, O 2 , and N 2 , extending previous work (Ishidoya et al., 2013;Belikov et al., 2019). δ GST is divided by D GST in Eq. (5) and therefore does not depend on the exact value chosen for D GST .

NIES TM model
We compare our results to previous simulations of GS using the National Institute for Environmental Studies chemical transport model (NIES TM) published recently by Belikov et al. (2019). The NIES TM is a 3-D transport model of similar complexity to TOMCAT driven by the Japanese 25-year Reanalysis (JRA-25) with a hybrid sigma-isentropic (σ -θ ) vertical coordinate up to 5 hPa or ∼ 35 km. The model and GS results are described in detail in Belikov et al. (2013Belikov et al. ( , 2019, respectively.

N 2 O-AoA calibration
Our new relationships between N 2 O and mean AoA for the Northern Hemisphere and Southern Hemisphere (NH and SH) are well constrained by the observations and generally follow the mid-latitude NH calibration curve of Andrews et al. (2001) (Fig. 3d). At mean AoA > 2.5, the N 2 O-AoA relationship yields slightly younger ages in the NH than in the SH and compared to the previously published curve, suggesting there might be a latitudinal dependence of the relation-ship. Such a latitude dependence is, in fact, expected based on theory (Plumb, 2007). We expect curvature and a latitudinal difference in the N 2 O-AoA relationship because photolysis of N 2 O depends on latitude and altitude due to local sunlight availability. Furthermore, mixing of young and old air results in a mixture with an anomalously low N 2 O concentration for a given age. Since the SH, NH, and tropics feature different photolysis rates and show different degrees of mixing/isolation, different N 2 O-AoA relationships are expected. Unimodal age spectra are generally preferentially selected by the algorithm, in particular for the SH (Fig. 3a-c), but unimodal age spectra dominate the solution ensemble by a small margin except for AoA < 0.5 years (Fig. S3). Confidence intervals on age spectra parameters from bimodal spectra are considerably wider than for unimodal spectra. This implies that the parameters in a bimodal distribution are redundant and the shape of the age spectrum is not sufficiently constrained by the observations used. The width of the spectrum in each N 2 O bin varies widely within the prescribed bounds for most unimodal and bimodal age spectra (Fig. S3). It appears that not enough data are available from the airborne campaigns to determine the amplitude of the seasonal cycle with enough confidence to constrain the width of the age spectrum and distinguish the relative contribution of the old peak (influencing primarily the mean concentration difference to the troposphere) and the young peak (controlling the amplitude of seasonality) in bimodal age spectra. Additional observations with less scatter or information from different age tracers are needed to properly resolve the shape of the age spectra. Despite the limited resolution of the seasonal cycle, the observations are sufficient to place tight limits on the mean AoA in each N 2 O bin and yield a well-characterized relationship between N 2 O and mean AoA for each hemisphere.

The relationship between mean AoA and GS in models and observations
A comparison of the AoA-GS relationship with observations yields good agreement for the TOMCAT model results within the observational uncertainties (Fig. 4a), but the observations fall outside the range of GS predicted by the NIES TM (Belikov et al., 2019) for mean AoA > 1 year (Fig. 4b). For young samples with mean AoA < 3 years, GS of δ(Ar/N 2 ) increases by roughly 35-45 per meg per year of AoA in both TOMCAT and observations and converges to zero for the youngest samples. In the upper stratosphere, TOMCAT does not obtain any ages as old as observed by Ishidoya et al. (2008Ishidoya et al. ( , 2013Ishidoya et al. ( , 2018 and therefore cannot reproduce these observations directly. Changing the vertical coordinate system of TOMCAT or forcing the model with a different reanalysis product could improve agreement with the observations for old ages because TOMCAT in the configuration used here is known to slightly underestimate AoA in the upper stratosphere (Monge-Sanz et al., 2007; Chabril- A comparison of δ(Ar/N 2 ) and AoA between flask samples and TOMCAT is shown in Fig. S4. However, the direct comparison should not be considered a good test of model performance because our stratospheric composition criteria intentionally target unusually old air in the lowermost stratosphere. For the same reason, profiles of δ(Ar/N 2 ) and AoA are not evaluated here. Instead, we suggest the AoA-GS relationship provides a more robust metric for comparison. Overall, our observations validate the implementation of GS for young (< 3 years) ages in TOMCAT.
The relationship between mean AoA and GS differs between TOMCAT and the NIES TM (Fig. 5). The NIES TM shows weaker curvature in the relationship overall and produces larger declines in Ar/N 2 at ages less than 4.5 years compared to TOMCAT. In TOMCAT's mesosphere (which is at the limit of the domain covered by the ERA-Interim reanalyses forcing the model), AoA is near uniform, but GS continues to increase with increasing altitude, changing the relationship between mean AoA and GS in this region. The AoA-GS relationship in TOMCAT is very similar in all nontropical regions (outside 15 • > lat > −15 • ), whereas the cur-vature of the relationship is slightly stronger in the tropics. In contrast, the NIES TM has a clear dependence of the AoA-GS relationship on latitude. There is also some evidence in the observations of a dependence of the AoA-GS relationship on hemisphere. In the observations, δ(Ar/N 2 ) values appear to be slightly more negative in the SH than in the NH for the same age, particularly for AoA > 1.5 years. However, almost all SH samples with mean AoA older than 1.5 years come only from the ORCAS campaign. This campaign used a different inlet design than HIPPO and ATom, which could cause a small artifactual offset, but the scatter in our observations generally makes it difficult to separate signal from noise for small interhemispheric differences.

GS and AoA in TOMCAT
Annual mean δ(Ar/N 2 ) in TOMCAT follows the typical pattern of a tracer with a stratospheric sink (Chipperfield, 2006), as previously found in simulations using the NIES TM (Belikov et al., 2019) and the SOCRATES model (Ishidoya et al., 2013;Sugawara et al., 2018) (Fig. 6). δ(Ar/N 2 ) is zero in the troposphere and decreases with elevation. The most depleted δ(Ar/N 2 ) is observed at high latitudes where sinking air of the Brewer-Dobson circulation advects strongly frac- Figure 4. Comparison of age of air (AoA) and gravitational settling (GS) of δ(Ar/N 2 ) in models and observations. Panel (a) shows observations from the Northern Hemisphere (red) and Southern Hemisphere (blue) together with TOMCAT model output (SH: cyan; NH: yellow) selected from the closest model grid box in time and space. Because uncertainty for AoA is similar for all stratospheric samples but δ(Ar/N 2 ) uncertainty differs between campaigns, representative error bars (95 % confidence interval) are shown for HIPPO (triangles) and ATom/ORCAS (circles) at an arbitrary AoA. δ(Ar/N 2 ) is normalized to yield a delta value of zero in the equatorial free troposphere. In panel (b), observations from airborne campaigns (this study) and the balloon sampling system Sugawara et al., 2018) are plotted with the AoA-GS relationship observed in TOMCAT and the NIES TM (Belikov et al., 2019) as lines for each of the latitude bands shown in Fig. 5 and points for different altitude bins between 10 and 35 km. To yield an equivalent estimate of δ(Ar/N 2 ), δ( 13 CO 2 / 12 CO 2 ) results from the NIES TM (Belikov et al., 2019) have been rescaled according to Eq. (5) and offset by ∼ 10 per meg to account for the different tropospheric reference region in the definition of δ. tionated air downward. In the tropics, δ(Ar/N 2 ) values are considerably less fractionated at the same altitude due to upwelling of unfractionated tropospheric air. Vertical gradients in δ(Ar/N 2 ) generally increase with altitude and are largest in the mesosphere (> 50 km) because molecular diffusion increases with decreasing pressure and eventually dominates above the turbopause (not shown). There are strong seasonal changes in δ(Ar/N 2 ) depletion on the order of several thousand per meg, in particular in the high-latitude mesosphere, with the strongest fractionation occurring during the winter season (see movie available with online version of the paper).
The AoA tracer in TOMCAT shows a similar pattern to δ(Ar/N 2 ), with younger ages at low altitude and in the tropics and the oldest ages in the mesosphere. Vertical gradients in mean AoA are largest at high latitudes close to the tropopause. At high latitudes above 20 km and at low latitudes above 50 km, vertical gradients of AoA mostly disappear and AoA becomes nearly uniform.

Difference between TOMCAT and the NIES TM
We hypothesize that an adequate representation of the mesosphere in models is critical in determining the curvature of the AoA-GS relationship. The residence time of air above ∼ 40 km is rather short in TOMCAT, and AoA is nearly con-stant with altitude in this region. GS in contrast continues to increase with altitude because of the diffusivity dependence on pressure allowing gases to separate more effectively. The much lower top of the NIES TM (∼ 35 km vs. 60-65 km) reduces its ability to capture this effect, which impacts the AoA-GS relationship because the mesospheric signal is exported into the stratosphere, in particular in polar regions where mesospheric air descends. TOMCAT furthermore produces a less negative slope of the AoA-GS relationship for young air (mean AoA < 3 years) and greater similarity in the AoA-GS relationship between latitude bands than the NIES TM as shown in Fig. 5. This could in part be a consequence of using a different meteorological reanalysis product for forcing the two models (Chabrillat et al., 2018) or could indicate differences between the models in vertical and horizontal mixing intensity.

Estimating mean AoA from observed δ(Ar/N 2 )
Different tracers of AoA all have unique strengths and weaknesses. Estimating AoA in the lowermost stratosphere from CO 2 for example is limited by complexities involving seasonality and the possibility of multiple entry points of air into the stratosphere due to isentropic mixing with the midlatitude troposphere (Hall and Plumb, 1994;Andrews et al., 2001;Waugh and Hall, 2002). SF 6 -derived AoA in contrast is biased high at mid and high latitudes due to the influence of mesospheric air which has been photochemically depleted Figure 5. Comparison of the annual and zonal mean age of air (AoA) relationship with gravitational settling simulated in TOM-CAT and the NIES TM. The relationship is plotted as lines for the latitude bands indicated by marker color. For the NIES TM, markers show the vertical profile using all grid boxes available between ∼ 10 and 35 km. For TOMCAT, markers instead correspond to binned altitude bands between 10 and 35 km with a spacing of 2.5 km because of the finer vertical resolution of the model. To yield an equivalent estimate of δ(Ar/N 2 ), δ( 13 CO 2 / 12 CO 2 ) results from the NIES TM (Belikov et al., 2019) have been rescaled according to Eq. (5) and offset by ∼ 10 per meg to account for the different tropospheric reference region in the definition of δ.
in SF 6 (Kovács et al., 2017;Linz et al., 2017). N 2 O as an AoA tracer relies on the photolytic destruction of N 2 O in the stratosphere, which may depend strongly on location and had only been empirically calibrated for young ages at mid latitudes so far (Andrews et al., 2001;Linz et al., 2017).
Thanks to the robust relationship between mean AoA and δ(Ar/N 2 ) and the small seasonal cycle amplitude of δ(Ar/N 2 ) < 5 per meg in the upper troposphere (Fig. S1), mean AoA could also be estimated from δ(Ar/N 2 ). Using the current analytical δ(Ar/N 2 ) uncertainty of 12.2 per meg (2σ ) and the AoA-δ(Ar/N 2 ) relationship seen in TOMCAT (including variability from seasonal and latitudinal differences), we estimate that mean AoA could be calculated to within about ±0.4 years (2σ , Fig. S5). This is still considerably worse than the ±0.1-year confidence interval in mean AoA estimated from N 2 O. However, the uncertainty can be improved with future improvements in sampling, and the accuracy and precision of the measurements.
The heavy noble gases krypton and xenon will be roughly 3.6× and 5.8× more strongly fractionated in the stratosphere than Ar but are also more challenging to measure due to their ∼ 8000× and 100 000× lower abundance in the atmosphere. Nevertheless, future analysis of these gases in stratospheric air samples might further improve our ability to estimate AoA from the gravitational fractionation of gases and help diagnose artifactual fractionation, because heavier noble gases are more strongly fractionated under the influence of gravity and less sensitive to thermal fractionation (Seltzer et al., 2019).

Future directions
An open question in climate applications of noble gases is whether there could be a stratospheric influence on tropospheric δ(Ar/N 2 ). Tropospheric observations of δ(Ar/N 2 ) and other noble gas-elemental ratios have been used to infer ocean heat content changes by capitalizing on the temperature dependency of gas solubility in the oceans (Keeling et al., 2004;Headly and Severinghaus, 2007;Ritz et al., 2011;Bereiter et al., 2018). However, long-term trends (Butchart, 2014) and natural variability in stratospheric circulation and stratosphere-troposphere exchange (STE) such as the Quasi-Biennial Oscillation (QBO) (Baldwin et al., 2001) could advect a stratospheric GS signal into the troposphere and alias onto surface observations of these gases. To this end, we calculate the volumetric Ar deficit in the atmosphere in moles using TOMCAT (Fig. 7) relative to a hypothetical, unfractionated atmosphere with a homogenous mixing ratio (C o Ar ) at all elevations. Ar deficit is where C Ar is the simulated Ar mixing ratio and N air is the number density of air. The Ar deficit is concentrated in the lower stratosphere at around 20 km and at mid to high latitudes. Although the GS signal is considerably stronger in the mesosphere, the potential for perturbing the troposphere is low given the low molar density of air in the mesosphere. The region with the greatest potential to influence the troposphere therefore lies in the lower stratosphere. The total Ar deficit of the atmosphere above 200 hPa is approximately −2.9 × 10 13 mol and the atmosphere below 200 hPa contains roughly 1.3 × 10 18 mol of Ar. Perturbing STE and/or the stratospheric circulation by 10 %, consistent with interannual to decadal variability of STE in models (Salby and Callaghan, 2006;Flury et al., 2013;Ray et al., 2014;Montzka et al., 2018), thus should lead to a detectable signal of roughly 2.2 per meg (−2.9 × 10 13 /1.3×10 18 ×10 %×10 6 ) in tropospheric δ(Ar/N 2 ). Corresponding advection of stratospheric GS signals in N 2 amplifies the pure Ar signal by roughly 10 %. A careful investigation of such a signal in the δ(Ar/N 2 ) surface network data (Keeling et al., 2004) is needed because secular trends of δ(Ar/N 2 ) caused by degassing of Ar and N 2 from a warming ocean are also expected to be on the order of 2-3 per meg per decade. Previous studies have also used ratios involving heavier noble gases (Xe/N 2 , Kr/N 2 ) to reconstruct mean ocean temperature changes over glacial-interglacial timescales (Headly and Severinghaus, 2007;Bereiter et al., 2018;Baggenstos et al., 2019). Simultaneous changes in stratospheric circulation and STE affect heavier noble gases  more strongly than Ar/N 2 and will need to be accounted for in such applications of noble gas thermometry.

Conclusions
With improvements in data treatment, measurement quality, and modeling constraints, we have shown that gravitational fractionation of Ar relative to N 2 in the stratosphere and mesosphere is a potentially powerful constraint on circulation. High-precision observations of δ(Ar/N 2 ) in air samples of the lowermost stratosphere from nine airborne campaigns are well captured by the TOMCAT/SLIMCAT 3-D chemical transport model, which has been updated to account for the influence of gravity on air composition through molecular diffusion. In the observations and the model, δ(Ar/N 2 ) is directly related to stratospheric mean age of air (AoA) derived here using a new calibration of N 2 O. Our observations for mean AoA < 3 years produce a slope of roughly 35-45 per meg of δ(Ar/N 2 ) per year of AoA. TOMCAT/SLIMCAT shows better agreement with the new observations than the NIES transport model (Belikov et al., 2019), and we speculate that the model disagreement could be explained by (i) the factor of 2 lower top of the NIES transport model, (ii) the use of different reanalysis products, and/or (iii) differences in vertical and horizontal mixing. In this context, further work is also needed to study the influence of unresolved turbulence on AoA and δ(Ar/N 2 ) in chemical transport models.
As the importance of stratospheric circulation for ozone recovery, climate projections, and evaluation of tropospheric trends in halocarbons is increasingly recognized, a need for new observations from the still undersampled stratosphere is becoming evident. Combining δ(Ar/N 2 ) with other tracers of circulation could lead to new insights into atmospheric mixing and transport. δ(Ar/N 2 ) has potential advantages over existing approaches based on transient tracers such as CO 2 or N 2 O since δ(Ar/N 2 ) is only influenced by the physics of transport and mostly unaffected by seasonality in the tropical upper troposphere. Furthermore, because of its large gradients at high altitudes, δ(Ar/N 2 ) observations from the upper stratosphere and mesosphere could improve our understanding of circulation on seasonal and interannual timescales in a region where AoA becomes near uniform, and some photochemical age tracers such as N 2 O are almost fully depleted and thus lose sensitivity to circulation changes.

Appendix A: Description of the Markov chain Monte Carlo algorithm
The following list outlines key steps in our MCMC algorithm to calculate AoA from CO 2 for each N 2 O bin.

Start a new Markov chain and allow for uncertainty in
TRTS by adding white noise with an amplitude given by the scatter of upper tropospheric observations around the mean TRTS in Fig. 2.
2. With a 50 % chance, peak weighting factor A is set to zero and a unimodal spectrum is tested (k = 1). Alternatively, A is allowed to vary between 0 and 1 for a bimodal distribution (k = 2).
3. The other age spectrum parameters are selected from N 2 O bin-dependent prior distributions: 1 is sampled from a uniform prior distribution with a mean AoA predicted by the N 2 O-AoA relationship of Andrews et al. (2001) and with generous width (> 1.5 years). 2 is sampled from a uniform prior distribution with values between 1 and 6 years representing older stratospheric air. The shape parameters are defined as λ i ≡ 2 i 2γ i , and γ i is chosen randomly for each peak with values between 0.1 and 1, as observed in previous studies (Hall and Plumb, 1994;Andrews et al., 2001;Waugh and Hall, 2002).
4. Convolve the age spectrum calculated from the parameters (m ≡ [ 1 , 2 , λ 1 , λ 2 , A]) with the perturbed TRTS to obtain a possible time series of CO 2 in the N 2 O bin and calculate the misfit (e) between the CO 2 time series and observations. 5. Calculate the likelihood function P (d|k, m) for the set of parameters m and k, given a total of n observations (d): whereĈ e is the covariance matrix. We assume that all n observations are independent. Thus,Ĉ e has only diagonal entries of σ 2 CO 2 and det(Ĉ e ) simplifies to (σ 2 CO 2 ) n . The value of σ 2 CO 2 is different for each bin and determined iteratively as the approximate root mean square error of the observations around the final time series for each bin obtained at the end of the MCMC algorithm. Typical values of σ 2 CO 2 are between 0.18 and 1.28 ppm and generally decrease with increasing AoA of a N 2 O bin.
6. If this is the first pass of the chain, define k saved and m saved to equal k and m. Otherwise, calculate the selection criterion α ≡ min 1, P (d|k,m) P (d|k saved ,m saved ) and accept k and m as new saved values (k saved m saved ) with probability α. Sampling from the same prior distributions on each pass of the chain simplifies our expression of α compared to that presented by Malinverno (2002), making it only dependent on the likelihood ratio.
7. Repeat steps 2-6 1000 times, sampling parameter values from the same prior distributions, and store the final values of k saved and m saved obtained after the 1000th iteration (i.e., a plausible solution produced past the burnin period) for later use.
8. To sample the full posterior pdf (i.e., the full uncertainty about the age spectrum parameters), initialize 2000 different Markov chains by repeating steps 1-7.
Each stored value of m characterizes one age spectrum that is likely not far from the best solution, given the data d, yielding an ensemble of 2000 age spectra from which statistics can be computed. Note that each Markov chain is fully independent, so the algorithm can be easily parallelized to minimize computational costs.

Appendix B: Derivation of Eq. (4) from the Maxwell-Stefan equations
We start by approximating air as a ternary mixture of N 2 , O 2 , and Ar and later generalize to consider additional trace species. According to the Maxwell-Stefan equations (Taylor and Krishna, 1993), diffusion in this ternary mixture is governed by where C i ≡ n i /N is the mole fraction, n i is molar or number density (mol m −3 ), N is the total number density (mol m −3 ), f i is the molecular diffusion flux (mol m 2 s −1 ) relative to the molar average velocity of the mixture, d i is the thermodynamic driving force for molecular diffusion (m −1 ), and D i:j is the binary diffusion coefficient of the  Table B1. Molecular diffusion volumes (Reid et al., 1987) and masses used in this study. where P is pressure (atm), ν i is the molecular diffusion volume of a trace gas or air (Table B1), and M i is the molecular mass (g mol −1 ) of a gas.
For an ideal gas, d i is given by where ω i ≡ ρ i ρ = M i C i M air is the mass fraction of gas i, ρ i is density of i (kg m −3 ), P is pressure (Pa), T is temperature (K), k T i is the thermal-diffusion ratio of i, M i is the molecular mass of i (kg mol −1 ), and F i the external body force per mole (N mol −1 ) for i (Chapman et al., 1990;Taylor and Krishna, 1993). In the atmosphere, (vertical) pressure gradients are caused by gravity and well approximated by hydrostatic balance The gravitational force per mole F i is Substituting Eqs. (B7) and (B8) into Eq. (B6) yields where we use the definition of the scale height H i ≡ RT gM i . The two terms involving the body force cancel because all species experience the same gravitational force per unit mass. The tendency for gravimetric separation instead arises from the pressure gradient term proportional to 1 H i − 1 H air . Equations (B1) and (B2) can be inverted to solve for f N 2 and f Ar (Taylor and Krishna, 1993): where Here D air N 2 and D air Ar are the effective diffusivities of N 2 and Ar in air, while D air N 2 ×(Ar,O 2 ) and D air Ar×(N 2 ,O 2 ) reflect ternary cross-interactions, such as the tendency of N 2 to be impacted by any process that drives a diffusive flux of Ar.
In air, Ar is a minor gas (C Ar C N 2 ∼ C O 2 ), and therefore interactions of N 2 with Ar can be neglected in the N 2 flux and the diffusive fluxes of N 2 and O 2 must balance approximately, as in the case of a binary mixture of the two gases: Combining Eqs. (B10) and (B17) with Eq. (B9) yields where we have replaced the thermal diffusion ratio k T i with the better empirically constrained thermal diffusion factor ∼ 9.8 × 10 13 ∼ 1.6 × 10 −11 ∼ 3.2 × 10 −10 m s −1 B20 a 1.8 × 10 −2 observed at 293 K (Waldmann, 1947). b Assuming C Ar ≈ N Ar N N 2 . c Estimated from TOMCAT results.
α T i ≡ k T i C i C j defined as such only in binary mixtures. Therefore, k T N 2 ≈ α T N 2 :O2 C N 2 C O 2 and k T Ar ≈ α T Ar:air C Ar . Table B2 presents rough estimates of the magnitudes of the terms in Eqs. (B19) and (B20) in the stratosphere, showing that the thermal diffusion and cross-diffusion terms involving D air Ar×(N 2 ,O 2 ) are at least 2 orders of magnitude smaller than the remaining terms. Neglecting these smaller terms yields the governing Eq. (4) used in our model simulation: where we introduced M i ≡ M i − M air , the molecular mass difference to air (kg mol −1 ), as a shorthand. Equation (B21) is equally valid for trace gases such as Ar and major gases N 2 and O 2 when the appropriate diffusivities given by Eqs. (B12) and (B14) are used. In our case D N 2 ≡ D air N 2 = D air O 2 ≈ D N 2 :O 2 for N 2 and D Ar ≡ D air Ar ≈ D Ar:O 2 D N 2 :Ar C N 2 D Ar:O 2 +C O 2 D N 2 :Ar for Ar.
Appendix C: Calculating δ(Ar/N 2 ) from δ(GST) The conservation equation of gas i with mixing ratio C i accounting for advection (first term RHS), eddy mixing (second term RHS), and molecular diffusion (third term RHS, using the simplified Eq. B21) is given by where u is the velocity vector (m s −1 ), D e is the eddy diffusivity tensor (m 2 s −1 ), D i is the molecular diffusivity of species i in air (i.e., D air i m 2 s −1 ), g is the gravitational acceleration, R is the fundamental gas constant (J K −1 mol −1 ), and T is temperature (K). "∇·" represents the divergence operator. u, D e , D i , N , and T depend on x, y, z, and t and the largest gradients for these variables generally occur in the vertical direction.
Dividing Eq. (C1) by a reference value C i,0 , separating C i into a constant and a perturbation component (i.e., C i C i,0 = 1 + C i C i,0 = 1 + δ i ), and using the chain rule yields To simplify Eq. (C2), we assume that the perturbations in the mixing ratio are small (i.e., δ i 1), which is true at the 1 % level for δ(Ar/N 2 ) even in the mesosphere. Furthermore, we assume that U + D e L D i H i (U is a characteristic velocity scale, D e is a characteristic eddy diffusivity, and L is a characteristic length scale; H i = RT M i g ) or, equivalently stated in terms of the Peclet number, Pe ∼ U H i +D e H i L D i 1. We estimate typical values of the vertical Peclet number in the stratosphere to be between 1000 and 10 000 based on the height (∼ 20 km) and turnover time (∼ 5 years) of the stratosphere and the range of molecular diffusivities given in Table  B1. In the mesosphere, AoA is near uniform, and we conservatively assume a turnover time of 0.5 years and a height of 20 km. This implies a Peclet number of ∼ 500 for the mesosphere. Under these conditions, the third term on the RHS can be eliminated because it is O(Pe) smaller than the first and second terms and because it is O(δ i ) smaller than the fourth term. Thus, we obtain where the last term approximates the vertical divergence of the gravitational flux. At the top of the atmosphere and Earth's surface, the gravitational flux abruptly vanishes, and its divergence becomes large. If its divergence is small in the interior, Eq. (C3) can be conceptually interpreted as an advection-diffusion problem with large apparent sources or sinks at the bottom and top boundaries due to diffusive flux divergence. The steady-state solution to Eq. (C3) for a tracer δ i with no additional sources and sinks (other than the apparent sources from diffusive flux divergence) yields the steadystate profile for species i. This solution can be scaled to yield the steady-state solution for any other tracer δ j as as can easily be validated by solving Eq. (C4) for δ i and substitution into Eq. (C3). Note that D j M j D i M i does not depend on x, y, and z.
Furthermore, ratios of passive tracers k and l can be calculated directly from δ i by recognizing that δ k/ l ≈ δ k −δ l for δ k + δ l 1. Hence, we obtain the general version of Eq. (5) in the main text: