Twenty years of ground-based NDACC FTIR spectrometry at Izaña Observatory-overview and long-term comparison to other techniques

High-resolution Fourier Transform InfraRed (FTIR) solar observations are particularly relevant for climate studies, as they allow atmospheric gaseous composition and multiple climate processes to be monitored in detail. In this context, the present paper provides an overview of 20 years of FTIR measurements taken in the framework of the NDACC (Network for the Detection of Atmospheric Composition Change) from 1999 to 2018 at the subtropical Izaña Observatory (IZO, Spain). Firstly, long-term instrumental performance is comprehensively assessed, corroborating the temporal stability and reliable instrumental 5 characterisation of the two FTIR spectrometers installed at IZO since 1999. Then, the time series of all trace gases contributing to NDACC at IZO are presented (i.e. C2H6, CH4, ClONO2, CO, HCl, HCN, H2CO, HF, HNO3, N2O, NO2, NO, O3, OCS, and water vapour isotopologues H 2 O, H 18 2 O, and HD O), reviewing the major accomplishments drawn from these observations. In order to examine the quality and long-term consistency of the IZO FTIR observations, a comparison of those NDACC products for which other high-quality measurement techniques are available at IZO has been performed (i.e. CH4, CO, H2O, 10 NO2, N2O, and O3). This quality assessment was carried out on different timescales to examine what temporal signals are captured by the FTIR records, and to what extent. After 20 years of operation, the IZO NDACC FTIR observations have been found to be very consistent and reliable over time, demonstrating great potential for climate research. Long-term NDACC FTIR data sets, such as IZO, are indispensable tools for the investigation of atmospheric composition trends, multi-year phenomena

and complex climate feedback processes, as well as for the validation of past and present space-based missions and chemistry climate models.

Introduction
The recognition that changes in the composition of the Earth's atmosphere are occurring, on both long and short timescales and thereby modifying our environment and climate, has resulted in scientific debate, as well as public concern in the last decades (Gottwald et al., 2006). Established examples, such as depletion of ozone layer, warming of air and oceans, rising sea level 20 or melting cryosphere, have widely been reported in literature (WMO, 2018;Masson-Delmotte et al., 2021, and references therein). In order to assess the significance of such changes and to better understand the physical and chemical processes involved, continuous, consistent, long-term monitoring of the atmospheric composition is indispensable. These observational data sets are also fundamental to testing the ability of current climate models to provide reliable projections of future climate, and thus, they are the basis for design and implementation of efficient climate-change mitigation and adaptation policies. 25 Among different atmospheric monitoring measurement techniques, Fourier Transform InfraRed (FTIR) spectrometry is of particular interest for climate research. With this technique, the source radiation (typically the sun for atmospheric groundbased measurements) is modulated by an interferometer and all optical frequencies are recorded simultaneously in the measured interferogram (Griffiths and de Haseth, 2007). Then, a mathematical Fourier transform is used to retrieve the atmospheric absorption spectrum from the interferogram. By analysing the pressure broadening effect on these measured solar spectra through 30 inversion schemes, the FTIR technique can provide atmospheric concentrations of many different trace gases simultaneously (e.g. Hase et al., 2004;Schneider et al., 2005;Wunch et al., 2011;Schneider et al., 2012;Kohlhepp et al., 2012;García et al., 2012;Sepúlveda et al., 2014;Barthlott et al., 2015;Wunch et al., 2015;Vigouroux et al., 2018;De Mazière et al., 2018).
The first continuous or semi-continuous records of ground-based FTIR spectrometers started in the late 1970s and early 1980s in just a few stations around the world. Nowadays, high-resolution FTIR instruments mainly operate at a global scale 35 in the framework of two international networks for atmospheric composition monitoring: NDACC (Network for the Detection of Atmospheric Composition Change, https://www.ndaccdemo.org) and TCCON (Total Carbon Column Observing Network, https://tccon-wiki.caltech.edu). While NDACC aims mainly to establish a long-term database to detect changes and trends in atmospheric composition and to understand their impact on the Earth's atmosphere (De Mazière et al., 2018), TCCON focuses more on research on greenhouse gases, improving our understanding of the carbon cycle and providing reference 40 validation data sets for climate models and space-based observations (Wunch et al., 2011). Recently, these high-resolution FTIR observations have been extended by COCCON (COllaborative Carbon Column Observing Network, Frey et al. (2019)), a research infrastructure of portable, compact, low-resolution ground-based FTIR instruments set up as a supplement to TCCON.
Given its strategic location, one of the most relevant ground-based FTIR stations is Izaña Observatory (IZO), where FTIR observations have been carried out since 1999 coincidentally with other high-quality atmospheric measurements (Cuevas et al.,45 2019). IZO is located in the subtropical belt (∼ 30ºN), in the descending branch of the Northern Hadley atmospheric circulation cell and within the so-called subtropical transport barrier (Schneider et al., 2005, and references therein). This area, the transition between tropics and mid-latitudes, plays a crucial role in the chemical and dynamical transport processes in the atmosphere and is a direct tracer of climate change. Recent studies have demonstrated, for example, that the tropical belt has expanded over the past few decades, meaning that the descending limb of the Hadley cells is shifting towards the poles in 50 both hemispheres (Heffernan, 2016, and references therein). This poleward movement of large-scale atmospheric circulation systems, such as storm tracks and jet streams, and their associated subtropical dry zones, may lead to profound changes in the global climate system, affecting natural ecosystems, biodiversity and water resources (Seidel et al., 2008). Together with the so-called tropical bloating, climate models predict a speed-up in the stratospheric Brewer-Dobson circulation in response to current global warming, boosting an ozone recovery in the extratropics at the expense of a delay in the tropics and subtropics 55 (Hegglin and Shepherd, 2009;WMO, 2018;Masson-Delmotte et al., 2021, and references therein). Nevertheless, these complex phenomena and their implications on the Earth's climate system and, in particular, on tropical and subtropical regions, are still poorly understood (Seidel et al., 2008;Heffernan, 2016). Unfortunately, due mainly to geographical and political factors, these areas suffer from a great lack of observations that allow their atmospheric structure and composition to be comprehensively investigated. Hence, the high-quality long-term FTIR measurements acquired at IZO provide excellent potential for 60 climate research.
In this context, the present paper gives an overview of the FTIR measurement programme at IZO, going over its history during its first 20 years of operation (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) and its current status, as well as exploring its great value for long-term climate research. Although the IZO FTIR station currently operates in the framework of NDACC, TCCON, and COCCON, this review work focuses on NDACC FTIR activities throughout the entire 20-year period. For this purpose, the current paper has FTIR solar spectra are only recorded when the line of sight between instrument and the sun is cloud-free. Given the strategic location of IZO, these conditions are very common with an average of 180 days a year of clear days (Cuevas et al., 2019).
Thus, FTIR solar measurements at IZO are typically taken two or three times a week (with about two spectra a day for each 110 NDACC optical filter). Although the maximum number of measurement days is concentrated in the warmest months, as shown in Figure 2, the monthly distribution of sampling days over a year is quite uniform. The total number of NDACC measurement days amounts to 2056 for the 1999-2018 period, with an annual average of ∼100 measurement days a year.
An overview of the history of the FTIR instruments at IZO is given in Table 1. As previously mentioned, FTIR measurements started in 1999 with the installation of a Bruker IFS 120M. This spectrometer was replaced in 2005 by a more sophisticated 115 model, a Bruker IFS 120/5HR, which continues operating today. During March and April 2005 both instruments measured side-by-side, which allows the consistency of the spectrometers to be documented Sepúlveda et al., 2012).
During the entire operation of the two FTIR systems, they have been placed in air-conditioned scientific containers (Figure 1 (c)), and been operated in ventilated mode (i.e. the spectrometer not evacuated) due to the especially dry conditions at IZO. The IZO FTIR instruments have been very stable, especially the IFS 120/5HR system, and only two optic re-alignments 120 have been required during the first 20 years of operation (in June 2008 and February 2013). Apart from that, the most relevant interventions have been the replacement of the internal reference laser used for controlling the sampling of the interferogram in 2016 and 2017, due to frequency instabilities, and the solar tracker upgrade in 2012, when the quadrant-diode set-up was replaced by the CamTracker system (Gisi et al., 2011). By evaluating the image of the sun on the FTIR's entrance fieldstop acquired by a digital camera, the CamTracker system significantly improves the traditional tracking accuracies (at better than 125 10 arc seconds), and minimises FTIR pointing errors. In addition, some minor instrumental issues have occurred during these 20 years, causing short data gaps (see Table 1). For further details about the solar FTIR measurements at IZO, refer to Schneider et al. (2005), Sepúlveda et al. (2012), and García et al. (2012).

Atmospheric Remote Sensing Retrieval Principles
By evaluating spectral signatures of vibrational-rotational transitions contained in the solar absorption spectra measured, the 130 FTIR technique allows total column amounts and low-resolution vertical profiles of different atmospheric trace gases to be retrieved with a high degree of precision. For this purpose, refined FTIR retrieval strategies and inversion principles are used, based on the formalism given by Rodgers (2000). In summary, in the inversion procedure, the measurement (solar absorption spectrum) is assembled into a measurement vector y, while the unknowns are described by a state vector x and a parameter vector p, which define the state of the atmosphere and the auxiliary and instrumental parameters, respectively. These magni-135 tudes are connected by a forward model F that describes the physics of the measurement process (interaction of solar radiation with the atmosphere): This is an ill-posed problem, i.e., there are many different atmospheric states (x) that produce almost identical spectrum (y).
To overcome this, the solution state is constrained by setting up a cost function: The first term is a measure for the difference between the measured spectrum (y) and that simulated for a given atmospheric state (x), taking into account the part of the measurement signal which is not explained by the forward model assuming the state x and parameter values p (S y is the covariance matrix of y − F(x, p)). The second term is the regularisation term. It constrains the atmospheric solution state (x) towards an a priori most likely state (x a ), whereby the kind and strength of the Due to the nonlinear behaviour of F(x, p), the cost function, Eq.
(2), is minimised iteratively by numerical methods. For the (i + 1)th iteration it is:

150
where K is the Jacobian matrix (derivatives that capture how the measurement vector y will change for changes in the atmospheric state x), and G is the gain matrix (derivatives that capture how the retrieved state vectorx will change for changes in the measurement vector y).
Because the vertical resolution of a remote sensing FTIR instrument is limited, a proper description of the relation between retrieved and actual state must be provided. This information is theoretically characterised by the averaging kernel matrix (A), 155 which is calculated as A=KG and samples the derivatives that capture changes in the retrieved statex for changes in the actual atmospheric state x. A links the retrieved and true state as follows: Therefore, A describes the smoothing of the real atmospheric distribution due to the use of a constrained retrieval, and thus, vertical resolution and sensitivity that can be achieved by a remote sensing FTIR system. While the columns of A provide the 160 response of the retrieved profile to a perturbation in the state vector, the rows of A describe the altitude regions that mainly contribute to the retrieved profile and therefore the vertical distribution of the FTIR sensitivity. As a measure of the total sensitivity, the trace of A (also so-called "Degrees Of Freedom for Signal", DOFS) gives the number of independent layers discernible by the remote sensing instrument.
Rewriting Eq.(4) and considering potential errors, the retrieved statex can be linearised about a reference profile x a (the a 165 priori profile), the estimated model parametersp, and the measurement noise as: where K p represents the model parameter Jacobian matrix (i.e. the sensitivity matrix to model parameters). Eq.(5) will be the basis for the analytic error estimation of the retrieved NDACC products, where the first term corresponds to the smoothing error associated with the limited vertical sensitivity of the FTIR instrument, the second term accounts for errors due to uncertainties 170 in the input/model parameters, and the third term provides the measurement noise. An extensive treatment of the atmospheric remote sensing retrieval principles is given in Rodgers (2000).

Retrieval Strategies
At IZO the FTIR programme routinely contributes to NDACC with TCs and VMR profiles of ethane (C 2 H 6 ), methane (CH 4 ), chlorine nitrate (ClONO 2 ), carbon monoxide (CO), hydrogen chloride (HCl), hydrogen cyanide (HCN), hydrogen fluoride 175 (HF), nitric acid (NO 3 ), nitrous oxide (N 2 O), and ozone (O 3 ) (so-called "standard" products hereafter). All these compounds are retrieved with the non-linear least-squares fitting algorithm PROFFIT (PROFile FIT, Hase et al., 2004), considering the spectral regions and interfering gases given in Table 2. The inversion procedure is solved using a first order Tikhonov-Phillips regularisation (L1, Rodgers, 2000) for all NDACC products, with exception of ClONO 2 which is obtained using a scaling cross sections or pseudolines are used (Birk and Wagner, 2000;Harrison et al., 2010). Finally, the NCEP (National Centres for Environmental Prediction) 12.00 UT daily temperature and pressure profiles are used for the forward simulations.
All of these settings are based on NDACC-IRWG recommendations (Infrared Working Group, IRWG, 2014) with small modifications. Most relevant changes are those related to CH 4 , for which the spectral micro-windows are adopted from Sepúlveda et al. (2014), and the spectroscopy parameters correspond to the improved linelist provided by Dubravica et al. Table 2. Summary of the spectral regions and interfering gases considered for standard and non-standard NDACC products at IZO. For details about specific retrieval strategies for the non-standard products refer to Vigouroux et al. (2018) for H2CO, to  and Barthlott et al. (2017) for water vapour isotopologues, to Hase (2000) and Rinsland et al. (2003) for NO and NO2, and to Lejeune et al. (2016) for OCS. tions and advantageous for reproducing the tropospheric CH 4 signals (Hase and Sepúlveda, 2011;Sepúlveda et al., 2012Sepúlveda et al., , 2014. Another minor modification affects the absorption lines used for O 3 retrievals, which are a simplification of the refined set-up presented by Schneider et al. (2008b). This strategy has been found to provide more precise O 3 estimations than those retrieved from the traditional NDACC approach (1000-1005 cm −1 broad micro-window) when comparing to independent measurements 195 (Schneider et al., 2008a, b;García et al., 2021).
In addition to the standard NDACC products, the IZO FTIR programme also contributes to this database with other trace gases (not required by the network, and so-called "non-standard" products hereafter): nitrogen dioxide (NO 2 ), nitrogen oxide (NO), carbonyl sulphide (OCS), formaldehyde (H 2 CO), and water vapour isotopologues (H 16 2 O, H 18 2 O and HD 16 O). These non-standard NDACC gases are also retrieved with the PROFFIT code, using the settings and references listed in Table 2.

200
Water vapour isotopologue observations have been centrally retrieved and quality-filtered in the framework of the MUSICA project (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water, Schneider et al., 2012Barthlott et al., 2017). Note that, for the water vapour products, the δ-notation is used to express the relation of the observed isotopologue ratio to the standard ratio VSMOW ( Barthlott et al. 205 (2017) for details about this MUSICA product).
All FTIR products presented here correspond to those publicly available from the NDACC archive (www.ndaccdemo.org).
MUSICA water vapour isotopologues are also available at the NASA LaRC Airborne Science Data for Atmospheric Composition (www-air.larc.nasa.gov). The only quality filter applied on public FTIR products is that observations taken at high solar zenith angles (≥85º) have been excluded to avoid imprecise retrievals (mainly caused by misalignments of the solar tracker or 210 spectroscopic issues). These data represent less than 1% of the total data set.

Product Characterisation: Vertical Sensitivity and Uncertainty Budget
The vertical sensitivity of an FTIR system changes significantly from gas to gas, since it depends on the target gas considered, geometry of observation and instrumental issues (e.g. the signal-to-noise ratio) or retrieval strategy. This fact can clearly be observed in Table 3, which summarises the DOFS' statistics for all NDACC products. It can be seen that the FTIR vertical 215 sensitivity ranges from roughly resolving four independent layers of O 3 vertical distribution (mean total DOFS of 4.12) to only retrieving information about the TCs of H 2 CO, NO 2 and ClONO 2 (recall that these gases are retrieved using a scaling retrieval, thereby the total DOFS is theoretically equal to unity). Between two and three atmospheric layers are discernible for CH 4 , CO, HCl, HCN, HNO 3 , H 16 2 O, N 2 O, and OCS, while for C 2 H 6 , δD, HF, and NO, the sensitivity is limited to one or two layers. These total DOFS values mean that the vertical resolution amounts to roughly 10 km (from the ground up to the 220 middle stratosphere), except for H 2 O isotopologues. For the latter, the sensitivity is mainly confined to the troposphere and the vertical resolution ranges from 2-3 km in the lower troposphere and up to ∼8 km in the upper troposphere, as shown in Figure   3. This figure depicts the rows of the averaging kernel matrix A for typical measurement conditions at IZO for all NDACC products, where the A rows of the layers that are well-resolved by the FTIR instrument are identified (and highlighted in coloured lines). For example, for O 3 , the four resolvable layers are the troposphere (∼5 km), tropopause region (∼18 km), the 225 Table 3. Overview of standard and non-standard NDACC products: number of measured spectra (N), mean (M) and standard deviation (σ) of the total DOFS, M and σ of the statistical uncertainty (Sta. Unc., in %), and M and σ of the systematic uncertainty (Sys. Unc., in %). Note that H 16 2 O values correspond to the simple MUSICA water vapour product (refer to Section 4.3 of Barthlott et al. (2017)), while δD products are taken from the quasi optimal estimation of {H2O, δD}-pair data (refer to Section 4.4 of Barthlott et al. (2017)). (*) refers to those trace gases evaluated using a scaling retrieval. lower stratosphere (∼28 km) and the middle stratosphere (∼39 km), while for trace gases with a total DOFs of about two, such as CH 4 or HCN, the FTIR system basically distinguishes between signals from the troposphere (∼5 km) and the stratosphere (∼28 km).
The characterisation of the retrieved FTIR products is completed by a theoretical assessment of expected uncertainties according to Eq.(5), which evaluates how different sources of errors can be propagated into the retrieved products. At IZO, 230 the error budget, analytically performed by PROFFIT software, includes the impact of measurement noise and the model parameter sources accounting for instrumental/model aspects (baseline parameters, Instrumental Line Shape -ILS-function, Figure 3. Example of the averaging kernels (A) rows, on a logarithmic scale, for standard and non-standard NDACC products for typical measurement conditions at IZO (spectra taken on 20 th July 2013 at a solar zenith angle of ∼30º). Note that for H2O, only the A rows of the main water vapour isotopologue (H 16 2 O) are shown. Coloured lines represent A rows at altitudes representative of the layers discernible by the FTIR instrument. For a better representation, the ordinate limit for each product has been adapted depending on whether the trace gas has a significant contribution in the middle/upper stratosphere (y-limit of 40, 50, or 60 km) or is predominantly distributed in the troposphere/lower stratosphere (y-limit of 10, 15, or 25 km). solar pointing, atmospheric temperature profiles, solar lines, and spectroscopic parameters), which are split into statistical and systematic contributions. Further details about the uncertainty analysis are given in Appendix A.
Statistics on the uncertainties for all NDACC products are also included in Table 3. The statistical uncertainty mean over the 235 20-year period (one standard deviation, σ, in brackets) ranges from ∼0.4% (∼0.04%) for N 2 O up to ∼50% (∼20%) and ∼100% (∼1800%) for H 2 CO and ClONO 2 , respectively. The latter, as large values point out, are particularly difficult to retrieve from the solar absorption spectra at IZO due to their weak spectral signature and the relatively low TCs recorded at a subtropical station under background conditions (Kohlhepp et al., 2012;Vigouroux et al., 2018). Overall, statistical uncertainties are dominated by measurement noise, baseline, and atmospheric temperature errors (e.g. Schneider et al., 2008aSchneider et al., , 2012García 240 et al., 2012;Sepúlveda et al., 2012;Vigouroux et al., 2018;García et al., 2021).
A similar pattern is found for systematic uncertainty contributions: H 2 CO and ClONO 2 present the maximum errors, ∼50 % (∼20%) and ∼100 % (∼1600%) respectively, while the main water vapour isotopologue H 16 2 O shows a mean bias lower than 1.5 % (∼0.2%). For all NDACC gases, the systematic uncertainty budget is dominated by spectroscopic errors. In the case of MUSICA water vapour isotopologue products, these are retrieved using an improved spectroscopy based on HITRAN2012, but 245 modifying line intensities (S) and broadening parameters (γ) by about 5-10 % Barthlott et al., 2017).
This modification is introduced to correct the bias in the water vapour profile products, whereby very small systematic errors are expected.  depending on the target gas, the predominant errors are mainly located in one of the troposphere, upper troposphere/lower stratosphere (UTLS) region, or middle/upper stratosphere layers. Statistical uncertainties between 5-10% are expected in the lower, middle and upper troposphere for C 2 H 6 , CO, H 16 2 O, and OCS, and as high as 20% for HCN tropospheric VMR estimates. HCl, HF, HNO, NO and OCS are found to be especially sensitive to uncertainties in the UTLS region, while OCS and O 3 exhibit the major error impacts in the middle/upper stratosphere. Large values are also detected in the upper stratosphere for CO and 255 HNO 3 , but a subtle impact on the TCs might be expected given the low concentrations of these gases at those altitudes. For CH 4 and N 2 O, the error values are mostly limited to ∼2.5% throughout the atmosphere. The systematic uncertainties behave similar to statistical vertical profiles, although in general, the error values are slightly higher. Particularly, large error profiles are estimated for H 2 CO and ClONO 2 , and for both statistical and systematic contributions, due to their weak spectral signatures and low abundances at IZO as mentioned above.

4 Long-term Performance
The long-term performance of ground-based FTIR instruments can be assessed through indirect tests. Here, the evolution of the Instrumental Line Shape (ILS) function and solar pointing are analysed, as well as the total column-averaged amount of dry air (X air ) and of carbon dioxide (XCO 2 ) retrieved from NDACC FTIR spectra in order to identify instrumental inconsistencies and to document temporal stability of the long-term IZO FTIR time series.

Instrumental Line Shape Function
A precise knowledge of the ILS function is essential to properly characterise instrument performance, since the ILS affects the absorption line shape on which the retrieved information is based. This is of particular importance when stratospheric gases are concerned due to the full width at half maximum of their sharp absorption lines and ILS have similar magnitudes (Schneider et al., 2008a, b;García et al., 2021). Therefore, the ILS function at IZO has been routinely monitored about every two months 270 since 1999 using re-filled N 2 O cells at a pressure of 10 Pa. The ILS is then retrieved from the N 2 O absorption lines using LINEFIT software (v14.5), as described in Hase (2012), and applied in the NDACC atmospheric retrievals. Note that the ILS function depends on instrumental configuration (i.e. fieldstops, detectors, optical filters, ...), thereby at IZO the ILS information is estimated independently for each detector. For the MCT configuration, two broad micro-windows combining saturated and un-saturared N 2 O absorbing lines between 1235.0-1279.5 and 1291.8-1301.9 cm −1 are used, while for the InSb detector one 275 micro-window between 2173.2 and 2210.0 cm −1 is considered (Hase, 2012). In addition, sealed HBr cell measurements have been taken occasionally since 1999.
Continuous monitoring of the ILS function through independent cell measurements ensures that the actual instrumental status is taken into account in NDACC retrievals, but it also allows instrumental alignment and temporal stability to be verified.
As an example, Figure 5 depicts the time series of the ILS's modulation efficiency amplitude (MEA) and phase error (PE) 280 parameters for the NDACC filter 4 measurement settings (InSb detector) for the IZO FTIR instruments between 1999 and 2018. This figure documents that, in addition to suffering from a higher level of spectral noise in the cell and atmospheric measurements, the ILS of the IFS 120M spectrometer is less stable over time than the ILS of the IFS 120/5HR. It further illustrates how punctual interventions on the spectrometers can properly correct the instrumental issues detected: the MEA temporal degradations were mitigated by two punctual optic re-alignments in 2008 and 2013, while the PE asymmetries were 285 minimised by replacing the internal reference laser in 2016 (recall Table 1). In the last years, the IFS 120/5HR has been very stable with a loss of MEA not exceeding 2% and PE limited to ±0.04 rad throughout the OPD range. As documented by García et al. (2021), the ILS time series for the MCT configuration is very consistent with that reported for the InSb detector, corroborating proper instrumental characterisation of the IZO FTIR instruments.

290
Mispointing of the solar tracker can generate a Doppler shift of solar lines with respect to telluric spectral features due to the solar rotation (Gisi et al., 2011). This effect is considered in the operational NDACC retrievals by fitting a separate shift for solar background lines, whereby effects on trace gas observations are minor. Nevertheless, analysing the Doppler shift also gives a useful method to estimate the solar tracking accuracy. Figure 6 (a) shows the time series of the Doppler spectral scaling factor ∆ν/ν, which has been retrieved by observing the solar line shifts in the measured MIR spectra around 2104 cm −1 295 using PROFFIT software. After the quadrant-diode set-up with a semitransparent mirror was installed in February 2005 at the IZO FTIR instrument and further realignments were made in May 2007, an averaged scaling factor of -0.18·10 −7 with a scatter of 4.48·10 −7 was reached. The latter translates into a mispointing precision along the solar equator of ∼35 arc seconds (Gisi et al., 2011). The Doppler scaling values after 2012 clearly indicate the significant improvement induced from the more accurate CamTracker system. The precision is a factor of about two lower, and within the range of 20 arc seconds. This ensures

XCO 2 and X air
Two approaches based on gas retrievals from the measured MIR spectra have been examined to assess the long-term consistency of NDACC FTIR data sets. The first approach, the XCO 2 method, is based on Schneider et al. (2012), and further elaborated by Barthlott et al. (2015), who demonstrated that the XCO 2 retrievals from NDACC MIR spectra (referred to as NDACC XCO 2 305 hereafter) can be used as a proxy for the assessment of the network consistency of the NDACC FTIR measurements. This approach compares retrieved NDACC XCO 2 data to a multi-regression XCO 2 model that provides information on long-term, seasonal, and latitudinal behaviour of XCO 2 ( Barthlott et al., 2015). To quantify this relationship, the R XCO2 parameter is defined here as follows: The XCO 2 model is based on CarbonTracker results and Mauna Loa CO 2 in situ records, and adapted to the FTIR measurement site using only latitude and surface pressure as local inputs. On the other hand, the NDACC CO 2 TCs were retrieved by analysing four isolated CO 2 absorption lines between 2620.55 and 2629.95 cm −1 , using a scaling retrieval with a fixed WACCM a priori VMR profile, and using PROFFIT software. Then, the XCO 2 is calculated by dividing the CO 2 TC by the dry pressure column (DPC) parameter. The DPC is obtained by converting surface pressure (P S , in Pascals) to column air 315 concentration Barthlott et al., 2015), as follows: where µ dryair is the molecular mass of dry air (∼28.96 × 10 −3 N A kg molecule −1 ), µ H2O the molecular mass of water vapour (∼18.02 × 10 −3 N A kg molecule −1 ), N A Avogadro's constant (∼6.022 × 10 23 molecules mol −1 ), g(ϕ) is the latitude-dependent column-averaged gravitational acceleration, and TC H2O is the water vapour TC. The TC H2O data are a result of the MUSICA 320 retrieval , and references therein), and surface pressure is taken from NCEP data used in the retrievals. Refer to Barthlott et al. (2015) for further details about the XCO 2 approach.
The second method is based on the X air parameter, which can be used as a sensitive test of the temporal stability of an FTIR instrument because, for X air , there is no compensation of possible instrumental problems (Frey et al., 2019). This quantity compares the measured TC of a well-known, very stable reference gas with surface pressure measurements (Eq. (8)). Therefore,

325
for an ideal FTIR instrument X air , values should be close to unity and large deviations (∼1%) from this threshold might indicate instrumental problems (Wunch et al., 2015;Frey et al., 2019). Here, the nitrogen (N 2 ) absorption signatures measured by the NDACC FTIR spectra have been considered as reference gas.
where F N2 is the dry-air mole fraction of nitrogen in the atmosphere (0.7808) and TC N2 is the N 2 TC. The latter is retrieved 330 by evaluating four N 2 spectral micro-windows between 2403.00 and 2426.3 cm −1 , considering a scaling retrieval with a fixed WACCM a priori VMR profile, and using PROFFIT software (Goldman et al., 2007).
Figure 6 also shows the time series of the P S (normalised by the typical surface pressure at IZO of 770 hPa), the R XCO2 and the X air N 2 parameters. Consistently, both R XCO2 and X air N 2 data are found to be biased by ∼2% with respect to unity (upward and downward, respectively), which is very likely due to errors in the MIR spectroscopic parameters of the CO 2 and 335 N 2 absorption signatures analysed (Goldman et al., 2007;Barthlott et al., 2015). Therefore, considering R XCO2 ∼1.02 and X air N 2 ∼0.98 as reference, the XCO 2 and X air N 2 approaches provide consistent results. Anomalous values are detected only in the period 1999-2000, which are attributed to the surface pressure records (marked as grey-white dots in Figure 6). In 2000, there was a change of type and location of the IZO surface pressure sensor (until June 2000 a Thyas sensor with a precision of ±1hPa has been used, followed by a Setra sensor with a precision of ±0.3 hPa), leading to a jump of 0.30% in both the R XCO2 340 and X air N 2 parameters. By ruling out this period, the mean X air N 2 values (1σ in brackets) are 0.9821 (0.0038) and 0.9844 (0.0037) for the IFS 120M and IFS 120/5HR, respectively, while for R XCO2 the mean values are 1.0256 (0.0038) and 1.0234 (0.0023) for the IFS 120M and IFS 120/5HR, respectively. These results agree well with the reference X air N 2 and R XCO2 values of 0.98 and 1.02. Note that the reported R XCO2 mean values are computed from the monthly time series as the modeled XCO 2 data cannot capture the synoptic time scale variation (i.e. day-to-day variations) (Barthlott et al., 2015).

345
The switch of spectrometer from IFS 120M to IFS 120/5HR in 2005 entails the most important change identified in both the X air N 2 and R XCO2 time series, leading to a mean bias of ∼0.20% between both FTIR instruments. In addition, and consistent with the ILS analysis, the IFS 120/5HR system is found to be more stable than the IFS 120M spectrometer (the R XCO2 scatter is reduced by ∼ 65% for the 2005-2018 period and ∼ 3% for the X air N 2 ). However, these differences lie clearly within the estimated confidence ranges for both FTIR systems and it is, therefore, not expected that they will influence the long-term IZO 350 NDACC time series (e.g. García et al., 2012;Sepúlveda et al., 2012). The other minor instrumental issues or interventions on the FTIR instruments (recall Table 1) do not seem to affect the X air N 2 and R XCO2 time series, since some of them can be partially post-corrected during NDACC gas retrieval processing. That is the case of, for example, the frequency instabilities detected in the internal reference laser in the period 2016-2017. As illustrated in Figure 6 (b)-(c) (black-white dots), this issue has an unestimated impact on the R XCO2 and X air N 2 values, because the spectral shift of the measured MIR spectra 355 is simultaneously fitted by PROFFIT software when retrieving the different NDACC products. No significant temporal drifts were detected in either reference time series (at 95% confidence level), corroborating the long-term temporal stability of the IZO FTIR instruments. Figure 6 also includes the X air O 2 time series, which is estimated similarly to X air N 2 (Eq. (8)) but using the oxygen (O 2 ) TCs retrieved from TCCON NIR spectra as the reference gas (Wunch et al., 2015). The X air O 2 parameter also suffers from a ∼2% 360 bias due to O 2 spectroscopic inconsistencies, and a mean typical value of 0.9832 is found for the IZO FTIR instrument. This value is consistent with results reported for other TCCON sites (Wunch et al., 2015). It is worth highlighting that the dispersion for X air N 2 (∼0.37%) duplicates that found for the IZO FTIR instrument when using the TCCON X air O 2 retrievals (0.18%).
This different behaviour could, in part, be due to the fact that N 2 TCs are retrieved from a few weak N 2 absorption lines ∼2400 cm −1 , whereby they are more sensitive to spectral measurement noise (and disturbing effects). In addition, discrepancies in the Note also that the X airO2 and R XCO2 results are very coherent, indicating that the R XCO2 parameter can be successfully used to assess reliability and stability of the NDACC FTIR data.
To sum up, the long-term performance analysis indicates that the IZO FTIR spectrometers do not suffer from major instrumental issues apart from those already identified. In addition, both instruments have been shown to be stable over time

370
(especially the IFS 120/5HR) and well-characterised during their first 20 years of operation. Therefore, the NDACC FTIR trace gas concentrations measured at IZO can be reliably used for long-term climate research. In general, the abundances of trace gases observed at IZO are relatively lower than at middle/high latitudes and in polluted areas due to the special measurement conditions of the observatory. As mentioned above, IZO is a high-altitude station, isolated from local and regional pollution contributions, and located in the descending branch of the northern subtropical Hadley cell.

380
Therefore, very low water vapour or pollution-related gas concentrations are typically measured and, for some trace gases, IZO records are close to the FTIR limit of detection (e.g. ClONO 2 ). These typical background conditions are only sporadically disturbed by long-range transport of pollution and/or biomass-burning events from Europe and North America (Cuevas et al., 2013;García et al., 2017, and references therein), and intrusions of polar or tropical streamer airmasses causing sporadic downward and upward shift of the UTLS region, respectively Cuevas et al., 2013). These episodes 385 are typically observed in winter when, generally, large variations are also detected in the FTIR TCs time series due to a more disturbed atmosphere (see Figure 7). In addition, direct stratospheric air mass intrusions are occasionally detected in spring and summer (Cuevas et al., 2013), leading to an enhancement of tropospheric concentrations of some trace gases (e.g. O 3 ).
Chemically long-lived gases, such as HF or N 2 O, could be used to track these UTLS vertical movements and stratospheretroposphere-exchange (STE) events . Figure 7 shows the anti-correlation between the extreme TCs 390 of HF and N 2 O, which are more abundant in the stratosphere and troposphere, respectively. This cross-relationship is also noticeable for other stratospheric gases, such as O 3 , NO 2 , or HNO 3 , and tropospheric compounds, like CH 4 . On the other hand, during summer, African air masses are frequently advected westwards over the Atlantic Ocean, modifying the atmospheric composition of lower and middle troposphere of the subtropical North Atlantic region. During these episodes, African boundary layer air is strongly injected into the free troposphere, whereby large mineral dust concentrations are detected at IZO along 395 with industrial pollutants, as well as rather humid, enriched water vapour, and relatively low tropospheric O 3 concentrations (Cuevas et al., 2013;González et al., 2016;García et al., 2017, and references therein).
intra-annual variations are somewhat smooth and are largely dominated by the dynamical shift in the height of the subtropical tropopause, which is associated with stratospheric general circulation. The higher the tropopause, the smaller the relative contribution of the stratosphere to the TCs. This results in minimum (maximum) TCs in summer (winter) for long-lived stratospheric gases (e.g. ClONO 2 , HCl, HF, HNO 3 ) and the opposite behaviour for long-lived tropospheric gases (e.g. OCS, records at a global scale (including IZO among them), this oscillating behaviour can be caused by an extratropical dynamical variability with a 5-7 year period driven by interactions between transport circulation and the quasi-biennial oscillation in tropical winds. This work also reveals that the amplitude of this short-term dynamical variability is large in relation to the long-term trend records, whereby it may have a strong impact on trend estimates when using shorter than multi-decadal data records. Consistently with other northern NDACC stations (Strahan et al., 2020), the IZO records point to a decline in HCl (-0.21±0.19%yr −1 ) and an increase in HNO 3 (+0.44±0.33%yr −1 ) over the 20-year period. However, when considering the two decades separately, trends become less apparent and could likely be affected by these short-term dynamical variations as previously mentioned (+0.41±0.75%yr −1 and +0.19±0.58%yr −1 for the periods 1999-2008 and 2009-2018, respectively, for HNO 3 , and -0.17±0.37%yr −1 and -0.14±0.33%yr −1 for the periods 1999-2008and 2009. Note that for a better interpretation of the long-term evolution, Figure 8 represents the time series of annual TC anomalies relative  (Figure 8 (c)). Specific anthropogenic activities can be further monitored through the measurement of related-gas abundances in the atmosphere. This is the case for non-methane hydrocarbons, such as C 2 H 6 , which shows rather steady values until 2009 when an upturn is detected (Figure 8 (c)). The causes of this sharp rise, also observed at other globally-distributed NDACC FTIR sites, are similar to those documented for the CH 4 increase (i.e. the oil and natural gas production boom in the 470 Northern Hemisphere, particularly in North America) (Franco et al., 2016;Mahieu et al., 2018, and references therein). As with H 2 CO, after the 2015 peak the IZO time series suggests a stabilisation of the C 2 H 6 TCs, at least at subtropical latitudes.
In addition to ozone, greenhouse and air quality gases, a key element in the Earth's climate is the water cycle. Ground-based FTIR observations of water vapour isotopologue composition have proven to provide valuable information for understanding the different water cycle processes (moisture source, transport, cloud processes, and precipitation) and their relation to climate 475 (e.g. Risi et al., 2012;Schneider et al., 2012;Barthlott et al., 2017;Schneider et al., 2012, and references therein).   Figure 3. Also note that the range of the coloured scale, showing VMR concentrations, has also been adapted for each trace gas.
White areas correspond to data gaps due to instrumental issues (recall Table 1). For greenhouse gases, the long-term TC tendency is largely the result of the increase in the tropospheric concentrations, where the NDACC products provide similar positive rates in the 1999-2018 period: +0.30±0.02%yr −1 and +0.29±0.01%yr −1 525 in the 2.37-5.6 km layer for CH 4 and N 2 O, respectively. This monotone increment is also reported in the lower/middle stratosphere (see Figure 11 (b)), although marked year-to-year fluctuations are detected, likely due to atmospheric transport processes and dynamical mechanisms, as pointed out in Section 5. Although greenhouse gas concentrations in the stratosphere are significantly lower than in the troposphere, stratospheric accumulation rates have been found to be significantly greater (+0.52±0.08%yr −1 and +0.50±0.16%yr −1 in the 22-29 km layer for CH 4 and N 2 O, respectively). This result is expected 530 since vertical transport and mixing mechanisms are considerably faster than their respective destruction processes in the stratosphere (i.e. mainly photodissociation for N 2 O, and oxidation by reaction with hydroxyl radical OH for CH 4 ). This long-term behaviour, and possible short-term variations, may play an important role in modulating stratospheric temperatures, and thus, affecting the stratospheric chemical cycles (e.g. enhancing the recovery of stratospheric O 3 , Steinbrecht et al. (2017)). For other trace gases, the information drawn from the TC analysis only would cover up the vertical distribution of long-term 535 patterns. This is the case of the tropospheric CO records, whose decrease rate is almost three times that for TC abundances: -0.96±0.26%yr −1 in the 2.37-5.6 km layer for the 1999-2018 period. This example, together with those described above, further emphasises the added value of the vertical information provided by the NDACC FTIR data. Nonetheless, it is fair to admit that the trend estimations of short-lived gases, such as CO, might be influenced by the FTIR sampling effects (as addressed in detail in Section 8.4).

Other Climate Research Applications
Examples of scientific applications of the IZO FTIR time series are described in detail in Sections 5 and 6, especially those of use in investigating greenhouse gas budgets and long-term changes in key atmospheric gases, such as ozone, chlorine and fluorine compounds, air quality gases, etc., at both a regional and global scale. In addition, the validation of remote observations measured by different satellite instruments has been one of the priorities of the IZO FTIR programme. The high-545 quality NDACC FTIR data at IZO have been applied extensively for many years in the evaluation of, e.g.,   . The TCCON FTIR products are retrieved with the GGG Suite software package (current version GGG2014), whose core part is the non-linear least-squares fitting algorithm GFIT (Wunch et al., 2015). It basically performs scaling retrievals with respect to the a priori VMR profiles to compute TCs of CH 4 , CO and N 2 O, together with CO 2 , H 2 O, HDO, and HF. In order to minimise the impact of instrumental issues on the precision of the TCCON products, retrieved column 595 abundances are converted to total column-averaged dry-air mole fractions (XGas) by using simultaneously retrieved O 2 TCs. The XGas mole fractions are then calibrated onto the WMO's gas standard scale maintained by NOAA (National Oceanic and Atmospheric Administration, www.noaa.gov) and provided as standard TCCON products (Wunch et al., 2011(Wunch et al., , 2015

Comparison Strategy
The comparison methodology used here is based on previous works carried out at IZO (e.g. Sepúlveda et al., 2012;García et al., 2012;Sepúlveda et al., 2014;García et al., , 2018 and can be briefly summarised as follows: The different measurement approaches and instrument capabilities also lead to different vertical sensitivity. Even using the same instrument and technique, NDACC and TCCON averaging kernels tend to peak at different altitudes and so,

655
NDACC and TCCON FTIR products may reflect concentration variations from different atmospheric layers. All these aspects can introduce significant differences in the retrieved products and must be considered when interpreting the comparison results (e.g. Barthlott et al., 2015;Robles-González et al., 2016;Kiel et al., 2016, and references therein For CH 4 , CO and N 2 O, the NDACC TC data were converted to total column-averaged dry-air mole fractions by using 665 the DPC parameter (Eq. (7)) in order to be compared to the standardised, WMO-calibrated TCCON XGas retrievals.
This transformation also allows the analysis of the capability of the NDACC XGas products to capture tropospheric concentration variations by comparing them to IZO ground-level concentrations. In addition, the IZO ground-level records are compared to the NDACC tropospheric CH 4 , CO, and N 2 O concentrations (CH 4 T RO , CO T RO , and N 2 O T RO , respectively), which are obtained as the mean of retrieved NDACC VMR profiles between IZO altitude (2.37 km a.s.l.) and 670 middle troposphere (5.6 km a.s.l.) García et al., 2014).
In order to compare the FTIR vertical profiles to the in-situ highly-resolved profiles (meteorological and ECC sondes), the latter have been vertically-degraded by applying the averaging kernels obtained in the FTIR retrieval procedure, following Eq.(4). By doing so, the limited sensitivity of the FTIR data is properly taken into account in the comparison (Rodgers, 2000). To homogenise the reference data sets, only those sondes with continuous measurements up to 12 km 675 for H 2 O, and up to 29 km for O 3 have been considered. Beyond these altitudes, the sonde profiles have been completed using the corresponding FTIR a priori VMR profiles to compute the smoothed humidity and O 3 profiles.
3. Temporal Criteria: Temporal collocation depends on the natural variability of each target gas, FTIR uncertainty and characteristics of each reference technique, therefore it varies from gas to gas.
For CH 4 , CO and N 2 O, the daily night-time means (20.00-08.00 UT) of the IZO ground-level records and the daily 680 day-time means of the FTIR products are paired García et al., 2018). As previously mentioned, the IZO night-time surface data represent the background regional signal of the free troposphere well, while during daytime local air circulations may disturb the ground-level data. In addition, these trace gases show rather small intra-day variations, so a pairing of daily means could be more meaningful than a comparison of individual measurements. A similar temporal criterion is then applied for the CH 4 , CO and N 2 O column retrievals from the NDACC and TCCON 685 data sets, i.e., the daily means of XCH 4 , XCO and XN 2 O are matched.
Given the large natural variability of H 2 O, and in order to ensure that the different techniques observe similar air masses, temporal coincidence criterion has been restricted to 1 hour for all the PWV products (Schneider et al., 2010). For the radiosonde data, the time at a half of the observation is chosen as reference time (a sonde takes approximately one hour between the launch and burst in the UTLS).

690
For the Brewer O 3 TC the 1-hour temporal coincidence is also applied, since the precision of both the FTIR and Brewer techniques is able to properly resolve the intra-day O 3 concentration variations (Schneider et al., 2008b;. For the DOAS-FTIR intercomparison, because the DOAS technique measures only during twilight and O 3 and NO 2 are photochemically active species, the FTIR observations are averaged before and after 12 UT (a.m. and p.m., respectively) (Robles- González et al., 2016). 695 Finally, for O 3 profile comparison the strategy described by Schneider et al. (2008a) and García et al. (2012) was followed, where the ECC sondes are corrected daily by means of coincident Brewer data. These works illustrated that by applying this correction, the quality and long-term stability of the ECC sonde data can be significantly improved. Given that intra-day O 3 variability is much lower than that for H 2 O, the temporal coincidence window is extended to 3 hours around the O 3 sonde launch.

700
To avoid redundant data, for all the intercomparisons, each FTIR measurement is only paired once to the reference observation that minimises the time difference within the temporal coincidence window.

Temporal Decomposition:
Once the FTIR and reference IZO observations have been temporally paired, quality assessment of the FTIR products is addressed, both directly comparing the measured data sets and at different timescales by means of a temporal decomposition of the measured time series. This temporal decomposition, explained in detail in 705 Appendix B, provides an added value for comparison as it allows the temporal signals discernible by the FTIR system to be properly identified (i.e. single measurements, daily and seasonal averages, and long-term variations) García et al., , 2018. Table 4 summarises the direct comparison between the NDACC FTIR products and the reference data sets considered: TCs of R 2 =0.98) and the dispersion between techniques is smaller than 1% (1σ, σ stands for the standard deviation of the relative differences with respect to reference data, i.e., Brewer data). This scatter accounts for the precision of both techniques, thereby it could be considered a very conservative value of FTIR precision. A large contribution to this dispersion can be attributed to the impact of atmospheric temperature profile uncertainties on the FTIR O 3 retrievals (Schneider et al., 720 2008a, b;García et al., 2012García et al., , 2021. When considering more refined O 3 products (not the standard NDACC product), including a simultaneous temperature profile fit in the O 3 retrieval procedure, the scatter between Brewer and FTIR can be significantly reduced by up to 0.5-0.7% (Schneider et al., 2008b;García et al., 2012García et al., , 2021. Although the agreement is very satisfactory, the Brewer and FTIR products differ in their absolute quantification of O 3 TCs: the FTIR is upward biased by ∼3.6%. This well-known systematic discrepancy is likely introduced by inconsistencies between ultraviolet 725 and infrared spectroscopic parameters. As presented in Section 3.4, uncertainties of 5% in the HITRAN O 3 spectroscopy may explain a bias of ∼5% between FTIR and Brewer data.

Direct and Timescale Comparison
Although a direct comparison between coincident DOAS and FTIR instruments is challenging, a satisfactory consistency has been documented for O 3 TCs: the FTIR-DOAS mean difference is ∼8-10% with a σ of ∼3%. These values agree with the expected precision for DOAS (∼5%) and with previous DOAS-Brewer comparisons carried out at IZO (Gil -Table 4. When examining the different timescale signals, the O 3 TC variations are similarly captured by all techniques at the short-term (measurement-to-measurement) and seasonal scales, as shown in Figure 12 (right panels) and summarised in Figure 14. For long-term signals, while Brewer and FTIR observations similarly reproduce the O 3 evolution with a correlation of ∼95%, DOAS O 3 observations show a poorer agreement with the FTIR data (R∼60%). This is likely due to a systematic bias introduced by the switch of DOAS instrument in 2010 (the change-point is clearly recognisable in the difference time series displayed in Figure 12 (a)). As a result, the DOAS long-term trends (14 (e)) are found to overestimate the Brewer and FTIR values. Note that the linear trends shown in Figure 14 (e) have been computed considering the coincident observations and so could differ from the values presented in Sections 5, which are based on 740 the FTIR monthly means. Figure 14. Summary of timescale comparison between FTIR and IZO reference data sets: standard deviation of the relative differences (σ, in %) is displayed on x-axis, and the size of dots represents the determination coefficient, R 2 . These statistics are shown for the comparison of (a) measured and decomposed time series: (b) measurement-to-measurement, (c) seasonal, and (d) long-term variations (annual means).
(e) Linear trends (in %/yr) for coincident FTIR and reference data sets, which are calculated by fitting a linear function combined with a Fourier time series to the data according to Eq.(B1). Errors represent the 95% confidence interval and were determined using the bootstrap method (see details in Appendix B). Note that TRO refers to the comparison of the tropospheric quantities (GAW in-situ records and FTIR VMR averages).
2. Nitrogen Dioxide: Moderate performance is found when comparing the DOAS and FTIR NO 2 products. The scatter obtained for the relative differences, ∼10%, was found to agree well with the expected precision of DOAS NO 2 data (∼12%), with the FTIR theoretical uncertainty budget (recall Table 3), and with previous works (∼9-11%, Robles- González et al. (2016)). However, a remarkable asymmetry in the mean differences has been documented between a.m.

745
and p.m. FTIR-DOAS comparisons for all timescales analysed (Figure 12 and Figure 14). In general, both for O 3 and NO 2 , the DOAS a.m. observations compare better with the FTIR data than p.m. values. This pattern is largely introduced by diurnal variations in the TCs due to photochemical processes, especially important for active species like NO 2 , which are captured differently by the two techniques. A photochemical correction (solar zenith angle-dependent) can be applied to the FTIR data to refer the measurements to the DOAS acquisition time, which reduces the bias between 750 the techniques and the a.m.-p.m. asymmetry (Robles-González et al., 2016). In addition, part of these differences can be attributed to the different vertical sensitivities of the DOAS and FTIR techniques: while the DOAS method, with zenith-sky measurements at twilight, is almost insensitive to troposphere and tropopause regions (Gil-Ojeda et al., 2012;Robles-González et al., 2016), the FTIR system can detect UTLS contributions (recall Figure 4).
The diurnal asymmetry might be the reason for the inconsistency between the FTIR a.m. and p.m. records for the long-755 term signals. While the FTIR a.m. observations point to a significant NO 2 increase over IZO (similar to that for DOAS data), the FTIR p.m. records seem to indicate the opposite long-term behaviour (Figure 14 (e)). However, further studies would be recommendable to better understand what drives the difference in these linear trends and reconcile them.
3. Water Vapour: The consistency between the NDACC PWV product and the other PWV techniques is very satisfactory, with variances of more than 95% in agreement. As expected, the best performance is found for FTIR PWV products to all the PWV reference techniques, ranging from ∼10% to ∼38% for TCCON and CIMEL data, respectively. Part of this overestimation is introduced by the NDACC PWV retrievals, ∼12% (Tu et al., 2020), which agrees with the bias obtained with respect to the calibrated TCCON data. The large bias with CIMEL is likely attributed to calibration issues of the standard AERONET CIMEL PWV products. As recently pointed out by Almansa et al. (2020), a dedicated calibration of the CIMEL H 2 O channel (centred at 940 nm) at IZO reduces the reported bias by ∼20%.

775
In relation to PWV seasonality, although the comparability of the different techniques is excellent, the differences among them depend on the PWV values: maximal differences are observed for extreme PWV conditions (i.e. summer and winter months, Figure 12), which may be due in part to the different seasonal sensitivities of the different measurement techniques. At longest timescales, the NDACC FTIR, CIMEL and GPS consistently suggest that PWV values over IZO have been significantly slowing down over the last two decades, but the magnitude of this decrease varies among 780 techniques (Figure 14 (e)). However, the NDACC-TCCON PWV comparison offers contrary results due in part to a more reduced coincident data set (∼1100 pairs for NDACC-TCCON versus ∼11000 coincidences for NDACC FTIR-CIMEL-GPS).
4. Carbon Monoxide, Methane, and Nitrous Oxide: An excellent consistency between the NDACC and TCCON XCO, and XCH 4 records is observed for both the original measured time series and the signals on different timescales. The 785 direct correlation is higher than 0.95 for both gases with a scatter of the relative differences of 0.7% and 2.7% for XCH 4 and XCO, respectively. These values lie within the expected NDACC systematic uncertainties budget (recall Table 3) and the random uncertainty of the TCCON retrievals (Wunch et al., 2015).
Although the overall comparability of the FTIR XN 2 O products is satisfactory, it is found to be poorer than the agreement between tropospheric NDACC N 2 O and ground-level records. This behaviour can be due to different vertical sensitivities 790 of the NDACC and TCCON products, leading to the NDACC XN 2 O data being more influenced by the seasonality of the UTLS region, as seen in Figure 13 and reported by previous studies Zhou et al., 2019). This figure also highlights the remarkable decoupling (even anti-correlation for XCH 4 and XN 2 O) between the annual cycles of the XGas and tropospheric products. While the seasonality of XGas observations is mostly dominated by the annual shift in the UTLS region, tropospheric concentrations of CO, CH 4 , and N 2 O are determined by the source emission 795 patterns. As indicated by the comparison to ground-level records, the NDACC tropospheric products properly capture the tropospheric seasonal signals, demonstrating great potential for source/sink attribution studies. The performance of the NDACC tropospheric products can be further improved by means of more sophisticated retrieval strategies, allowing for a reduction of the stratospheric contribution García et al., 2014). Likewise, the influence of the stratospheric signal on the XGas products can be partially corrected by using co-retrieved XHF data or XN 2 O (for 800 XCH 4 ) Wang et al., 2014, and references therein).
In relation to systematic differences, the mean bias is lower than 0.6% for XCH 4 and XN 2 O, indicating no significant spectroscopy inconsistencies between the NIR and MIR spectral regions used for TCCON and NDACC retrievals, respectively. This result is confirmed by comparing the tropospheric NDACC N 2 O product to the ground-level records.
However, for CH 4 , a bias of ∼2.6% with respect to surface measurements appears, which is compatible with the as-805 sumed error of 3% in the spectroscopy intensity parameter (Table 3). Concerning CO, the NDACC and TCCON XCO products generally differ by ∼5-6%, which is likely attributed to TCCON post-calibration, as reported by previous works (e.g. Kiel et al., 2016). The TCCON XCO calibration factor is 1.0672±0.0200 (recall Section 8), which coincides with the bias obtained between the TCCON and NDACC XCO products. Note that the comparison with ground-level records points to an underestimation of the NDACC tropospheric CO values (unlike to those observed for the TCs), which further 810 emphasises the presence of a systematic inconsistency.
The timescale analysis was found to be a very useful tool, particularly, for tropospheric NDACC comparisons. As illustrated by Figure 14, the agreement observed between ground-level and tropospheric NDACC products is mainly the result of seasonal and long-term signals. Tropospheric measurement-to-measurement variations, especially for long-lived CH 4 and N 2 O gases, are smaller than the FTIR precision and are, therefore, scarcely captured by the remote sensing system (no correlation has been found between FTIR and ground-level observations). Given that CO presents more variable concentrations in the atmosphere, the NDACC FTIR product is able to capture part of its tropospheric variations (correlation of ∼0.60 on a daily scale).

5.
Water vapour and ozone profiles: Figure 15 summarises the comparison between the FTIR O 3 and H 2 O VMR profiles and the reference observations (smoothed ECC and humidity sondes). The vertical distribution of differences exhibit 820 different patterns for each gas. For O 3 , the difference profile is almost constant until the UTLS region with a mean bias of ∼10-15% and scatter of ∼6-7%. Beyond this region, the discrepancies considerably decrease to below ∼5% (4.0% at 29 km) likely due to a better sensitivity of the FTIR system to O 3 variations (recall Figure 3). Part of the observed discrepancies are introduced by including both FTIR instruments in the comparison. As shown in Section 4, the ILS of the IFS 120M spectrometer is noisier and less stable over time than the ILS of IFS 120/5HR, leading to greater uncertainties 825 in the 120M O 3 vertical distribution retrievals. Note that instrumental performance is especially critical for stratospheric gases, like O 3 , since the ILS affects the absorption line shape on which the retrieved information is based. When considering only the more stable, well-aligned IFS 120/5HR instrument, García et al. (2021) documented an improvement in comparability between FTIR and ECC sondes by ∼1% up to the UTLS and ∼0.5% in the middle stratosphere. In addition, as with the total columns, refined FTIR retrieval strategies (i.e. including a simultaneous temperature fit) can 830 further improve the FTIR performance (Schneider et al., 2008a;García et al., 2012García et al., , 2021. On the other hand, the H 2 O comparison exhibits a strong vertical stratification with largest discrepancies located in the lower troposphere (i.e. just above the island). This pattern is likely due to the substantial impact of the local diurnal up-slope flow on FTIR observations . As stated above, diurnal insolation at IZO generates a thermal up-slope flow from the lowermost humid layers, causing a strong H 2 O diurnal cycle and thus affecting the H 2 O 835 signals measured by the FTIR system (Schneider et al., 2010;González et al., 2016). To examine this effect and ensure the comparison is carried out for free tropospheric conditions, the FTIR observations have also been restricted to low solar elevation angles (between 25º and 45º) , with the resulting difference profile also included in Figure 15 (in red). Despite a considerable decrease in the number of coincidences, this restriction ensures an optimal comparison and that quite similar air masses are detected by both FTIR and meteorological sondes. As a result, the 840 comparability is significantly improved. Until the middle troposphere, both the mean bias and scatter range from 15-20%, while they increase in the upper troposphere (mean bias of 45.6% and σ of 31.0% at 8 km). At these altitudes, larger temporal and spatial variability of the humidity fields are expected , and references therein), which makes the comparison of the remote sensing and in-situ profiles difficult.
For both gases, the scatter values found agree well with the FTIR error estimation (recall Section 3.4), with the expected M e a n , ±1 σ S E A [ 2 5 º , 4 5 º ] Figure 15. Vertical profile of relative differences between NDACC FTIR VMR profiles and IZO reference data sets (mean and ±1σ in %) for (a) O3 and (b) H2O. For the latter, the comparison considering only coincidences for FTIR solar elevation angle between 25º-45º is also included (in red). The dotted area represents determination coefficient, R 2 . Statistics of differences at the altitude levels, well-distinguishable by the FTIR instrument, are included in plots (5, 18, and 29 km for O3, and 3, 5, and 8 km for H2O). The number of coincident measurements is 276 in the period 1999-2018 for O3, and 154 and 32 in the period 2008-2017 for H2O without and with limiting the FTIR solar zenith angle, respectively. could account for part of the dispersion observed between both data sets. Other sources of discrepancies, as mentioned, might be the different observing geometries. Note that the temporal decomposition analysis has not been carried out for 850 profile comparisons given the reduced number of coincidences.

Influence of Sampling
Although the weather conditions at IZO are very favourable for solar measurements, the sampling of FTIR data may not be regular enough and contain gaps that affect the reliability of the FTIR results, particularly on long-term timescales. In order to examine this effect, Figure 16 compares the linear trends for coincident FTIR-Reference data sets and those computed from 855 the complete reference time series, without pairing to the FTIR observations, in the coincident period shown in Table 4. Note that only those products, for which the sampling of the reference data sets is uniform and continuous, have been considered (i.e. GPS PWV observations and ground-level CO, CH 4 , and N 2 O records). Although some of the other reference techniques have a higher measurement frequency than the FTIR system, they present their own sampling issues (the Brewer and CIMEL data are also biased towards cloud-free days, the DOAS technique measures only during twilight, and the TCCON data have 860 similar sampling to the NDACC FTIR observations). In addition, as in a similar way to Section 8.3, the linear trends shown in Figure 16 have been computed considering individual observations to assess the sampling effects and they could, therefore, differ from the values presented in Sections 5 and 6, which are based on FTIR monthly means.
The overall agreement between the different linear trend estimations is quite high. FTIR sampling has been found to have a minor impact, especially, on the stable and long-lived CH 4 , and N 2 O gases. However, for the more variable CO records, 865 a significant discrepancy is found between considering the entire ground-level time series and those paired with the FTIR observations. This artefact could be, in part, attributed to the sparse FTIR sampling and large dynamical variability in the winter months. As shown in Figure 13, the greatest differences between FTIR and ground-level data are concentrated in winter, which could induce a bias in the trend values. For PWV, the agreement between the coincident and entire GPS time series demonstrates that the uncertainties induced by FTIR sampling might also be considered negligible. Although there are 870 remarkable differences in the magnitude of the obtained PWV linear trends, they lie within the respective confidence error intervals.

Summary and Conclusions
Long-term ground-based observations of atmospheric composition are essential to monitor the evolution of the Earth's atmosphere system. Within the NDACC framework, FTIR spectrometry provides abundances of many trace gases simultaneously 875 and with a high degree of precision, which can be used to understand tropospheric and stratospheric chemistry and transport. In this context, the current paper provides an overview of the first 20 years of NDACC FTIR measurements taken at the subtropical Izaña Observatory (IZO, Spain) between 1999 and 2018.
The great potential of the IZO NDACC FTIR records for climate research is internationally recognised, contributing to more than one hundred peer-reviewed scientific papers and to numerous international research activities and projects. The major accomplishments drawn from these works are briefly reviewed in the current paper, especially those for investigating the greenhouse gas budgets and long-term changes of pivotal atmospheric gases, or the evaluation of space-based observations and climate model estimations. In addition, a comprehensive assessment of the long-term instrumental performance of the IZO FTIR systems is presented, and of the quality and long-term consistency of the different NDACC FTIR products in a coherent manner.

885
Together with the long-term monitoring of key atmospheric trace gases, analysing the temporal frequency, duration, and extent of the vertical phenomena over IZO (e.g. STE episodes or UTLS vertical movements) could provide useful insights into long-term changes in the dynamics and chemistry of the subtropical atmosphere. Furthermore, the evaluation of possible links to dynamical mechanisms or teleconnection patterns and hemispheric phenomena, such as ENSO (El Niño and the Southern Oscillation), NAO (North Atlantic Oscillation) or QBO (Quasi-Biennial Oscillation), might also serve as tracer of climate 890 change. Dedicated studies using the IZO trace gas time series would be of great use in better understanding these drivers and connections on short-term and long-term scales. The NDACC FTIR multidecadal data sets, such as those produced by IZO, are therefore indispensable to address the major challenges of current climate research.

Appendix A: NDACC Uncertainty Budget
The NDACC uncertainty analysis includes the impact of measurement noise and the different model parameter sources. Partic-895 ularly, the error contribution of model parameters can be analytically estimated through the respective error covariance matrix S x,p : where S p is the covariance matrix of the uncertainties ∆p. In this work, S p is estimated considering the error sources, values, and partitioning between random and systematic contributions listed in Table A1. They have been identified as the leading 900 error sources and typical values affecting the different FTIR products .
The error covariance matrix for measurement noise (S x, ) is analytically calculated by where S y, is the covariance matrix for noise in the measurement.
The total statistical and systematic uncertainties (listed in Table 3) are then calculated as the square root sum of the squares of 905 all statistical and systematic errors considered, respectively. Note that the measurement noise is considered as purely random, while the spectroscopy parameters are purely systematic.

Appendix B: Multi-annual Evolution
The multi-annual evolution of the measured time series has been modeled by using a multi-regression fit of different coefficients that consider a mean concentration gas value and variations on different timescales García et al., 2018, 910 and references therein): x m (t) = A 0 + A 1 t + Coefficients A i capture the long-term variations: A 0 and A 1 are related to the linear changes, while coefficients A sin,i and A cos,i define the amplitude and phases of a Fourier series that considers all frequencies between 1 and N − 1. Here N is the 915 total number of years covered by the whole time series series and ∆t = max{t} − min{t} is the time period covered by the whole time series. The coefficients B sin,i and B cos,i capture the intra-annual variation (season cycle) by fitting amplitude and phases of a Fourier series that considers all frequencies between 1 and p. We consider frequencies up to 2yr −1 (P =2) and ∆j = max{j(t)} − min{j(t)} as the intra-annual Julian day period covered by the data, and j(t) the intra-annual Julian day (intra-annual Julian day means Julian day starting each year with 0, i.e, j(t) is between 0 and 366). Note that N = ∆t ∆j .

920
The uncertainty ranges of the fit parameters, including the linear trends, is calculated using the bootstrap resampling method (García et al., 2012, 2021, andreferences therein). This approach is based on recurrently estimating the fit parameters on a mod-ified time series, which results from randomly disturbing the original time series with the residues between the multi-regression fit and the original time series. Thereby, it does not assume that residues follow any specific distribution (i.e. Gaussian) and are uniform over time, which could occur if the modeled fit is not able to properly capture the measured time series.

925
This multi-regression fit has been also used to temporally decompose the FTIR and reference time series, when the quality assessment of the FTIR products is addressed (Section 8). For the hourly and daily means comparison (so-called measurementto-measurement), we work with the de-seasonalised and de-trended time series in order to ensure that the comparison between the measured inter-day variabilities is not affected by the seasonal and long-term signals. This time series is calculated by subtracting from the measured time series (reference or FTIR) the corresponding linear trend, inter-annual and intra-annual 930 signals obtained from the multi-regression fit. For the seasonal comparison, an averaged annual cycle is computed from the multi-annual averaged monthly means of the de-trended time series (measured time series minus the linear trend and the inter-annual variations). Finally, the long-term signals given by the annual means are computed from the de-seasonalised time series, which is computed by subtracting the intra-annual variations from the measured time series. Note that to obtain the linear trends in percentage the measured time series is transformed on a logarithmic scale. Since the trace gas short-term variations 935 are usually much smaller than the climatological or long-term background values, the variations on the logarithmic scale can be interpreted as the variations relative to the long-term background reference.
Data availability. The NDACC FTIR and DOAS products, as well as the ozone sondes, are available from the NDACC archive (www.ndaccdemo.org).
The TCCON FTIR data are accessible via the TCCON Data Archive, hosted by CaltechDATA (https://tccondata.org). The CIMEL PWV data can be downloaded from AERONET database (https://aeronet.gsfc.nasa.gov/), while the water vapour sondes are available from the GRUAN are in charge of the ozone sonde programme at IZO. Finally, all authors discussed the results and contributed to the final paper.
gemeinschaft for the project MOTIV (GeschaFTIRzeichen SCHN 1126/2-1), by the Ministerio de Economía y Competitividad from Spain through the projects CGL2012-37505 (project NOVIA) and CGL2016-80688-P (project INMENSE), and by EUMETSAT under its Fellowship Programme (project VALIASI). This work has been developed within the framework of the activities of the World Meteorological Organization (WMO) Commission for Instruments and Methods of Observation (CIMO) Izaña test bed for aerosols and water vapour remote sensing instruments. García, O. E., Schneider, M., Hase, F., Blumenstock, T., Sepúlveda, E., Gómez-Peláez, A., Barthlott, S., Dohe, S., González, Y., Meinhardt, The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate