In situ temperature measurements in the upper troposphere and lowermost stratosphere from 2 decades of IAGOS long-term routine observation

. Despite several studies on temperature trends in the tropopause region, a comprehensive understanding of the evolution of temperatures in this climate-sensitive region of the atmosphere remains elusive. Here we present a unique global-scale, long-term data set of high-resolution in situ temperature data measured aboard passenger aircraft within the European Research Infrastructure IAGOS (In-service Aircraft for a Global Observing System; http: //www.iagos.org). This data set is used to investigate temperature trends within the global upper troposphere and lowermost stratosphere (UTLS, < 13 km) for the period of 1995– 2012 in different geographical regions and vertical layers of the UTLS. The largest number of observations is available over the North Atlantic. Here, a neutral temperature trend is found within the lowermost stratosphere. This contradicts the temperature trend in the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis, in which a signiﬁcant (95 % conﬁdence) temperature increase of + 0.56 Kdecade − 1 is found. Differences between trends derived from observations and reanalysis data can be traced back to changes in the temperature difference between observation and model data over the period studied. This study underpins the value of the IAGOS temperature observations as an anchor point for the evaluation of reanalyses and its suitability for independent trend analyses.


Introduction
Temperature changes in the lower stratosphere (∼ 50 hPa) obtained from radiosondes and satellite retrievals show cooling of about 0.5 K decade −1 over much of the globe during the period from 1979 to 1995. Since 1995, the cooling turned into a neutral trend with a larger increase (not significant) over the Antarctic region than over the tropics (Randel et al., 2009;Blunden and Arndt, 2014;Seidel et al., 2016). The robustness of the temperature trends in the lower stratosphere derived from radiosondes (since 1958) and from satellites (since 1979) suffers from instrumental uncertainties such as sensor changes, drifts, etc., implying large uncertainties in the trend estimates (Simmons et al., 2014). In recent years, several studies assessed the uncertainty of temperature trends in the lower stratosphere and the impact on changing trends of radiatively active constituents (such as ozone) or atmospheric dynamics (e.g., Fueglistaler et al., 2014;Seidel et al., 2016).
Temperature trends in the upper troposphere-lowermost stratosphere (UTLS) are even more uncertain due to insufficient regional coverage of in situ observations. Most studies of UTLS temperature trends are based on the global radiosonde network, but most of these data are from the Northern Hemisphere midlatitudes (70 % of radiosonde launches occurred between 30 and 60 • N). Furthermore, these observations suffer from time-varying biases, which cannot capture the large variability in the UTLS and inhomogeneity due to changes in instrumentation (Bencherif et al., 2006;Seidel and Randel, 2006;Xu and Powell, 2010). Satellite observations cover the spatial scale but are limited by their coarse vertical resolution, especially in the UTLS region. A wellsuited data source for temperature profiles is the relatively new Global Positioning System radio occultation (GPSRO) technique (Kursinski et al., 1997;Wickert et al., 2001). The reliability of these measurements within the UTLS region has been demonstrated with trend analyses of GPSRO temperature (Steiner et al., 2009) or GPSRO-derived trends in the thermal tropopause temperature and tropopause height (Schmidt et al., 2010;Rieckh et al., 2014;Khandu et al., 2016). Ho et al. (2017) demonstrated the usefulness of the GPSRO measurements to correct the temperature bias of radiosondes with different sensor types in the lowermost stratosphere (LMS), which is an important task to reduce the temperature uncertainties. Comparison of tropospheric temperature trends derived from homogenized satellite data sets and model simulations find more consistency but require longterm time series (> 17 years) before a robust trend arises from internal climate variability (Santer et al., 2011).
Since the radiative forcing from greenhouse gases, including water vapor, is sensitive to changes in the midtroposphere and the UTLS (Solomon et al., 2010;Riese et al., 2012), this region is extremely important for climate change and for controlling dynamical processes governing stratosphere-troposphere exchange (Gettelman et al., 2011). Furthermore, the variability and changes in the temperature in the UTLS play an important role in regulating the exchange of water vapor, ozone, and other trace gases between the troposphere and the stratosphere.
Continuous in situ observations of these properties in the UTLS region can only be conducted with satellite and aircraft measurements over a large spatial region. Automated aircraft temperature observations are collected, along with an increasing number of humidity data through the World Meteorological Organization (WMO) Aircraft Meteorological DAta Relay (AMDAR) program (WMO, 2014;Petersen, 2016). Petersen (2016) showed that at flight level the errors of temperature and wind in the 3-48 h forecast were reduced by nearly 50 % when assimilating data from passenger aircraft. However, Ballish and Kumar (2008) and Drüe et al. (2008) identified that the AMDAR aircraft temperature is strongly affected by a warm bias, which can fluctuate by altitude, aircraft type, and phase of flight, while the reason for this bias is not fully understood (Ingleby et al., 2016).
Previous studies already used the passenger aircraft temperature measurements from MOZAIC (Measurement of ozone and water vapor by Airbus in-service aircraft Marenco et al., 1998) for intercomparison with GPSRO measurements and ECMWF (European Centre for Medium-Range Weather Forecasts) analyses. Bortz et al. (2006) analyzed MOZAIC temperature measurements at cruise altitude from 1994 to 2003 within the tropical region. They found no significant temperature increase within the upper tropical troposphere. The authors concluded that the temperature mea-surements are representative enough to be used in intercomparison studies with satellite (Microwave Sounding Unit, MSU) and radiosonde measurements within this region, but at this time the data record was too short for trend estimates. Heise et al. (2008) compared around 2700 MOZAIC in situ temperature profiles with profiles from GPSRO and analyses from ECMWF between 2001 and 2006. They concluded that MOZAIC in situ temperature had no bias against the ECMWF temperature above 300 hPa, whereas GPSRO showed a cold bias of −0.9 K compared to the MOZAIC temperature. Since 2001, GPSRO data have been assimilated in numerical weather prediction models and reanalysis products. Schmidt et al. (2010) characterized the tropopause inversion layer in the Northern Hemisphere with temperature profiles from in situ measurements and from the model for the period from 2001 to 2009 and concluded that the cold point at the tropopause agreed well.
In this study, IAGOS (In-service Aircraft for a Global Observing System) temperature observations, which are available for almost 2 decades since 1994, are analyzed. The geographical coverage of the measurements is shown in Sect. 2, in which the reliability of the IAGOS observations and the data selection is discussed. The UTLS temperature distribution and derived temperature trends are presented in Sect. 3, and their robustness and suitability for the evaluation of global-scale reanalyses with the example of ERA-Interim (ERA-I) are discussed in Sect. 4. Due to varying geographical coverage of these data, they cannot provide a full global assessment yet, but as we will show in our study, they can serve as an anchor point for trend analyses and evaluation of reanalysis at least over the extratropics. This conclusion is included in Sect. 5.
2 Data selection and methods

The IAGOS European Research Infrastructure
Since 1994, IAGOS in situ observations of essential climate variables (temperature, water vapor, and ozone) in the UTLS are provided on a global scale by the European Research Infrastructure IAGOS . IAGOS builds on the former EU framework projects MOZAIC and CARIBIC (Civil Aircraft for the Regular Investigation of the atmosphere Based on an Instrument Container Brenninkmeijer et al., 2007).
Currently (2017), up to 10 passenger aircraft from various international airlines are equipped with scientific instruments to monitor the meteorological state (temperature, water vapor, and wind) and atmospheric chemical composition. In addition to the measurements of temperature and (relative) humidity, the IAGOS-CORE Package 1 includes instruments to measure ozone and carbon monoxide. Ozone is measured by UV absorption (Thermo Scientific for Model 49i Ozone Analyzer), and CO by infrared absorption (Thermo Scientific  . (b) Packages 1 and 2 installed aboard the Airbus A340-300, and the inlet plate including the Rosemount housing (photograph courtesy of Lufthansa).
for Model 48CTL CO Analyzer). Both instruments are regularly calibrated before and after deployment and the overall uncertainty for ozone is 2 ppb ± 2 % and 5 ppb ± 5 % for CO . Additionally, positions, pressure, ambient temperature (measured from the aircraft), aircraft speed, wind speed, and wind direction are provided by the A330 and A340 avionic systems (details are given by Petzold et al., 2015). For completeness, several additional parameters (NO y , NO, NO 2 ) are measured or will soon be measured (CH 4 , CO 2 ) by IAGOS Package 2a-d, which is described by Petzold et al. (2015).

Temperature measurements and evaluations
On IAGOS aircraft, temperature is measured in situ with a compact airborne sensing device AD-FS2 (Aerodata, Braunschweig, Germany), which is installed in an appropriate aeronautic housing . From 1994 to 2009 a platinum resistance sensor (Pt100) was attached near the humidity sensing device. After a design change from MOZAIC to IAGOS, this sensor is now directly at the humidity sensor ( Fig. 1). Intercomparison between both systems showed a temperature deviation of less than 0.1 K in the calibration chamber, which is below the sensor uncertainty. The temperature is measured with an uncertainty of ±0.25 K by a microprocessor-controlled transmitter unit (model HMT333, Vaisala, Finland), which passes the signal to the data acquisition system of the IAGOS Package 1 instrument, in which it is recorded with a time resolution of 4 s Petzold et al., 2015). Pre-and post-deployment calibrations from the laboratory are used to evaluate the temperature signal and to ensure the quality of the temperature measurement Neis et al., 2015). Typical deployment phases are in the range of 2-3 months. Accounting for corrections of adiabatic compression at the inlet part of the housing (Stickney et al., 1994;Moninger et al., 2003), the overall uncertainty of the ambient air temperature is ±0.5 K. More details are given in the standard operating procedure (SOP) of the IAGOS Capacitive Hygrometer (ICH), available at http://www.iagos.org. For the purpose of this analysis the data set was reduced to 1 min averages.
To ensure the reliability of the IAGOS temperature observations, we make use of temperature measurements from the AIRTOSS-ICE aircraft campaign (Aircraft Towed Sensor Shuttle -Inhomogeneous Cirrus Experiment), which focused on midlatitude cirrus clouds (Neis et al., 2015). During this campaign, the IAGOS temperature instrument was installed on the research aircraft (Learjet 35A) and provides the opportunity to compare both temperature measurements. The aircraft temperature measurement was made with a Pt100 thermistor mounted in the same type of Rosemount as for the IAGOS temperature measurements. The temperature sensor of the research aircraft was calibrated regularly with an uncertainty of about 0.5-1.0 K. Figure 2 shows the temperature correlation and the temperature bias ( T ) at pressures below 400 hPa during seven flights. The general behavior between both temperature measurements agreed well along the flight tracks. The mean deviation is T = −0.3 K. The temperature deviation is pressure dependent and decreases towards 200 hPa. The temperature correlation is high and the temperature bias is smaller than the overall uncertainty of 0.5 K (sensor and adiabatic compression correction), which demonstrates the capability of the IAGOS temperature sensor to measure the ambient temperature at cruise altitude very precisely. Figure 2. (a) Temperature correlation for all measurements below the pressure altitude 400 hPa and (b) temperature bias of the IAGOS temperature instrument (T ICH ) compared to a regularly quality-checked temperature sensor (T AC ) from the research aircraft for seven flights during the AIRTOSS-ICE campaign in 2013. The grey area marks the total range of the total uncertainty for the IAGOS temperature observations. The temperature bias was separated for measurements between 300 and 400 hPa (orange) and 200 and 300 hPa (blue) to highlight the shift to a smaller bias at lower pressure.   Figure 3. Relative frequency of the IAGOS temperature observations at cruise altitude (p < 350 hPa) in different regions (Table 1). The total number of 1 min averages is 14.8 million. Note, 1 % includes 0.2 million 1 min mean data points within the UTLS region.
The temperature measured from the aircraft (T AC ) is based on total air temperature (TAT) designed for subsonic aircraft (Goodrich Corporation, formerly Rosemount Aerospace). The total air temperature is defined as the ambient air temperature plus the temperature increase due to adiabatic compression in the Rosemount housing. Typically three TAT sensors (platinum resistance sensor) are installed at the nose region of the aircraft, but in general only one is used for the pilots and stored for IAGOS. The other two sensors are used to monitor the differences between all TAT sensors. In general, the airlines follow the AMDAR quality recommendations (WMO, 2003), and the TAT sensors are regularly checked by visible inspection. An exchange of one TAT sensor is per- formed if it differs more than 3 K from the other two TAT sensors.

Spatial and temporal data coverage
In this study all temperature measurements from the IAGOS-CORE flights at cruise altitude (p < 350 hPa) from January 1995 to December 2012 are used. More recent measurements are not yet validated and therefore not included in this study. Therefore, most of the IAGOS temperature observations rely on the former instrument design. Following previous IAGOS or MOZAIC analyses Dyroff et al., 2015;Thomas et al., 2015;Stratmann et al., 2016), we divide the data into 14 geographical regions ( Fig. 3 and Table 1). Seasonal and regional differences in the temperature behavior can then be linked to the different  12499 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008  dynamical patterns. The largest number of measurements is obtained in the midlatitudes (North America, North Atlantic, Europe, and Central Asia), whereas the number of measurements in the higher latitudes (northern Canada, Greenland, Scandinavia, northern Asia) and the tropical regions (Middle America, tropical Atlantic, North Africa, tropical Asia, South America, southern Africa) is much smaller and does not provide continuous coverage over the period presented. The data coverage for each region over the analyzed period is shown in Fig. 4. Changes in radiatively active chemical constituents (e.g., ozone and water vapor) within the LMS and the upper troposphere (UT) have different impacts on the ambient temperature in these layers. In general, the tropopause layer (TPL) separates the stable stratified LMS and the unstable UT. In the present study, the pressure of the thermal tropopause (p TPHWMO ) is derived from ERA-I (see below), and it is used to determine the position of the aircraft relative to this layer and to distinguish if the aircraft flew within the UT, the TPL, or the LMS. This is achieved using the following criteria: A comparable definition has been used by Thouret et al. (2006), in which the pressure of the 2 PVU (potential vorticity unit) surface was used to define the dynamical tropopause layer with a vertical depth of ±15 hPa. The thermal tropopause is valid within all latitude bands, whereas the dynamical tropopause cannot be used in tropical regions because the Coriolis parameter is zero. Therefore, potential vorticity (PV) goes to zero and the 2 PVU surface is not defined (Boothe and Homeyer, 2017). Within the IAGOS project, measurements of ozone (since 1994) and carbon monoxide (since 2002) are available and are used to justify our layer classification scheme. Figure 5 shows the cumulative distribution of ozone, carbon monoxide, and PV. All distributions demonstrate a clear separation between the three layers based on the thermal tropopause. The median ozone mixing ratio in the UT is 60 ppb, median CO mixing ratio is 90 ppb, and median PV is 0.5 PVU. Within the LMS, the median values for O 3 (310 ppb), for CO (40 ppb), and for PV (7 PVU) are consistent with previous studies (Pan et al., 2004;Kunz et al., 2008;Brioude et al., 2009;Schmidt et al., 2010Schmidt et al., ). 1995Schmidt et al., 1996Schmidt et al., 1997Schmidt et al., 1998Schmidt et al., 1999Schmidt et al., 2000Schmidt et al., 2001Schmidt et al., 2002Schmidt et al., 2003Schmidt et al., 2004Schmidt et al., 2005Schmidt et al., 2006Schmidt et al., 2007Schmidt et al., 2008

Meteorological reanalysis
ERA-I covers the period from 1979 until present, assimilating observational data from various satellites, radiosondes, buoys, commercial aircraft, and others (Dee et al., 2011;Simmons et al., 2014). Note that the IAGOS temperature observations are not assimilated in any numerical weather prediction model or reanalysis product, which makes it unique for model evaluation. For this study, the 6-hourly outputs from ERA-I (0.75 • × 0.75 • ) were interpolated onto a 1 • × 1 • horizontal grid and on 60 vertical levels of constant pressure and potential temperature (Kunz et al., 2014). Additionally, the variables of the PV, and the pressure of the thermal tropopause (p TPHWMO ) based on the WMO criteria were calculated (WMO, 1957;Reichler et al., 2003). The ERA-I data were linearly interpolated (longitude, latitude, pressure, time) onto each flight track with 4 s resolution as described by Kunz et al. (2014). As for IAGOS temperature observations, the ERA-I data set was reduced to 1 min averages.

IAGOS temperature measurements over different regions
Approximately 69 % of the IAGOS temperature measurements in the UTLS were obtained in the midlatitude band between 30 and 60 • N. Therefore, we show our detailed analysis within all three layers of the UTLS only over the North Atlantic region between January 1995 and December 2012 and provide additional material in the Supplement. Figure 6 shows monthly median temperature of the IAGOS observations in the LMS, TPL, and UT for the period from 1995 to 2012 over the North Atlantic region. All layers show a seasonal temperature variation of 5-10 K in each year. The warmest temperatures are observed within the LMS and the coldest (as expected) within the TPL. Between 2006 and 2008, the wintertime temperature minima at the TPL and the UT are almost 3-4 K warmer compared to the other years. These enhanced temperatures can also be observed over Europe and Central Asia within these layers. All other regions in the extratropics show time series in amplitude and seasonal variations mostly comparable to the North Atlantic region, if enough measurements were available for each vertical layer. Throughout all regions in the extratropics, the summer (winter) months are always the warmest (coldest) within all three layers (see Tables S1-S3 in the Supplement). Over the tropical regions, no measurements are available in the LMS and TPL because the TPL is mostly above cruise altitude. In the UT, the temperature is mostly constant throughout the year for each tropical region.

Temperature anomalies and trends
The temperature trends are derived from using a robust regression analysis over the full period of de-seasonalized temperature observations. In all regions, 18-year monthly averages (e.g., mean of all January values, mean of all Febru- ary values, etc.) were subtracted from the time series of each layer. The temperature trends exhibit no significant non-linear contributions. We ensured that the data are homoscedastic, and they are nearly normally distributed (median and mean values are very close; see Tables S1, S2, and S3). Additionally, we checked that the data are not autocorrelated. Temperature trends are reported only if at least 90 % of all months have at least 200 data points each (Fig. 4). The Mann-Kendall test is used to identify the trend significance (Mann, 1945;Kendall, 1975;Gilbert, 1987). The 90 % threshold was chosen because the Mann-Kendall significance of the trend analysis did not change when 10 % of the data were randomly excluded from the trend calculation. The robustness of temperature trends was tested by skipping the first or last year of the 18-year period (Table S4). Within all layers and all regions, each trend kept the same sign and the trend values varied within the standard error. The only exception was the UT over North America where the temperature trend changed from slight positive trend (18 years) to neutral when the final year was removed. Figure 7 shows the time series of temperature anomalies and linear trend lines over the North Atlantic region using the IAGOS observations. The monthly temperature anomalies in each layer vary by ±3 K (strongest within the UT). Temperature trends are +0.22 (±0.20) K decade −1 in the UT, +0.25 (±0.16) K decade −1 in the TPL, and −0.05 (±0.17) K decade −1 in the LMS. None of the trends are significant at the 95 % level. In the UT, the amplitudes of the smoothed time series are larger and decrease towards the upper levels. They show two warmer phases and two colder phases during the period analyzed. Figure 8 and Table 3 summarize the temperature trends for different regions in the different layers. Within the LMS, significant cooling is observed over Greenland (−1.39 (±0.29) K decade −1 ) and North America (−0.71 (±0.21) K decade −1 ). The smaller trend over Europe (−0.53 (±0.20) K decade −1 ) is not significant. The other regions do not have enough data for a meaningful trend analysis in the LMS. Within the TPL, there are only three regions with sufficient data coverage, and no significant trend is detected, although there is a weak indication for a small warming over North America and the North Atlantic and for a small cooling over Europe. In the UT, significant cooling is found over North America (−1.08 (±0.18) K decade −1 ), while temperatures over Europe (−0.59 (±0.1) K decade −1 ) insignificantly decrease. Over the North Atlantic (+0.22 (±0.20) K decade −1 ) and over Central Asia (+0.32 (±0.33) K decade −1 ), temperatures increase insignificantly. Over tropical Asia (−0.54 (±0.04) K decade −1 ) temperatures show a tendency to decrease, but without significance at the 95 % level. This result is puzzling because it is assumed that the temperature increases in the tropics in the UT, equivalent to the surface temperature (Khandu et al., 2016). One reason could be related to the tropopause definition, but O 3 (40-80 ppb) and PV (always below 1 PVU) indicate that the selected air mass is tropospheric. Another reason could be related to a higher variability in the temperature due to large-scale influence or simply that the data coverage is still too poor in this region, which might lead to a higher variability of the local temperatures which then mask the temperature trend.   Table 3.

Intercomparison with ERA-I reanalysis
ERA-I reanalysis is widely used for intercomparisons and is currently the latest global atmospheric reanalysis product provided by ECMWF (Dee et al., 2011;Fujiwara et al., 2017). Simmons et al. (2014) indicated that there is a problem with the temperature near the tropopause in the tropics and extratropics. Here, we demonstrate the use of IAGOS observations to evaluate the ERA-I reanalysis within the UTLS region. Figure 9 shows the intercomparison between the temperature of the IAGOS observations and the ERA-I reanalysis of all 1 min mean data over the North Atlantic region at cruise altitude (below 350 hPa). The good agreement between both temperatures is reflected by the high correlation coefficient (R 2 = 0.97) and the slope of 0.94 from the regression fit. The bias between the temperature of the IAGOS measurements and ERA-I is −0.02 K over the entire period. The absolute bias is 0.81 (±0.72) K, and 93 % of the data are close to the 1 : 1 line (±2 K). The spread of the correlation corresponds to a larger variability in the IAGOS temperature measurements when the aircraft flew through clouds, or from the interpolation of the gridded ERA-I temperature to the aircraft position. IAGOS measurements with a backscatter cloud probe have been available since 2011 and these measurements will be used in the future to distinguish between clear-sky and in-cloud measurements to further reduce the uncertainties. The good agreement between the 1 min mean temperature data from ERA-I and the IAGOS observation in the other regions is documented in Table 2, in which R 2 is always larger than 0.97, the slope of each regression fit is between 0.94 and 0.98, and the absolute bias varies between 0.60 and 0.98 K, with a SD of 0.56-0.79 K.

Anomalies and trends using ERA-I
The ERA-I temperature time series and temperature anomalies agree with the IAGOS measurements mostly in phase and amplitude within all vertical layers over the North Atlantic region (Figs. 6 and 7). Larger deviations can be found at the beginning of the time series in the mid-1990s, when the IAGOS temperatures are warmer than ERA-I, and after 2008, when the opposite occurs. The temperature trends from ERA-I show a slight warming over the North Atlantic region (+0.38 (±0.18) K decade −1 ) in the UT and 0.46 (±0.15) K decade −1 in the TPL. Neither the trends in the UT nor in the TPL are significant.
In the LMS, ERA-I temperatures show a significant increase of +0.56 (±0.17) K decade −1 over the 18 years of the study period at the 95 % significance level, which is not the case for the IAGOS observations, in which the temperature shows a insignificant decrease of −0.05 (±0.17) K decade −1 over the period analyzed (see Sect. 4.1 above). We calculated the temperature trends in the lower (5th percentile, colder temperatures) and upper (95th percentile, warmer temperatures) ranges of the data to determine the robustness of these trends within the LMS. For the 5th percentile, the temperature trend from ERA-I remains significant (99 % confidence) at +0.48 K decade −1 , while the trend from the IA-GOS observations still does not significantly decrease at −0.22 K decade −1 (87 % confidence). In the upper range, the differences between the temperature trends diminish to +0.12 K decade −1 (53 % confidence) using ERA-I and −0.07 K decade −1 (73 % confidence) using the IAGOS observations. This shows that within the colder temperature regimes, the trend difference is significant, while for the warmest temperatures regimes, the variability is too large to obtain a significant temperature trend. Figure 8 and Table 3 also contain the temperature trend estimates for ERA-I. In contrast to the IAGOS observations, a cooling tendency in the LMS can only be found over Greenland (−0.79 (±0.29) K decade −1 ), and over North America (−0.25 (±0.21) K decade −1 ), but both are not significant. The strong cooling trend over Greenland results from elevated temperatures in the late 1990s. Over Europe, where IAGOS observations showed an insignificant decrease, ERA-I exhibits a very small (and not significant) warming of +0.11 (±0.19) K decade −1 .
Within the TPL, the temperature trends are mostly comparable to the IAGOS observations (except over Europe), but all trends are not significant. The best consistency between the temperature trends from IAGOS observations and ERA-I is seen for the UT layer, except over Europe, where ERA-I shows slightly less cooling than IAGOS observations, and Central Asia, where ERA-I has a stronger warming trend.

Possible sources of the differing temperature trends
There is a remarkable coherence of the temperature time series and the anomalies of the observations and ERA-I in all regions and layers, but the linear temperature trends reveal a deviation between the two data sets, especially in the LMS. In order to investigate this in more detail, Fig. 10 shows the annual deviation between the IAGOS observations and ERA-I at the LMS over eight regions. From 1995 to 2002 the observations are always warmer than ERA-I within the Northern Hemisphere. The sign changes to neutral until 2008, and since 2009 the IAGOS observations have been colder than ERA-I. This temporally varying temperature bias is one reason why the temperature trends are not equal between the two data sets. The differences cannot be explained by the chosen definition to determine the different layers within the UTLS region. For example, ozone observations show seasonal variations with values between 350 and 640 ppb, which are robust values for stratospheric air masses with a typical seasonality (Brioude et al., 2009). Additionally we use CO observations (from 2002 to 2008) and PV with similar results. Table 2. Statistical parameters summarized for all regions studied (see Fig. 9). The slopes and intercepts are determined from the linear robust regression fit between ERA-I and IAGOS temperature. Additionally, for each region the absolute bias and its SD are given. Furthermore, we demonstrated that temperature measurements from IAGOS agree well with quality-controlled measurements aboard research aircraft (Fig. 2). Therefore, we expect that the trends from IAGOS are robust, which leads to the hypothesis that ERA-I exhibits temperature biases that vary with time. Some of the observed deviations can be explained by changes in the data assimilation sources in ERA-I, which employed several new satellite products at various times after the year 2000. Simmons et al. (2014) showed that the source of input data (e.g., inconsistent sea surface temperature bias) changed in ERA-I after 2002. Since 2001, temperature profiles have been assimilated from GPSRO measurements into ERA-I, which led to a cold bias in ERA-I, where the largest effects appeared after 2006 in ERA-I, when the number of assimilated data increased. This could lead to a warming of the lower stratosphere and a cooling at around 200 hPa (Poli et al., 2008(Poli et al., , 2010. Another source of the deviation could be expected from the increasing number of temperature measurements from the AMDAR system onboard passenger aircraft. Ballish and Kumar (2008) showed that the temperatures from passenger aircraft are warm-biased at cruise altitude (200-300 hPa). This has not been accounted for in ERA-I. Furthermore, the WMO reported that the number of aircraft reports increased from 100 000 in early 2000 to more than 350 000 after 2012 and became the third most important data source of assimilation for short-term forecast in numerical weather prediction (WMO, 2014;Petersen, 2016).
In order to test the second hypothesis, Fig. 7 also includes the temperature trend derived over the North Atlantic region using the aircraft temperature measurements (T AC ), which we assume to be comparable to AMDAR measurements. The temperature trend in the LMS is also positive, and it is in between the neutral temperature trend using the IAGOS temperature observations and the positive temperature trend using ERA-I. This gives an indication that the assimilated aircraft measurements could indeed be a cause for bias in the ERA-I data. It is therefore planned to introduce a bias correction for AMDAR aircraft observations in the next ERA reanalysis (ERA5, D. Dee, personal communication, 2016).

Conclusion
In this study, nearly 2 decades of in situ IAGOS temperature measurements by the European Research Infrastructure IAGOS using passenger aircraft were presented and used to determine regional trends in the UTLS. The quality of the temperature measurements is regularly evaluated in the laboratory through pre-and post-deployment calibration, and it has been assessed by intercomparison with temperature observations from research aircraft.
UTLS temperature time series and trends are analyzed for 14 different regions within the northern latitudes, midlatitudes, and tropics and are separated into the lowermost stratosphere, the tropopause layer, and the upper troposphere using the thermal tropopause from ERA-I as a reference layer.
The interannual variability within all regions and layers is mostly consistent between the IAGOS observations and ERA-I during the past 18 years (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012), which is not the case for the temperature trends. For the temperature trend, regions are only considered when at least 90 % of all months are available. Over the North Atlantic, where the largest number of measurements are available, we found a significant (95 % confidence level) positive trend (+0.56 K decade −1 ) in ERA-I and a small negative trend (−0.05 K decade −1 ) in the observations (not significant) in the LMS. A significant (95 % confidence) negative temperature trend over Greenland and North America in the LMS for the IAGOS temperature is found. Over Europe both temperature trends are not significant, but the temperature trend from the IAGOS observations is negative and for positive ERA-I. Within the tropopause layer we found mostly comparable trends, except over Europe, where the sign of trends is different, but all trends are not significant. The calculated temperature trends in the UT show all the same signs for each region, whereas only a significant trend is found over North America, with a large cooling rate of about −0.92 K decade −1 (ERA-I) and −1.08 K decade −1 (IAGOS) in the last 18 years.
The large deviation between the different LMS temperature trends from IAGOS observations and ERA-I data is mostly related to the temporally varying bias between both temperature time series. As we have no reason to assume an evolving bias in the IAGOS observations, we conclude that ERA-I temperatures are too cold between 1995 and 2001 and too warm after 2007. These dates roughly correspond with changes in the data streams that were used in the ERA-I data assimilation. The evolution of ERA-I temperature can be explained by additional assimilation of GPSRO data and data from the AMDAR passenger aircraft network, which have been shown to be warm-biased. These data sources play an increasing role in the ERA-I data assimilation after 2006. Our recommendation is therefore to include a bias correction for these temperature measurements in future versions of the European reanalysis and use IAGOS observations as an anchor point.
The IAGOS temperature measurements highlight the need of independent global measurements with high and longterm accuracy to quantify long-term changes, especially in the UTLS region, and to help identify inconsistencies between different data sets of observations and models. Due to the expansion of the IAGOS aircraft fleet with airlines from other continents, we are looking forward to investigating temperature changes in the UTLS over several more regions in a few years from now.
Data availability. The measurements are available free of charge on www.iagos.org.
Author contributions. FB conducted the analysis of the measurements and wrote the paper. HGJS, PN, SR, and UB were in charge of the instrument setup, calibration, and processing of the measurements. AP, MGS, and AW helped with the interpretation of the data and in the paper-writing process. PN, DB, and VT were in charge of providing the CO and ozone measurements of MOZAIC and IA-GOS.