On the possibilities to use atmospheric reanalyses to evaluate the warming structure in the Arctic

There has been growing interest in the vertical structure of the recent Arctic warming. We investigated temperatures at the surface, 925, 700, 500 and 300 hPa levels in the Arctic (north of 70 N) using observations and four reanalyses: ERA-Interim, CFSR, MERRA and NCEP II. For the period 1979–2011, the layers at 500 hPa and below show a warming trend in all seasons in all the chosen reanalyses and observations. Restricting the analysis to the 1998–2011 period, however, all the reanalyses show a cooling trend in the Arctic-mean 500 hPa temperature in autumn, and this also applies to both observations and the reanalyses when restricting the analysis to the locations with available IGRA radiosoundings. During this period, the surface observations mainly representing land areas surrounding the Arctic Ocean reveal no summertime trend, in contrast with the reanalyses whether restricted to the locations of the available surface observations or not. In evaluating the reanalyses with observations, we find that the reanalyses agree better with each other at the available IGRA sounding locations than for the Arctic average, perhaps because the sounding observations were assimilated into reanalyses. Conversely, using the reanalysis data only from locations matching available surface (air) temperature observations does not improve the agreement between the reanalyses. At 925 hPa, CFSR deviates from the other three reanalyses, especially in summer after 2000, and it also deviates more from the IGRA radiosoundings than the other reanalyses do. The CFSR error in summer T925 is due mainly to underestimations in the Canadian-Atlantic sector between 120 W and 0. The other reanalyses also have negative biases in this longitude band.


Introduction
The surface warming in the Arctic is observed to be at least twice as large as the global average warming in the recent decades (Hassol, 2004;Bekryaev et al., 2010).Causes for the so-called Arctic amplification of the warming have been proposed to be the snow and ice feedbacks (Manabe, 1983;Hall, 2004) and the poleward energy transport from lower latitudes (Alexeev et al., 2005) among many proposed causes.Graversen et al. (2008) and Screen and Simmonds (2010) used the vertical structure of Arctic temperature trends in reanalyses to gain insights into this issue.They argue that if the maximum warming occurs much above the surface, the poleward energy transport would be the primary mechanism for the warming amplification.Graversen et al. (2008) found the maximum warming well above the surface, whereas Screen and Simmonds (2010) showed the warming to be largest in the lowest layers.Graversen et al. (2008) and Screen and Simmonds (2010) used different reanalyses and different periods for their analyses.Chung and Räisänen (2011), on the other hand, addressed the origin of Arctic warming rather than the Arctic amplification.They hypothesize, based on idealized climate model experiments, that if the summertime warming is largest well above the surface, the poleward energy transport would be mainly responsible for the Arctic warming irrespective of the warming structure in winter.Taken together, the vertical profile of Arctic warming has emerged as one of the top climate issues.
Here, we use sounding and surface observations, and reanalyses collectively to investigate the recent warming and its vertical structure in the Arctic.In doing so, we also evaluate C. E. Chung et al.: On the possibilities to use atmospheric reanalyses to evaluate the warming structure the reanalyses in Arctic warming.Reanalysis evaluation is important because reanalyses are so broadly applied in climate research, including Arctic research.Reanalyses are not observation but commonly treated as observation in the literature.For example, the atmospheric forcing for ocean, sea ice, glacier, and hydrological models are often taken from reanalyses.Reanalyses are also employed in the studies of climate variability and trends as well as occurrence of extreme events.Reanalyses are, however, not free of errors (Lüpkes et al., 2010;Bromwich et al., 2011;Jakobson et al., 2012).
Reanalysis is a system where observations are assimilated into a global model in order to provide the atmospheric state continuous in space and time (Saha et al., 2010;Dee et al., 2011;Rienecker et al., 2011).Reanalyses differ from each other due to several reasons.First, the usage of observations varies between reanalyses.While all the reanalyses assimilate radiosonde sounding data, there are differences in the assimilation of satellite data.Also, the assimilation method varies.ERA-Interim applies a method based on four-dimensional data assimilation (4-D-VAR), where exact time of observations is taken into account by sophisticated means, whereas other reanalyses apply simpler methods.Second, there are large differences in the horizontal and vertical resolutions of the models applied.Third, the physical parameterization schemes for radiative transfer, turbulent mixing, cloud physics, and surface processes vary between the models.
Over the Arctic, where there are very few in situ observations, the quality of reanalyses is particularly questioned.Recent studies (e.g., Screen et al., 2012) tend to utilize multiple reanalyses to establish the robustness of the vertical warming structure in the Arctic.In this study, we examine 2 m air temperature, temperature (T ) at 925, 700, 500 and 300 hPa, and the temperature difference between 925 and 500 hPa levels in the latest reanalyses so as to estimate the accuracy of each reanalysis product in Arctic warming.Recently, Alexeev et al. (2012) used sounding observations to evaluate the warming structures in the older NCEP/NCAR and ERA-40 reanalyses.Here, we consider the most recent reanalyses and focus on Arctic-averaged temperatures.

Surface (air) temperature
We analyze monthly mean 2 m air temperature fields from three latest-generation atmospheric reanalyses: ECMWF ERA-Interim reanalysis (ERA-I) (Dee et al., 2011), NASA Modern-Era Retrospective analysis for Research and Applications (MERRA) (Rienecker et al., 2011) and the NCEP Climate Forecast System Reanalysis (CFSR) (Saha et al., 2010).The 2 m air temperature will be referred to as the surface air temperature for the remainder of the paper.The CFSR surface air temperature analysis is available at two resolutions; here we use the T62 Gaussian grid version.We also analyze NCEP's earlier-generation reanalysis product, the so-called NCEP II reanalysis (Kanamitsu et al., 2002).The period of analysis is from 1979 to 2011.Due to data availability, CFSR is only analyzed until the year 2009.
The reanalyses agree on the large-scale features in surface air temperature climatology (see Fig. 1 for the summer season).Furthermore, for all four reanalyses considered, the domain-average air surface temperature for the 70 • -90 • N region shows a clear warming trend, both for annual and seasonal means (Fig. 2a-e).There are, however, also disagreements between the reanalyses, which will be discussed below.
To evaluate the surface air temperature in the reanalyses, we use the Goddard Institute for Space Studies (GISS) analysis of global surface temperature change (Hansen et al., 2010), referred to here as "GISTEMP".GISTEMP integrates in situ surface air temperature measurements over land-and ship-based and satellite-derived sea surface temperature (SST) measurements.The SST measurements are, however, only used over year-round ice-free areas.The latter is because GISTEMP data were produced for comparison with the surface air temperature in climate models and only in year-round ice-free areas is SST anomaly a good approximation to surface air temperature anomaly (Hansen et al., 2010).The GISTEMP version we use here is a gridded monthly mean data set with a 250 km smoothing.The use of a smoothing distance of 250 km instead of 1200 km (the default of GISTEMP) avoids the uncertainty related to the extrapolation of temperature measurements made at Arctic  N average is shown.On the right, the average is calculated using only those reanalysis data that correspond to the available GISTEMP data in location and time.Note that T s refers to surface air temperature (SAT) for the reanalyses, and to a combination of SAT over land and SST over ocean for GISTEMP.
observation sites to large distances over the open ocean or sea ice, thereby providing a more robust point of comparison for the reanalyses.On the other hand, a consequence of the 250 km smoothing distance is that it leaves a large amount of data gaps.The GISTEMP data we use are defined only for a fraction of the Arctic area (on average, 27 % from 1979 to 2011), mainly limited to the vicinity of the observation sites on the land and permanently ice-free parts of the Barents Sea and Greenland Sea (see Fig. 11).The GISTEMP algorithm and its application to the Arctic are discussed in more detail in Hansen et al. (2010).
Figure 2f-j compare the 70-90 • N average surface air temperature anomalies in the reanalyses with those of the GIS-TEMP data.Unlike Fig. 2a-e, which use all reanalysis data north of 70 • N, Fig. 2f-j use, after linear interpolation onto the GISTEMP grids, only those months and grid cells corresponding to valid GISTEMP data.In generating Fig. 2f-j, the area averaging is done for each month, and then seasonal or annual means are computed.The temperature anomalies are defined with respect to the climatology of years 1979-2009.Note that while GISTEMP originally provides temperature anomalies with respect to the 1951-1980 climatology on each grid, the anomalies with respect to the 1979-2009 climatology are formed by subtracting the 1979-2009 mean of Arctic-average GISTEMP data.
The surface air temperature in the reanalyses are evaluated in Fig. 2f-j.In summer, the spread between the reanalyses (Fig. 2h) tends to decrease slightly from that in Fig. 2c.However, in winter, the spread seems to increase.Thus, overall, restricting the analyses to the regions with GISTEMP data does not improve the mutual agreement between the reanalyses.This might actually not be surprising because in the Arctic region GISTEMP is mainly based on in situ surface air temperatures from land, and among the reanalyses considered in this study, these data are only assimilated in ERA-Interim.This is done through a separate surface analysis, which is based on Optimal Interpolation, in contrast to 4dVAR in the main atmospheric analysis (Dee et al., 2011).MERRA assimilates in situ surface air temperatures only over oceans (from ships and buyos; Rienecker et al., 2011), while CFSR and NCEP II do not explicitly assimilate surface air temperature observations (Wang et al., 2011).

Upper-air temperatures
As for upper-air temperatures from the reanalyses, we again use ERA-Interim reanalysis, MERRA, CFSR and NCEP II reanalysis.The CFSR upper-air temperature product is available at two resolutions; here we use the 2.5 • × 2.5 • version.We first discuss the 500 hPa and 925 hPa levels because the temperature trend difference between these two levels is a good measure of the vertical warming structure.
The left panels in Figs.3-6 show, for each season, the 70-90 • N average temperature from the four reanalyses studied.Considering first the 500 hPa level, it is noted that in spring (Fig. 3a) and especially in fall (Fig. 5a) and in winter (Fig. 6a), MERRA is colder than the other reanalyses.In summer the differences between the reanalyses are small, mostly within 0.5 K (Fig. 4a).At the 925 hPa level, NCEP II is systematically colder than the other reanalyses in winter (Fig. 6b).A more striking feature is, however, that in summer, T 925 for CFSR deviates substantially from the remaining three reanalyses, by −0.5 to −1.7 K, since around the year 2000 (Fig. 4b).This outlier behavior of CFSR is surprising because CFSR is an updated product from NCEP II.
Considering the temperature difference between T 925 and T 500 , CFSR again appears as an outlier in summer in the 2000s (Fig. 4c).The low values of T 925 (Fig. 4b) combined with relatively high T 500 (Fig. 4a) result in a T 925 -T 500 difference that is smaller than in the other reanalyses, by more than 1 K in the mid-2000s (Fig. 4c).Another notable feature in Fig. 4c is that for ERA-Interim, the summertime temperature difference between 925 and 500 hPa increases substantially (by almost 2 K) from late 1990s to early 2000s.This  feature is not reproduced by the other reanalyses, and it is mainly linked to a larger increase in T 925 in ERA-Interim as compared with MERRA, CFSR and NCEP II (Fig. 4b).Furthermore, considering the other seasons, it is noted that in winter the difference between T 925 and T 500 is smaller for NCEP II than for the other reanalyses (Fig. 6c) as a result of the relative coldness of NCEP II at 925 hPa (Fig. 6b).
To evaluate the reanalyses, we use the Integrated Global Radiosonde Archive (IGRA) data (Elliott and Gaffen, 1991).IGRA consists of quality-assured soundings over the globe, and has 34 radiosonde stations north of 70 • N. IGRA provides monthly and 4 times daily products, but we only use the latter (hereafter referred to as daily IGRA data) for comparing the reanalyses with IGRA data (the IGRA monthly means are not particularly reliable in the Arctic because they are averaged from the available observations, in many cases  with less than 30 days of data).Moreover, in comparison with the 00Z values of the reanalyses, IGRA soundings within two hours of 00Z are used.Similarly, the 10Z-14Z IGRA soundings are used to evaluate the 12Z reanalysis values.Soundings near 06Z and 18Z are only available from a fraction of the stations and are therefore not included in the analysis below.Naturally, the use of only two observations per day implies that the diurnal cycle is not fully represented.For example, if there are problems specific to a certain part of the diurnal cycle in a reanalysis (such as the representation of stable conditions in the morning, or convective conditions in the afternoon), they could go unnoticed in the analysis.This is mainly a concern when considering data for individual locations (e.g., Fig. 8   In the computation, we use the reanalysis data that correspond to the available IGRA observations.
In the data processing for reanalysis evaluation, we first linearly interpolated the 00Z and 12Z daily reanalysis data onto the IGRA stations.The interpolated data were then averaged over the station locations north of 70 • N over each season for 00Z and for 12Z separately.In this averaging, only the reanalysis data corresponding to the available IGRA data were included.The same averaging procedure was repeated for the IGRA data.Then we averaged the 00Z and 12Z averages, resulting in Tables 1-4 and Figs.3-7.The averaging of the 00Z and 12Z values was done to avoid a possible bias due to the asymmetry between 00Z and 12Z data volumes; the numbers of 00Z and 12Z soundings for each year and station are not always equal.
The evaluation of the reanalyses, based on the available IGRA observations, is shown in the right panels of Figs.3-6.The most salient feature of these figures is that the spread between the reanalyses is much smaller than in the left panels.If the spread is quantified in terms of the difference between each reanalysis seasonal Arctic average temperature and the mean over the four reanalyses, the spread decreases by 54 % when averaged over all the seasons and 925, 700, 500 and 300 hPa levels.All the reanalyses incorporated sounding observations, but the reanalyses differ in the extent of other observations incorporated and in the method of doing it.For example, ERA-Interim, CFSR, and MERRA assimilate satellite radiances, whereas NCEP-II assimilates temperature profiles based on satellite data (Saha et al., 2010;Rienecker et al., 2011;Dee et al., 2011).Differences may also exist in the assimilation method of sounding observations (Lüpkes et al., 2010).It is therefore expected That the reanalyses agree better with each other over the available IGRA data than for the true Arctic-mean values (i.e., the average over the entire area north of 70 • N).Over the central Arctic regions lacking sounding data, the reanalysis systems have more freedom to form their own climate, and thus biases.While spreads between the reanalyses are generally smaller over the available IGRA data, CFSR shows a relatively larger error in summer at 925 hPa since 2000 (Fig. 4e).CFSR shows a cold bias of 0.4-0.8K at 925 hPa in summer since the year 2000 (Fig. 4e).
The temperature difference between the two levels (dT = T 925 − T 500 ) is shown in the bottom panels of Figs.3-6.In summer, CFSR shows the largest deviations from the IGRA data, dT for CFSR being less than that for IGRA by almost 1 K for many years in the 2000s (Fig. 4f).In the 1990s, ERA-interim deviates from the IGRA data most, and the deviation is about 0.5 K. ERA's rapid increase of dT for the true Arctic average from the late 1990s to the early 2000s, as shown in Fig. 4c, is significantly reduced in Fig. 4f.
We now extend our investigation to the 300 and 700 hPa levels.Figure 7a-d display, for each reanalysis and season, the bias compared to IGRA data at the 300, 500, 700, and 925 hPa levels, and Fig. 7e-h show the corresponding rms (root mean square) errors.The bias and rms error are defined for the seasonal and Arctic averages of temperature (see Fig. 7).The exact numerical values are provided in Tables 1-2.Different reanalyses exhibit different degrees of bias and rms error.The disagreement between the reanalyses is often particularly large in the lower and upper troposphere (925 and 300 hPa) compared to the middle troposphere (500 and 700 hPa), as shown in Fig. 7.
Looking into seasonal and true Arctic average temperatures at the 300 and 700 hPa levels, the spread between the reanalyses often exceeds 1 K (not shown).When the Arctic average is made over the region with available IGRA data, the spread between the reanalyses is due primarily to an outlier reanalysis.In addition to the aforementioned CFSR behavior at 925 hPa in summer, NCEP II is an outlier in summer  and fall at 300 hPa.It has a substantial warm bias of about 0.6 K compared to IGRA data, while the other reanalyses show smaller negative biases (−0.37 to −0.03 K).In view of this NCEP II deficiency and the CFSR error in summer at 925 hPa, we note that an outlier reanalysis product tends to correspond to the most erroneous product.Prior to the year 2000 this tendency is, however, less obvious.The tendency we noted here is based on the chosen four reanalyses.
It is further noted from Fig. 7a-d that all the reanalyses have a cold bias at 925 hPa.Also, MERRA has a cold bias at all the 4 levels and 4 seasons.The reasons for this cold bias tendency might be related to free-running climate models having a cold bias in the Arctic.Note that our evaluation of the reanalyses is limited to the use of the IGRA data, which must have been largely incorporated into each reanalysis product.Using sounding observations that are not assimilated over the Arctic, Jakobson et al. (2012) also found the CFSR to perform the worst at 925 hPa (roughly 600 m).However, they found CFSR to be the best reanalysis for temperature in the lowermost 100 m layer, indicating that the performance of each reanalysis depends strongly on altitude.
We further discuss the aforementioned error in CFSR in summer at 925 hPa since 2000 (Fig. 4e). Figure 7b and f confirm that in terms of Arctic-mean values, the old NCEP II reanalysis is superior to the new CFSR in summer at 925 hPa.To better understand this surprising result, we show in Fig. 8 the difference between the reanalysis summertime T 925 and the IGRA value for each station.The most conspicuous feature of Fig. 8 is that all the reanalyses have negative (cold) biases in the Canadian-Atlantic sector between 120 • W and 0 • , and less negative or slightly positive biases in the other longitudes.We found that this bias is not directly related to interpolation of data below the earth's surface since the surface altitudes of the IGRA stations and those of the nearby reanalysis grid cells are safely below the 925 hPa level.Out of the four reanalyses, the CFSR and NCEP II have particularly negative biases over the 120 • W-0 • band compared to the bias over the other longitudes (Fig. 8).While the negative bias over the 120 • W-0 • band is largely canceled out by the positive bias over the other longitudes in the NCEP II, the bias is still negative outside of the 120 • W-0 • band in the CFSR.In this regard, the better performance of NCEP II for the Arctic mean values is fortuitous.

Temperature trends
In this section, we quantify temperature trends at selected levels.Considering first the surface level, the reanalyses and GISTEMP both show clear warming trends from 1979 to 2011 (Fig. 2), and Table 3 quantifies them.For the recent period (i.e., since around 1998), however, the warming trends in GISTEMP agree very well with those in the reanalyses, except in summer (Fig. 2f-j).In summer (Fig. 2h), GIS-TEMP shows no clear trend since 1998, while all the reanalyses show a warming trend.Hence, we need to have a closer look at the period 1998-2011.The trends of the time series of Figs.2-6 are computed for the 1998-2011 period in Table 4 and Figs.9-10.Reverting back to the surface temperature trend, the summertime warming trends in the reanalyses appear for the true Arctic averages as well as when only those data corresponding to the available GISTEMP data are sampled, while GISTEMP does not show a warming trend (Table 4).On the contrary, in spring, autumn and winter, the reanalyses and GISTEMP all show significant warming trends.When annual mean trends are analyzed, all the reanalyses agree with the GISTEMP trend of about 1.6 K 14 yr −1 , except for ERA-Interim (about 2.1 K 14 yr −1 ).We also note that both GISTEMP and the reanalyses show a large increase (∼ 1.5 K) in summertime temperature from 1996 to 1998, indicating that the trends are sensitive to the choice of the period considered.
Figure 9 summarizes the observed trends from 1998 to 2011.In summer, the T s trend is near zero while upper-air temperature trends are positive.Also, the warming at 925 hPa is almost equal to that at 500 hPa.In the other 3 seasons, on the other hand, the warming tendency is larger at lower altitude.Fall is particularly interesting, since the surface shows a very large warming when there has been a statistically significant cooling trend at 500 hPa.The reanalyses agree on the cooling trend at 500 hPa in fall, with the trends of −0.74 K to −0.03 K 14 yr −1 for true Arctic 70-90 • N mean values, and with the trends of −1.25 K to −0.91 K 14 yr −1 when sampled according to the availability of IGRA data (Table 4).On the other hand, analyzing the entire 1979-2011 period reveals that the trend is not necessarily negative in fall and the warming tendency does not necessarily become larger at lower altitude (Table 3).In fact, for the period 1979-2011, the layers at 500 hPa and below all show a warming trend in all seasons in all the chosen reanalyses and observations (Table 3).This raises a hypothesis that the recent warming is associated with different mechanisms than the earlier warming.
The trends shown in Tables 3 and 4 and Fig. 9 can be viewed as originating from one realization sampled from an ensemble of many time series.We assess the significance of the calculated trends using a null hypothesis of an ensemble of no-trend time series.Student's t test (Santer et al., 2000) 3 and 4. Significance at the 95 % level is indicated by an oval for the 1979-2011 trend and by a dot for the 1998-2011 trend.When an oval (or dot) is located above I, M or S, it means that the trend is insignificant, may be significant or is significant, respectively.We apply three different significance testing techniques.When all of the techniques establish significance (or insignificance), we determine that the trend is significant (or insignificant).When the techniques give conflicting results, the trend is considered "maybe significant".All tests are two-tailed tests.The left panels represent the significance of the trends from the true Arctic-mean time series while the right panels address the trends with only the reanalysis data corresponding to the available IGRA or GISTEMP data.
is commonly applied to establish the significance, but this technique does not account for autocorrelation in the time series.Thus, we choose to apply Student's t test after using the prewhitening technique of Wang and Swail (2001) and Student's t test with effective sample size correction (Santer et al., 2000), as well as normal Student's t test.The former two techniques factor in autocorrelation.If all of the three techniques establish the significance, we consider the trend significant, as explained in Fig. 10. Figure 10 shows that the 1979-2011 trends are mostly significant while the 1998-2011 trends are often insignificant.Also, the trends from the true Arctic-mean time series (Fig. 10a-e) tend to be more significant than those from using only the reanalysis data corresponding to the available IGRA or GISTEMP data (Fig. 10f-j).This makes sense because more data would lead to more significant results.We also assess the significance of the difference in the trends.This can be done in two ways.One way is to construct Table 3. 70-90 • N average temperature trend from 1979 to 2011 in units of temperature change (in K) over the 33 yr.In the case of CFSR, the trend is from 1979 to 2009 in units of temperature change over the 31 yr.There are five values separated by "|", and these five values represent Observation|ERA|CFSR|MERRA|NCEP II.Observation here refers to either GISTEMP or IGRA data.In the rows with observation (i.e., five values separated by a blank), the 70-90 • N average is made with the reanalysis data corresponding to the available observation, whereas in the rows without observation (i.e., four values separated by a blank), true 70-90 • N average values are used.The standard error of each trend is shown with ± in each row.confidence intervals for individual trends and then determine the overlap between a pair of confidence intervals.A 95 % confidence interval would contain the true trend at a 95 % probability.Thus, when two confidence intervals overlap with each other, the two corresponding trends might have come from the same population.We find confidence interval overlaps in most of the differences in the trends, indicating that the trend differences are too small.Santer et al. (2001) found similar results for the trends they calculated.The other way to assess the significance of a trend difference is to determine the significance of the trend of the time series difference.The advantage of this method is that variability common to the two time series is removed.Applying this method reveals that the trend differences are a good mixture of significant and insignificant differences.Again, the trend difference tends to be more significant for the true Arctic-mean time series.
The spatial distribution of recent summertime temperature trends for GISTEMP and the reanalyses is shown in Fig. 11.The near-zero summertime trend in GISTEMP is related to a negative temperature trend in the Eurasian sector between 20 • and 110 • E. This cooling trend is not well reproduced in the reanalyses.Also, over the ocean, the reanalyses noticably differ from GISTEMP (Fig. 11), but we have to bear in mind that, using the 250 km smoothing, there are very few GISTEMP values in the central Arctic.In the sea areas that are open only in summer and autumn (e.g., parts of the Barents, Kara, Laptev, East Siberian, Chukchi, and Beaufort Seas), the GISTEMP data are probably less reliable than the reanalyses for these seasons.This is because (a) reanalyses effectively assimilate satellite SST observations from the seasonally ice-free seas, whereas GISTEMP applies SST observations only from permanently ice-free areas, and (b) the GISTEMP values seen in these sea areas are based on the extrapolation of land observations.
With regard to the differences between GISTEMP and reanalyses over the central Arctic (the area where GISTEMP has very few data), one might simply think that reanalyses are more accurate because they assimilate satellite radiances or satellite-based upper-air temperatures, and those should affect the surface air temperatures.However, satellite retrievals have a coarse vertical resolution and cannot well represent the fine structure of temperature in the shallow atmospheric boundary layer in the Arctic.Due to the close coupling of the open ocean and near-surface air, we believe that the surface air temperature in reanalyses is more affected by the assimilation of satellite data on SST than upper-air temperatures.
In addition to differences in satellite data usage, another potential reason for the detected discrepancies (Figs.2h and   10) is that in GISTEMP T s refers to a combination of SST and surface air temperature, whereas in reanalyses T s refers to surface air temperature (SAT).Even though SST observations are assimilated in reanalyses, SAT and SST are not perfectly correlated, and so their trends are not necessarily equal.

Discussion
As noted in the introduction, various factors can cause differences between temperature fields in different reanalyses.These include differences in the usage of observations, differences in the atmospheric model used in producing the reanalysis, and the methodology used for data assimilation.In general, it is difficult to pinpoint the exact causes for the differences between different reanalysis temperatures and their trends.However, some general comments can be made.First, as noted in Sect.3, the spread between reanalysis upper-air temperatures is generally smaller over those regions where IGRA radiosounding data are available.This is expected because most or all of the radiosounding data must have been assimilated in the renalyses, thereby constraining them.Over the central Arctic regions lacking sounding data, the reanalysis systems have more freedom to form their own climate, and thus biases.
In contrast, for surface air temperature, restricting the analysis to the locations with available GISTEMP data did not reduce the spread between the different reanalysis.As pointed out in Sect.2, this might be explained by the fact that among the reanalyses considered, only ERA-Interim assimilates surface air temperatures from land stations.At the surface level, there are additional observations that can be used to validate reanalyses.Makshtas et al. (2007) used the observations from Russian drifting stations and Alexeev et al. (2012) used drifting Argo buoys to validate reanalyses over the Arctic.Testing the accuracy of each reanalysis product can perhaps be better carried at the surface level.
While all the reanalyses agree that the Arctic is, in general, warming, they show substantial differences in the details, such as the vertical structure of summertime warming (as characterized by the time series of the temperature difference between 925 and 500 hPa in Fig. 4c).There are two general causes that can explain time-varying differences between the reanalyses.First, the reanalysis systems may respond differently to real changes in the Arctic environment, such as diminishing Arctic sea ice.For instance, the performance of the atmospheric models used in producing the reanalyses may depend on the surface status.Surface status can also vary among the reanalyses despite the assimilation of satellite-based sea ice extent data.The differences may originate from the remote sensing algorithm applied (Valkonen et al., 2008) and because out of the reanalyses included in this study only in CFSR is the atmospheric model coupled with an ocean model and a dynamic-thermodynamic sea ice C. E. Chung et al.: On the possibilities to use atmospheric reanalyses to evaluate the warming structure model (Saha et al., 2010).Second, while the models used for reanalyses are "frozen", the availability of observations has changed during the reanalysis period, in particular, due to the increasing amounts of satellite data.Changes in the observation systems may have caused artificial trends and shifts in the reanalyses, such as those demonstrated for ERA-40 by Screen and Simmonds (2011).Overall, while it is beyond the scope of the present work to unravel the exact causes for the differences between the reanalyses, there is clearly a need for such studies.
Despite the aforementioned differences between the reanalyses, there is an optimistic aspect to our results.Alexeev et al. (2012) compared earlier reanalysis versions.The disagreement between ERA-40 and NCEP-I at 925 hPa in winter (as in Fig. 3 of their study) is much greater than the disagreement between the reanalyses we have analyzed (as shown in Fig. 6b).This indicates that as reanalyses are updated, they tend to become in greater agreement with each other.

Concluding remark
In this study, we have examined the Arctic warming structure from 1979 to 2011 in observations and reanalyses.Our analysis shows that the warming structure in the recent period (roughly from 1998) differs greatly from that in the earlier period.In the recent period, the surface was not clearly warming in summer, and 500 hPa air became colder in autumn.Before 1998, however, all the layers at 500 hPa or below were warming.These findings are supported by multiple reanalyses together with observations.The cooling trend in both observations and the reanalyses at 500 hPa in fall since 1998 has not been sufficiently recognized in earlier studies.
While examining the warming structure, we have also examined the validity of the reanalyses in surface-air and upper-level temperatures.We have shown for the 1998-2011 period that all the reanalyses reveal warming trends at the surface in all the seasons, while GISTEMP shows no trend in summer.At the 925 hPa level, CFSR shows larger Arcticmean bias than NCEP II in summer (especially after the year 2000), although CFSR is supposedly an improvement over the NCEP II.This CFSR error was shown to arise primarily from the 120 • W-0 • longitude band.It is our hope that this study will stimulate further investigations into the root cause of the reanalysis errors and biases.Our results also suggest that studies of the Arctic climate based on reanalyses should be undertaken with extreme caution.

Fig. 1 .
Fig. 1.June-August average surface air temperature climatology for the period 1979-2011 in units of K.In the case of CFSR, the climatology is for 1979-2009.

Fig. 2 .
Fig. 2. 70-90• N average T s anomalies relative to 1979-2009 climatology from four reanalyses and GISTEMP.On the left, the true 70-90 • N average is shown.On the right, the average is calculated using only those reanalysis data that correspond to the available GISTEMP data in location and time.Note that T s refers to surface air temperature (SAT) for the reanalyses, and to a combination of SAT over land and SST over ocean for GISTEMP.

Fig. 3 .
Fig. 3. 70-90 • N average T (temperature) from four reanalyses and the IGRA data.On the left, the true 70-90 • N average is shown.On the right, the average is calculated using only those reanalysis data that match the available IGRA observations in location and time.

Fig. 5 .
Fig. 5. Same as Fig. 3 except for the SON (September, October and November) season.
below); for the Arctic mean values, this problem is likely alleviated by the fact that different IGRA stations sample different parts of the diurnal cycle.

Fig. 7 .
Fig. 7. Mean bias and rms error of seasonal 70-90 • N average temperature in each reanalysis, relative to the IGRA data, in units of K.In the computation, we use the reanalysis data that correspond to the available IGRA observations.

Fig. 8 .
Fig. 8. Evaluation of the reanalyses with the IGRA data in June-August 2000-2009 average 925 hPa T .The reanalysis data used here match the available IGRA observations in location and time.Shown is reanalysis temperature bias at each IGRA station in units of K, with the circle size indicating the magnitude of the bias.When the bias is positive (negative), the circle is colored red (blue).

Fig. 9 .
Fig. 9. 70-90 • N average temperature trend from 1998 to 2011 in units of change (in K) over the 14 yr, as in Table4.All the trends here are from pure observations (either GISTEMP or IGRA data) and not from the reanalyses.The error bar represents ± standard error of the trend.

Fig. 10 .
Fig. 10.Significance of the trends in Tables3 and 4. Significance at the 95 % level is indicated by an oval for the 1979-2011 trend and by a dot for the 1998-2011 trend.When an oval (or dot) is located above I, M or S, it means that the trend is insignificant, may be significant or is significant, respectively.We apply three different significance testing techniques.When all of the techniques establish significance (or insignificance), we determine that the trend is significant (or insignificant).When the techniques give conflicting results, the trend is considered "maybe significant".All tests are two-tailed tests.The left panels represent the significance of the trends from the true Arctic-mean time series while the right panels address the trends with only the reanalysis data corresponding to the available IGRA or GISTEMP data.

Fig. 11 .
Fig. 11.June-August average T s trend from 1998 to 2011 in units of change (in K) over the 14 yr for GISTEMP and for the four reanalyses.In the case of CFSR, the trend is from 1998 to 2009 in units of change over the 12 yr.White areas in the GISTEMP plot indicate missing data or insufficient data for the trend analysis.Note that T s refers to surface air temperature (SAT) for the reanalyses, and to a combination of SAT over land and SST over ocean for GISTEMP.The 70 • N circle is thickened in each panel.

Table 1 .
The rms error of seasonal and 70-90 • N average temperature in the reanalyses in units of K.There are four values separated by "|", and these four values represent ERA|CFSR|MERRA|NCEP II.The rms error here is between each reanalysis and the IGRA data.To compute the error, we use the reanalysis data that correspond to the available IGRA observations.00Z and 12Z daily data are used instead of monthly means.

Table 2 .
Same as Table1, except for mean bias of seasonal 70-90 • N average temperature of each reanalysis product, relative to the IGRA data, in units of K.

Table 4 .
70-90 • N average temperature trend from 1998 to 2011 in units of temperature change (in K) over the 14 yr.In the case of CFSR, the trend is from 1998 to 2009 in units of temperature change over the 12 yr.