Bias in CMIP6 models compared to observed regional dimming and brightening trends (1961–2014)

Anthropogenic aerosol emissions have increased considerably over the last century, but climate effects and quantification of the emissions are highly uncertain as one goes back in time. This uncertainty is partly due to a lack of observations in the pre-satellite era, and previous studies show that Earth system models (ESMs) do not adequately represent surface energy fluxes over the historical era. We investigated global and regional aerosol effects over the time period 1961-2014 by looking at surface downwelling shortwave radiation (SDSR). We used observations from ground stations as well as multiple experiments 5 from five ESMs participating in the Coupled Model Intercomparison Project Version 6 (CMIP6). Our results show that this subset of models reproduces the observed transient SDSR well in Europe, but poorly in China. The models do not reproduce the observed trend reversal in SDSR in China in the late 1980s, which is attributed to a change in the emission of sulfur dioxide in this region. The emissions of SO2 show no sign of a trend reversal that could explain the observed SDSR evolution over China, and neither do other aerosols relevant to SDSR. The results from various aerosol emission perturbation experiments 10 from DAMIP, RFMIP and AerChemMIP suggest that its likely, that aerosol effects are responsible for the dimming signal, although not its full amplitude. Simulated cloud cover changes in the different models are not correlated with observed changes over China. Therefore we suggest that the discrepancy between modeled and observed SDSR evolution is partly caused by erroneous aerosol and aerosol precursor emission inventories. This is an important finding as it may help interpreting whether ESMs reproduce the historical climate evolution for the right or wrong reason. 15


Observations
The Global Energy Balance Archive (GEBA) holds data from ground-based stations measuring energy fluxes at the Earth's 55 surface around the globe (Wild et al., 2017). Pyranometers were used in most of the measurement sites, which have an accuracy limitation of 3-5 % of the full signal (Michalsky et al. (1999), Wild et al. (2013)). We use the monthly mean data from 1487 stations in the time period 1961-2014 measuring downwelling shortwave radiation. The GEBA data set has been complemented by a machine learning technique (random forests (Breiman, 2001)) as in Storelvmo et al. (2018) to cover temporal gaps in the measurements and facilitate comparison to the gridded model data. 60 Monthly mean cloud cover data is taken from the Climate Research Unit Time Series 4.02 (CRU), which covers the period 1901-2017 (Harris et al., 2014). CRU consists of a climatology made from measurements at meteorological stations around the globe, interpolated to a 0.5 o latitude/longitude resolution grid covering continental areas.

Models and CMIP6
65 Five ESMs (NorESM2, CanESM5, MIROC6, CESM2 and CNRM-ESM2-1) were chosen for this study, based on available data and their involvement in relevant model intercomparison projects within the Coupled Model Intercomparison Project Phase 6 (CMIP6) (Eyring et al., 2016). As this study focuses on dimming and brightening, we have chosen experiments from model intercomparison projects (MIPs) that include perturbed historical simulations with which one can single out the effect of anthropogenic aerosol emissions on our diagnostic variables. An overview of models and experiments covering the proposed 70 CMIP6 reference and perturbation studies can be found in Table 1. This section will give a more detailed description of the experiments in Table 1 and explain why they were chosen.
Every model that takes part in CMIP6 has to deliver a set of common experiments, among these the historical simulation.
As can be seen in Table 1 this is the one experiment for which all the models have provided simulation results. All other 75 experiments listed in Table 1 are simulations covering the historical period but with specific alterations dependent on what intercomparison project they are a part of.
The Detection and Attribution Model Intercomparison Project (DAMIP) has the goal of improving estimations of the climate response to individual forcings (Gillett et al., 2016) and includes three relevant experiments. The experiment tracing the impact 80 of exclusively the anthropogenically emitted aerosols as forcing agents over the historical period, is called hist-aer. The hist-nat experiment consists of only the perturbation due to the evolution of the natural forcing, e.g. from stratospheric aerosols from volcanoes and solar irradiance variations. Finally, the hist-GHG experiment has only forcings from changes in the well mixed greenhouse gases. These experiments were chosen as they give a unique insight into how a fully coupled earth system model attributes responses over the historical period to the different climate forcers.
While DAMIP provides a good framework for one of the main questions in CMIP6, namely how the Earth system responds to forcing, the RFMIP intercomparison focuses on understanding the forcing itself. The Radiative Forcing Model Intercomparison Project (RFMIP) contains a large set of experiments to further understand the radiative forcing of the past and the present (Pincus et al., 2016). We use two experiments from RFMIP, both with sea surface temperatures fixed to pre-industrial val- The third MIP included in this study is the Aerosol Chemistry Model Intercomparison Project (AerChemMIP), which is designed to answer questions regarding the effect aerosols and other near-term climate forcers (NTCF) can have on climate.
NTCFs include methane, tropospheric ozone, aerosols and their precursors (Collins et al., 2017). Three experiments have 100 been selected from AerChemMIP, two of which have pre-industrial aerosols emissions (hist-piAer) and pre-industrial NTCFs (hist-piNTCF), respectively, while the last experiment has prescribed sea surface temperatures from the historical simulation (histSST), with all forcing agents included. These experiments were chosen to see wether historical changes in tropospheric ozone, or wether a mixing layer in the ocean may have had an effect on dimming.

105
The GEBA stations have been divided into regions based on the country and continent each GEBA station is registered to. The number of stations in a region is presented together with the first results in Figure 1. All model output and CRU results have been co-located to GEBA station locations using the nearest neighbour method. A global mean is defined here as the mean of a variable across all GEBA station locations. A regional mean is a mean of a variable across the GEBA station locations registered to that same region in the GEBA data. Every station has been weighted equally. When a result is shown as an anomaly, as 110 opposed to an absolute value, the general formula has been to subtract the mean of the first five years of the investigated time period  from the timeseries in question. These "baseline" values can be found in supplementary Table ??.
The model data has been retrieved from The Earth System Grid Federation (ESGF) (Cinquini et al., 2014). ESGF is a data mangement system consisting of multiple geographically distributed nodes that coordinate through a peer-to-peer (P2P) protocol (Fan et al., 2015). We have used one ensemble member per experiment, as not every experiment had the option of providing 115 more than one simulation. Since we are working with values that are highly variable a centered running mean of 10 years has been used as a smoothing technique.

Dimming and brightening
The change in SDSR in the historical simulations from the five models is presented together with GEBA data in Figure 1. To further identify from where these discrepancy originate, we consider the geographical regions separately. Asia and Europe are relevant regions in regards to anthropogenic aerosol emissions (as explained in Section 1) and thereby also relevant to global dimming and brightening. The historical SDSR evolution in Europe and Asia are presented in Figure 1

Dimming and brightening over China in various CMIP6 experiments
The CMIP6 framework consists of many simulations that can help investigate dimming and brightening (as explained in Section 2.2). In order to understand which forcing agents are responsible for the overall trends in SDSR in the models, we now investigate China for the experiments listed in Tabel 1. Figure 2 (a) shows the historical simulations together with observations 145 of SDSR as previously seen in Figure 1 (f). Figure 2  pre-industrial aerosols (hist-piAer) and pre-industial near term climate forcers, including aerosols and ozone (hist-piNTCF) show very small or negligible changes in the SDSR over the time period considered.

Overall there is a clear difference in SDSR between experiments that include anthropogenic aerosol emissions and experiments
165 that do not. Dimming is apparent in every simulation containing anthropogenic aerosol emissions, but absent in the simulations containing pre-industrial aerosols only. This points to anthropogenic aerosol emissions playing a key role in global dimming.
Whether the sea surface temperature is pre-industrial, prescribed historical, or decided by a coupled ocean model seems to be unimportant for the SDSR in most models.
No trend reversal is identified in any of the simulations in which dimming is identified, and therefore none of the model simu-170 lations show a temporal evolution of SDSR close to the one seen in observations over China.
All-sky SDSR changes can be further decomposed into a clear-sky contribution as well as a contribution from changes in cloud cover and/or other cloud properties. In the next section we present the decomposed contributions to all-sky SDSR in China to further understand the discrepancy seen in Figure 2.

Clear sky SDSR and cloud cover in China
Clear-sky SDSR over China for the historical CMIP6 simulation is shown together with all-sky SDSR over China from GEBA in Figure 3 (a). If the simulated dimming is primarily caused by aerosol-radiation interactions, the dimming is stronger in the clear-sky SDSR for all models compared to the all-sky SDSR. This is exactly what we see in Figure 3 (a). All models and obser-180 vation show a change in behaviour in the late 1990s until 2010, where models show a steepening of their dimming trend while the observations go from a brightening trend to a SDSR stabilisation. This can be related to the cloud cover change presented in Figure 3 (b), where all models except for MIROC6 show a decrease in cloud cover over the same period. A decrease in cloud cover would entail a brightening, and will therefore act as a mask for the steep decrease in clear sky SDSR. The simulated cloud cover changes are presented together with cloud cover observations from CRU in Figure 3 (b). The transient change in cloud cover presented by CRU are, if anything, opposite of what they would have to be to explain the observed All-sky SDSR.
It is important to note that the robustness of observed cloud cover changes must be verified by satellite observations, which goes beyond the scope of this study.
The pronounced trend reversal in observed all sky SDSR in the late 1980s in China is neither identified in all sky SDSR, clear 190 sky SDSR, nor cloud cover in any of the model simulations.
In section 3.2, we showed that a dimming was only apparent in simulations that included anthropogenic aerosol emissions.
In this session we found the clear-sky SDSR to be stronger than all-sky SDSR, indicating the simulated dimming is primarily caused by aerosol-radiation interactions. The next section will then show how the simulated aerosol burdens are connected to SDSR.

Atmospheric burden of SO 4
In the atmosphere, the actual presence of an aerosol is of course what scatters shortwave radiation, and the emissions of its precursor is only an indirect indicator of this presence. Therefore, we present the simulated change in burden of SO 4 over Europe, a location where dimming and brightening was well represented in simulations, and over China, where dimming and brightening was poorly represented in simulations (Figure 4 (a) and (b) respectively). As expected if sulfate aerosols have in 200 fact played an important role in European dimming and brightening, the simulated burden of SO 4 shows a strikingly similar pattern (but with opposite sign) as the observed SDSR over Europe for all models. The maximum burden is found in the early to mid 1980s depending on the model, and the minimum SDSR around the same time. The various models differ in the magnitude of change in SO 4 burden over Europe but all show similar tendencies. NorESM2 is the model with the largest changes, and CESM2 is the model with the smallest changes in SO 2 burden. The same is observed over China, where NorESM2 has double 205 the SO 4 burden at the end of the time period than the next model. In contrast to Europe, the observed SDSR does not mirror well to the simulated SO 4 burden over the GEBA stations in China. In order for the SO 4 burden to be the main cause of the observed changes in SDSR, the Asian SO 4 burden would have to peak around the late 1980s, which is not seen in the models in Figure 4   The temporal development of SDSR is represented poorly in Asia, and specifically in China. Following the above logic this discrepancy could be rooted in errors in emissions or removal processes. The modeled emissions of SO 2 over China showed no trace of the trend reversal in observed SDSR between 1980 and 1990. Assuming sulfate burden is responsible for the observed trend reversal, we argue that errors in emissions inventories in China could be part of the problem. The sulfur dioxide emission 255 inventory used as input for historical model simulations in CMIP6 is shown in Figure 3 corresponding to Hoesly et al. (2018).
This figure also shows emission inventories of black carbon and organic carbon in China, and a closer look shows that neither of these aerosol emissions show tendencies matching a trend reversal in observed SDSR between 1980 and 1990. Hoesly et al. (2018) have pointed to the need to study in the future emission uncertainties. Aas et al. (2019) have studied global and regional trends in atmospheric sulfur and found that uncertainties in emissions was largest in Asia, even though their study 260 only went back to 1990.

Conclusions
An earlier study has shown that previous generations of Earth System Models have not been able to reproduce the transient development of surface downwelling shortwave radiation (SDSR) in the last decades since 1960 when observations became 265 available. This discrepancy is hypothesized to be related to increasing and then partially decreasing trends in global aerosol emissions and subsequent aerosol radiative effects, but the exact cause is unknown.
In this paper, we compare observations to model simulated surface downwelling shortwave radiation and cloud cover in specific regions for the time period 1961 to 2014. We found that in the historical CMIP6 experiment models reproduce the transient de-270 velopment of SDSR well in Europe, but poorly in Asia. Observations in Asia exhibit a trend reversal in SDSR in the late 1980s that is primarily driven by SDSR changes in China. The multiple historical and historical perturbation experiments performed under CMIP6 reveal, that, in China, only those simulations containing anthropogenic aerosol emissions show dimming. None of the simulations exhibit the observed trend reversal over China in the late 1980s (brightening). We suggest that the continuous decrease in SDSR is related to the continuous increase in atmospheric sulfate burden in the historical simulations over China.

275
Following this logic, the observed transient development of SDSR points to the sulfate burden in the models being wrong in this region. The sulfate burden is a result of sulfur dioxide emissions, gas-to-particle conversion and wet deposition. sulfur dioxide emissions over China show no sign of the observed trend reversal in SDSR and neither does black carbon nor organic carbon emissions. We suggest that the cause of the discrepancy between model and observations in transient SDSR in China is partly in erroneous emission inventories.

280
As the observed climate change is the result of warming from greenhouse gases and simultaneous cooling from aerosol radiative effects, getting aerosol emissions correct is an important part in earth system models ability to simulate the past for the right reasons.
Further studies could include other observations and proxies for aerosol effects in the historical era, such as long-term satellite 285 retrieved aerosol optical depth, deposition of anthropogenic sulphur, organic carbon and nitrate in ice cores, as well as daily temperature range records.