Tropospheric ozone in CCMI models and Gaussian process emulation to understand biases in the SOCOLv3 chemistry–climate model

. Previous multi-model intercomparisons have shown that chemistry–climate models exhibit signiﬁ-cant biases in tropospheric ozone compared with observations. We investigate annual-mean tropospheric column ozone in 15 models participating in the SPARC– IGAC (Stratosphere–troposphere Processes And their in Climate–International Global Atmospheric Chemistry) Chemistry-Climate Model Initiative (CCMI). These models exhibit a positive bias, on average, of up to 40 %–50 % in the Northern Hemisphere compared with observations derived from the Ozone Monitoring Instrument and Microwave Limb Sounder (OMI/MLS), and a negative bias of up to ∼ 30 % in the Southern Hemisphere. SOCOLv3.0 (version 3 of the Solar-Climate Ozone Links CCM), which participated in CCMI, simulates global-mean tropospheric ozone columns of 40.2 DU – approximately 33 % larger than the CCMI multi-model mean. Here we introduce an updated version of SOCOLv3.0, “SOCOLv3.1”, which includes an improved treatment of ozone sink processes, and results in a reduction in the tropospheric column ozone bias of up to 8 DU, mostly due to the inclusion of N 2 O 5 hydrolysis on tropospheric aerosols. As a result of these developments, tropospheric column ozone amounts simulated by SOCOLv3.1 are comparable with several other CCMI models. We apply Gaussian process emulation and sensitivity analysis to understand the remaining ozone bias in SOCOLv3.1. This shows that ozone precursors (nitrogen oxides (NO x ), carbon monoxide, methane and other volatile organic compounds, VOCs) are responsible for more than 90 % of the variance in tropospheric ozone. However, it may not be the emissions inventories themselves that result in the bias, but how the emissions are handled in SOCOLv3.1, and we discuss this in the wider context of


Introduction
Ozone is a key trace gas in the atmosphere. In the stratosphere, it absorbs UV-B (280 < λ < 320 nm) radiation and thus protects life at the surface. However in the troposphere, where approximately 10 % of the total atmospheric ozone burden resides, ozone is a greenhouse gas and air pollutant, with adverse affects on human health and crop yields Silva et al., 2013Silva et al., , 2017. Approximately 90 % of tropospheric ozone results from a series of photochemical reactions which are initiated by the reaction of NO x (nitrogen oxides, NO x = NO+NO 2 ) and either CO (carbon monoxide), CH 4 (methane) or an NMVOC (non-methane volatile organic compound) (Denman et al., 2007). These ozone precursors are emitted from, amongst other sources, fossil fuel burning, industrial processes and agriculture. Ozone can also be transported from the stratosphere in stratosphere-troposphere exchange (STE) events. Greenslade et al. (2017) calculate the mean fraction of total tropospheric ozone attributable to STE at three sites between 38 and 69 • S as 1 %-3 %, and show that during individual STE events, over 10 % of tropospheric ozone may be directly transported from the stratosphere. Due to its global tropospheric lifetime of ∼ 22 days, ozone is subject to intercontinental transport (Auvray and Bey, 2005), and this is modulated by decadal climate variability (Lin et al., 2014). Ozone is lost from the troposphere either by dry deposition or photochemical destruction.
Most chemistry-climate models (CCMs), which are used to understand chemistry-climate interactions and project future atmospheric composition, overestimate tropospheric ozone in the Northern Hemisphere compared with observations (Young et al., , 2018Parrish et al., 2014). In particular, version 3.0 of the SOCOL (Solar-Climate Ozone Links) CCM (Sect. 2.2) contains notable positive tropospheric ozone biases. Revell et al. (2015) identified that ozone concentrations in SOCOLv3.0 are up to 50 % too high in the Northern Hemisphere mid-troposphere (500 hPa) compared with observations from the Tropospheric Emission Spectrometer (TES). The reasons underlying SOCOLv3.0's tropospheric ozone bias were not completely clear to Revell et al. (2015), who noted that, while SOCOLv3.0 could accurately simulate the general geographic distribution of tropospheric ozone, the actual magnitude was wrong and likely to be "a source issue (that is, emissions), a sink issue (HNO 3 washout), or a combination of the two." Staehelin et al. (2017) showed that the mean tropospheric ozone burden in SOCOLv3.0 is 413 Tg, which is approximately 80 Tg larger than the multi-model mean burdens reported for the ACCENT (Atmospheric Composition Change: the European Network of Excellence; Stevenson et al., 2006) and ACCMIP (Atmospheric Chemistry and Climate Model Intercomparison Project;  activities, of 337 and 336 Tg, respectively. Furthermore, SOCOLv3.0 overestimates both the tropospheric ozone production and destruction rates compared to the multi-model means from ACCENT and ACCMIP (Staehelin et al., 2017). While SO-COLv3.0's production rates are overestimated by 34 % compared to ACCENT and 41 % compared to ACCMIP, the destruction rates are overestimated only by 20 % (ACCENT) and 31 % (ACCMIP).
Recently a newer version of SOCOL has been developed, "SOCOLv3.1", which remediates obvious deficiencies in SOCOLv3.0's representation of tropospheric processes (Sect. 2.3). We compare tropospheric column ozone in SO-COLv3.0 and 3.1 with observations derived from OMI/MLS, the Ozone Monitoring Instrument/Microwave Limb Sounder (Sect. 3.1), and use Gaussian process (GP) emulation and sensitivity analysis to investigate the remaining ozone bias in SOCOLv3.1 (Sect. 3.2). Because thousands of simulations are required to perform a sensitivity analysis, and this would be computationally inefficient with a CCM, we supplement SOCOLv3.1 with a GP emulator. This allows a sensitivity analysis to be performed at low computational cost. Variance-based sensitivity analysis evaluates a suite of model input parameters and their relationship to the variable of interest simultaneously.
Here, we apply GP emulation and variance-based sensitivity analysis to the SOCOLv3.1 tropospheric ozone budget to understand causes of the bias. In contrast to one-at-a-time testing, which investigates the model response to varying one input parameter while holding all others constant, GP emulation allows all parameters to be evaluated simultaneously and covers more of the parametric uncertainty space than one-ata-time testing. GP emulation is computationally efficient and allows the interacting effects of the uncertainties on different input parameters to be accounted for. It also generates much more information than one-at-a-time testing -typically the same level of information as a Monte Carlo approach, but requiring a fraction of the model simulations (O'Hagan, 2006). GP emulation has only been used by the global atmospheric modelling community in the last few years, in applications such as cloud and aerosol microphysics modelling (Lee et al., , 2012Carslaw et al., 2013;Johnson et al., 2015) and chemical transport modelling (Ryan et al., 2018). This is the first time the technique has been applied to global tropospheric ozone. Our GP emulator experiments have been designed to focus on recent developments regarding SOCOL's tropospheric chemistry scheme; however the methodology has the potential to be expanded to also include meteorological parameters.
SOCOLv3.0 participated in phase 1 of the Chemistry-Climate Model Initiative (CCMI) (Eyring et al., 2013b;Morgenstern et al., 2017), which is a joint activity of SPARC (Stratosphere-troposphere Processes And their Role in Climate) and IGAC (International Global Atmospheric Chemistry), and is the successor activity to phase 2 of the Chemistry-Climate Model Validation activity, CCMVal-2 (SPARC CCMVal, 2010). Unlike CCMVal-2, which focussed on stratospheric processes and composition, CCMI includes many models with comprehensive representations of the troposphere, and aims to additionally address aspects of tropospheric chemistry and circulation. Here, we examine tropospheric column ozone in SOCOLv3.0 and 14 other CCMI models. This is the first time that global distributions of tropospheric ozone have been examined in the CCMI models, and results are presented in Sect. 3.3.

CCM simulations to compare with observations
We use the ensemble mean of three free-running SOCOLv3.0 simulations of the recent past to compare with observations (ETH-PMOD, 2015). These simulations were performed for CCMI, and conform to REF-C1 specifications (Eyring et al., 2013b). The simulations cover the period 1960-2010, following a 10-year spin-up period. Greenhouse gas concentrations (CH 4 , CO 2 and N 2 O) follow observations until 2005, then Representative Concentration Pathway (RCP) 8.5 (Riahi et al., 2011). Ozone precursor emissions (including NO x , CO and NMVOCs) follow a historical emissions inventory until 2000 , then RCP 6.0 (Masui et al., 2011). Sea surface temperatures (SSTs) and sea ice concentrations were prescribed following HadISST observations (Rayner et al., 2003). Concentrations of ozone-depleting substances followed the World Meteorological Organization's A1 scenario (WMO, 2011), and stratospheric aerosol surface area densities and optical parameters were prescribed from the SAGE-4λ data set (Arfeuille et al., 2013;Luo, 2013).
We also examine annual-mean tropospheric ozone in REF-C1 simulations performed by models participating in CCMI, described by Morgenstern et al. (2017) and references therein. Using the simulated ozone volume mixing ratio and WMO-defined tropopause height from each model, tropospheric ozone columns were calculated for the year 2005 by integrating ozone between the surface and WMOdefined tropopause. The WMO definition of the tropopause was selected to be consistent with the OMI/MLS tropospheric ozone product (Ziemke et al., 2006). Between 2010 and 2014, the average tropospheric ozone burden derived from OMI/MLS was 300 Tg, which is very close to the multiinstrument mean of five satellite products over the same period, of 301 Tg .
Where multiple ensemble members ("realizations") of the REF-C1 simulation were submitted to the CCMI archive, the ensemble mean is shown. The exception is NIWA-UKCA, which submitted three realizations of the REF-C1 simulation; however only the first realization is shown as ozone precursor emissions were erroneously fixed at 1960 levels for the other two realizations (Morgenstern et al., 2017). The EMAC simulations used road traffic emissions which were updated every year rather than every month. Therefore when we examine year 2005 tropospheric column ozone in Sect. 3.3, the EMAC simulations used road traffic emissions for August 1954. Jöckel et al. (2016) show that this error results in tropospheric ozone columns that are ∼ 2 DU lower than if the correct emissions had been used. The UMUKCA-UCAM simulations used CCMVal-2 REF-B2 emissions for NO x aircraft emissions and NO x , CO and HCHO surface emissions.
L. E. Revell et al.: Tropospheric ozone in CCMI models 2.2 The SOCOLv3.0 chemistry-climate model The SOCOL CCM was developed in Switzerland at ETH Zurich and PMOD/WRC (the Physical Meteorological Observatory Davos/World Radiation Center). Version 3.0 of SOCOL Revell et al., 2015) consists of the middle atmosphere version of the ECHAM5 (European Centre Hamburg Model) atmosphere-only general circulation model (Roeckner et al., 2003) coupled to the MEZON (Model for Ozone Trends) chemistry transport model (Egorova et al., 2003). The chemical solver takes into account 41 chemical species, 140 gas-phase reactions, 46 photolysis reactions and 16 heterogeneous reactions. The oxidation of isoprene, an important NMVOC for the tropospheric ozone budget, is accounted for with the Mainz Isoprene Mechanism (MIM-1), which comprises 16 organic degradation products of isoprene and a further 44 chemical reactions (Pöschl et al., 2000). Global isoprene emissions are estimated to range from 440 to 660 Tg(C) yr −1 , which is comparable to the annual amount of CH 4 emissions (Guenther et al., 2006). About two-thirds of the annual global emissions of volatile organic compounds (VOCs) are accounted for in SOCOLv3.0 by isoprene and methane. Apart from isoprene and formaldehyde, other NMVOCs are not included explicitly in the model but their contribution to CO is accounted for via the addition of a certain fraction of NMVOC emissions to CO. For anthropogenic, biomass burning and biogenic NMVOC emissions the conversion factors to CO are 1.0, 0.31 and 0.83, respectively (Ehhalt et al., 2001).
Clear-sky photolysis rates are calculated using a lookuptable (LUT) approach, which provides photolysis rates as a function of overhead ozone and oxygen columns (Rozanov et al., 1999). Variability of solar irradiance is included in the LUTs. Cloud impacts on photolysis are accounted for in the troposphere by the inclusion of a cloud modification factor following the parametrization described by Chang et al. (1987). From a recent intercomparison of photolysis rates simulated by different CCMI models we learned that SOCOLv3.0 overestimates tropospheric NO 2 photolysis by roughly a factor of 2 compared to other models (Nicely et al., 2018). This overestimation is likely related to the treatment of backscattering from clouds in the calculations of the photolysis LUTs and the missing impact of aerosols. Both effects cannot be easily corrected by the implemented cloud modification factor, and so an online photolysis scheme is planned for future model versions.
Dry deposition velocities of O 3 , CO, NO, NO 2 , HNO 3 and H 2 O 2 are based on Hauglustaine et al. (1994). This simplified approach assumes constant dry deposition velocities over land and ocean, without accounting for seasonal or geographical variability. The tropospheric washout of HNO 3 and H 2 O 2 is calculated by using a constant removal rate of 4 × 10 −6 s −1 , irrespective of precipitation occurrence. At every chemical time step, i.e. every 2 h, 2.8 % of tropospheric HNO 3 and H 2 O 2 below 160 hPa are removed.
Boundary conditions for the ozone precursor gases NO x , CO and NMVOCs are implemented as surface emission fluxes. Methane's global average surface mixing ratio is prescribed on the six lowermost model levels. For this study, both SO-COL configurations were run with 39 vertical levels between the Earth's surface and 0.01 hPa (∼ 80 km) and T42 horizontal resolution (grid cells approximately 2.8 • × 2.8 • ).
2.3 Upgraded model version SOCOLv3.1 SOCOLv3.1 was developed to address SOCOLv3.0's representation of processes relevant to tropospheric ozone chemistry, with the aim of improving the model's large tropospheric ozone bias as shown by Revell et al. (2015). First, we implemented heterogeneous hydrolysis of N 2 O 5 on tropospheric aerosol, as this is an important removal process for atmospheric NO x and was not included in SOCOLv3.0. As SOCOLv3.0 does not explicitly simulate tropospheric aerosols, the new scheme makes use of the ECHAM5 internal tropospheric aerosol climatology considering aerosol properties of 11 Global Aerosol Data Set types (Köpke et al., 1997). The reaction probabilities for the different aerosol types are calculated following the parametrization by Evans and Jacob (2005). Second, the simplified treatment of dry deposition was replaced by a more sophisticated scheme in SOCOLv3.1 based on the surface resistance approach for the estimation of dry deposition velocities proposed by Wesely (1989). This takes into account actual meteorological conditions, different surface types and trace gas properties like solubility and reactivity. Further details of this scheme are given by Kerkweg et al. (2006).
Third, we adjusted how methane is prescribed in the model. In previous versions of SOCOL, methane was prescribed as a global surface average mixing ratio on the six lowermost model levels (covering approximately 2.5 km). This was changed to only the surface level in SOCOLv3.1. While the amount of methane entering the atmosphere is the same in both configurations, prescribing it on one level instead of six means that methane-induced ozone production in the mid-troposphere-upper troposphere is reduced. Because SOCOLv3 has a high OH bias compared to the AC-CMIP models (Staehelin et al., 2017), ozone production from methane oxidation is amplified by the continuous resupply of methane due to the mixing ratio boundary condition when methane is prescribed on six levels instead of one. An interhemispheric gradient and seasonal cycle in methane have also been implemented in SOCOLv3.1; however these were not used in this study and instead methane was prescribed as a global average surface mixing ratio to test the general sensitivity of tropospheric ozone to surface methane concentrations.
Finally, because the LUTs used in SOCOLv3.0 cause tropospheric NO 2 photolysis to be overestimated due to the treatment of backscattering from clouds (Sect. 2.2), we recal-culated LUTs for SOCOLv3.1. While the SOCOLv3.0 LUTs were calculated assuming 0.5 cloud coverage and a surface albedo of 0.3, the SOCOLv3.1 LUTs were based on clear-sky conditions and also used a surface albedo of 0.3.

SOCOLv3.1 simulations for GP emulator training and testing
Variance-based global sensitivity analysis quantifies the contribution of a single parameter to the variance of a model's output. Because the large number of model simulations required would make one-at-a-time testing computationally too expensive, a type of statistical model called a GP emulator can be used as a surrogate for the input-output relation of a complex model, such as a CCM (Le Gratiet et al., 2017). For "training" data on which the GP emulator is built, we know that the true value of the emulated output should be the same as the input, so the emulator should return the output with no uncertainty. For inputs that the emulator is not trained at, the outputs should have a probability distribution specified by a mean function and covariance function (O'Hagan, 2006). Here, we use tropospheric ozone columns from SOCOLv3.1 to train the emulator.
Interacting contributions to the overall uncertainty in tropospheric column ozone can be identified by comparing the main effect variance (the reduction in the ozone variance when a particular model forcing is fixed, e.g. NO x emissions), with the total effect variance (the remaining variance in the tropospheric column ozone when everything except a particular model forcing is fixed). Various software packages are available for GP emulation. We used the Gaussian Emulation Machine for Sensitivity Analysis (GEM-SA), available at http://tonyohagan.co.uk/academic/GEM/index.html (last access: 11 November 2018), to build an emulator for tropospheric column ozone.
Although many factors influence the tropospheric ozone budget, we restricted our analysis to nine model forcings/parametrizations (see Table 1 for details of the scalings applied). These are listed below, followed by a section rationalizing the inclusion of each variable. We reiterate that this list above does not constitute a comprehensive list of variables controlling tropospheric ozone; however by illustrating the methodology used, we aim to demonstrate its utility.
1. natural and anthropogenic NO x emissions (denoted in figures as "NO x ").
Variables (1-3) were selected due to their importance as tropospheric ozone precursors. CO and NMVOC emissions were varied simultaneously (3) because the only NMVOCs included explicitly in SOCOL are isoprene and formaldehyde; other NMVOCs are represented via additional CO using a "lumped" approach (Sect. 2.2). For models with a more complex representation of NMVOCs, we recommend testing CO and NMVOC emissions separately when constructing a GP emulator.
The remaining variables were included to investigate the sensitivity of tropospheric ozone to the model improvements implemented in SOCOLv3.1. SOCOLv3.0 and its predecessors prescribed methane on the lowermost six model levels. This was changed to only the surface level in SOCOLv3.1, and variable (5) was included in our analysis to investigate the sensitivity of tropospheric ozone to this implementation. The lowermost level in SOCOL covers approximately 100 m, and the six lowermost levels combined cover approximately 2.5 km. To explore whether other ozone precursors are sensitive to the number of levels they are prescribed on, variable (4) was included, even though it is prescribed only as a surface emissions flux in most, if not all, CCMs. By doing so, we aim to test the exchange of emissions between the boundary layer and free troposphere.
Because ozone production and destruction reactions are mostly photochemical, i.e. they occur in the presence of sunlight, we selected variable (6) to test the sensitivity of the current CMF parametrization, and examine impacts of the updated LUTs on tropospheric ozone in SOCOLv3.1. HNO 3 washout is the main sink for NO x , and therefore affects the ozone budget. Future SOCOL versions will include an online wet deposition scheme, and so variable (7) was selected to probe the sensitivity of tropospheric ozone to the rate of HNO 3 loss. Heterogeneous N 2 O 5 hydrolysis is similarly important as it leads to HNO 3 formation; however it was not included in SOCOLv3.0. Therefore variable (8) was included in our analysis to quantify its relevance for tropospheric ozone abundances. Finally, variable (9) was chosen to test the sensitivity of tropospheric ozone to the newly implemented dry deposition parametrization (Sect. 2.3).
Typically 10n simulations are recommended for training a GP emulator, where n is the number of variables under investigation (Loeppky et al., 2009). Hence we performed 90 SO-COLv3.1 training simulations, and used the resulting annual- Table 1. Range of the sensitivity forcings/parametrizations. P and L indicate whether the variable is of relevance to ozone production and/or loss, respectively.

Minimum Maximum Descriptions
(1) NO x emissions (P) 0 4 The surface NO x emissions field as a function of latitude and longitude was multiplied by a scaling factor between 0 and 4, to explore the sensitivity of tropospheric ozone to a range of NO x emissions.
(2) CH 4 concentrations (P) 0 4 The global-mean CH 4 mixing ratio was multiplied by a scaling factor between 0 and 4, to explore the sensitivity of tropospheric ozone to a range of CH 4 concentrations.
(3) CO+NMVOC (P) 0 4 As for (1), but the scaling factor was applied to CO and NMVOC emissions emissions simultaneously.
(4) ELEV for NO x 1 6 Emissions were prescribed on the lowermost six levels (between and CO+NMVOCs (P) the surface and ∼ 2.5 km), to test whether the number of levels is important for tropospheric ozone abundances.
As photolysis rates of 0 do not occur during daytime, we selected a lower bound of 0.25 to represent cloudy sky conditions.
(7) HNO 3 washout (L) 0 0.5 To test the sensitivity of tropospheric ozone to HNO 3 removal, we removed between 0 and 50 % of tropospheric gas-phase HNO 3 at each chemical time step.
(8) N 2 O 5 hydrolysis (L) 0.001 0.3 The probability of N 2 O 5 hydrolysis occurring. Since the default is 0.1, we explored the sensitivity of tropospheric ozone to a range from 0.001 to 0.3.
(9) O 3 dry deposition (L) 0 1 A specific reactivity of 0 stands for a nearly non-reactive gas, while 1 stands for a gas similarly reactive to ozone.   Table 1 for more details. For clarity the inputs have been scaled between 0 and 1. mean tropospheric ozone column to construct the GP emulator in several geographical regions (Europe, United States, Asia, the Southern Ocean and the global mean). For each of the 90 training simulations, the nine input variables were scaled simultaneously, with the scaling factors determined using a "maximin" Latin hypercube design, which generates a near-random sample of parameter values from a multidimensional distribution and fills the uncertainty space of the parameters (McKay et al., 1979). The Latin hypercube was generated using GEM-SA. For the discrete input parameters (e.g. (4) and (5) in the list above), the scaling factor was rounded to the nearest whole number. Table 1 summarizes the minimum and maximum scalings applied to each of the nine variables. This is discussed further in Sect. 3.2. Figure 1 shows the experimental design for the 90 training simulations.
SOCOLv3.1 training simulations were performed for the year 2005 (following a common model spin-up period of 10 years, which was discarded from our analysis). The feedback between chemistry and radiation was switched off to keep internal variability as small as possible. Switching off the chemistry-radiation feedback means that all simulations have the same meteorology (given that they started from the same initial conditions and ran with the same dynamical boundary conditions), despite having different chemistry. Therefore, we can be confident that the differences between the simulations are caused by differences in chemistry and not dynamics.
After constructing the GP emulator, the next step is to validate it by comparing emulator-predicted ozone with SOCOL-simulated ozone. This was done by performing a further 27 (i.e. 3n) SOCOLv3.1 "test" simulations. The setup for these simulations was similar to the training simulations, with a new Latin hypercube generated by GEM-SA to supply the scaling factors.   Fig. 2a), it overestimates tropospheric column ozone between 60 • N and 40 • S by up to 30 DU -approximately a factor of 2 (Fig. 2c). The improved treatment of ozone sink processes in SOCOLv3.1 means that tropospheric ozone columns are reduced regionally by up to 8 DU compared with SOCOLv3.0 (Fig. 2d-e). Individual sensitivity tests (not shown) indicate that this is due mostly to the inclusion of heterogeneous N 2 O 5 hydrolysis on tropospheric aerosol. Both SOCOLv3.0 and 3.1 show a small negative bias in tropospheric ozone over the Southern Ocean. This was also visible in the SOCOLv3.0 and TES comparison presented by Revell et al. (2015). Recent work by Luhar et al. (2017) has indicated that the Wesely (1989) dry deposition scheme overestimates the observed ozone deposition velocity by a factor of 2-4 in the Southern Ocean, where SSTs are low and chemical reactions are slow. Further upgrades to the model's deposition scheme may therefore improve comparisons of simulated and observed tropospheric ozone in cold oceanic regions.
The global-mean tropospheric ozone column in SO-COLv3.1 is 36.4 DU (Fig. 2d), which is still at the upper end of the range of the CCMI models (Fig. 6), but comparable to other models such as ACCESS (36.3 DU), EMAC-L47 (37.3 DU) and MRI-ESMr1 (35.7 DU). Despite the improvements to SOCOLv3.1, a large bias in tropospheric ozone of approximately 20 DU compared with OMI/MLS remains (Fig. 2f). The bias maximizes over continental regions in the Northern Hemisphere, and over Southeast Asia.

GP emulation and sensitivity analysis in
SOCOLv3.1 To understand the drivers of the remaining tropospheric ozone bias in SOCOLv3.1, we constructed a GP emulator from the 90 SOCOLv3.1 training simulations (Sect. 2.4). Tropospheric ozone predicted by the emulator is compared with SOCOLv3.1 test simulations in Fig. 3. In all geographical regions shown, the goodness of fit between emulated and simulated tropospheric ozone is high (R 2 ≥ 0.85) and the points fall mostly along the 1 : 1 line, indicating that the emulator performs well in these regions. The point with the largest simulated tropospheric ozone column corresponds to a simulation in which two ozone loss processes, HNO 3 washout and ozone dry deposition, were set to zero and large scalings (4.00 and 3.54) were applied to the ozone precursors NO x and CH 4 , respectively, following the Latin hypercube design (Fig. 1). The emulator underestimates tropospheric ozone for this point in all regions examined, indicating that it may not be well constrained at the extreme ends of the parameter uncertainty space. Figure 4 displays the sensitivity of global-mean tropospheric ozone to each parameter, obtained by averaging over all other parameters, and indicates whether tropospheric ozone increases or decreases in response to an individual forcing/parametrization. Greater uncertainty is indicated   where the lines diverge (appearing as a thicker line -i.e. the emulator is less well constrained). Tropospheric ozone exhibits a strong sensitivity to its precursor gases (Fig. 4a-c), and while the correlation between CH 4 and CO+NMVOCs is approximately linear, for NO x there appears to be a satura-tion effect for scaling factors greater than 1, likely due to the "NO x titration effect" (Thornton et al., 2002). In our calculations a uniform sampling distribution was applied when generating the Latin hypercube, which means that in 25 % of our training simulations the NO x (and CH 4 , CO and NMVOC)   scaling factors are less than 1, while in the other 75 % of simulations they are larger than 1.
To test whether the emulator may be biased due to the sampling distribution used, we calculated tropospheric column ozone as a function of NO x and CO+NMVOCs using the gradients in Fig. 4a and c. Assuming a uniform sampling distribution between 0 and 4, as per the Latin hypercube design used here, the sensitivity indices for NO x and CO+NMVOCs are 0.68 and 0.32, respectively. If we assume a piecewise uniform distribution, so that 50 % of the points are between 0 and 1, and 50 % are between 1 and 4, the sensitivity indices are 0.72 for NO x and 0.28 for CO+NMVOCs. That is, the differences are negligible, implying that the type of sampling distribution used does not bias the result. However, given the NO x saturation effect above 1 (Fig. 4a), if we assume a uniform distribution between 0 and 2 instead of 0 and 4, the NO x sensitivity index increases to 0.86, while the CO index decreases to 0.14. This shows the importance of selecting an appropriate range for the parameter uncertainty space. However, the conclusions of our emulator analysisthat ozone precursors are the dominant driver of tropospheric ozone variability -remain unchanged. Figure 5 shows the percentage of variance that each parameter contributes to in each geographic region, either jointly or alone. In all regions examined, ozone precursors -CH 4 , NO x , CO and NMVOCs -account for more than 90 % of the variance in tropospheric column ozone. In other words, changing these ozone source input parameters has a far larger impact on tropospheric ozone abundances than changing ozone sink parameters does, and this applies to both polluted regions (Europe, the United States and Asia) and relatively pristine environments (the Southern Ocean). NO x emissions are generally the dominant driver of variability (in the European region they are approximately equal to the contribution from CH 4 ; Fig. 5a). Over Asia, where CO emissions are larger than over Europe and the United States, the ratio of NO x : CO is also lower than it is over Europe and the United States (Revell et al., 2015). NO x emissions therefore become more important as a driver of ozone variability over Asia (Fig. 5c). In all regions, joint interactions between NO x , CH 4 and CO+NMVOCs play a relatively minor role compared with the individual influences of these species.
Although updating SOCOLv3.1 with regards to N 2 O 5 hydrolysis, HNO 3 washout, LUTs and ozone dry deposition results in a reduction in tropospheric ozone of up to 8 DU regionally (Fig. 2e), as drivers of tropospheric ozone variability in SOCOLv3.1 they are insignificant compared with ozone precursors. However, we cannot discount the possibility that it is not the ozone precursor emissions themselves that are responsible for SOCOLv3's tropospheric ozone bias, but rather L. E. Revell et al.: Tropospheric ozone in CCMI models  (Table 1), for the same regions shown in Fig. 3. For clarity only those which contribute at least 1 % to the variance are shown. NO x is the NO x emissions; CH 4 is the CH 4 concentrations; CO is the CO+NMVOC emissions; ELEV is the number of vertical model levels that NO x , CO and NMVOC emissions are prescribed on. HNO 3 is the rate of HNO 3 washout. Joint interactions, indicated by, e.g. NO x . CH 4 concentrations are also indicated where these contribute at least 1 % to the variance.
the way in which the emissions are handled by the model; this is considered further in the Discussion and conclusions.

Tropospheric ozone in the CCMI models
We now consider SOCOL's tropospheric ozone bias in the context of the CCMI models. Figure 6 illustrates the diversity in simulated tropospheric ozone amongst the CCMI models. Despite most of the models using ozone precursor emissions following the REF-C1 recommendations (Sect. 2.1), they simulate vastly different representations of tropospheric ozone. A few of the models are closely related, as discussed by Morgenstern et al. (2017); for example the CESM1 models, WACCM and CAM4-chem, are essentially the same model in terms of tropospheric ozone. They differ only in the height of the model lid, which is 140 km for WACCM and 40 km for CAM4-Chem. ACCESS and NIWA-UKCA can also be considered the same model for the REF-C1 experiment; although a coupled ocean was used for most of NIWA-UKCA's CCMI simulations, for the REF-C1 experiment they used the same prescribed sea surface conditions (temperature and ice coverage) as ACCESS. Differences between ACCESS and NIWA-UKCA in the REF-C1 simulation, therefore, are likely related to issues with the different compilers used which may induce small differences in stochastic physics and tropospheric age of air (Dietmüller et al., 2018).
The EMAC L47 and L90 models are also very similar; both have a model lid at 0.01 hPa (∼ 80 km), but they differ in the number of model levels between the surface and 0.01 hPa (47 and 90, respectively). They also use different time steps. Figure 7 shows the difference in tropospheric ozone between each of the CCMI models and OMI/MLS, and the root-mean-square error (RMSE) for the model-OMI/MLS difference. Alongside Fig. 6, Fig. 7 indicates clear outlying models in terms of tropospheric ozone. UMUKCA-UCAM simulates the smallest amount of tropospheric ozone (14.9 DU in the global mean Fig. 6o); however it only contains one NMVOC (formaldehyde) and does not lump NMVOCs together in the way that many other CCMs do. This means that additional NMVOC source gases are not considered by substituting with represented species, such as in SOCOLv3, whereby additional NMVOCs are included in the form of CO. Of the CCMI models, SOCOLv3.0 simulates the largest global-mean tropospheric ozone column, of 40.2 DU (Fig. 6a). In ULAQ-CCM, the zonal bands of large ozone abundances at northern and southern midlatitudes are related to the model's coarse horizontal resolution (5.6 • × 5.6 • ), which affects surface fluxes and tropospheric transport (Orbe et al., 2018).  Interestingly, EMAC-L90 simulates a better representation of tropospheric column ozone than EMAC-L47, despite the fact that EMAC-L90 has three fewer model levels between the surface and 300 hPa than EMAC-L47 and a longer time step. The difference in tropospheric column ozone between the two models likely results from the increased vertical resolution around the tropopause in EMAC-L90, which has 11 levels between 300 and 100 hPa compared with 7 in EMAC-L47, meaning that EMAC-L90 better simulates stratospheretroposphere exchange.  Fig. 8a was calculated for all models, while the MMM in Fig. 8d was calculated only for models with a RMSE less than 10 DU, as indicated in Fig. 7 i.e. all models except SOCOLv3.0, ACCESS CCM, EMAC-L47, ULAQ-CCM and UMUKCA-UCAM. The CCMI models simulate a global-mean tropospheric ozone abundance of 31.1 DU (Fig. 8a) and 30.2 DU (Fig. 8d), depending on the MMM definition applied. Both global-mean MMMs are close to the OMI/MLS global mean of 28.6 DU (Fig. 2b); however the MMMs differ markedly from OMI/MLS in terms of the global tropospheric ozone distribution. Compared to OMI/MLS, the models overestimate tropospheric column ozone almost everywhere between 60 • N and 60 • S (the region where OMI/MLS data are available), regardless of the MMM definition. The exception is at southern mid-latitudes, where the models underestimate tropospheric ozone compared to OMI/MLS. When the MMM is calculated for all models, the positive bias is up to 50 %, and the negative bias reaches up to −33 % (Fig. 8c). When models with an RMSE > 10 DU are discarded from the MMM, the negative bias is largely unchanged at −32 %, but the positive bias is reduced, and reaches up to 40 % (Fig. 8f).
These results broadly agree with models evaluated as part of ACCMIP , and phase 5 of the Coupled Model Intercomparison Project (CMIP5) (Eyring et al., 2013a), which used the same ozone precursor emissions as for CCMI. The ACCMIP models simulated, on average, up to 30 % more tropospheric column ozone compared with OMI/MLS at northern mid-latitudes . The global-annual-mean tropospheric ozone column simulated by these models was 30.8 DU, calculated from 15 models. For the 18 CHEM models participating in CMIP5 (those models with interactive chemistry, i.e. ozone was calculated online and not prescribed from a climatology), the climatological-mean annual-mean MMM averaged over 2000-2005 was 30.5 DU (Eyring et al., 2013a), which is similar to the MMMs calculated here. The CMIP5 and ACCMIP MMMs also show a stronger interhemispheric gradient than OMI/MLS observations do, consistent with our findings.
The standard deviation on the MMM is up to 11.3 DU when calculated for all models (Fig. 8b), and reduces to a maximum of 9.5 DU when calculated for only the models with RMSE < 10 DU (Fig. 8e). The variability between models is largest at northern mid-latitudes and in the continental outflow region off the west coast of Africa.

Discussion and conclusions
Despite using the ozone precursor emissions recommended for CCMI, SOCOLv3.0 simulates the largest global-mean tropospheric ozone abundance of all the CCMI models (Fig. 6), and exhibits a bias of ∼ 30 DU regionally compared with OMI/MLS observations (Fig. 2c). The CCMI MMM is biased high in the Northern Hemisphere and low in the Southern Hemisphere compared with OMI/MLS ( Fig. 8c and f), consistent with previous studies (ACCMIP and CMIP5). Although ACCMIP, CMIP5 and CCMI all used the same emissions inventories, it is nevertheless interesting that they all produced very similar global-mean tropospheric ozone abundances (approximately 30 DU), given the different foci of the different model intercomparison activities; CCMI focussed on models coupling the stratosphere and troposphere, while CMIP5 focussed on coupling the atmosphere and ocean.
We have developed a new model version, SOCOLv3.1, which includes an upgraded treatment of tropospheric ozone sink processes. This results in a reduction in tropospheric ozone of up to 8 DU (Fig. 2e), which is mostly due to the inclusion of N 2 O 5 hydrolysis on tropospheric aerosol. SO-COLv3.1 still exhibits a positive bias in tropospheric column relative to OMI/MLS (particularly in the Northern Hemisphere), but simulates tropospheric column ozone amounts that are much more comparable with the other CCMI models. Reducing SOCOL's tropospheric ozone bias is expected to lead to improvements in the simulated abundance of species which are oxidized by the hydroxyl radical, such as CO and CH 4 , since ozone is the primary source of OH. Revell et al. (2015) showed that CO in SOCOLv3 was up to 40 ppbv too low in the Northern Hemisphere compared with observations from TES, due to the tropospheric ozone bias. In SO-COLv3.1, the Northern Hemisphere CO bias is reduced by approximately a factor of 2 (not shown).
We have quantified the contribution to tropospheric ozone variance in SOCOLv3.1 from nine model forcings/parametrizations using GP emulation and sensitivity analysis. By switching off the coupling between chemistry and radiation in the emulator experiments, we aimed to limit dynamical and meteorological variability. We did not consider stratosphere-troposphere exchange in our emulator experiments. Staehelin et al. (2017) showed that SOCOLv3.0's ozone burden due to stratospheric influx, when calculated from ozone origin tracers as described by Garny et al. (2011) and Revell et al. (2015), is close to the multi-model mean values from the ACCMIP and ACCENT ensembles. Therefore, STE is unlikely to be a major driver of SOCOLv3's tropospheric ozone bias. To the best of our knowledge, this is the first time that GP emulation has been applied to global tropospheric ozone modelling. By selecting a relatively small number of model forcings/parametrizations and focussing largely on tropospheric ozone chemistry we aim to demonstrate the utility of the methodology; however it could also be extended to explore the variability in tropospheric ozone due to meteorological parameters.
Our GP emulation experiments and sensitivity analysis illustrate that the ozone precursors NO x , CH 4 , CO and NMVOCs are responsible for more than 90 % of the variance in tropospheric column ozone in the improved model version, SOCOLv3.1. While CH 4 is prescribed as a surface mixing ratio, the other ozone precursors are specified from emissions inventories. Collating emissions inventories is challenging as they are typically compiled using a bottom-up approach. Anthropogenic emissions must rely on accurate reporting, while for biogenic emissions there are no reporting requirements. Furthermore, emissions are generally prescribed in global models as monthly means, and thus do not reflect diurnal or weekly variability (Young et al., 2018). Hassler et al. (2016) identified that current global emissions inventories do not capture trends in the NO x /CO ratio, and previous multimodel studies have also identified potential deficiencies with the inventories Parrish et al., 2014). Jena et al. (2015) and Zhong et al. (2016) showed that different NO x emissions inventories can significantly alter simulated tropospheric ozone.
However, it may not be the emissions used for CCMI themselves that are incorrect, but rather problems in how they are handled in global models. Given the coarse grid sizes necessary to run a global model and still retain computational efficiency, resolution -horizontal, vertical and temporal -is likely important for simulating tropospheric ozone, especially in polluted regions where very large emissions in an urban environment may be spread over a model grid cell spanning thousands of square kilometres. In global models, polluted air coming from a point source is considered to be well mixed throughout a large grid cell, which would generally lead to more efficient ozone production (Young et al., 2018). Horizontal and vertical resolution are difficult to test in an emulator sensitivity study as presented here; however by examining the CCMI models collectively (Morgenstern et al., 2017), we can derive some insights. For example, we note that GEOSCCM, HadGEM3-ES and the CESM1 models (CAM4Chem and WACCM), which simulate the smallest RMSEs relative to OMI/MLS (Fig. 7d, e, j, k), have fairly high horizontal resolution relative to other CCMs, of 2 • × 2 • , 1.875 • × 1.25 • and 1.9 • × 2.5 • , respectively. Of the models analysed in this study, HadGEM3-ES also has the largest number of levels in the troposphere (48). Similarly, tropospheric ozone in the EMAC model with 90 levels (EMAC-L90) compares better with observations than the 47-level version (EMAC-L47) (Fig. 7h, i), which may be due to a more realistic simulation of the ozone gradient across the tropopause (Sect. 3.3).
Another respect in which SOCOLv3.0 is an outlier amongst the CCMI models is its chemical time step of 2 h. The other models analysed in this study have chemical time steps ranging from 6 min (CCSRNIES-MIROC3.2) to 1 h (the models based on the UK Met Office Unified Model, i.e. HadGEM3-ES, NIWA-UKCA, ACCESS and UMUKCA-UCAM). In a sensitivity test, SOCOLv3.0's chemical time step was reduced to 15 min, which reduced the ozone burden in polluted urban areas by approximately 5 DU (not shown). To test how SOCOL responds to prescribing a surface mixing ratio of NO x rather than an emissions flux, we performed a further sensitivity simulation in which surface NO 2 mixing ratios from the CESM1 WACCM REF-C1 simulation were prescribed instead of NO x emissions. This also resulted in a reduction of tropospheric ozone of up to 5 DU. In reality there is likely no single solution for reducing SO-COLv3.0's excessive tropospheric ozone bias; however assuming that the prescribed emissions are correct, then increasing the model's spatial and temporal resolution within the bounds of computational efficiency will likely reduce the bias.
We have shown the importance of ozone precursor emissions for simulating the tropospheric ozone budget with SO-COLv3.1. This is in line with the findings of Revell et al. (2015), who analysed three SOCOLv3.0 simulations for the period 1960-2100: REF-C2 (based on RCP 6.0), SEN-C2-fEmis (NO x , CO and NMVOC emissions fixed at constant 1960 levels) and SEN-C2-fEmis-fCH 4 (similar to SEN-C2-fEmis but with surface methane concentrations also fixed at constant 1960 levels). They showed that future global ozone abundances are governed largely by changes in methane and NO x , with methane causing an increase in tropospheric ozone that is approximately one-third of that caused by NO x . Future work should investigate how tropospheric ozone evolves in future under the various CCMI sensitivity scenarios in all CCMI models.
Finally, phase 6 of the Coupled Model Intercomparison Project (CMIP6) will use the emissions data set described by Hoesly et al. (2018). In this data set, year 2000 NO x emissions are ∼ 20 % larger than the emissions used for CCMI . Therefore, simulated ozone biases by the current generation of CCMs will likely be amplified in CMIP6.
Given the results of our multi-model intercomparison as well as previous multi-model studies, our results highlight the need for careful validation of emissions inventories used by global models. However, the way in which emissions are handled by the models also appears to result in biased ozone abundances, and further work is needed to address the challenges of simulating sub-grid processes of importance to tropospheric ozone, in SOCOLv3 as well as in other CCMs. GP emulation may prove a useful tool for such studies, and we have demonstrated its usefulness for understanding tropospheric ozone biases. GP emulation is a powerful tool, and should be considered for use by those wanting to perform detailed sensitivity analyses at low computational cost.
Data availability. The CCM data used here (except the CESM1 data) are held at the Centre for Environmental Data Analysis (CEDA; http://data.ceda.ac.uk/badc/wcrp-ccmi/data/CCMI-1/, last access: 11 November 2018). CESM1 WACCM and CESM1 CAM4chem data were downloaded from http://www.earthsystemgrid. org (last access: 11 November 2018). For instructions for access to both archives see http://blogs.reading.ac.uk/ccmi/ badc-data-access (IGAC/SPARC Chemistry-Climate Model Ini-tiative, 2018). GEOSCCM data were provided directly by Luke D.Oman to replace the GEOSCCM data currently held in the CEDA archive. SOCOLv3.1 data are available by contacting Laura E. Revell. Matrices for training and testing the GP emulator are in the Supplement.
Author contributions. LER and AS designed the experiments and interpreted the output, assisted by FT and AF. All other authors provided information pertaining to their model. LER performed the SOCOL sensitivity simulations and emulator analysis, and wrote the paper with assistance from all other authors.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Chemistry-Climate Modelling Initiative (CCMI) (ACP/AMT/ESSD/GMD inter-journal SI)". It is not associated with a conference.
Acknowledgements. We acknowledge the modelling groups for making their simulations available for this analysis, the joint WCRP SPARC-IGAC Chemistry-Climate Model Initiative (CCMI) for organizing and coordinating the model data analysis activity and the British Atmospheric Data Centre (BADC) for collecting and archiving the CCMI model output. The EMAC simulations were performed at the German Climate Computing Centre (DKRZ) through support from the Bundesministerium für Bildung und Forschung (BMBF). DKRZ and its scientific steering committee are gratefully acknowledged for providing the HPC and data archiving resources for this consortial project ESCiMo (Earth System Chemistry integrated Modelling). We acknowledge the UK Met Office for use of the MetUM. This research was partially supported by the New Zealand government's Strategic Science Investment Fund (SSIF) through the NIWA programme CACV. Olaf Morgenstern acknowledges funding by the New Zealand Royal Society Marsden Fund (grant 12-NIW-006). The authors wish to acknowledge the contribution of NeSI high-performance computing facilities to the results of this research. New Zealand's national facilities are provided by the New Zealand eScience Infrastructure (NeSI) and funded jointly by NeSI's collaborator institutions and through the Ministry of Business, Innovation and Employment's Research Infrastructure programme (https://www.nesi.org.nz, last access: 11 November 2018). Fiona Tummon was supported by SNSF grant number 20F121_138017. ACCESS-CCM runs were supported by the Australian Research Council's Centre of Excellence for Climate System Science (CE110001028), the Australian government's National Computational Merit Allocation Scheme (q90) and the Australian Antarctic science grant program (FoRCES 4012). The HadGEM3-ES simulations from the Met Office were supported by the Joint UK BEIS-Defra Met Office Hadley Centre Climate Programme (GA01101) and the European Commission's Seventh Framework Programme StratoClim project (grant agreement 603557). CCSRNIES research was supported by the Environment Research and Technology Development Fund (2-1303 and 2-1709) of the Ministry of the Environment, Japan, and computations were performed on NEC-SX9/A(ECO) computers at the CGER, NIES. UMUKCA-UCAM model integrations were performed using the ARCHER UK National Supercomputing Service and MONSooN system, a collaborative facility supplied under the Joint Weather and Climate Research Programme, which is a strategic partnership between the UK Met Office and the Natural Environment Research Council. Laura E. Revell thanks China Southern for partial support. The authors thank Edmund Ryan and one anonymous reviewer for their helpful and constructive comments.
Edited by: Paul Young Reviewed by: Edmund Ryan and one anonymous referee