A machine learning examination of hydroxyl radical differences among model simulations for CCMI-1

The hydroxyl radical (OH) plays critical roles within the troposphere, such as determining the lifetime of methane (CH4), yet is challenging to model due to its fast cycling and dependence on a multitude of sources and sinks. As a result, the reasons for variations in OH and the resulting methane lifetime (τCH4 ), both between models and in time, are difficult to diagnose. We apply a neural network (NN) approach to address this issue within a group of models that participated in the Chemistry-Climate Model Initiative (CCMI). Analysis of the historical specified dynamics simulations performed for CCMI indicates that the primary drivers of τCH4 differences among 10 models are the flux of UV light to the troposphere (indicated by the photolysis frequency JO1D), the mixing ratio of tropospheric ozone (O3), the abundance of nitrogen oxides (NOx ≡ NO+NO2), and details of the various chemical mechanisms that drive OH. Water vapour, carbon monoxide (CO), the ratio of NO : NOx , and formaldehyde (HCHO) explain moderate differences in τCH4 , while isoprene, methane, the photolysis frequency of NO2 by visible light (JNO2), overhead ozone column, and temperature account for little to no model variation in τCH4 . We also apply the NNs to analysis of temporal trends in OH from 1980 to 2015. All models that participated in the specified dynamics historical simulation for CCMI demonstrate a decline in τCH4 during the analysed timeframe. The significant contributors to this trend, in order of importance, are tropospheric O3, JO1D, NOx , and H2O, with CO also causing substantial interannual variability in OH burden. Finally, the identified trends in τCH4 are compared to calculated trends in the tropospheric mean OH concentration from previous work, based on analysis of observations. The comparison reveals a robust result for the effect of rising water vapour on OH and τCH4 , imparting an increasing and decreasing trend of about 0.5 % decade−1, respectively. The responses due to NOx , ozone column, and temperature are also in reasonably good agreement between the two studies.


Introduction
The hydroxyl radical (OH) is a key species of interest for numerous tropospheric chemistry studies over the past several decades (e.g. Prinn et al., 1987Prinn et al., , 1992Montzka et al., 2011;Prather et al., 2012;Holmes et al., 2013;Murray et al., 2013;Naik et al., 2013;Voulgarakis et al., 2013;McNorton et al., 2016;Rigby et al., 2017;Turner et al., 2017). As a result of its role as the primary daytime oxidant in the lower atmosphere, OH determines how quickly many tropospheric gases and aerosols degrade or transform chemically. Notably, loss of atmospheric methane (CH 4 ) is dominated by its reaction with OH. Uncertainties in the abundance of OH at the global scale, coupled with source terms of methane that are difficult to quantify, have driven disagreement in the causes of recent variations in the methane growth rate (Nisbet et al., 2019;Turner et al., 2019). As a key element in the methane budget, tropospheric OH must be studied further to clarify its present-day abundance as well as its variability over time.
Numerous studies have sought to constrain the OH abundance and resulting methane lifetime (τ CH 4 ) using observations, global atmospheric models, and combinations of the two. Historically, chemical inversion of methyl chloroform (MCF: CH 3 CCl 3 ) comprised the primary method capable of gleaning information about global-scale OH burdens (Lovelock, 1977;Prinn et al., 1987;Ravishankara and Albritton, 1995;Krol et al., 1998;Montzka et al., 2000;Bousquet et al., 2005), though additional species that are lost by reaction with OH were also tested for this purpose (Weinstock and Niki, 1969;Singh, 1977;Miller et al., 1998;Jöckel et al., 2002;Nisbet et al., 2016Nisbet et al., , 2019. Models have likewise been relied upon to derive tropospheric OH abundance and its evolution. Stevenson et al. (2006) found a large spread in τ CH 4 (6.3 to 12.5 years) from a suite of atmospheric chemistry models in an analysis performed more than a decade ago. A total of 7 years later, the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP) generated both historical  and future  simulations from numerous chemistry-climate models, revealing still large discrepancies not only in present-day τ CH 4 (with values ranging from 7.1 to 14.0 years) but also in how τ CH 4 is expected to vary through the year 2100 given common emissions scenarios. Note that, here and throughout, τ CH 4 refers to the lifetime of methane due to reaction with tropospheric OH only. Most recently, the confluence of observations with advanced modelling techniques has enabled sophisticated analyses of global OH (Prather et al., 2012;Holmes et al., 2013;McNorton et al., 2016;Rigby et al., 2017;Turner et al., 2017). Despite the advent of numerous observing systems for species with some bearing on OH chemistry in the last several decades, it is widely acknowledged that current observations are insufficient to unambiguously derive current trends in OH (Prather and Holmes, 2017;Turner et al., 2017Turner et al., , 2019Nisbet et al., 2019).
While global models are insufficient for clarifying the outstanding questions regarding OH and τ CH 4 on their own, they can serve as valuable testbeds in which to evaluate the factors influencing OH chemistry. At the global scale, the dominant reactions responsible for producing, cycling, and sequestering OH (see, e.g.  are well characterized and represented, to varying degrees of explicitness, in modern chemical mechanisms. Despite general consensus on the immediate drivers of OH chemistry, large differences in OH can manifest due to infrequently diagnosed differences in, e.g. ultraviolet (UV) flux to the troposphere (needed to initiate ozone photolysis for subsequent OH primary production) due to variations in cloud parameterizations and radiative transfer codes. Similarly, differences in the representations of volatile organic compound (VOC) ox-idation pathways can influence the extent to which OH is recycled following reactions with hydrocarbons. Such nuances in the chemistry of OH make OH differences between models notoriously difficult to attribute. With properly coordinated simulations and sufficient model output, however, we have demonstrated that the barriers posed by complex, non-linear chemistry can be overcome.
The multi-dimensional system that describes OH behaviour is well suited for study via machine learning approaches. We have previously demonstrated the utility of neural networks (NNs) for quantifying differences in OH among a small group of chemical transport models (CTMs), which rely on the specification of meteorological conditions (Nicely et al., 2017). Other groups have similarly shown the promise of machine learning techniques to better parameterize within models such complex processes as convection (Gentine et al., 2018), radiative transfer (Krasnopolsky et al., 2009), ozone production (Nowack et al., 2018), and deposition (Silva et al., 2019), and to replace the numerical integrators that simulate chemistry within models (Keller and Evans, 2019). NNs in particular are capable of modelling complex non-linear functions, making them a suitable technique for studying the non-linear chemistry involved in OH production and loss. The community continues to develop best practices for harnessing the power of machine learning for applications in atmospheric science. We build here on the specific application of NNs to better understand model representations of OH.
In this study, we apply an NN approach to quantifying the causes of OH differences to the large group of models that participated in the Chemistry-Climate Model Initiative . We repeat our earlier analysis that identifies the primary drivers of OH and τ CH 4 differences among model simulations conducted with specified dynamics, for a single year. We then expand the approach to study temporal variations in OH for 1980-2015, allowing for attribution of trends and interannual variability in τ CH 4 to specific parameters. Finally, we compare the derived trends in OH simulated by the CCMI models to trends derived from a previous observation-based study.
2 Model simulations CCMI, carried out as an official activity of the International Global Atmospheric Chemistry (IGAC) and the Stratosphere-troposphere Processes And their Role in Climate (SPARC) communities, seeks to enable intermodel evaluation of chemistry-climate models . Phase 1 of CCMI has designed a set of simulations, covering both historical and future timeframes, with prescribed emissions inventories such that the interactive chemistry and its interplay with dynamical and radiative processes can be robustly compared between models. The analysis presented here focuses on one simulation, the historical specified dy-namics (SD) simulation from 1980 to 2010 (REF-C1SD) (Hegglin and Lamarque, 2015;Morgenstern et al., 2017). Details of the emissions inventories recommended for this simulation can be found in Eyring et al. (2013). We have also performed the intermodel comparison portion of this analysis (Sect. 3.2) for the historical free-running simulation conducted from 1960 to 2010 (REF-C1). However, since a comprehensive examination of OH within the REF-C1 simulations was conducted by Zhao et al. (2019), those results are presented in the Supplement. We also include output from models that are not formal participants in CCMI but provided simulations comparable to those being used here. These additional models are described below. Monthly mean fields are used for the various chemical, physical, and radiative parameters necessary for evaluating OH, described in Sect. 3. We analyse all models that include and provided output for the complete list of these variables.
Models that participated in the REF-C1SD simulation were nudged toward reanalysis meteorological fields such that dynamical conditions are represented with historical accuracy. The details of how nudging -of the winds, temperature, and sometimes pressure and water vapour fields -is conducted can be found in Morgenstern et al. (2017, Table S30). The nudging of these models to common fields does not necessarily improve model agreement, however, as in the case of large-scale tropospheric transport . Particularly relevant to this analysis is the nudging of specific humidity, which is only performed in the MOCAGE model, of the models we analysed. Models that produced REF-C1SD simulations for CCMI and provided the necessary output to complete this analysis include CAM4-Chem (Tilmes et al., 2016), EMAC-L47MA, EMAC-L90MA (Jöckel et al., 2016), MOCAGE (Josse et al., 2004;Guth et al., 2016), MRI-ESM1r1 (Deushi and Shibata, 2011;Yukimoto et al., 2012), and WACCM (Marsh et al., 2013;Solomon et al., 2015;Garcia et al., 2016). For both configurations of the EMAC model, the simulations that included nudging of wave-0 temperatures were used (Jöckel et al., 2016). All models, here and including those described below, include interactive stratospheric chemistry.
Four models also contributed SD-type simulations to be analysed alongside the REF-C1SD CCMI simulations. The Goddard Earth Observing System (GEOS) model (Molod et al., 2015) conducted a "Replay" run, meaning the general circulation model computes its own meteorological fields for a 3 h simulation period, then calculates the increment necessary to match a pre-existing reanalysis dataset, in this case the Modern-Era Retrospective Analysis for Research Applications version 2 (MERRA-2). The increment is then applied as a forcing to the meteorology at every time step during a second run of the same simulation period. This simulation includes full interactive tropospheric and stratospheric chemistry from the Goddard Modeling Initiative (GMI) chemical mechanism (Nielsen et al., 2017) with output for the years 1980-2018 at 0.625 • × 0.5 • horizontal res-olution and 72 vertical levels (Orbe et al., 2017;Stauffer et al., 2019;Wargan et al., 2018). This simulation is referred to as "GEOS Replay". Additionally, three chemical transport models (CTMs), which directly rely on established meteorological fields such as MERRA-2 rather than calculate them, provided output used in this analysis. The OsloCTM and GEOS-Chem CTM output all required variables for the year 2000, while the GMI CTM (Strahan et al., 2013) simulated the full 1980-2015 period. All CTMs except GEOS-Chem calculate water vapour interactively in the troposphere. GEOS-Chem instead uses specific humidity fields from the MERRA reanalysis. We note that, while the GEOS Replay simulation described above used the GMI chemistry package, all discussion of the simulation from "GMI" refers to the separate, stand-alone CTM. While CTMs read in and use external meteorological fields rather than "nudging" or "replaying" internally calculated fields, we expect them to similarly represent realistic meteorological conditions for a given year. As such, we group them with the REF-C1SD simulations from CCMI, bringing the total number of SD-type simulations analysed to 10.

Neural network setup
Neural networks are generated to predict the monthly mean OH mixing ratio for a given model following the method outlined in Nicely et al. (2017). Briefly, four NNs are trained for one model, each for one simulation month. To reduce the computational demands of NN training, we only establish NNs for four months, one for each season: January, April, July, and October. Separate NNs are trained for the SD (main text) and free-running (Supplement) simulations, and all training is performed with output from the year 2000. Each model grid box located below the tropopause (thermal, following the WMO definition, for all models except GEOS Replay, which uses a "blended" tropopause calculation combining thermal and potential vorticity definitions) is a single sample, so sample sizes are determined by a model's vertical and spatial native resolution. The number of tropospheric model grid points, and thus the training dataset sample size, is indicated for each model in Table S1 in the Supplement and always exceeds 100 000. Because separate NNs are trained for each month, and monthly mean output from each model simulation is used as input and training data, the dataset does not represent diurnal variations in OH chemistry. The training process adjusts weighting factors such that mixing ratios of OH are predicted accurately when 3-D fields of the following variables are input to the NN: pressure, latitude, temperature (T ), ozone (O 3 ), specific humidity (H 2 O), methane (CH 4 ), the sum of nitrogen oxide and nitrogen dioxide (NO x ≡ NO + NO 2 ), the ratio NO : NO x , carbon monoxide (CO), isoprene (ISOP = C 5 H 8 ), formaldehyde (HCHO), the photolysis frequency of NO 2 (J NO 2 ), the photolysis frequency of ozone to excited state O( 1 D) (J O 1 D), and stratospheric ozone column (O 3 COL). Note that many of the inputs covary with one another depending on the chemical regime or meteorological conditions. A strength of the NN approach is that the inputs chosen need not be independent of each other. The NO x and NO : NO x inputs are calculated using monthly mean NO and NO 2 fields. All chemical species are input to the NN as unitless mixing ratios, except for methane, which is normalized by the maximum tropospheric value and indicated by the notation CH NORM 4 . This normalization enables direct comparison of methane distributions between models, despite the fact that the use of boundary conditions sometimes results in substantially different amounts of methane between models. (While the CCMI models generally used roughly consistent boundary conditions, the additional simulations that were not formally part of CCMI exhibit methane concentrations outside the ranges of those in the CCMI models.) Pressure is provided in units of hPa, temperature in K, photolysis frequencies in s −1 , and O 3 COL in Dobson units (DU). Three of the inputs -HCHO, NO : NO x , and O 3 COL -have been introduced to this analysis after the work of Nicely et al. (2017), due to availability of output from all models and to the added information they encompass that may be relevant for OH chemistry. For instance, having knowledge of the partitioning of NO x likely enables one to more accurately predict OH quantities compared to knowing just the total abundance of NO x . Likewise, the introduction of O 3 COL is somewhat redundant when its primary effect on OH is through attenuation of ultraviolet (UV) flux to the troposphere, which is already encompassed by the input J O 1 D. However, J O 1 D is also altered by other factors such as clouds, which cannot as easily be included as an input for this analysis (some models provide 2-D cloud fraction fields, others output 3-D fields, and still others do not give any metric regarding clouds). Whether strong differences in J O 1 D are caused by clouds or overhead ozone should be clarified by inclusion of O 3 COL as an input.
The neural network architecture is consistent with that of Nicely et al. (2017) and is shown in Fig. 1. However, the number of computational nodes was doubled from 15 to 30 given the availability of more powerful computing resources. Two hidden layers each containing 30 nodes provided strong performance of the NN in reproducing the OH mixing ratios from a given model. For training, the model output is randomly split 80 %/10 %/10 % into training, validation, and test datasets. During that process, the data from the training set are used to actively adjust weighting factors, and the validation set is evaluated to determine a training stopping point. When errors in predicting the validation data grow after adjusting weighting factors some number of iterations in a row, it is determined that the NN model prior to the growth in errors likely reached a local minimum in its cost function. This manner of "early stopping" helps to prevent overfitting, though application of the NNs to alternative years Blue boxes designate inputs (left) and output (right), red triangles indicate bias terms, green circles indicate nodes at which activation functions are performed, and grey arrows represent the weighting terms, which are optimized through the training process. For full details of the neural network setup and training, we refer readers to Nicely et al. (2017). Although 15 nodes are shown here in each hidden layer, 30 were actually used for all NNs in this study.
is not immune to overfitting, an issue discussed further in Sect. 4.3.1. For further application of this method across varying timescales, we would recommend a more methodical approach to sampling model output in time as well as in space. The final 10 % of data is then used to independently test the resulting NN and compare between different training iterations. A total of five trainings were performed for each NN, and the NN with best performance (evaluated by the correlation coefficient from comparison of NN-calculated and model-simulated OH values) was chosen as the NN to be used in further analysis. Further details of the training process and evaluation metrics can be found in Nicely et al. (2017).
We note that alternative machine learning algorithms have seen increased application to problems within atmospheric science in the last few years and may be equally or even better suited than neural networks for studying non-linear chemical systems. In particular, random forest regressions and gradient-boosting techniques offer greater computational efficiency and, in the case of random forests, have the capability to quickly identify which inputs are most strongly influencing the calculated output, known as "feature importance" (Hu et al., 2017;Grange et al., 2018;Liu et al., 2018;Keller and Evans, 2019). Additionally, linear regression algorithms such as Ridge and Lasso regression may be beneficial in curbing issues related to extrapolation. We also do not intend to suggest that our chosen NN input list, architecture, and general method are the best approach; input vari-ables were largely determined by available output, and architecture testing was conducted on the computing resources available at the time of the study. It is possible that a single NN could suffice for predicting OH variations throughout an entire year, rather than for just a single month, following methodical subsampling methods to create the initial training dataset. As such, we encourage exploration of modifications to this method as well as additional algorithms for future machine learning applications to atmospheric chemistry.

Intermodel comparison approach
Once NNs are established for each model, an analysis is conducted to quantify the OH and τ CH 4 differences attributable to individual input terms. To accomplish this, each model, A, is paired with another model, B, such that one input to the NN of model A is substituted with the same field from model B. All other inputs are held fixed, using fields from model A for the year 2000. Fields are interpolated to the resolution of the native model, A in this case, bilinearly across latitude and longitude, and linearly in log(pressure) space for the vertical coordinate. Any resulting changes in OH can then be directly attributed to the substituted variable.
The "swaps" that are performed in the manner described above undergo a process we refer to as "extrapolation control," which restricts the substituted variable from leaving the range of values over which the native model's NN was trained. For example, if O 3 is being substituted from CAM4-Chem into the GMI NN, we not only check that a given CAM4-Chem O 3 value lies within the minimum and maximum GMI tropospheric O 3 values but also that the GMI value of CO at that grid point can be associated with the new CAM4-Chem O 3 value. This check is performed across all variables and essentially prevents the substitutions from venturing too far outside of the chemical regimes simulated within the native model. In the case that a swapped variable exceeds the acceptable range of values, it is revised up or down accordingly. For reference, we tally the number of instances in which extrapolation control is invoked for two categories: coarse adjustments, when a NN input value from another model falls entirely outside the range of the NN input values from the native model, and fine adjustment, when a value from another model must be tweaked to preserve the native model's chemical regimes. On average, coarse adjustments are incurred for 3.5 % of all swapped data points, while fine adjustments are made to 18.8 % of swapped values. We find that extrapolation control is critical to achieve meaningful results with the NN intermodel comparison method, though it necessarily forces the attributed changes in OH and τ CH 4 to be conservative estimates.
Metrics used to evaluate the results of variable swaps include tropospheric OH integrated columns for visualization and changes in τ CH 4 for a globally summed quantity. Tropospheric columns are integrated vertically and weighted by the mass of methane and the temperature-dependent rate con-stant of reaction between OH and methane. The global mean lifetime of methane is found using Eq. (1): where M air is the mass of air within a grid box, brackets denote number density, χ denotes mixing ratio, k OH+CH4 is the reaction rate constant for the OH + CH 4 reaction calculated for each grid box temperature, and summations are performed over all tropospheric model grid boxes. This formulation is equivalent to the standard lifetime calculation of burden divided by loss rate, adapted to the quantities most directly related to model outputs available (Chipperfield et al., 2014). Again, we note that this is strictly the atmospheric lifetime of methane with respect to loss by tropospheric OH. If one additionally includes all stratospheric grid boxes within the above summation, annual average lifetimes of almost all models consistently increase by ∼ 1.2 years.

Time series evaluation approach
A new element of this analysis applies the alreadyestablished NNs of each model to examine the time evolution of OH over several decades of simulation. For this, we focus on the REF-C1SD simulation set, as it contains the most realistic representation of historical emissions and meteorological conditions and thus is most likely to resemble true OH variations. All models that provided SD-type simulations as described in Sect.

Results and discussion
4.1 Native model and NN performance Figure 2 shows values of τ CH 4 found for all models that produced SD-type simulations. Annually and globally averaged lifetimes vary from 6.59 years (OsloCTM) to 8.41 years (GMI). All models exhibit the expected seasonal variation in τ CH 4 , with minimum values in the Northern Hemisphere (NH) summer months due to higher OH at this time of year. Specifically, the seasonal variation in the global mean is a result of greater anthropogenic influence in the NH and resulting increases in concentration of two OH precursors: ozone and NO x . An example of NN performance is shown for the January WACCM model in Fig. 3, relative to the native model OH fields. Tropospheric OH columns are shown for the model and NN alongside the absolute value of the difference between the two. In general, the NNs from all models show similar magnitudes and spatial patterns in their calculated OH field, with errors somewhat randomly scattered and maximizing locally to values of ∼ 10 % of the total column value. Figures S1-S4 in the Supplement show the performance of all NNs, for each of 10 SD-type model simulations and for each of the four months, while Table S2 provides further statistics on all NNs used here. Performance of all model NNs for the year 2000 is strong, with values of τ CH 4 calculated from the NN-generated OH field within 0.006 years of the parent model's τ CH 4 on average. The maximum error in τ CH 4 , an overestimate by 0.012 years, occurs for the MRI-ESM1r1 model in the month of January. Performance is generally poorest in boreal winter, with average offsets in τ CH 4 of 0.007 years, and strongest in boreal summer, for which the mean bias is only 0.004 years.

Intermodel comparison
The intermodel comparison component of this analysis can be understood fundamentally by the OH and τ CH 4 differences generated by substituting input fields between models. An example of the OH column and τ CH 4 changes that are calculated through individual variable swaps is shown in Fig. 4. The two models with the highest and lowest values of τ CH 4 , GMI and OsloCTM, respectively, are chosen for this example. Swaps performed between the two models for the month of January reveal that local O 3 , J O 1 D, HCHO, and NO x account for the largest differences in τ CH 4 for this particular model pairing. A complete budgeting of the changes in τ CH 4 attributable to all inputs for GMI and OsloCTM is shown in Table 1. Note that the values of τ CH 4 shown in Table 1 correspond to lifetimes for the month of January rather than annual averages and so will differ from the lifetimes noted at the beginning of Sect. 4.1. It is worth discussing several features that are evident in the visualized OH changes shown in Fig. 4. First is the spatial distribution of the OH variations. Depending on how the sink or source term undergoing the swap affects OH chemistry, the strongest impacts may occur in localized areas or may distribute evenly over the globe. For instance, varying local ozone and NO x (Fig. 4a, b and g, h, respectively) exert the greatest influence on OH over the climatological tropics, with maximum impacts over land but extending over the oceans as well. This is likely a result of the anthropogenic or biomass burning emissions sources, which limit the largest differences in ozone and NO x between the two models to areas proximate to the South American, African, and In- donesian source regions for the month of January. The OH changes resulting from substitutions of the inputs J O 1 D and HCHO, however, are distributed over oceans as well as over land masses and, in the case of HCHO, are strongest in remote marine regions. This pattern is common for species that influence OH chemistry through mechanisms that are largely independent of local emissions. In the case of HCHO, its role as a secondary source of OH through methane oxidation is relatively more important in the absence of large VOC concentrations; thus, its stronger influence is seen away from terrestrial vegetation.
The second feature to note in Fig. 4 is the symmetry between input swaps in opposing directions. In other words, the swap of an input from OsloCTM into the GMI NN generally yields OH column and τ CH 4 changes that are equal but opposite to the changes resulting from use of a GMI input in the OsloCTM NN. With few exceptions, almost all regions of OH increase (red) in one model's NN are matched by OH decreases (blue) in the other model's NN in Fig. 4. The changes in τ CH 4 are correspondingly similar in magnitude but opposite in sign. This behaviour is expected because a swap that may, e.g. increase an OH precursor and subsequently cause an increase in OH for one model will manifest as a decrease in that same precursor when the substitution occurs in the NN of the other model. While this pattern occurs for the vast majority of cases across all model pairings and swaps performed for this analysis, there are instances when symmetry is not maintained. This could happen for two reasons.
First, the sensitivities of the two models to a particular change in an OH precursor or sink could differ. For example, one model may be sensitive to an increase in isoprene, causing OH concentrations to drop in response. Another model may incorporate buffering effects, such as reactions involving oxidized volatile organic compounds (Taraborrelli et al., 2012;Lelieveld et al., 2016) that allow OH to be recycled following its reaction with isoprene, causing it to be less sensitive to the same change in methane. We refer to these variations in model sensitivities as chemical mechanism differences, as they are most likely a result of the chemical reactions, species representations, or reaction rates implemented within a model's chemical mechanism.
The second explanation for lack of symmetry in the OH response to a model swap is a forced asymmetry in the swapped inputs themselves, imposed by the extrapolation control technique described in Sect. 3.2. It is possible that the swap of an input in one direction, i.e. from Model A to Model B, could proceed with no alteration to the substituted variable, while the swap in the other direction, i.e. from Model B to A, results in the variable lying outside the trained range of Model A. The extrapolation control process will revise the substitute variable field from Model B, such that the difference between it and the native field from Model A is lessened. As such, the first swap into the NN of Model B will yield a larger magnitude change in the input as compared to the swap into the NN of Model A. The impact of these factors is indirectly quantified through a remainder term that falls out of a full budgeting analysis, described below.
A third consideration in interpreting the information presented in Fig. 4 is the conditions that must be met in order for a large change in OH to manifest through this analysis. First, the two models between which a swap is conducted must exhibit differences in the parameter of interest. Should the two models exhibit, e.g. very similar ozone fields, then swapping one model's O 3 with the other's will produce little difference in the NN-calculated OH. Second, the model must have some OH sensitivity to the variable being swapped. If a model is insensitive to changes in methane, swapping in a drastically different methane field may not cause a perceivable difference in OH. Therefore, the absence of an OH response does not necessarily mean that input fields are similar between models. Conversely, the existence of large OH changes indicates that differences in the swapped input field exist between the two models and that the native model demonstrates a dependence of OH on that input variable.
A fourth issue is the fact that NNs can exhibit some degree of random behaviour based on how they were trained and initialized. Our method involved training five NNs and selecting from those the one that performed best when compared to the independent test dataset. That single NN was used in all subsequent analysis. However, it is a useful exercise to evaluate the role of NN randomness in our results. We show, in Figs. S5 and S6, the left and right panels of Fig. 4, reproduced for the alternate NN trainings of the GMI and OsloCTM models, respectively. A visual comparison of tropospheric OH column differences among the five trainings of each model's NN reveals markedly similar spatial distributions and magnitudes. The values of calculated τ CH 4 changes ( τ CH 4 ) do differ somewhat between the training instances, with larger effects on some variable swaps than for others. For instance, the standard deviation of the values of τ CH 4 calculated for all five trainings of the GMI NN is about 0.2 years for the J (O 1 D) and HCHO swaps but less than 0.05 years for O 3 and NO x . We note, though, that some of the NNs displayed in Figs. S5 and S6 exhibit worse performance than the one ultimately chosen for subsequent use. As a result of this exercise, the uncertainties resulting from this analysis method may be considered, at most, to be ∼ 0.2 years.
The final point of interest in Fig. 4 is the general consistency in the signs of OH and τ CH 4 for each model. The substitutions of all four variables generally cause an increase in OH within the GMI NN (and corresponding decrease in τ CH 4 ) and a decrease in OH (increase in τ CH 4 ) within the OsloCTM. This feature is most pronounced for this particular pair of models due to our reasoning for choosing them: they exhibit the largest difference in τ CH 4 among our group of 10 models. Because the native GMI model has a longer τ CH 4 value compared to OsloCTM, it makes sense that incorporation of OsloCTM's various OH precursor and sink fields into the GMI NN will tend to decrease the GMI τ CH 4 , bringing it into closer agreement with that of OsloCTM. This characteristic points to the utility of this analysis as a budgeting tool for quantifying the cause of the difference in τ CH 4 between two models. The τ CH 4 accounting for the GMI and OsloCTM set of swaps conducted for January is shown in Table 1. When considering all 12 variable swaps that were performed, the NN analysis more than explains the original gap in τ CH 4 between the two models. The GMI January lifetime of 9.24 years is decreased to 6.71 years (τ ORIG + τ ) after summing all τ values, while the OsloCTM lifetime is increased from 7.18 years to 9.48. This budgeting rarely provides a perfect accounting of the τ CH 4 gap due to the same reasons that give rise to asymmetric OH responses to a given swap: chemical mechanism differences and asymmetric swaps of inputs due to extrapolation control. As a result, a remainder term, found as the difference between the other model's τ ORIG and the present model's τ ORIG + τ , is attributed to these factors. This term is listed in the last row of Table 1 with the label "Mech." Results from analysing individual model pairs reveal a multitude of insights regarding idiosyncrasies in emissions of, global distributions of, and OH sensitivities to the various input parameters. These results, available in the archived dataset described in the data availability section, may be especially useful to the reader with an interest in a particular species or model. However, with over 4000 plots (12 species × 10 models × 9 submodels × 4 months = 4320) and 180 τ CH 4 budget tables generated, it is beyond the scope of this paper to highlight and explain every interesting feature. Instead, we aggregate the results across all models to identify some primary conclusions. Figure 5 shows the change in τ CH 4 for a specific model and substituted input variable, averaged over all nine pairings. For example, the data point shown for CAM4-Chem J O 1 D is calculated from the nine τ CH 4 values obtained when swapping the J O 1 D fields from the other nine models into the CAM4-Chem NN. The circular point represents the mean of those nine values, while the whiskers indicate 1 standard deviation about the mean. Aggregate results shown in this manner are compiled both for individual months (available in the archived dataset noted above) as well as for annually averaged output. The latter is calculated as the average of the four monthly mean and standard deviation values, and is shown in Fig. 5.
As with the individual OH tropospheric column change plots (Fig. 4), numerous conclusions can be drawn by studying the aggregated results in Fig. 5. The method for reading the data in Fig. 5 is demonstrated in the following example. The mean τ CH 4 value attributable to J O 1 D for the WACCM model is +0.99 years. This indicates that use of J O 1 D fields from other models causes τ CH 4 to increase by ∼ 1 year, meaning the native J O 1 D field from WACCM imparts a low bias to τ CH 4 of 1 year, relative to the other models. A low τ CH 4 would result from OH concentrations being too high. Since OH and J O 1 D are positively correlated (i.e. J O 1 D can be thought of as a source for OH), the too-high OH is an indication of too-high J O 1 D. In general, positive values of τ CH 4 correspond to relative high biases in input parameters that are source terms for OH and to low biases for species that instead serve as sinks. This reasoning is less straightforward for species such as HCHO, which can both produce and consume OH, while it is also produced by OH-initiated oxidation. We stress that these comparisons are strictly relative to other models, not to any observation or other indication of truth. So, points that appear as outliers in Fig. 5 should not necessarily be interpreted as an erroneous result but rather considered an area for further examination.
The ordering of variables along the x axis of Fig. 5 denotes the average magnitude of τ CH 4 values across all models, with parameters on the left accounting for the largest τ CH 4 differences. As such, J O 1 D is the largest driver of OH differences in the CCMI SD model simulations, followed by local O 3 and NO x . The subsequent variables (H 2 O, CO, the NO : NO x ratio, and HCHO) cause moderate variations in tropospheric OH, while ISOP, CH 4 , J NO 2 , O 3 COL, and T are not responsible for intermodel spread in τ CH 4 . We note that T differences between the SD simulations are likely limited due the meteorological constraints imposed on the models. However, examination of the free-running simulations, discussed in the Supplement, also shows practically no impact of T on OH. Thus, we conclude that the effect of temperature on OH chemistry is likely indirect, acting through pathways embodied by other variables, such as H 2 O and species that exhibit strongly temperature-dependent reaction rates. Finally, the Mech. term, described in the discussion of Table 1, appears on the far right, indicating its origins as a remainder term from the budget analysis of individual model pairs. The magnitudes of τ CH 4 values attributed to chemical mechanism differences and asymmetric swaps between models are large enough to consistently rank the Mech. term third, between O 3 and NO x , in terms of importance for OH in this analysis. Especially in model simulations conducted with common emissions inventories (though inventories can be implemented very differently among models, as demonstrated by Young et al., 2013), we expect some of the disparity in a short-lived species like OH to emerge from differences in chemical mechanism implementations. In other words, when responses in OH to a given change in a source or sink term differ between two models, the remainder term (or term labelled "Mech." in Table 1) will increase, representing variations in the sensitivity of OH that presumably arise due to the two different implementations of the chemical mechanism. It is possible that other factors are represented by this term; e.g. other chemical species that influence OH chemistry but are not considered in the NN analysis could contribute to the Mech. term. However, previous analysis using a 0-D chemical box model as a "standard" mechanism in Nicely et al. (2017) suggested a correlation between actual biases in OH imparted by a given model's chemical mechanism and the remainder term resulting from the NN analysis. Therefore, we have some confidence that the Mech. term is mean- Variables along the x axis are ranked by averaged magnitude of the τ CH 4 values (i.e. inputs located farther left are responsible for larger differences in CH 4 lifetime), except for the "Mech.+Nonlin." term, which is shown last to indicate its role as a remainder term. Model name abbreviations are "CAM4" for CAM4-Chem, "EM47" for EMAC-L47MA, "EM90" for EMAC-L90MA, "GRep" for GEOS Replay, "GCHM" for GEOS-Chem, "GMI" for GMI, "MOC" for MOCAGE, "MRI" for MRI-ESM1r1, "OSLO" for OsloCTM, and "WACC" for WACCM.
ingful, though significant further study would be required to parse the actual mechanistic differences responsible for imparting bias in OH calculations.
Significant intermodel differences in the largest driver of τ CH 4 spread, J O 1 D, could arise from two possible sources. The amount of solar UV light penetrating down to the troposphere is largely dictated by the stratospheric column ozone amount. However, the differences in total ozone column are generally small and insufficient to cause the variations in J O 1 D seen among the CCMI models. Rather, J O 1 D likely varies to a great extent due to differences in cloud cover, and dissimilar treatments of clouds within model photolysis codes. Figure S7 highlights this effect by showing the ratio of J O 1 D at the surface to J O 1 D in the upper troposphere (UT) for each model. The relatively small column amounts of ozone within the troposphere should account for very little absorbed UV light, making it much more likely that deviations in this ratio from 1.0 are driven by scattering due to clouds and possibly aerosols. The fact that models show large spatial differences in this ratio is a strong indication that clouds underlie the model differences in J O 1 D.
While the model differences in J O 1 D, O 3 , NO x , and chemical mechanisms appear to drive the bulk of the τ CH 4 spread among this group of CCMI models, we emphasize that individual models may not adhere to these conclusions. As such, any efforts to improve a particular model should instead focus on the results specific to that model. For instance, HCHO plays a very small role in describing intermodel differences in OH on average, but for the OsloCTM model, HCHO is a much more important factor. Thus, we refrain from offering an across-the-board solution for remedying the large model spread in τ CH 4 and instead suggest a more individualized approach of studying plots such as those shown in Fig. 4 for more spatially and temporally resolved information. Visualizations of all model swaps, for all months and species, are available in our archived dataset described in the data availability section for this purpose.
There are several other qualifications to note when considering the results of the intermodel comparison. One is the negating effect between the J O 1 D and tropospheric O 3 variables. Many, but not all, model τ CH 4 values for J O 1 D in Fig. 5 are opposite in sign to the τ CH 4 values attributed to O 3 . Physically, photolysis of tropospheric ozone by light at wavelengths below 336 nm to form excited state O( 1 D) and subsequent reaction with H 2 O to form OH is a loss pathway for ozone. Therefore, more UV flux will tend to decrease tropospheric ozone concentrations while increasing OH, and vice versa. This physical mechanism, then, can explain the frequent cancellation of the τ CH 4 values attributed to these two factors. Should a modeller attempt to alter a model's OH field by forcing adjustments in its J O 1 D, the opposing impact of tropospheric O 3 may result in no change for the value of τ CH 4 . However, this does not preclude the finding that both J O 1 D and tropospheric O 3 are substantially different in the models for reasons we do not fully understand. Tropospheric ozone can also vary between models for reasons external to the radiative environment. For instance, differences in the stratosphere-troposphere exchange, wet and dry deposition, and lightning NO x emissions can each cause substantial variations in tropospheric ozone among models (Wild, 2007). Further parsing of the reasons for the ozone differences seen among the CCMI models is difficult without specialized out-put, including tracers such as ozone of stratospheric origin and NO x generated by lightning. We recommend a targeted study to address the underlying reasons for the variations in tropospheric ozone.
Another qualification concerns the issue of causation versus correlation. Machine learning techniques, and NNs in particular, are generally more adept at identifying the predictors of a certain phenomenon than traditional methods, such as multiple linear regression. However, it is still possible that an input that is tightly correlated with the output may be misidentified as a driver of variations in the output. This is particularly relevant to keep in mind for species that serve as sinks of OH, such as CO and methane. Whether a decline in OH initiates or results from an increase in its sinks is difficult to differentiate, even with advanced analysis methods. Therefore, descriptions of CO and methane as drivers of OH variations in this text may just as well be interpreted conversely, as downstream indicators of the change in oxidizing capacity.
A final qualification is this analysis constitutes a foundationally hypothetical experiment. It essentially addresses the following questions: what if we could switch the fields of just one chemical species between two global models? What would be the instantaneous impact on OH? On τ CH 4 ? This approach, then, necessarily neglects the roles of feedbacks in the atmospheric system (e.g. if the NO x field is perturbed, this will propagate to changes in ozone as well, with time). However, for the objective of teasing apart the influences on global OH abundance and τ CH 4 and explaining intermodel differences, a notoriously difficult task, we regard our approach as a valuable exercise.

Time series evaluation
The second half of our NN analysis interrogates temporal trends in OH and τ CH 4 . Figure 6 shows the evolution of τ CH 4 in the SD-type simulations conducted for 1980-2010. Two models, GEOS-Chem and OsloCTM, only provided output for the year 2000 and thus only appear as single points in Fig. 6. In addition, some models provided output beyond the year 2010; output from years through the end of 2015 was included when available. The lifetimes all show a general downward trend over time, consistent with the upward trend in global mean tropospheric OH concentration shown by Zhao et al. (2019;their Fig. 4). Results concerning attribution of the τ CH 4 time series are presented in Sect. 4.3.1, while derivation and analysis of trends are shown in Sect. 4.3.2.

Attribution of the τ CH 4 time series
Swaps of input variables to a NN are conducted on an intramodel basis, with the goal of determining which OH precursors and sinks are responsible for OH variations over time. The results of these swaps are shown for each model in Fig. 7. Changes in τ CH 4 attributable to each parameter are displayed  Fig. 7 is zero by design. As an input field from another year is swapped into the NN, however, OH differences manifest and are denoted by the corresponding change in τ CH 4 . Because we are relying on the same NNs used for the intermodel analysis, we emphasize that the methane fields used here are still normalized, separately for each year. As a result, the variations in τ CH 4 due to CH NORM 4 should not be interpreted as a measure of the methane feedback factor (Prather et al., 2001;Fiore et al., 2009;Holmes et al., 2013;Holmes, 2018). Instead of representing the change in OH with a change in absolute concentration of methane, the numbers shown here signify the change in OH with a change in how methane is distributed within the atmosphere, both vertically and spatially. Largely, one would expect this to remain constant over time, though results from this analysis of the CCMI simulations suggest there are some modest changes in τ CH 4 attributed to the distribution of tropospheric methane. Should a similar method be applied to analysis of temporal variations in OH in the future, we would encourage training the machine learning algorithm on data spanning all years such that use of methane absolute values would be possible.
While significant diversity in the drivers of OH variability across models is evident from Fig. 7, there are also several distinctive features that appear repeatedly. For instance, the response of τ CH 4 to changes in CO shows a prominent peak in the year 1998 in all models except one. To gauge the role of emissions in this response, we show in Figs. S8-S12 the time series of CO mixing ratios and other parameters averaged for the region most impactful to τ CH 4 : the tropical lower troposphere (latitudes between 30 • S and 30 • N, pressures greater than or equal to 700 hPa). Indeed, CO mixing ratios maximize in almost all models in the year 1998, likely " to designate the use of normalized CH 4 fields as inputs to the NNs, as described in Sects. 3.1 and 4.3. As a result, OH changes due to CH NORM 4 represent impacts of changes in how CH 4 is distributed within the troposphere, rather than how CH 4 concentrations are changing over time.
as a result of the emissions inventory reflecting the extreme biomass burning and strong El Niño-Southern Oscillation (ENSO) event during that and the preceding year (Duncan et al., 2003 and references therein). The increase in τ CH 4 can thus be explained by the increased CO sink of OH, causing a temporary depletion of the oxidant. In addition, less distinctive peaks in τ CH 4 due to CO are identified in other years with strong El Niño conditions, notably 1982-1983, 1987, and 1991-1992(Duncan et al., 2003. The impacts of several other variables on τ CH 4 also demonstrate behaviour with reasonably identifiable causes. A prolonged decrease in τ CH 4 due to J O 1 D from 1992 to 1998 is evident in the analysis of the CAM4-Chem, GEOS Replay, GMI, MRI-ESM1r1, and WACCM NNs. This may correspond to several confounding events that acted to increase the flux of UV light to the troposphere, increasing the primary production of OH and decreasing τ CH 4 , as seen in Fig. 7. First, solar activity reached a maximum around 1990, after which the decline in sunspots correlated strongly with a decline in tropical total ozone columns (Duncan and Logan, 2008). Second, the eruption of Mount Pinatubo in 1991 likely impacted J O 1 D through the decrease in stratospheric ozone that resulted (Tie and Brasseur, 1995;Aquila et al., 2013). Finally, the prolonged ENSO event of 1990(Allan and D'arrigo, 1999 may have caused reduction in cloud cover due to drought conditions (Duncan et al., 2003). Interestingly, the τ CH 4 response to H 2 O is moderately anticorrelated with CO. This is particularly evident for the year 1998 in many of the models, when large biomass burning events occurred in many regions of the world, such as the boreal forests of both Asia and North America, Central America and Mexico, and Indonesia, which were attributed in part to a strong El Niño in 1997 that transitioned in a strong La Niña in 1998. Although strong ENSO events cause drought conditions over some regions, it is more fundamentally associated with warming sea surface temperatures and increased evaporation, particularly in the tropical Pacific Ocean. Thus, it is reasonable that larger values of specific humidity will tend to increase OH primary production during an El Niño year, as suggested by the decrease in τ CH 4 shown in Fig. 7. An apparent increase in ozone also coincides with the 1998 ENSO event, determined by the decreasing component of τ CH 4 . The prevalence of biomass burning would indeed cause increases in tropospheric ozone through increased emissions of its precursors, CO, VOCs, and NO x . Additionally, the τ CH 4 response to O 3 shows the most distinguishable trend of all the variables over the full 1980-2015 period. Steady decreases in τ CH 4 due to O 3 imply an increasing tropospheric ozone burden, a modelling result supported by observations (Verstraeten et al., 2015).
We also note the appearance of spurious results in several cases. The τ CH 4 responses to CH NORM 4 in EMAC-L47MA and EMAC-L90MA as well as to O 3 COL in MOCAGE extend to very large negative values in the early part of the time series. To show the full extent of the EMAC τ CH 4 responses to CH NORM 4 , we show alternate versions of Fig. 7b and c with expanded y-axis ranges in Fig. S13. Chemical conditions during the 1980s would differ most markedly from the regimes simulated in the year 2000, on which the NNs are based. Particularly for concentrations of methane, which underwent monotonic rise aside from a stabilization period from 2000 to 2007 (Turner et al., 2019), conditions in 1980 could be quite different. However, as was noted in Sect. 3.1, methane inputs to the NNs are normalized against the maximum tropospheric value. The field of CH NORM 4 for each year is likewise normalized against the maximum methane for that year, so a strong response in τ CH 4 must indicate a significant change in the distribution of methane, not just in changes in its concentration over time. Indeed, Fig. S14 shows the normalized methane values used as input to the NNs for the pressure level closest to the surface. For each EMAC configuration (for the month in which the τ CH 4 response shown in Fig. 7 is largest and most unphysical), the methane distributions in the 1980s do show notable change from the year 2000 distribution used for training. Specifically, methane mixing ratios in the Southern Hemisphere drop relative to the larger concentrations in the Northern Hemisphere. Other models, such as WACCM, shown in the bottom panels of Fig. S14, show practically no interannual change in the methane distribution for a given month. This behaviour in the EMAC model likely results from implementation of a Newtonian relaxation scheme to determine a time-varying, latitude-dependent lower boundary condition for methane (Jöckel et al., 2016). Our spurious NN result may indeed be explained by a slowdown in the rate of increase in methane concentrations at the lower boundary initiated in 1980, evident in supplementary figure E1 of Jöckel et al. (2016). While this method of determining boundary conditions generally represents a more sophisticated treatment of methane, within the context of this analysis, it imparts an artificially strong signal in OH and τ CH 4 . Therefore, the unphysical results in Fig. 7b and c due to CH NORM 4 indicate an artefact due to the NN method, not a problem in the EMAC model itself.
For the other occurrence of anomalous behaviour, MOCAGE shows an unrealistically large response of τ CH 4 to O 3 COL in the 1980s (Fig. 7f), a result not corroborated by any other model. Figure S15 illustrates the likely cause of this behaviour. While most models exhibit modest changes in total O 3 COL between 1980 and 2000, including GEOS Replay shown in the top set of panels, the MOCAGE model (bottom panels) shows much larger column amounts in the year 1980. These values fall well outside the range of O 3 COL amounts on which the NN was trained, so unrealistic behaviour of the NN in this case is not surprising.
These examples of spurious results highlight an issue that must be treated with caution when using machine learning approaches. Because the application of our NN method to time series analysis is an extension beyond the originally intended purpose, not all NNs are sufficiently generalizable to reliably reproduce OH for years other than the training year (2000). To account for this, we evaluate each NN for all years by inputting variables from each year. With this test, all inputs are changed, not just a single input at a time. The resulting OH, as depicted in Figs. S16-S23 for select years, compares well to the native model's OH field for that year in many cases but not in all. Considerable bias occurs at low OH mixing ratios, though we note that near-zero concentrations will likely not affect the resulting globally integrated τ CH 4 unless values are grossly overestimated. This evaluation also represents a rigorous test of the NNs, as significant shifts in numerous inputs at once might push the NN algorithm into new phase space not encountered during training, much more so than only changing one input at a time, which is our approach in the subsequent time series analysis. Nonetheless, we limit the influence of poorly generalizable, or "overfit," NNs by only including in the multi-model mean results for the years in which a NN reproduces its native model's OH field with an r 2 value greater than or equal to 0.95. For four Figure 8. Same as Fig. 7 but the average across all eight models, except filtered to remove NN results for individual months and years during which NN performance is poor, as detailed in the text.
NNs (one per month) created for each of eight CCMI models, across 36 years, the potential application of the NNs to 1152 calculations (4 × 8 × 36) is reduced to 696 calculations using this test. Results from this point forward are subject to this quality check and were found to be insensitive to the r 2 threshold imposed. This insensitivity is demonstrated by alternate versions of the figures to come, placed in the Supplement, generated using all NNs rather than the quality-filtered NNs. Figure 8 shows the multi-model mean attribution of variations in τ CH 4 . Many of the same features identified in Fig. 7 also emerge here: clear definition of strong ENSO years in the CO response, apparent Mount Pinatubo effects in the J O 1 D response, and a general downward trend in τ CH 4 due to O 3 are all observed. Also, as might be expected from the intermodel comparison results discussed in the prior section, J O 1 D, O 3 , NO x , H 2 O, and CO account for many of the strongest OH variations over time (Fig. 7) as well as between models (Fig. 5). Figure S24 shows the analogue of Fig. 8, without the quality filter applied to the NNs described above. That is, all NN results from Fig. 7

Trends and interannual variability in the τ CH 4 time series
We also perform linear fits to each response time series in Fig. 8. The resulting trends in τ CH 4 are shown in Fig. 9a. The interannual variability of τ CH 4 is also calculated as the standard deviation of the detrended time series, shown in Fig. 9b, though it is relevant to note that CTMs have historically not captured the full interannual variability exhibited by observed OH proxies (Holmes et al., 2013). Figure S25 shows the equivalent of Fig. 9, without application of the NN quality filter described above. Negative trends in τ CH 4 due to O 3 , H 2 O, J O 1 D, and NO x stand out as largest in magnitude. The sum of all factors shown in Fig. 9a is −2.3±0.4 % decade −1 , which is comparable to the mean downward trend in τ CH 4 seen in Fig. 6, −1.8 % decade −1 . Time series of the model input variable fields show corresponding trends, with param- eters that serve as source terms of OH increasing over time (Figs. S9-S12). Tropospheric ozone and NO x show clear upward trends over time, while H 2 O and J O 1 D show upward trends with more variability, which is also conveyed by the error bars in Fig. 9a. It is interesting to note that H 2 O plays a stronger role in the overall temporal trend of τ CH 4 , as compared to its role in explaining intermodel differences. This is likely due to the fact that temperatures were constrained in the specified dynamics simulations, which in turn should determine the water vapour calculated within the models. The interannual variability attributed to CO in Fig. 9b is also consistent with the large year-to-year swings in tropical lower tropospheric CO mixing ratios shown in Fig. S8. While Fig. 9a suggests that CO exhibits very little overall trend between 1980 and 2015, we note there is a discernible increase in CO prior to ∼ 1998 in Fig. S8 followed by a steady decline thereafter. This is consistent with remote site measurements that show significant negative trends in CO since the late 1990s (Zeng et al., 2012). Finally, the attributed trends in τ CH 4 from the CCMI models (Fig. 9a) are compared in Fig. 10 to trends in tropospheric mean OH concentration ("[OH] TROP ") from a previous observation-based analysis (Nicely et al., 2018). In that work, TOMS/OMI/SBUV observations of total column ozone were used to infer radiative effects on the OH burden, while water vapour from the AIRS instrument, methane from surface observations, NO x from a global model simulation constrained to realistic emissions, and temperature from the MERRA-2 reanalysis were analysed to calculate chemical impacts on [OH] TROP . In Nicely et al. (2018), the trend in [OH] TROP due to NO x encompassed the effects of both the total abundance and the partitioning of NO x , while the O 3 COL factor encompassed all radiative effects on OH. Thus, to perform a "like-for-like" comparison, the τ CH 4 trends due to NO x and NO : NO x are combined, as are the trends due to O 3 COL and J O 1 D shown in Fig. 9a. Error bars shown in Fig. 10 represent the 1σ uncertainty in the slope of the linear fit and, in the case of combined trends, are found by summing in quadrature the individual uncertainties. Because τ CH 4 varies with the inverse of OH concentration, note that the x axis of Fig. 10 is inverted and a −1 : 1 line is shown in grey.
The trends in τ CH 4 from this analysis and in [OH] TROP from Nicely et al. (2018) are in reasonably good agreement for H 2 O, NO x , O 3 COL, and temperature. In particular, the two trends due to H 2 O agree within the uncertainties, with τ CH 4 decreasing by ∼ 0.5 % decade −1 and [OH] TROP increasing at almost the same rate. The impacts of NO x and O 3 COL are found to increase OH concentrations in both studies, though the impacts on τ CH 4 from the CCMI models are found to be larger in magnitude than the observational estimate. The small impact of temperature, tending to lessen the OH burden, is also in close agreement between the two studies, with the CCMI models again showing a slightly stronger re-sponse. The role of NO x in driving ∼ 0.3 % decade −1 decline in τ CH 4 is roughly consistent as well. Only the effect of ozone column falls relatively far from the −1 : 1 line, with analysis of the CCMI models suggesting a stronger decrease in τ CH 4 between 1980 and 2015, albeit with large uncertainties. This may result from inaccurate representations of stratospheric ozone in the CCMI models, mischaracterization of the impacts on UV photolysis in the troposphere, or a combination of both. Overall, the results depicted in Fig. 10 show relatively robust findings regarding the responses of [OH] TROP and τ CH 4 to the factors examined through these two independent studies.
Because the methane used as input for the CCMI NNs was normalized, as discussed above, the trend in τ CH 4 found in this analysis due to CH NORM 4 did not represent a methane feedback factor in the traditional sense. As such, it is not comparable to the trend in [OH] TROP due to methane found by Nicely et al. (2018) and so was not included in Fig. 10. However, even in the event that one were to retrain new NNs using absolute values of methane and sampling across all years to generate the training dataset, we would question the physical meaning of the resulting trends. With the current necessity of providing boundary conditions for surface methane rather than fluxes in models, our ability to realistically simulate methane is hampered. We encourage the further examination of the response of OH to methane on the global scale, which is likely a large influencer of tropospheric OH abundance, as indicated in Nicely et al. (2018) and Holmes et al. (2013).

Conclusions
We perform a neural network analysis of the monthly mean output from historical simulations of 10 models that participated in CCMI for the purposes of understanding OH and τ CH 4 differences and temporal trends. NNs are trained to reproduce OH mixing ratios for a given model using 3-D fields of 12 OH precursor and sink parameters. Performing swaps of the NN inputs between models produces a quantitative estimate of the difference in τ CH 4 that can be attributed to variations in the substituted variable. Among the 10 models that we examine, on average, variations in J O 1 D, local O 3 , NO x , and chemical mechanisms account for the largest differences in τ CH 4 . Model diversity in representations of H 2 O, CO, the partitioning of NO x , and HCHO is responsible for moderate OH differences, while isoprene, CH NORM 4 , J NO 2 , overhead ozone column, and temperature account for little to no variation in OH. However, the relative importance of a particular variable is highly model dependent, so any effort to improve the representation of OH within a given model should be guided by that particular model's results.
We also analyse time series of τ CH 4 using year 2000 NNs generated for the first half of the study. All models exhibit a downward trend in τ CH 4 between 1980 and 2015, varying from −0.54 to −2.97 % decade −1 (average of −1.83 % decade −1 ). Swaps of NN inputs are conducted between years rather than between models, so attributions of the factors influencing trends in τ CH 4 are found for each model and then combined into a multi-model mean result. This analysis indicates that the largest contributors to the decreasing trend in τ CH 4 are O 3 , J O 1 D, NO x , and H 2 O, while CO also imparts a large degree of interannual variability. Features due to strong ENSO events and associated biomass burning as well as the eruption of Mount Pinatubo are discernible in the time series of attributed variations in τ CH 4 . In particular, the species CO, H 2 O, and O 3 instigate prominent responses during strong El Niño years. Finally, the attributed trends in τ CH 4 from the NN analysis of CCMI model output are compared to trends in tropospheric mean OH concentration found previously in the observation-based study of Nicely et al. (2018). While the strong response of τ CH 4 to increasing H 2 O over time appears to be a robust result, disagreement on the methane feedback on OH between the two studies highlights limitations in the approaches of both, in addition to more systemic issues in the community's ability to model methane.
The NN and machine learning methods in general provide a valuable tool for performing insightful model intercomparisons of complex systems in a computationally efficient manner. These approaches, however, must be undertaken with care to avoid erroneous results and recognition of their limitations. At present, we have devised a method to identify the drivers of OH variations, whether between models or between years, at coarse temporal resolution. Much future work is needed, though; observations must be incorporated to introduce a ground truth element to this analysis in a manner that either adjusts for or avoids disconnects between coarse versus local/instantaneous spatiotemporal scales and appropriately accounts for measurement uncertainty; an analysis of model output with much higher temporal frequency is needed to identify exactly where model differences in chemical mechanisms lie; and subsequent studies of why the various OH precursor and sink fields differ are required to make this analysis of greatest utility for improving model representations of τ CH 4 . While these challenges are significant, they are not insurmountable, especially as machine learning and other advanced statistical analysis techniques continue to be developed and honed.
Data availability. All output from most of the models that participated in CCMI is available at the Centre for Environmental Data Analysis (CEDA), the Natural Environment Research Council's Data Repository for Atmospheric Science and Earth Observation, at http://data.ceda.ac.uk/badc/wcrp-ccmi/data/CCMI-1/output (CEDA Archive, 2019). WACCM and CAM4-Chem output for CCMI is available for download at http://www.earthsystemgrid. org (Climate Data Gateway at NCAR, 2019). For instructions for access to both archives, see http://blogs.reading.ac.uk/ccmi/ badc-data-access (IGAC/SPARC, 2019). Output from the models that were not formal participants in CCMI phase 1 is available from the co-authors who performed the model simulations; please contact the corresponding author with requests. A complete set of figures and tables generated by the model intercomparison and time series analyses is available at https://doi.org/10.13016/vvbp-p6o8 (Nicely et al., 2020).
Author contributions. JMN and RJS conducted initial design of the method. JMN carried out the analysis. Development and refinement of the analysis were further guided by BND, GMW, and TFH. All other authors provided model output central to the analysis. JMN drafted the manuscript, and all co-authors provided assistance in finalizing the figures and text.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Chemistry-Climate Modelling Initiative (CCMI) (ACP/AMT/ESSD/GMD inter-journal SI)". It is not associated with a conference.
Acknowledgements. Julie M. Nicely was supported by an appointment to the NASA Postdoctoral Program at the NASA Goddard Space Flight Center, administered by the Universities Space Research Association under contract with NASA. The authors also acknowledge the joint WCRP SPARC-IGAC Chemistry-Climate Model Initiative (CCMI) for organizing and making available the suite of model simulations used here. Special thanks is extended to the modelling groups which, at times, provided extra output that enabled this intercomparison to take place with the maximum number of participants. We also thank many colleagues who engaged in helpful discussions that shaped the direction of this work and inspired additional analyses, including, but not limited to, Vaishali Naik, Sarah Strode, Jason St. Clair, and Melanie Follette-Cook. This work was partly supported by JSPS KAKENHI grant no. JP19K12312. Laura E. Revell acknowledges China Southern for partial support. The EMAC simulations have been performed at the German Climate Computing Centre (DKRZ) through support from the Bundesministerium für Bildung und Forschung (BMBF). DKRZ and its scientific steering committee are gratefully acknowledged for providing the HPC and data-archiving resources for this consortial project ESCiMo (Earth System Chemistry integrated Modelling). Guang Zeng and Olaf Morgenstern acknowledge funding under the New Zealand Government's Strategic Science Investment Fund (SSIF).
Financial support. This research has been supported by an appointment to the NASA Postdoctoral Program at the NASA Goddard Space Flight Center, administered by the Universities Space Research Association under contract with NASA; the JSPS KAKENHI (grant no. JP19K12312); China Southern; the Bundesministerium für Bildung und Forschung (BMBF); and the New Zealand Government's Strategic Science Investment Fund (SSIF).
Review statement. This paper was edited by Paul Young and reviewed by Peer Johannes Nowack, Leif Denby, and one anonymous referee.