Constraints and biases in a tropospheric two-box model of OH

The hydroxyl radical (OH) is the main atmospheric oxidant and the primary sink of the greenhouse gas CH4. In two recent studies, constraints on the hydroxyl radical (OH) were derived using a tropospheric two-box model of methyl chloroform (MCF) and CH4. When OH variations as derived in this set-up were propagated to the CH4 budget, the constraints on OH from MCF still allowed for a wide range of CH4 emission scenarios. This is important, because global CH4 emissions are generally 5 considered best constrained by the global lifetime of CH4, which is determined mainly by OH. Here, we investigate how the use of a tropospheric two-box model in these studies can have affected derived constraints on OH, due to the simplifying assumptions inherent to a two-box model. First, instead of prescribing fixed model parameters for interhemispheric transport, chemical loss rates and loss to the stratosphere, we derive speciesand time-dependent quantities from a full 3D transport model simulation. We find significant deviations between the magnitude and time-dependence of the parameters we derive, 10 and the assumptions commonly reported and adopted in literature. Moreover, using output from the 3D model simulations, we investigated differences between the burden seen by the surface measurement network of the National Oceanic and Atmospheric Administration and the true tropospheric burden. Next, we accounted for these biases in a two-box model inversion of MCF and CH4, to investigate the impact of the biases on OH constraints. We find that the sensitivity of interannual OH anomalies to the biases is modest (1-2%), relative to the significant uncertain15 ties on derived OH (5-8%). However, in an inversion where we implemented all four bias corrections simultaneously, we did find a shift to a positive OH trend over the 1994-2015 period. Moreover, the magnitude of derived global mean OH and by extent that of global CH4 emissions are affected much more strongly by the bias corrections than their anomalies (∼10%). In this way, we identified and quantified direct limitations in the two-box model approach that can possibly be corrected for when a full 3D simulation is used to inform the two-box model. This derivation is, however, an extensive and species-dependent 20 exercise. Therefore, a good alternative would be to move the inversion problem of OH to a 3D model completely. It is crucial to account for the limitations of two-box models in future attempts to constrain the atmospheric oxidative capacity, especially because though MCF and CH4 behave similarly in large parts of our analysis, it is not obvious that this should be the case for alternative tracers that potentially constrain OH, other than MCF. 1 Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2018-798 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 16 August 2018 c © Author(s) 2018. CC BY 4.0 License.

resolvable.In future attempts to improve constraints on the atmospheric oxidative capacity through the use of simple models, a crucial first step is to consider and account for biases similar to those we have identified for the two-box model.

Introduction
For the interpretation of atmospheric observations in the context of, for example, atmospheric pollution, or in that of global warming, atmospheric models are often used.Atmospheric models vary in complexity from simple one box models to stateof-the-art 3D transport models.Different types of models are suitable for addressing different types of problems to different degrees of scrutiny.Therefore, there is no model category that fits all problems.Simple box models are easy to set up, computationally cheap, and transparent.For these and other reasons, their use in atmospheric studies is ubiquitous and has provided useful insights (e.g.Quay et al. (1999); Walker et al. (2000); Montzka et al. (2011); Schaefer et al. (2016); Schwietzke et al. (2016)).However, simple box models also put limitations on the derived results, as they are by definition less comprehensive than complex models.For example, box models do not explicitly contain much information on a species' spatial distribution, which can be important if interacting quantities (e.g.loss processes) are distributed non-homogeneously in space.Where exactly these limitations lie and what the gain is from increasing model complexity can be difficult to diagnose, and depends on the application.
A problem that has often been approached in box models is that of constraining the global atmospheric oxidizing capacity, which is largely determined by the tropospheric hydroxyl radical (OH) concentration (Montzka et al., 2000(Montzka et al., , 2011)).OH is dubbed the detergent of the atmosphere for its dominant role in the removal of a wide variety of pollutants, including urban pollutants (CO, NO x ), greenhouse gases (CH 4 , HFCs), and HCFCs, which are greenhouse gases, and also contribute to stratospheric ozone depletion.The budgets of many of these pollutants have been strongly perturbed since pre-industrial times, and it is important to understand what consequences this has had in the past, and could have in the future, for the atmosphere's oxidizing capacity.
Due to its high reactivity, OH has a lifetime of seconds, which inhibits extrapolation of direct measurements.Moreover, OH abundance is the net result of many different reactions and reaction cycles, and thus modelling it bottom-up in full-chemistry models is complex and dependent on uncertain emission inventories of the many gases involved.Therefore, the most robust observational constraints on OH on the larger scales are thought to be derived indirectly from its effect on tracers: gases that are predominantly removed by OH.Depending on how well the tracer emissions are known, the time evolution of the global mixing ratio of such a tracer can serve to constrain OH.The most well-established tracer for this purpose is methyl chloroform (MCF) (e.g.Montzka et al. (2000); Bousquet et al. (2005)).In part, this is because it was identified early on as a tracer with a well-defined production inventory that allowed emission estimates with small errors, relative to other gases (Lovelock (1977); Prinn et al. (1987)).Moreover, production of MCF was phased out in compliance with the Montreal Protocol, and the resulting rapid drop in emissions made loss against OH the dominant term in the MCF budget (Montzka et al., 2011).
Research and debate surrounding OH (Krol and Lelieveld (2003); Krol et al. (2003); Reimann et al. (2005); Prinn et al. (2005); Rigby et al. (2013); McNorton et al. (2016)) has lead to considerable improvements in its constraints: for example, a likely upper bound on global interannual variability of OH of a few percent (Montzka et al., 2011).Two recent studies derived OH variations in a tropospheric two-box model through an inversion of atmospheric MCF and CH 4 observations (Rigby et al. (2017); Turner et al. (2017)).In such an inversion, a range of parameters is optimized (most prominently emissions of MCF and CH 4 , and OH), so that the modelled mixing ratios best match atmospheric observations of the tracers involved.
Both studies found that constraints on OH in this set-up were weak enough that a wide range of OH concentration variations over time and, by extent, CH 4 emission scenarios were possible as an explanation for the post-2007 increase in its measured global mole fraction.This is an important conclusion, because the CH 4 growth rate, combined with the CH 4 lifetime (in turn dominated by MCF-derived OH), is generally assumed to provide the strongest top-down constraints on global CH 4 emissions and variations therein.We note that in Rigby et al. (2017) the two tropospheric boxes were supplemented by a single stratospheric box, making it technically a three-box model.However, due to our focus on the troposphere, we hereafter treat this type of model too as a two-box model, and where relevant we discuss the implication of the addition of a stratospheric box.
There are two important reasons to approach the problem of constraining OH in a model of exactly two tropospheric boxes.
Firstly, through the focus on annual timescales and hemispheric spatial scales, the result is only sensitive to interannual variability in large-scale transport of the modelled tracers.Moreover, by focusing on interannual variability as opposed to absolute OH or emission levels, remaining systematic offsets are not thought to significantly affect the outcome.
Secondly, a crucial part of the optimization consists of disentangling the influence of OH and that of emission variations on observed MCF mixing ratios.Ideally, MCF emission variations would be prior knowledge.However, though MCF production is well documented, the emission timing is much more uncertain (McCulloch and Midgley, 2001).MCF was mainly used as a solvent in, for example, paint and degreasers of metals.In these applications, MCF is released only when used, rather than when produced, which results in uncertainty in the emission timing.Moreover, due to the continuing decline of the atmospheric MCF mixing ratios, small, ongoing MCF emissions could eventually become important.Observation-inferred emissions exceeding bottom-up emission inventories have been identified both from the U.S. (Millet and Goldstein, 2004) and from Europe (Krol et al., 2003), as well as from other processes, such as MCF re-release from the ocean (Wennberg et al., 2004).Therefore, in the absence of other constraints, emission uncertainties would limit the use of MCF for deriving interannual variability of OH.However, in a two-box set-up, an additional constraint is provided by the IH mole fraction gradient of MCF.Emission inventories show that MCF emissions are predominantly located in the Northern Hemisphere (NH), whereas OH has a NH to SH ratio that is uncertain, but the ratio has a likely range of 0.80 to 1.10 (Montzka et al. (2000); Patra et al. (2014)).This means that emission variations have a strong effect on the IH mole fraction gradient of MCF, whereas the effect of large-scale OH variations is much weaker.Thus, the IH gradient is an important piece of information that can help to disentangle the influence of emissions from the influence of OH on MCF growth rate variations.This use of the IH gradient for constraining global emissions of anthropogenically emitted gases has also been recognized in previous research (Liang et al. (2017);Montzka et al. (2018)).
Despite the appealing degree of simplicity offered by the two-box model, its results still hinge on many simplifying assumptions, both explicit (e.g.interhemispheric transport) and implicit (e.g.intrahemispheric transport).In this context, the uncertain outcome of the two recent two-box model studies put forward an important question: how do the simplifying assumptions inherent to the two-box set-up affect the conclusions drawn from it?Or, conversely, would these conclusions change when moving the analysis to a 3D transport model?A recent study (Liang et al., 2017) partly explored these questions.The study investigated how to incorporate information from 3D transport models in a two-box model, to increase the robustness of two-box model derived constraints on OH.They found that there are key parameters in the two-box model that can be tuned to better represent the 3D simulation results, and thus ideally better represent atmospheric transport in general.For example, they found that IH transport rates can be species-dependent.
Here, we provide a different approach to the issue.In the first part of our study, we parametrized results from the 3D global transport and chemistry model TM5 into a two-box model.Through this parametrization, we explored difficulties in the translation from the 'reality' of a 3D transport model to a two-box model, and the assumptions made in the process.We focused on four aspects of the parametrization.
Firstly, we investigated the tracer-dependent nature of IH transport as reported by Liang et al. (2017).Secondly, we analysed the IH OH ratio.Previous research has shown that because of tracer-specific source-sink distributions, different tracers can be exposed to different global mean OH concentrations (Lawrence et al., 2001).We extended this observation to a speciesdependent IH OH ratio.Thirdly, we looked at the stratospheric loss for MCF specifically.This net loss to the stratosphere might be slowing after its emissions dropped (Krol and Lelieveld (2003); Bousquet et al. (2005)).Fourthly, we used the 3D simulation to investigate differences between the burden seen by the surface measurement network of the National Oceanic and Atmospheric Administration Global Monitoring Division (NOAA-GMD) and the true tropospheric and hemispheric burden in our 3D model: a bias that was also discussed in Liang et al. (2017).
In the second part of this study, we assessed the impact of these four potential biases on derived OH variations in a two-box inversion set-up that is very similar to Rigby et al. (2017) and Turner et al. (2017).The objective was to provide a quantitative estimate of the impact of biases in a two-box inversion, and to explore if and how these can be accounted for.Though this study is focused on the problem of OH, it also serves as a case study of potential pitfalls in two-box models in general, when applied to interpreting global-scale atmospheric observations.

Two-box inversion
In this section, we discuss the set-up of our two-box model inversion.The model incorporated two tracers (MCF and CH 4 ) and consisted of two boxes (the troposphere in the NH and in the SH), which were delineated by a fixed equator.The stratosphere was implicitly included in the model through a first-order loss process that was taken to be equal for both hemispheres.The governing equations for a tracer mixing ratio X are given in Equation 1.
Thus, within each hemisphere, there were emissions (E), loss to OH (k OH [OH]X), loss to the stratosphere (l strat X), loss to other processes (l other X; e.g.ocean deposition), and transport between the hemispheres (k IH (X N H − X SH )). that stratospheric loss becomes a transport rather than a first-order loss term.Where relevant, we point out further differences with these previous studies.
Since the objective was to leverage observed mixing ratios to infer information on tropospheric OH, we also set up an inverse estimation framework, complementary to the above forward model.The objective of the inversion was to optimize a state x, such that the forward model best reproduced the observations, without straying too far from a first best guess: the prior.Therefore, the state is the vector which contains all parameters that needed to be optimized.The optimization objective is analogous to minimizing the cost function J, as defined in Equation 2: with B and R the prior and observation error covariance matrix respectively, H the forward model, and y the observations.In addition, we compute the cost function gradient ∇J (Equation 3).
with H T the transpose of the forward model, also known as the adjoint model.Note that because the forward model H was non-linear (e.g.OH chemistry), we used the adjoint of the tangent-linear forward model.Calculation of the cost function gradient facilitates quicker convergence of the optimization.For the minimization we used the Broyden-Fletcher-Goldfarb-Shanno algorithm.In essence, this statistical inversion set-up is the same as that used in the 4DVAR system of ECMWF (Fisher, 1995) and TM5-4DVAR (Meirink et al., 2008).
For the optimization of MCF emissions, we used an extended version of the emission model from McCulloch and Midgley (2001).This emission model was adopted to account for the varying and uncertain release rates of MCF when used in different applications (e.g.degreasing agent, paint).This uncertainty results in a gap between the uncertainty in production, or integrated emissions (∼2%), and the uncertainty in annual emissions (up to 40%) (McCulloch and Midgley, 2001).Therefore, production was distributed between four different categories with different release rates: rapid, medium, slow and stockpile.In the prior distribution, the bulk of production (> 95%) was placed in the rapid category.To account for uncertainty in the production inventory, we also adopted an additional emission term superimposed on the production-derived emissions.The emissions in year i were then given by Equations 4 and 5.For each year i, we optimized four parameters for MCF emissions: three parameters that shifted emissions between the rapid production category and each of the other three categories (f i M edium , f i Slow and f i Stock in Equation 6), and the additional emissions term (E i Additional ), which had an uncertainty constant through time.This emission model is similar to that used in (Rigby et al., 2017), though ours leaves more freedom with respect to the timing of emissions.
for the emissions in year i, where: and, in the optimization: An important choice in the inversion set-up is which parameters to prescribe and which to optimize.Rigby et al. (2017) optimized all parameters, so as to explore the full uncertainty of the optimization within the inversion framework.Turner et al.
(2017) only optimized hemispheric MCF and CH 4 emissions and hemispheric OH, while the remaining uncertainties were partly explored in sensitivity tests.We choose to optimize four end-products for each year: global OH, global MCF emissions, global CH 4 emissions, and the CH 4 emission fraction in the NH.Thus we had a closed system, as we also fitted to four observations: the global mean mixing ratio and the IH gradient of both MCF and CH 4 .In addition to the 4DVAR inversion, we generated a Monte Carlo ensemble, where in each realization the prior and the observations were perturbed, relative to their respective uncertainties.Then, the new prior was optimized using the new observations.The Monte Carlo simulation quantified the sensitivity of the optimization to the prior choice and to the realization of the observations.The Monte Carlo set-up also allowed us to explore the sensitivity of the inversion to parameters that were not optimized, such as the fraction of MCF emissions in the NH.This approach had the added advantage that parameters that were perturbed in the Monte Carlo simulation, but not optimized in the 4DVAR system, did not need to have a Gaussian error distribution, which would normally be a prerequisite in a 4DVAR inversion.The specifics of our inversion set-up are given in Table 1.
2.2 TM5 set-up and two-box parametrizations

3D model set-up
For the 3D model simulations we used the atmospheric transport model TM5 (Krol et al., 2005).The model was operated at a 6 • × 4 • horizontal resolution, at 25 vertical hybrid sigma pressure levels.The simulation period was 1988-2015, where we treated 1988 and 1989 as spin-up years.TM5 transport was driven by meteorological fields from the ECMWF ERA-Interim reanalysis (Dee et al., 2011).Convection of tracer mass was based on the entrainment and detrainment rates from the ERA-Interim dataset.This is an update from the previous convective parametrization used by, for example, Patra et al. (2011).The new convective scheme results in in faster interhemispheric exchange of tracer mass, more in line with observations (Tsuruta et al., 2016).
We ran TM5 with three tracers: CH 4 , MCF and SF 6 .For CH 4 , we annually repeated the 2009-2010 a priori emission fields used by Pandey et al. (2016), and we also used the same fields for stratospheric loss to Cl and O( 1 D).For MCF, we used emissions from the TransCom-CH 4 project (Patra et al., 2011).Since these emissions were available only up to 2006, we assumed a globally uniform exponential decay of 20%/year afterwards, similar to Montzka et al. (2011).MCF-specific loss fields (ocean deposition and stratospheric photolysis) were also taken from the TransCom-CH 4 project.Details of the MCF loss and emission fields can be found in the TransCom-CH 4 protocol.The OH loss fields we used were a combination of the 3D fields from Spivakovsky et al. (2000) in the troposphere and stratospheric OH as derived using the 2D MPIC chemistry model (Brühl and Crutzen, 1993).The OH fields were scaled by a factor 0.92, as described by Huijnen et al. (2010).For SF 6 , we used emission fields from the TransCom Age of Air project (Krol et al., 2017), with no loss process implemented.
Since the above set-up is simplistic in some aspects (e.g. annually repeating CH 4 emissions), we also ran a 'nudged' simulation.In the nudged simulation, we scaled the mixing ratios of a tracer up or down in latitudinal bands, depending on the mismatch of the model with NOAA observations (analogous to Bândȃ et al. (2015)), with a relaxation time of 30 days.This method ensured that the model followed the long-term trend in observations, without requiring a full inversion.The nudged simulation provided a test of the sensitivity of our results to the source-sink distributions we used in the 3D simulation.

Parametrizing 3D model output to two-box model input
Here we outline how we used the TM5 simulations to derive two-box model parametrizations for stratospheric loss (l strat ) and for interhemispheric exchange (k IH ).Firstly, the 3D fields were divided in 3 boxes: the troposphere in the NH and in the SH, and the stratosphere.The border between the hemispheres was taken as the equator, fixed in time.Where relevant we discuss the sensitivity of our results to this demarcation.We defined a dynamical tropopause as the lowest altitude where the vertical temperature (T ) gradient is smaller than 2 K/km, clipped at a geopotential height of 9 and 18 km.Our analysis was Table 1.The relevant settings we used in the inversion of our two-box model.The upper section contains the parameters optimized in the inversion, which were also perturbed in the Monte-Carlo ensemble.These parameters have Gaussian uncertainties, and their mean and 1σ uncertainty are given.The middle section contains parameters that were perturbed in the Monte Carlo, but not optimized.The middle parameters have uniform uncertainties, of which the lower and upper bound are given.The bottom section contains parameters that were neither optimized nor perturbed.For these parameters, the left column gives the standard setting, whereas the alternative column indicates whether we also ran an inversion using a TM5-derived timeseries (see Section 2.2.2).
found to be insensitive to the exact definition of the tropopause.Next, we computed an annual budget for each box.For the two tropospheric boxes, this was done as in Equation 1.This was supplemented by Equation 7for the stratospheric box.
Emissions, local loss and mixing ratios per box could be derived from the 3D model in-and output, and thus l strat and k IH could be inferred from these equations.Note that we did not strictly need the stratospheric budget equation to resolve two parameters, but we used it to resolve numerical inaccuracies.Resolving the budget of each species in this manner provided the necessary input of the tropospheric two-box model defined in Section 2.1, such that on the hemispheric and annual scale, identical results were obtained with the 3D and the two-box models.
An additional parameter that we derived from the TM5 simulations was the IH OH ratio to which each tracer was exposed.
We quantified this parameter as the ratio between hemispheric lifetimes with respect to OH (τ OH ), as in Equation 8.Note that this might differ from the physical IH OH ratio, because of correlations between the tracer distribution, the OH field and the temperature distribution.

Model-sampled observations
The standard in tracking global trends in atmospheric trace gases are surface measurement networks: for CH 4 and MCF most notably the NOAA-GMD (Dlugokencky et al. (2009); Montzka et al. (2011)) and the AGAGE (Advanced Global Atmospheric Gases Experiment) (Prinn et al., 2018) networks.By selecting measurement sites far removed from sources, the theory is that a small number of sites already puts strong constraints on the global growth rate (Dlugokencky et al., 1994).In general, quantification of the robustness of the derived growth rates based solely on observations can be difficult, since there are likely systematic biases inherent to sampling a small number of surface sites.When assimilated into a 3D transport model, these biases will largely be resolved (if transport is correctly simulated).However, when the data is aggregated to two hemispheric averages, as in a two-box model, quantification of the potential biases is crucial.
We explored the resulting bias in our model framework.By subsampling the TM5 output at the locations of NOAA stations, at NOAA measurement instances, we generated a set of model-sampled observations.These model-sampled observations were intended to be as representative as possible for the real-world observations of the NOAA network.To aggregate the station data to hemispheric averages, we used methods similar to those deployed by NOAA (for MCF: Montzka et al. (2011), with further details on our adaption in Supplement 1; for CH 4 : Dlugokencky et al. (1994)).Hemispheric averages for CH 4 were derived from 27 sites, and MCF averages from 12 sites.By comparison of the resulting products with the calculated tropospheric burden, as derived from the full tropospheric mixing ratios, we could assess how well the burden derived from the NOAA network represents the model-simulated tropospheric burden.The two end-products we investigated for each tracer were the rate of change of the global mean mixing ratio and that of the IH gradient.Note that by mixing ratio we mean the dry air mole fraction.These two parameters best reflect the information as it is used in a two-box model: the global mean mixing ratio is used to constrain the combined effect of OH and emissions, while the IH gradient is used to distinguish between the two.Note that in previous box-model studies of MCF, often only global growth rates were derived (Montzka et al., 2000(Montzka et al., , 2011)).

Potential biases in the two-box model
By concentrating on the budget of MCF, we identified three parameters that need attention in the two-box model: IH transport, the IH OH ratio, and loss of MCF to the stratosphere.In addition, we investigated the potential bias in converting station data to hemispheric averages (see Section 2.2.3 and Supplement 1).We quantified these biases and propagated them in two-box model inversions, as discussed in Section 3.2, to quantify their impact on derived quantities related to OH.

Interhemispheric transport
IH transport of tracer mass can vary because of variations in IH transport of air mass (e.g.influenced by ENSO, particularly at Earth's surface (Prinn et al. (1992); Francey and Frederiksen (2016); Pandey et al. ( 2017))), or because of variations in the source-sink distribution, and thus of the tracer's concentration distribution itself.Generally, interannual variability in IH transport is considered to be in the order of 10% (Patra et al., 2011).Two-box model studies typically assume time-invariant IH exchange (Turner et al., 2017) and/or similar exchange rates for different tracers (Rigby et al., 2017).Here we investigated whether such assumptions hold for a tracer which undergoes strong source-sink redistributions over time, such as MCF.The IH transport variations we derived for each tracer are discussed in Section 3.1.1.

Surface sampling bias
As discussed in Section 2.2.3, we explored the bias that results from representing hemispheric averages using sparse surface observations.Surface networks are a valuable resource, because they provide high-quality, long-term measurements of a growing variety of tracers.However, temporal, horizontal, and vertical coverage of surface networks is limited.In Section 3.1.2we discuss how these limitations can result in biases in two-box model observations.

The interhemispheric OH ratio
The IH ratio of OH concentrations is an uncertain parameter.This is mostly because of a mismatch between results from full-chemistry models (1.13−1.42(Naik et al., 2013)) and from MCF-derived constraints (0.80−1.10 Brenninkmeijer et al.  2017)), and is similar to the ratio we used in the TM5 simulations (0.98 (Spivakovsky et al., 2000)).However, the bias we consider here is of a different nature: it is the difference between the physical IH OH ratio and the IH loss ratio a particular tracer is exposed to.It is known that different tracers can be exposed to different oxidative capacities (Lawrence et al., 2001).Therefore, different tracers might similarly be influenced by different IH ratios in OH.We explore this bias in Section 3.1.3.CH4, 27 sites were used; for MCF, 12 sites were used).Solid lines denote averages as derived directly from the NOAA surface sampling network (which are used in our standard inversion).Dashed lines denote the same timeseries, but adjusted by correction factors that were derived from our TM5 simulations.The correction factors reflect the differences between hemispheric averages based on model-sampled observations, and hemispheric averages derived from the full TM5 troposphere.Figure 3 shows the ratios between the standard and corrected timeseries.

MCF loss to the stratosphere
The second-most important loss process of MCF is stratospheric photolysis.In our TM5 set-up, this loss process resulted in an in-stratosphere lifetime (stratospheric burden/stratospheric loss) of 4 to 5 years.It is generally assumed that this instratosphere loss translates to a global lifetime of MCF with respect to the stratosphere (global burden/stratospheric loss) of 40 to 50 years (Naik et al. (2000); Ko et al. (2013)), which corresponds to ∼10% of global MCF loss.Rigby et al. (2017) assumed a time-invariant in-stratosphere lifetime, but due to the inclusion of a stratospheric box the global lifetime with respect to stratospheric loss could vary somewhat due to changes in the troposphere-stratosphere gradient.These variations were tuned to result in a global lifetime with respect to stratospheric loss of 40 (29−63) years.Turner et al. (2017) incorporated this loss process in the OH loss term.Due to the rapid drop in MCF emissions and the relatively slow nature of troposphere-stratosphere exchange, this lifetime could vary through time (Montzka et al. (2000); Krol and Lelieveld (2003); Bousquet et al. (2005)).We will investigate this possibility in Section 3.1.4.

Standard two-box inversion and bias correction
To assess the impact of the biases discussed in Section 2.3 on a two-box model inversion, we ran our inversion (see Section 2.1) using different settings.In the standard, default inversion we did not consider any of the four biases discussed above.Thus, we used constant IH exchange (1 year), constant stratospheric loss of MCF (45 years), and a constant IH OH ratio (0.98) (see Table 1).The first three potential bias corrections were then straightforwardly implemented by replacing these constant values with the timeseries we derived for each parameter from the full 3D simulations (details in Section 2.2.2).As mentioned in Section 2.1, the inversion did not include uncertainties in these three parameters.We did this because conventional uncertainties tend to be large, and therefore, including them would have attenuated the impact of the bias corrections, while the corrections  were the main interest of this comparison.For the surface sampling bias, we first computed a correction between the hemispheric means as derived from the model-sampled observations and the calculated (TM5) hemispheric, tropospheric means (with demarcation at the equator).Then, we applied this correction to the real-world NOAA hemispheric means we used in the standard inversion.This gave a new set of observations, which we used in the inversion (discussed in Section 3.1.2).Both the standard and the corrected set of observations are shown in Figure 1.Through comparison of the results of the standard inversion and of an inversion with one or more biases implemented simultaneously, we can evaluate the individual and cumulative impact of the biases on derived OH and CH 4 emissions.

Interhemispheric transport
The IH exchange coefficients, derived for the three different tracers as described in Section 2.2.2, are shown in Figure 2.
Clearly, the exchange rates differ between tracers both in mean value, as well as in interannual variability.MCF is the clear outlier, but SF 6 and CH 4 also show different variations.The drivers of these differences are differences in intrahemispheric tracer distributions, and in the underlying source and sink distributions.The three tracers differ strongly in this respect: SF 6 and MCF are emitted almost exclusively in the NH mid-latitudes, whereas CH 4 has significant emissions in the tropics and in the SH.SF 6 has no sink implemented in our simulations, whereas MCF and CH 4 have a sink with a distinct tropical maximum in OH.This all affects how IH transport of air mass translates to IH transport of tracer mass.
Most notable is the minimum in the IH exchange rate for MCF in the 2000-2005 period.The timing of the 1989-2003 decline in k IH coincides with the initial drop in MCF emissions.An important shift in the distribution of the MCF mixing ratio is that the global minimum shifts from the South Pole to the tropics.In the same period, there is a strong vertical redistribution which has also likely impacted IH exchange.It is not obvious that these changes should result in slower IH exchange, but in the end, in TM5, they do.
Another notable feature is the positive trend in the IH exchange rate for CH 4 (+0.35 ± 0.05 %/yr; p=0.00) and for SF 6 (+0.50± 0.01 %/yr; p=0.00).For CH 4 , we used annually repeating sources, whereas for SF 6 we did include emission variations (see Section 2.2).This means that for CH 4 , changes in the source-sink distribution did not contribute to the trend or to the variability.Indeed, in a simulation with annually repeating meteorology, we found near-zero variability in k IH for CH 4 (see Supplement 4).Therefore, there is something in the combination of the meteorological data, the treatment of this data in TM5 and the source-sink distribution of both CH 4 and SF 6 which resulted in a significantly positive trend in the IH exchange rate of both gases.This trend could either indicate an acceleration of IH transport of air mass, or a shift in the pattern of IH transport which favours IH exchange of CH 4 and SF 6 .It is unclear from this analysis what the underlying mechanism is exactly, except that it is driven by temporal variations in transport and thus that there are parameters in the meteorological fields which also show a trend: otherwise this final product cannot exhibit a trend.However, it might be that the sensitivity of TM5 transport to these parameters is biased.
To test the sensitivity of the derived IH exchange rates to the source-sink distribution, we compared k IH derived from the standard simulation to the nudged simulation (the nudging procedure is explained in Supplement 2).IH transport of CH 4 as derived from the nudged simulation showed higher interannual variations than in the standard simulation (more discussion in Supplement 4), which can be expected, as the source-sink distribution becomes more variable.However, the general characteristics were conserved: most notably, the positive trend over the entire period persisted, for CH 4 and for SF 6 .For MCF, we find that the general characteristics of derived k IH are similarly insensitive to nudging, with the main change being a deeper 2000-2005 minimum in the nudged simulation.In the end, we deem the anomalies presented in Figure 2 quite robust with respect to the spatio-temporal source-sink distribution.
When the hemispheric interface is shifted from the equator to 8 • N, which is more representative of the average position of the ITCZ, the IH exchange rate increases for all tracers, but the variability in IH exchange of CH 4 and SF 6 remains largely unaffected (see Supplement 4).However, for MCF, the variability shifts completely.Rather than decreasing after the emission drop, the IH exchange rate now increases.This sensitivity reflects that for a tracer with a relatively small IH gradient which minimizes in the tropics, it becomes difficult to define an IH transport rate in a two-box model.By extension, care should be taken when interpreting the IH gradient of MCF in later years, since the influence of IH transport is difficult to isolate.
Sensitivities in the derivation of the IH exchange rate are discussed in more detail in Supplement 4.

Surface sampling bias
Figures 1 and 3 show the surface network bias in the global mean mixing ratios and in the IH gradient.In Figure 3, the bias is quantified as the ratio between values derived from the model-sampled observations (see Section 2.2.3) and values derived from the hemispheric (TM5) tropospheres.A comparison with global mean mixing ratios derived from real-world NOAA observations is given in Supplement 2.  2. Mean observational errors as derived from TM5 simulations over the 1994-2015 period.The errors were quantified as the mean difference between annual means derived from model-sampled observations and annual means derived from the full tropospheric grid.CH4 uncertainties are given both in ppb/yr and relative to the global mean mixing ratio.Uncertainties for MCF are only given relative to the global mean, because of its strong temporal decline.
The bias in the IH gradient was particularly large, because averages based on NOAA surface stations systematically overestimated the tropospheric burden in the NH and underestimated the burden in the SH.Two important effects contributed to this bias.Firstly, in the NH, where most emissions were located, mixing ratios tended to decrease with altitude, while in the SH vertical gradients were much smaller, or even reversed.Secondly, latitudinal gradients of both MCF and CH 4 tended to be highest in the tropics, where few or no measurement sites were available.Again, due to high emissions in the NH, mixing 5 ratios in the NH decreased towards the equator, while mixing ratios increased towards the equator in the SH.Both biases were of opposite sign in each hemisphere.Thus, in a global average, these biases largely cancelled, and only a small overestimate remained (left panel in Figure 3).For the IH gradient, however, these biases added up, which resulted in an overestimate of the IH gradient by surface stations of up to 20-40% (right panel in Figure 3).For MCF before 1995 and for CH 4 throughout the analysis period, the bias from the vertical gradient dominated.The shift in the bias for MCF was driven by a shift in the latitudinal gradient.The IH gradient of MCF got a minimum in the tropics and apparently this exacerbated the effect of the lack of tropical stations, combined with the simple, linear latitudinal interpolation we adopted for MCF (see Supplement 1).
We note that the derived bias in the IH gradient is sensitive to the demarcation of the two tropospheric boxes.When we shifted the IH interface from the equator to 8 • N, the bias was reduced to 15% for CH 4 , and varied between 15 and 25% for MCF.The trend in the IH bias of MCF became smaller, but persisted.Liang et al. (2017) performed a similar analysis for MCF.They reported a similar low to absent bias in the global mean and a more significant bias in the IH gradient of MCF (∼ 10%).This is smaller than the bias we found, even if we demarcated the hemisphere at 8 • N.However, an important difference is that in Liang et al. (2017) model-sampled observations were compared to the surface grid, instead of to the full troposphere.Thus, their bias estimate did not include vertical effects.When we used the surface grid as a reference, the IH bias for CH 4 was reduced to −10%: i.e. it reversed.For MCF the bias shift persisted, and the maximum bias was only slightly reduced to 15%, indicating a dominant influence from the latitudinal dimension.We emphasize that for a tropospheric two-box model the comparison with the full troposphere is most relevant.This analysis also provided an estimate of uncertainties in the rate of change of the global mixing ratio and in that of the IH gradient: the relevant observational parameters in a two-box inversion.Table 2 gives the differences between the quantities derived from model-sampled observations and from the full troposphere, i.e. the "true" (TM5) error.We can compare this TM5derived uncertainty to uncertainties derived only from observations, which we used in the two-box inversions.For CH 4 , we used uncertainties as reported by NOAA.These were obtained by generating an ensemble of surface network realizations, where in each realization different sites are excluded or double-counted randomly (bootstrapping).For each realization, aggregated quantities such as the global mean growth rate can be derived.The spread within the ensemble then provides a measure for the uncertainty.For MCF no such uncertainties are reported.Therefore, we developed our own method, which is described in Supplement 1.
Following these methods, we found observation-derived uncertainties in the global mean growth rate of around 0.60 ppb/yr and 0.6%/yr for CH 4 and for MCF respectively.NOAA does not report an uncertainty in the IH gradient of CH 4 , but error propagation from hemispheric means gave an uncertainty of 1.1 ppb/yr.For MCF, we found a time-dependent uncertainty in the rate of change of the IH gradient of 1.0 − 1.5%.
The CH 4 errors we derived from the TM5 simulation were slightly higher than the uncertainties reported by NOAA.Furthermore, since we used annually repeating CH 4 emissions, variations in CH 4 emissions can further increase the error.Indeed, the nudged run (see Supplement 2) resulted in 20% higher uncertainties.However, it is important to note that the CH 4 uncertainties reported by NOAA are intended to reflect the match with the marine boundary layer (MBL), rather than with the full troposphere.Therefore, it is not surprising that the errors we find are somewhat higher.
For MCF, we adopted observation-derived uncertainties that were significantly lower than those used by Rigby et al. (2017) and Turner et al. (2017): both studies reported uncertainties of around 5% in hemispheric averages.Both studies used different methods, that were grounded on different observational information.In Rigby et al. (2017), temporal variability dominated the uncertainty estimate, while in Turner et al. (2017) spatial variations were used.Our method is more similar to Rigby et al. (2017), but with modifications that averaged out some of the temporal variability, under the assumption that variability at different measurement sites was largely uncorrelated (details in Supplement 1).This shows that observation-derived uncertainties  8).Additionally, the IH ratio in OH concentrations is shown.
in MCF averages are uncertain quantities, in large part due to the relatively low number of available surface sites.Therefore, the uncertainty derived from TM5 is an especially useful addition for MCF.
Table 2 shows that TM5-derived uncertainties in MCF averages are significantly lower than all observation-derived estimates.
This results indicates that even the use of a simple averaging algorithm and a small number of surface sites, relative to what is available for CH 4 , already results in well-constrained hemispheric and global growth rates for MCF.The TM5-derived estimate thus supports the use of our observation-derived uncertainty estimates, rather than the higher estimates used in previous studies.

Interhemispheric OH ratio
In the TM5 simulations from which the global loss rates were derived, the prescribed tropospheric OH fields were taken from Spivakovsky et al. (2000).In these fields, the IH OH ratio is 0.98, when the IH interface is considered to be the equator.One might expect a similar ratio between OH loss in the NH and in the SH, which we quantified through the IH ratio in tracer lifetime with respect to OH loss (Equation 8).We found that this is not the case (see Figure 4).
The loss ratio was up to 7% higher than the physical OH ratio.Moreover, the ratio was not the same for MCF and CH 4 , and the ratio that corresponded to MCF showed a trend.The IH asymmetry in temperature in our model was small, so that it didn't explain the different between the IH loss and the IH OH ratio.Instead, we found that the systematic positive offset was largely driven by an IH asymmetry in the spatio-temporal correlations between OH and temperature.Mostly, this was because the OH maximum in the NH was located at lower altitude than in the SH in our 3D model.Since at low altitudes temperatures are higher, and higher temperatures correspond to higher reaction rates, this asymmetry resulted in relatively high NH loss rates.
As such, the ratio bias was sensitive to the OH distribution used in the 3D model simulation.
The trend in the ratio for MCF was driven by the change in the spatial distribution of MCF after the emission drop in the mid-90s.Before the drop, the IH gradient of MCF was emission-driven and high (25%).This resulted in a negative correlation between OH/temperature and MCF in the NH, which drove the initially lower loss ratio.After the emission drop, the IH Figure 5.The tropospheric loss rate to the stratosphere, as derived from the TM5 simulations (see Section 2.2.2).
gradient became largely sink-dominated, and dropped to 3%.The ratio then became similar, though not identical, to that of CH 4 , which also has a relatively low IH gradient (5%).The exact reasons for the IH asymmetry in the OH loss rate were complex: further details are discussed in Supplement 3.
The derived IH OH ratio was sensitive to the demarcation of the two tropospheric boxes.When we shifted the position from the equator to 8 • N, all IH OH ratios were reduced by 10 to 15%.However, the offset between the physical IH OH ratio and the actual loss ratio remained similar, as did the trend in the loss ratio for MCF.

Loss to the stratosphere
Figure 5 shows the stratospheric loss rate, as derived from Equations 1 and 7. Most notably, the stratospheric loss rate showed a significant negative trend for MCF, decreasing by 68% from 1991 to 1997.The MCF lifetime with respect to stratospheric loss in 1990 as calculated from TM5 was similar to the range reported in literature: 40 to 50 years (Naik et al. (2000); Ko et al. (2013)).Afterwards however, the corresponding timescale for stratospheric loss quickly increases.As loss to the stratosphere is a secondary loss process, it is generally assumed that variability in MCF loss is driven predominantly by OH variations (Montzka et al. (2011);Turner et al. (2017); Rigby et al. (2017)).Here, we found that this is not necessarily the case.The decline in loss to the stratosphere was not an artefact resulting from treating a transport process as a loss process: when taking the exchange proportional to the troposphere-stratosphere gradient, we still found a decrease in the exchange rate of 63%.
Previous research has identified that the tropospheric lifetime with respect to stratospheric loss could be decreasing (Krol and Lelieveld (2003); Prinn et al. (2005); Bousquet et al. (2005)), but not to the degree that we found here, and not relative to the troposphere-stratosphere gradient.This is important, because it means that also a three-box model with an explicit stratospheric box, such as in Rigby et al. (2017), would not capture the decline.
The explanation we suggest for the increase in MCF lifetime with respect to stratospheric loss has to do with the nature of troposphere-stratosphere exchange, which consists of an upward and a downward flux.In practice, as MCF emissions decreased, the troposphere started to transport air to the stratosphere which was exposed to lower MCF emissions, while the stratosphere was still transporting older air back to the troposphere (in the downward branch of the Brewer-Dobson circulation (Butchart, 2014)) that was exposed to higher MCF emissions.Therefore, the delay between the two opposed fluxes resulted in a reduced net upward flux rate in an atmosphere with decreasing emissions compared to an atmosphere with increasing or constant emissions.Consistent with this hypothesis, we found that the stratospheric loss rate did not decrease in a TM5 simulation with MCF emissions fixed at 1988 levels, and that stratospheric loss did decrease, but recovered, when we fixed emissions at 2005 levels over the entire analysis period (results not shown).This also implies that the troposphere-stratosphere exchange will slowly recover when MCF emissions stop decreasing.
For CH 4 , we found a stratospheric lifetime of 160-170 years, similar to the range reported in Chipperfield and Liang (2013).
For SF 6 , there was no loss process implemented in our model.However, storage of SF 6 in the stratosphere acted as an effective sink to the troposphere, with a lifetime of 100-160 years.

Two-box inversion results
In this section, we present a comparison between the results of the standard inversion and an inversion that incorporated the four bias corrections (referred to as "four-biases").The inversion set-ups are described in Section 2.4.The OH and CH 4 emission anomalies of both inversions are presented in Figure 6, along with uncertainty envelopes of one standard deviation.
The envelopes are wide, and with respect to these envelopes there were no significant differences between our two inversions.
Interestingly, differences between the two inversions were the smallest in the 1998-2007 period, during which MCF provides the strongest constraints on OH (Montzka et al., 2011).Note that the final analysis period started from 1994 (rather than from 1990), because we only had sufficient NOAA coverage of MCF available from 1994 onwards.
Shown in grey in Figure 6 are the anomalies derived by Rigby et al. (2017) (from the NOAA dataset) and by Turner et al. (2017).The four inversions showed qualitatively similar time-dependencies, and differences generally fell within one standard  et al. (2017) it was shown that the use of a different dataset can result in different OH anomalies, though these differences were insignificant with respect to their uncertainty envelopes.Also visible is the uncertainty envelope of one standard deviation from Rigby et al. (2017), which is notably larger than our envelopes.This is likely due to a combination of the higher observational uncertainties and the higher number of optimized parameters adopted in Rigby et al. (2017).Further discussion of differences with these two studies is provided in Section 4.
It is illustrative to further investigate how the identified biases impact the results.For this purpose, Table 3 presents five metrics for each of the two inversions, as well as for inversions where we implemented the bias corrections one-by-one (taking standard settings for the other parameters).
The first metric is the mean absolute error (MAE) in the OH anomalies between each respective inversion and the standard inversion.The MAE provides an estimate of how much the OH estimate in a given year is affected by accounting for the bias.The highest MAE of 1.3% is small compared to the full envelope of each individual OH inversion (3 − 4%).This means that in terms of interannual variability over the entire period, the outcome was not much affected by the biases.However, as most biases showed their strongest trends over short periods, the peak values of the differences between inversions even out somewhat when averaging over the entire period.
Secondly, we derived an OH trend for each inversion set-up.As described in Section 2.1, we mapped the uncertainty of each inversion set-up in a Monte Carlo ensemble of inversions.We fitted a linear trend to the derived OH timeseries of each ensemble member.From the resulting collection of linear fit coefficients, we derived a mean linear fit coefficient and its standard deviation.Differences between the OH trends derived from the different inversions are insignificant.However, it is interesting to see that when all four biases are combined, we derived a shift to more positive OH trends.In the standard inversion, 43% of the ensemble shows a positive trend, whereas in the four-bias inversion 88 % of the ensemble shows a positive trend.
The final three metrics are the tropospheric lifetime of MCF with respect to OH ((k M CF +OH [OH]) −1 , as in Equation 1), the total tropospheric lifetime of CH 4 ((k CH4+OH [OH]+l other ) −1 , as in Equation 1) and the derived global mean CH 4 emissions, averaged over the 1994−2015 period.For global CH 4 emissions, we added the soil sink (32 [26−42] Tg/yr (Kirschke et al., 2013)), which was not included in the two-box model set-up.Naturally, these three are strongly correlated.When we compare the relative differences in for example the lifetime of MCF with respect to OH between different inversion set-ups to the MAE in anomalies, it is clear that the systematic offset between the different inversions (up to 10%) was much higher than the differences in anomalies (up to 1.3%).This is similar to what was seen for the biases themselves, where the systematic component tended to be much higher than the temporal variations (e.g. the bias in the IH OH ratio, shown in Figure 4).We discuss this offset in more detail in Section 4.

Discussion
A first point that deserves discussion is the low global CH 4 emissions  we derived compared to those reported in literature.Our best estimate corresponds to 539 Tg/yr (  (Bousquet et al., 2005).In their study, CH 4 emissions of 525 ± 8 Tg/yr were found over the 1984-2003 period, so that their estimate does not include the renewed CH 4 growth.
We found that several factors contribute to the differences.Firstly, in the model used by Turner et al. (2017) the atmospheric mass was taken as the global atmospheric mass (5.15•10 18 kg), whereas we used the tropospheric mass (4.4•10 18 kg).When we ran our two-box inversion with the global atmospheric mass, we also found emissions close to 600 Tg/yr.Secondly, we could close the gap with Rigby et al. ( 2017) by adjusting our a priori two-box model parameters.Specifically, when we adopted an IH exchange rate and an IH OH ratio similar to theirs (1.4 yr −1 and 1.07 respectively) in our standard inversion, we found global CH 4 emissions of 595 Tg/yr.This points to a strong sensitivity of the derived CH 4 emissions to these parameters of the two-box model, which in our case are derived from full 3D TM5 model simulations.
In our standard 3D simulation, the IH gradient of MCF tended to be overestimated compared to observations from the NOAA network up to 2005, while global mean mixing ratios were captured much better.In another recent study, an effort was made to find tracer alternatives to MCF (Liang et al., 2017).For this, their suggested method was to use 3D model output to improve the results of a two-box model through intelligent parametrizations.Clearly, this is similar to the work described here.For example, similar to us, they found different IH exchange time scales for different tracers.However, we explicitly resolved the two-box model in the 3D framework, while their study focused mostly on fitting parameters empirically to find a match between two-box and 3D model results.Additionally, for the parametrization, Liang et al. ( 2017) used hemispheric mean mixing ratios derived from the surface network, whereas we based mixing ratios on the full (hemispheric) troposphere in TM5.We identified a trend and strong, persistent variations in IH transport (CH 4 and MCF) and in the surface sampling bias (MCF) which were not identified in Liang et al. (2017).Additionally, they described a two-box strategy in which two tracers are used to derive the IH OH ratio, which can then also be used for other tracers.
Our work suggests that there should be careful consideration of different IH OH ratios seen by different tracers, and potential trends therein.A two-box inversion is sensitive to the IH OH ratio, and we have shown that the effective IH OH ratio a tracer is exposed to depends strongly on that tracer's source-sink distribution.Some of the differences between their findings and ours may be explained by the definition of hemispheric mean mixing ratio (surface-based versus full troposphere), but further reconciliation of the two approaches in future research is necessary.
It is worth noting that the TM5 model, on which the two-box parametrization is based, has its own limitations, and so has treating TM5 as 'the truth'.For example, our simulations were done on the coarse horizontal resolution of 6 • × 4 • .This will have impacted how well NOAA background sites are actually situated in the background.We checked that the TM5-derived observational timeseries were not systematically more polluted than the real-world NOAA-GMD observations.For this, we detrended and deseasonalized the CH 4 and MCF timeseries per surface site, and quantified the spread in the residuals.At most sites, we found no offset between residual spread in the TM5-derived versus the real-world timeseries.At a small number of sites, TM5-derived timeseries showed more spread in residuals, while at others the spread was less.Therefore, we found no evidence for systematic biases in TM5-sampled observations.Additionally, any transport model is susceptible to some form of transport errors, and using a different 3D model for the two-box parametrization will likely result in different parameters.
Therefore, we are careful in suggesting quantitative interpretation of our results.Certain aspects of the biases, such as a slowdown of MCF loss to the stratosphere and the strong variations in IH transport of MCF, are likely to also be found in other 3D transport models, as they are a direct consequence of the MCF emissions drop.Other aspects, such as the exact interannual variations of IH transport of CH 4 , or the 7% offset between the physical OH ratio and the effective OH ratio, should be interpreted with more care, as these more strongly depend on the input emission and loss fields, and on the exact treatment of transport in the 3D model.Additional sensitivity tests done in multiple transport models can help in identifying sensitivities of the derived bias corrections.However, our analysis does show a potential for these biases to arise and TM5 is a good starting point for exploring them, as TM5 has provided a strong basis for a wide variety of studies in the past (e.g.Alexe et al. (2015); Laan-Luijkx et al. (2015); Bândȃ et al. (2016)).

Summary and Conclusions
In this study, we investigated variations in the global atmospheric oxidizing capacity, in conjunction with variations in the global CH 4 budget.We specifically revisited the use of two-box models to infer information about these quantities using global observations of MCF and CH 4 .
We identified four two-box model parameters that can benefit from 3D model-derived information.Two of these are known and obvious (IH transport; surface sampling bias), while the other two are less so (stratospheric loss; IH OH ratio).Two-box model parameters for these processes that were quantified from full 3D model output showed strong temporal trends mainly for MCF, which have not been identified in any previous research.In general, the biases resulted from a combination of variations in transport and in the spatio-temporal source-sink distributions of each tracer.
We tested the impact of each of the biases in a two-box model inversion.As expected, we found that absolute OH and thus absolute CH 4 emissions show large deviations between the different inversions (∼10%).Given that large parts of these deviations were constant through time, they do not necessarily impact conclusions of past two-box modelling studies that focused on interannual variations.
Compared to the absolute differences, we found only small differences in OH anomalies (up to 1. 3%, averaged over 1994-2015) relative to the full uncertainty envelope found here (3-4%) or in Rigby et al. (2017) (8%).This indicates that significant uncertainties in parameters unrelated to the identified biases remain.As such, we confirm in large part the conclusions drawn by Rigby et al. (2017) and Turner et al. (2017) regarding the underdetermined state of the problem.In the end, we did find that the conclusions one can draw from each individual inversion could be strongly affected by the bias corrections: in the standard inversion only 43% of our Monte-Carlo ensemble showed a positive trend in OH over the 1994-2015 period, compared to 88% in the four-bias inversion.
The identified two-box model biases contribute to the already significant uncertainty in derived OH, and properly accounting for them can be a piece in the puzzle of improving constraints on OH.Moving forward, a likely next step is to incorporate more tracers in an effort to further tighten constraints on OH.In such a scenario, the tracer-dependent nature of the biases will likely increase the bias impact, and a proper 3D model analysis for each tracer becomes even more important.Already, efforts have been made to do so (Liang et al., 2017), and in this study we provide further suggestions for such an approach.
A distinct advantage in this approach is that information from multiple 3D transport models can be used to tune the two-box inversion, making the inversion outcome less reliant on transport parametrizations of any single 3D transport model.
Additionally, computational efficiency of simple models allows for complex statistical inversion frameworks, incorporating, for example, hierarchical uncertainties (Rigby et al., 2017).
On the other hand, the biases are often dependent on the sources and sinks used in the 3D model simulation.As such, a feedback loop between the two-box inversion and the 3D transport models might be necessary to correctly derive bias corrections, which makes such an analysis cumbersome.Additionally, a bias such as that in IH exchange of MCF might be difficult to resolve at all, because IH exchange of MCF is ill-defined in a two-box model (see Section 3.1.1and Supplement 4).
Therefore, we deem it important that a multi-tracer inversion in a full 3D model should also be performed, similar to the 3D inversion of MCF performed by Bousquet et al. (2005), but extended to more recent years.As an added advantage, a 3D model inversion would increase the pool of potential tracers that can be implemented to constrain OH.For example, the short-lived tracer 14 CO has been identified as a potential tracer to constrain OH (Brenninkmeijer et al. (1992); Quay et al. (2000); Krol et al. (2008)), but would not be implementable in a two-box model.
Competing interests.No competing interests are present.

Figure 1 .
Figure 1.Hemispheric, annual mean timeseries of CH4 (left) and MCF (right), as derived from the NOAA surface sampling network (for

Figure 2 .
Figure2.The IH exchange rate for MCF, CH4 and SF6, as derived from a TM5 simulation (see Section 2.2.2).

Figure 4 .
Figure4.The ratio between tracer lifetime with respect to OH loss in the SH troposphere and NH troposphere (see Equation8).Additionally,

Figure 6 .
Figure 6.The results of two inversions of the two-box model: tropospheric OH anomalies (left) and CH4 emission anomalies (right).In the standard inversion, we kept IH transport, NH/SH OH ratio and stratospheric loss of MCF constant, and we used NOAA observations.In the second inversion, we implemented all four bias corrections instead (as described in Section 2.4).Both the mean anomalies and the 1-standard deviation envelopes are shown, where anomalies were taken relative to the time-averaged mean in each respective ensemble member.Plotted in grey are the anomalies as derived by Rigby et al. (2017) (from the NOAA dataset) and by Turner et al. (2017) (from a combined NOAA+AGAGE dataset), adjusted so that they, too, average to zero.The 1-standard deviation envelope from the Rigby et al. (2017) estimate is hatched in grey.
Figure 3.The surface sampling bias in the global mixing ratio (left) and in the IH gradient (right) of MCF and of CH4.The bias was quantified as the ratio between values derived from the NOAA surface sampling network and values derived from the full (TM5) troposphere.The biases were derived from 27 and 12 sites for CH4 and for MCF, respectively.Figure 1 visualizes the impact of correcting for the sampling bias in real-world NOAA observations.

Table 3 .
Turner et al. (2017)scribe the outcome of the two-box inversions.The two-box inversions listed are the standard set-up, four inversions with one bias implemented, and one inversion with all biases implemented.From left to right: 1) Mean absolute error (MAE) in OH anomalies between the standard inversion and each respective inversion.2) Trend in OH over the 1994-2015 period.3)Meanlifetime of MCF with respect to OH (tropospheric burden MCF/loss to OH). 4) Mean total tropospheric lifetime of CH4 (tropospheric burden CH4/total loss CH4).5)Mean annual CH4 emissions (with soil sink).deviation,andalways within two standard deviations.Differences withTurner et al. (2017)are largest, most notably after 2010, which can be expected since they use a combined AGAGE+NOAA dataset, whereas we only use NOAA data.In Rigby

Table 3 )
Rigby et al. (2017))icantly lower than the 580-600 Tg/yr estimates reported by the two-box inversions ofTurner et al. (2017)andRigby et al. (2017).Our estimate is also on the low Turner et al. (2017)two-box model, an inversion would tend to reduce MCF emissions to efficiently bring down the IH MCF gradient.To subsequently close the global MCF budget, OH will be reduced, resulting in lower global CH 4 emissions in the two-box model inversion.The lower difference in observational uncertainties is likely an important reason for this: their solution corresponds to a statistical inversion framework where less weight is given to the observations.In the end, conclusions from our study and those drawn byRigby et al. (2017)andTurner et al. (2017)remain qualitatively similar.The post-2007 renewed growth of CH 4 need not be caused by a sudden increase in emissions in 2007.Rather, emissions could have increased more gradually over the 1994-2007 period, while CH 4 growth was suppressed temporarily by elevated OH levels.The lack of sensitivity of the inversion to the bias corrections and the large remaining uncertainty envelope in the final inversion both indicate that there are other parameters that result in significant uncertainties.Examples are the emission fraction in the NH, observational uncertainties and uncertainty in emission timing of MCF.Thus, while a first step can be made through the incorporation of 3D model information, we confirm the conclusion drawn inRigby et al. (2017)andTurner et al. (2017)that the current state of the problem is still strongly underdetermined.