Inverse modelling of carbonyl sulfide: implementation, evaluation and implications for the global budget

Carbonyl sulfide (COS) has the potential to be used as a climate diagnostic due to its close coupling to the biospheric uptake of CO2 and its role in the formation of stratospheric aerosol. The current understanding of the COS budget, however, lacks COS sources, which have previously been allocated to the tropical ocean. This paper presents a first attempt of global inverse modelling of COS within the 4-Dimensional variational data-assimilation system of the TM5 chemistry transport model (TM55 4DVAR) and a comparison of the results with independent COS observations. We focus on the global COS budget, including COS production from its precursors carbon disulfide (CS2) and dimethyl sulfide (DMS). To this end, we implemented COS uptake by soil and vegetation from an updated biosphere model (SiB4), and new inventories for anthropogenic and biomass burning emissions. The model framework is capable of closing the COS budget by optimizing for missing emissions using NOAA observations in the period 2000–2012. The addition of 432 Gg S a−1 COS is required to obtain a good fit with NOAA 10 observations. This missing source shows little year-to-year variations, but considerable seasonal variations. We found that the missing sources are likely located in the tropical regions, and an overestimated biospheric sink in the tropics cannot be ruled out. Moreover, high latitudes in the Northern Hemisphere require extra COS uptake or reduced emissions. HIPPO aircraft observations, NOAA airborne profiles from an ongoing monitoring program, and several satellite data sources are used to evaluate the optimized model results. This evaluation indicates that COS in the free troposphere remains underestimated after 15 optimization. Assimilation of HIPPO observations slightly improves this model bias, which implies that additional observations are urgently required to constrain sources and sinks of COS. We finally find that the biosphere flux dependency on surface COS mole fraction may substantially lower the fluxes of the SiB4 biosphere model over strong uptake regions. In planned further studies we will implement this biosphere dependency, and additionally assimilate satellite data with the aim to better separate the role of the oceans and the biosphere in the global COS budget. 20 1 https://doi.org/10.5194/acp-2020-603 Preprint. Discussion started: 9 July 2020 c © Author(s) 2020. CC BY 4.0 License.


Introduction
Carbonyl sulfide (COS or OCS) is a low abundant trace gas in the atmosphere with a lifetime about 2 years and a tropospheric mole fraction of about 484 pmol mol −1 (Montzka et al., 2007). COS is regarded as a promising diagnostic tool for constraining photosynthetic gross primary production (GPP) of CO 2 through similarities in their stomatal control (Montzka et al., 2007;Campbell et al., 2017;Berry et al., 2013;Whelan et al., 2018;Kooijmans et al., 2017Kooijmans et al., , 2019Wang et al., 2016). COS also con-25 tributes to stratospheric sulfur aerosols, which have a cooling effect on climate and hence mitigate climate warming (Crutzen, 1976;Andreae and Crutzen, 1997;Brühl et al., 2012;Kremser et al., 2016). In the recent decades, COS mole fractions in the troposphere have remained relatively constant, which implies that sources and sinks of COS are balanced. Whelan et al. (2018) reviewed the state of current understanding of the global COS budget and the applications of COS to ecosystem studies of the carbon cycle. The most pressing challenge currently is to close the global COS budget. 30 Previous studies show that substantial emissions of COS are coming from oceanic, anthropogenic, and biomass burning sources, and the largest sinks are uptake by plants and soils (Watts, 2000;Kettle et al., 2002;Berry et al., 2013). Oceanic emissions are thought to be the largest source of COS, both directly and indirectly, due to emissions of CS 2 and DMS (Lennartz et al., 2017(Lennartz et al., , 2019, which can be quickly oxidized to COS in the atmosphere (Sze and Ko, 1980). There are considerable uncertainties related to this indirect COS source, with reported yields to COS being (83 ± 8)% from CS 2 (Stickel et al., 1993), and 35 (0.7 ± 0.2)% from DMS under low NOx conditions at 298 K (Barnes et al., 1996). Blake et al. (2004) reported anthropogenic Asian emissions for COS and CS 2 , which appear to have been underestimated by 30-100% due to underestimated coal burning in China (Du et al., 2016). Zumkehr et al. (2018) recently presented a new global anthropogenic emission inventory for COS.
The new anthropogenic emission estimates are, with 406 Gg S a −1 in 2012, substantially larger than the previous estimate of 180.5 Gg S a −1 by (Berry et al., 2013). Another recent study (Stinecipher et al., 2019) concluded that it is unlikely that biomass 40 burning accounts for the balance between sources and sinks of COS, due to the relatively small contribution of biomass burning to the total emissions ((60±37) Gg S a −1 ). Suntharalingam et al. (2008) made a first attempt to simulate the global COS budget using the GEOS-Chem model and global-scale surface measurements from NOAA. In order to fit the observed seasonal cycle of COS mole fraction, they had to double the terrestrial vegetation uptake, reduce the southern extra-tropical ocean source, and assume an additional COS source 45 of 235 Gg S a −1 . Berry et al. (2013) implemented COS in the global biosphere model SiB3. They inferred that, in order to compensate for updated COS biosphere and soil sinks of 1093 Gg S a −1 , there must be additional COS sources of 600 Gg S a −1 , which was allocated to the ocean. Glatthor et al. (2015) and Launois et al. (2015) estimated direct COS emissions from the ocean as 992 and 813 Gg S a −1 , respectively, and also Kuai et al. (2015) hinted to underestimated COS sources from tropical oceans by optimizing sources using one month of COS satellite observations by TES-Aura. However, Lennartz et al. 50 (2017) used COS measurements in ocean water to show that the direct oceanic emissions were much lower (130 Gg S a −1 ) than top-down studies suggested. It is therefore not resolved whether ocean emissions account for the missing source.
In this paper, we address several important open questions concerning the COS budget using inverse modelling techniques, employing the TM5-4DVAR modelling system. We focus on the closure of the COS budget, the contributions of the potential The retrievals of TES, MIPAS and ACE-FTS v3.6 are provided on 14, 60, and 150 vertical levels in the atmosphere, respectively. We map our modelled COS profiles to these levels using a mass conserving interpolation scheme.

Seasonal decomposition
In Section 3.1 we apply a simple seasonal decomposition method to our calculated exchange fluxes. The seasonal decomposition is performed using Python module StatsModels version 0.10. The time series are decomposed into trend, seasonality and 125 noise: with y(t) the monthly exchange fluxes, and y t , y s , and y r the trend, seasonal, and residual components, respectively.

130
We have implemented the anthropogenic emissions based on a recent global gridded emission inventory of COS (Zumkehr et al., 2018). Since we aim to model COS, CS 2 and DMS as separated tracers, we disentangled the reported COS emissions into COS and CS 2 contributions. Here, we applied an assumed yield of 0.87 (Zumkehr et al., 2018), which means that 1 mole CS 2 yields 0.87 mole COS. As a precursor of COS, CS 2 reacts with OH to produce COS, and has an atmospheric lifetime of about 12 days (Khalil and Rasmussen, 1984 et al., 1993) and 81% (Chin and Davis, 1993). We decided to use a yield of 83% in our modelling, while we used the reported yield of 87% to produce the numbers listed in Table 1. This implies that we introduce slightly less COS in the atmosphere compared to using the Zumkehr et al. (2018) data as direct COS emissions. Note that we apply all categorical emissions or fluxes with a monthly time resolution.

Biomass burning emissions
We estimated biomass burning emissions based on the widely used GFED V.4.1 data set (Randerson et al., 2018) for six of the seven emissions categories listed in Table 2. In converting dry mass burned to COS emissions, we used the updated emission factors reported in Andreae (2019). For biofuel use, we base our estimates on the Community Emission Data System (CEDS) (Hoesly et al., 2018). We calculated COS emissions by first converting CO emissions to dry mass burned, which were converted 150 to COS emissions in a second step. Emission factors are listed in Table 2. In this process we made a distinction between biofuel with and without dung. Dung burning is mainly employed in South Asia (Fernandes et al., 2007)  calculates the uptake by vegetation based on (Berry et al., 2013). Currently, soil uptake is scaled to the CO 2 soil respiration term, and the implementation of specific COS soil models (Sun et al., 2015;Ogée et al., 2016) (Kuai et al., 2015)) and (Berry et al., 2013). The spatial and seasonal distribution of the biosphere uptake is shown in Figure S3. The uptake shows a large seasonal cycle in the NH and large uptake over tropical forests. The biosphere fluxes were deployed on a monthly timescale.

Ocean emissions 170
Climatological ocean emissions of COS and the COS precursors CS 2 and DMS are based on Suntharalingam et al. (2008) and Kettle et al. (2002). Large quantities of COS, DMS, and CS 2 are emitted from open oceans. The estimated DMS emissions are about 22 Tg S a −1 , and we note that even if the COS yield from oxidation of DMS is as small as 0.7 % (Barnes et al., 1996), already 156 Gg S a −1 COS is formed. The CS 2 direct emission from oceans is roughly 195 Gg S a −1 , yielding 81 Gg S a −1 of COS. When the ocean water is cold enough, oceans can turn into a sink of COS instead of a source (Lennartz et al., 2017).

175
Supplementary Figure S4 shows the spatial distribution of the January and July direct and indirect ocean emissions of COS.
Note that our estimates of all COS oceanic emissions as 277 Gg S a −1 are substantially smaller than the estimates of 813 Gg S a −1 by Launois et al. (2015).

TM5-4DVAR inverse modelling system
We have implemented three tracers (COS, CS 2 , and DMS) in the inverse modelling framework TM5-4DVAR (Krol et al., 2005, 180 2008; Meirink et al., 2008). In brief, the TM5 model is used to convert fluxes, collected in state vector x, to observations y: where H represents the global chemistry transport model TM5. Since the relation between fluxes and observations is currently linear, y = H(x) can be written as y = Hx. In a flux inversion a cost function is minimized. The cost function has the form: where x b represents the prior state of the fluxes, and B and R are the error covariance matrices of the fluxes and observations, respectively. B contains the errors assigned to the fluxes, as well as their correlations in space and time (i.e. B is a non-diagonal matrix). R contains the errors assigned to (y − Hx). These errors are assumed to be uncorrelated and they include, next to the observational errors, also errors related to the process of mapping coarse-scale fluxes x to localized observations y. The

190
adjoint of the TM5 model Meirink et al., 2008) is used to calculate the gradient of this cost function with respect to all elements in the state vector: In all inversions, y is represented by COS observations from the NOAA flask network data (Montzka et al., 2007). Our flux space, however, in addition to COS emissions, may represent CS 2 and DMS emissions from anthropogenic activity and oceans.

195
To map their influence on simulated COS observations y, we need to consider chemical conversions of CS 2 and DMS to COS.
CS 2 and DMS are short-lived trace gases, with atmospheric lifetimes of approximately 12 days (Khalil and Rasmussen, 1984) and 1.2 days (Khan et al., 2016;Boucher et al., 2003;Breider et al., 2010), respectively. For CS 2 we implemented OH-initiated conversion to COS, while for DMS we simply apply exponential decay with a lifetime of 1.2 days. COS itself is also destroyed by OH in the troposphere and by photolysis in the stratosphere. For OH, we use monthly varying climatological OH fields 200 (Spivakovsky et al., 2000), and applied a correction factor of 0.92 (Naus et al., 2019). In summary, the chemistry that is implemented therefore consists of the following four reactions: https://doi.org/10.5194/acp-2020-603 Preprint. Discussion started: 9 July 2020 c Author(s) 2020. CC BY 4.0 License.
where j 1 is the stratospheric photolysis frequency, and r 1 and r 2 are the rate constants of COS and CS 2 OH-oxidation, respec-210 tively. The fractions f 1 and f 2 represent the molar yields of COS from CS 2 (taken as 0.83 (Stickel et al., 1993)) and DMS (taken as 0.007 (Barnes et al., 1996)). The rate r 1 is calculated according to the Arrhenius equation: where T is temperature in K , and A is 1.13× 10 −12 cm 3 s −1 (Cheng and Lee, 1986). The rate r 2 is 2.0 × 10 −12 cm 3 s −1 .

Model-data mismatch errors
The diagonal elements of the error covariance matrix R in Eq. 5 contain contributions from observational errors, representation errors and errors related to applying large fluxes in the planetary boundary layer (Bergamaschi et al., 2010):

225
where σ t is the total error, σ o the observational error, σ r the representation error, and σ f an error related to applying large surface fluxes. The assumed observational error is shown in Figure 3. It is worth to note that observational errors are usually overwhelmed by the representation and flux errors. The representation error is calculated by sampling the modelled gradients in the vicinity of the sampled station (Bergamaschi et al., 2010). Finally, the flux error in each cell is linked to the magnitude of the monthly surface flux f (kg m −2 s −1 in each cell) applied in the model as: Here, f represent the sum of all COS prior flux components. In this sum, the biosphere flux is dominant over regions with strong biosphere uptake. Further, g is gravitational acceleration (9.8 m s −2 ), M air is molar mass of dry air (28.9 kg kmol −1 ), M S is molar mass of sulfur (32.1 kg kmol −1 ), ∆p (Pa) is the thickness of the first model layer, and ∆t is the time (s) over which the COS flux accumulates (we use 1 hour). Note that σ f is unit-less and is multiplied by 1 × 10 12 to obtain units of pmol Based on the total error, we define a χ 2 metric to quantify how well the observations are reproduced by the model (e.g. at a particular station).
where N is the number of individual observations. We can calculate this metric before optimization (prior) and after opti-240 mization (posterior). χ 2 is used to diagnose whether inversions are over-fitting or under-fitting the information contained in the measurement network. A value χ 2 ≈ 1.0 indicates that the inverse system is able to fit the data within the error setting (Hooghiemstra et al., 2011). A large posterior χ 2 indicates that the state does not have enough degrees of freedom to fit the observations properly (or the error settings are too small). A small posterior χ 2 indicates over-fitting of the observations (or too wide error settings).

Model settings
In this study, the TM5-4DVAR system is employed on a global resolution of 6 • × 4 • (longitude × latitude). Flux fields are coarsened from a resolution of 1 • × 1 • . To create a reasonable start field for the inversions, we initially performed an 11-year forward simulation starting with zero initial mole fractions and baseline surface fluxes augmented by 432 Gg S a −1 , distributed uniformly to close the global budget. After 11 years, sources and sinks are roughly in balance, with atmospheric mole fractions 250 of about 500 pmol mol −1 . Note that fluxes are used as zero-order terms, while the COS removal by OH and photolysis are first order removal terms that grow as the atmospheric COS increases.
We will present the results of four inversions. Firstly, we optimized the missing emissions, which amount to 432 Gg S a −1 .
This inversion will be denoted by S u throughout the paper. The aim of this inversion is to investigate the spatial structure and temporal variability of the missing COS emissions. This is the first time that a formal 4DVAR approach is applied to 255 the COS budget. To this end, we start from an emission field of 432 Gg S a −1 that is uniformly distributed globally. We optimize emissions on a monthly timescale, and assign a grid-scale prior error of 100%, which is an arbitrary number to give fluxes enough freedom to adjust. In a three-year inversion, the total number of state vector elements amounts to 97200 (36 months×45 latitudinal bins ×60 longitudinal bins). The total number of NOAA observations is much smaller, thus rendering the inversion under-determined. We therefore also use inversion S u to explore different settings of the temporal and spatial 260 correlation lengths, which control the degrees of freedom of the state vector. We explore spatial correlation lengths of 1000 km, 4000 km, 6000 km, 10000 km, and 20000 km, and temporal correlation lengths of 5.5, 7, 9.5, 12 months.
Secondly, we explore the optimization of specific categories in inversions S1-S3. In S1 we attempt to perform an "objective" inversion, in which we assign grid-scale errors of 50% to the biosphere and ocean (we optimize both COS and CS 2 ), and 10% to the anthropogenic COS and CS 2 emissions, and to the biomass burning emissions. Furthermore, in S2 we only optimise 265 ocean exchange, and in S3 we only optimise the biosphere exchange. The aim of inversions S1-S3 is to explore whether either ocean fluxes or the biosphere fluxes (or both) should be used to close the COS budget. Note that DMS ocean emissions are not optimized. The names and setting of the inversions are summarized in Table 3.
The cost function is minimized with Congrad, an efficient numerical algorithm for solving linear systems (Lanczos, 1950).
We perform flux inversions for the period 2000-2012. To decrease computational costs, we adopt the strategy to run parallel 3-year inversions, and we discard the optimized fluxes of the first 6 months (spin-up) and the last 6 months (spin-down). For example, the first inversion targets the period 1-1-2000 to 1-1-2003, the second inversion 1-1-2002 to 1-1-2005, and so on. In 275 the spin-up period the fluxes in the first 6 months are used to adjust the imperfect initial condition. In the spin-down period, fluxes are less reliable, because they have not been well constrained by observations. The optimized fluxes in the overlapping years are used to check the inversion results for consistency. In general, it is found that the optimized fluxes in the overlapping periods are highly consistent.

Closing the COS budget
In this section, we consider inversion S u , in which a uniform field emitting 432 Gg S a −1 is optimized. We use different settings for the spatial and temporal correlation lengths of this field in the inversion, and quantify the posterior goodness-of-fit using the χ 2 metric (Eq. 10). As presented in more detail in Supplementary Figure S5 we find, as expected, that χ 2 decreases with increasing degrees of freedom (smaller correlations).

285
Overall, the posterior fit to NOAA observations does not improve significantly for smaller correlation lengths. If we analyse the posterior fit to independent data from HIPPO, however, we find that the χ 2 reaches a minimum (see supplement Figure S5).
After this minimum, χ 2 values increase again, a possible sign of over-fitting. We therefore select 4000 km and 12 months for respectively the spatial and temporal correlation length, and use these values in the remainder of this study.  The correlation settings have a large impact on the optimized fluxes. Figure 5 shows the spatial distribution of the posterior flux field calculated with two different correlation settings. For correlations of 1000 km and 5.5 months (panel (a)) we detect 300 a typical pattern that signals over-fitting of the observations. In such a pattern, the optimized flux displays hot spots close to measurement locations (e.g. THD, MLO, SMO). For very long spatial correlations, e.g., 20000 km, posterior fits are poor (χ 2 > 6, see Figure S5) and optimized flux patterns show irregular behaviour ( Figure S6). Our best inversion (4000 km and 12 months) produces a smooth optimized flux without apparent spatial patterns near observational stations (Figure 5b). This pattern confirms the missing COS sources in the tropics Berry et al., 2013) and also requires more 305 uptake at high latitudes, especially in the NH.
To investigate the variation in the optimized fluxes of inversion S u , we decompose the flux components as described in inter-annual variability, but no clear long-term trend. This indicates that the "missing sources" are likely partially related to fluxes with strong seasonal cycles. In the next section, we will therefore explore the optimization of the ocean and biosphere fluxes.

315
In this section we will discuss the results of inversions S1, S2, and S3. The resulting global budgets are compared to literature values in Table 4. In addition, χ 2 metrics and biases of the various inversions are reported in Table 5 for the NOAA surface network, the HIPPO campaigns, and the NOAA airborne profiles. Note that we also report results for optimizations that assimilated the HIPPO observations next to the NOAA surface data. The period of the analysed inversions is 2008-2010.
The prior and posterior emission errors and error reduction of the different inversion scenarios are listed and discussed in 320 Supplement Table S1.
The three inversions are all able to close the global COS budget, however, with very different budget terms (Table 4).
Inversions S1 and S3 close the COS budget by a drastic reduction of the biosphere uptake in the tropics and more biosphere uptake at high latitudes. When the biosphere is not optimized (S2), the inversion enhances the CS 2 tropical oceanic source and reduces direct COS emissions from the high latitude oceans (Table 4). Both patterns lead to enhanced release from the tropics 325 and more uptake at high latitudes, as was found for inversion S u .
Concerning the posterior fit to observations, none of the S1-S3 inversions performed like inversion S u . The statistics in Table 5 show that S u leads to the best fit to the assimilated observations, and only a small remaining bias. Inversions S1 and S3 show better χ 2 statistics and smaller biases than inversion S2, because it is difficult to fit continental NOAA stations (LEF, HFM, NWR, THD) only by optimizing ocean fluxes. However, S1 and S3 show a tendency to turn the tropical biosphere sink 330 into a source, as shown in Figure 7, which depicts the posterior biosphere flux and flux increment for inversion S1. Note that while the uptake over high NH latitudes is enhanced, fluxes over regions in South America and over Indonesia have turned into a source. This behaviour can be explained by the under-determined nature of the inverse problem: there are simply not enough observations to constrain the tropical fluxes. Fast mixing in the tropics further complicates the detection of signals from the tropical biosphere using the NOAA surface network. Without additional observations it is therefore hard to unequivocally 335 close the tropical COS budget. Currently, inversion S1 mostly assigns the missing sources to reduced biosphere uptake in the tropics, but the superior S u inversion assigns the missing COS sources to a broad band in the tropics, without strong preference for land or ocean. Note that the behaviour of inversions S1, S2, and S3 is strongly driven by the predefined spatio-temporal patterns in the prior flux fields.
Although we currently cannot close the global COS budget with one specific known flux, it is instructive to explore the in-340 formation content of independent COS observations. In the next section, we will therefore evaluate the results of our inversions with HIPPO and NOAA airborne observations (Figure 1).

Evaluation with HIPPO and NOAA airborne profiles
From Table 5 it is clear that for all inversions the comparison to HIPPO observations is not very favourable. Most notably, the simulations with optimized fluxes show strong negative biases, and poor χ 2 statistics. However, Figure 8 shows that the 345 inversions S1 and S u (blue lines) largely improve the correspondence to HIPPO campaign 1 observations (red), relative to the prior simulation (black). The posterior simulations do capture the HIPPO observations much better. The remaining differences in the middle panels of Figure 8 show the general underestimation of the model. However, inversion S1 overestimates HIPPO in the southern tropics, likely caused by too large flux adjustments over South America, the region sampled by HIPPO campaign 1.

350
Interestingly, when the HIPPO observations are additionally assimilated in the inversion, biases are largely removed (Figure 8, lower panels) while the correspondence to the NOAA surface network deteriorates only slightly (Table 5). Posterior χ 2 values for the HIPPO campaigns remain relatively poor, however, signalling too strict error settings or processes that are not properly modeled.
From the comparison with HIPPO we find that our state vector has enough flexibility to fit additional observations, and 355 that the inversions are strongly observation-limited. Moreover, we find that the inversions based on only observations from the NOAA surface network tend to underestimate COS in the free troposphere. This is corroborated by observations from the NOAA airborne profiles, which are mostly collected over the US (see Figure 1). Figure 9 shows a comparison between profiles using results of inversion S1. Although most posterior profiles (blue) improve considerably compared to the prior simulation (black), they still underestimate observations (red) in the free troposphere. Note that the simulations based on inversion S1 360 correctly predict the draw-down of COS towards the surface for most measured profiles, and especially the match with the LEF site is very good at the surface, which confirms the performance of the inversion. If HIPPO observations are additionally assimilated (green), the agreement in the free troposphere slightly improves. For S1, χ 2 for the profile comparison reduces from 27.7 to 20.1, and the bias reduces from -13.9 to -9.7 pmol mol −1 (Table 5). This confirms the low bias of the free troposphere COS mole fractions in simulations with fluxes that are optimized using both NOAA surface and HIPPO observations.
It is now clear that inversions using surface data from the available NOAA network sites will not be able to separate various source categories, and specifically not in the data-void tropics. In the next section we will therefore investigate the prospects of using satellite data to constrain fluxes.

Satellite validation
In Figure 10 we present a comparison between MIPAS, ACE-FTS, and co-sampled TM5 COS profiles. The latitude-height 370 distributions of MIPAS, TM5 (convolved with the MIPAS AK) and ACE-FTS are shown in Figure 10(a-c). In Figure 10(d) we show inversion S1 (blue), inversion S1 with additional assimilation of HIPPO profiles (green), and averaged TM5 profiles shows that the prior simulation is too high in the tropical latitudes and on the NH (e.g., June, September, and December).
After assimilation, the agreement with TES improves, but now a general underestimate can be observed. The inversion in which also the HIPPO observations are assimilated bring the simulated mole fractions closer to TES (except for September), confirming our earlier findings based on the airborne observation. Thus, although the TES-derived columns are rather noisy, they offer good perspective to better constrain the COS budget in the tropics. Due to the sensitivity of TES to COS in the 390 middle troposphere (Kuai et al., 2015), the assimilation of TES in our 4DVAR system might be able to differentiate between the biosphere and ocean signal, something that turned out to be difficult using NOAA surface observations only.

Discussion
In this study we have presented inversions focused on the closure of the global COS budget. In general, our inversion modelling framework based on the TM5-4DVAR system is quite well capable to close the global budget (e.g., inversion S u , S1 and S3) 395 and to optimize flux fields such that surface observations are well reproduced. However, due to the lack of observations, we are unable to unambiguously assign the missing COS sources to either missing ocean emissions or to reduced tropical uptake by the biosphere. Firstly, the total number of observations remains relatively small, which leads to an under-determined inversion problem. Secondly, there are no observational sites that sample air masses from tropical Africa, South America, and South East Asia, which are regions with important COS fluxes. An important next step will therefore be the utilization of satellite data 400 in future inverse modelling studies. In the current study, we did not include all exchange fluxes that are reported in literature (Whelan et al., 2018). In general, we find that our inversions still underestimate COS in the free troposphere. Here, there might be a role for volcanic emissions (25-42 Gg S a −1 (Whelan et al., 2018)), or 'unnoticed' tropical sources like wetland exchange (-150 to 290 Gg S a −1 (Whelan et al., 2018)). Volcanic emissions are important to mitigate the stratospheric aerosol loading in the stratosphere (Sheng et al., 2015) and might be able to reduce the gap between modelled COS by TM5 and measurements.

405
Alternatively, missing COS could come from an atmospheric oxidation process that converts CS 2 or DMS to COS. We did not find strong evidence for enhanced CS 2 emissions from tropical oceans in our S1 inversion, although inversion S2 produced reasonable COS simulations by optimizing only COS and CS 2 emissions from the ocean. Moreover, our "best" S u inversion produced a flux field that indicated enhanced tropical sources over both land and ocean ( Figure 5). Thus, field studies that address tropical COS exchange processes are urgently needed (Lennartz et al., 2020).

410
The use of COS as a proxy for gross primary productivity needs a better level of understanding of the biosphere flux. Here we used monthly prior flux fields calculated with the SiB4 model (Berry et al., 2013) in which soil exchange and vegetation uptake are combined. In future studies, we might need a better prior description of this important global COS sink. For instance, recent studies (Ogée et al., 2016;Sun et al., 2018;Meredith et al., 2019;Spielmann et al., 2020) stress the importance of the soil-atmosphere COS exchange. Our inversions S1 and S3 calculate large increments in the biosphere exchange (Figure 7), 415 with general less uptake in the tropics (turning the flux even into a COS source) and enhanced uptake in the NH high latitudes.
Quantitatively, the COS uptake is reduced from a prior value of 1053 Gg S a −1 to 557 Gg S a −1 to close the COS budget. While we seriously question the validity of this result given the fact that most flux adjustments are projected in the data-void tropics, it is still instructive to consider the feedback of the atmospheric COS mole fractions on COS uptake. Since biosphere models operate mostly uncoupled to atmospheric transport models, we used a fixed mole fraction of 500 pmol mol −1 to construct the 420 prior biosphere fluxes. However, observations clearly show a large drawdown of COS near the surface (Hilton et al., 2017;Spielmann et al., 2020). We therefore explored the calculations in SiB4 and found that biosphere flux should scale linearly with atmospheric COS mole fractions (Berry et al., 2013). To estimate the potential impact of reduced mole fractions at the surface on the biosphere flux, we corrected the monthly SiB4 fluxes as:

425
where f biosp and f biosp,cor are the original and corrected monthly biosphere fluxes on the TM5 grid, and y(COS) is the monthly mean COS mole fractions (pmol mol −1 ) in the first model layer (approximately 50 m) from inversion S u . This simple correction, based on monthly mean fields, changes the biosphere sink from -1053 Gg S a −1 to -851 Gg S a −1 , an update of 202 Gg S a −1 (Supplementary Figures S7, S8 and S9). Interestingly, the corrected flux is strongly reduced over regions with an active tropical biosphere, in line with results from inversions S1 and S3. This indicates that uptake of COS should be treated 430 as a first order loss process, and that the SiB4 prior fields based on fixed atmospheric mole fractions of 500 pmol mol −1 likely 14 https://doi.org/10.5194/acp-2020-603 Preprint. Discussion started: 9 July 2020 c Author(s) 2020. CC BY 4.0 License.
overestimate COS uptake. However, such an approach makes the optimization problem non-linear. This, and the challenge of assimilating satellite observations, will be the subject of future studies.

Conclusions
In this study, we have implemented an inverse modelling framework for COS, coupled to the budgets of CS 2 and DMS. Inver-435 sions using the NOAA surface observation network have been evaluated with observations from HIPPO, airborne observations, and satellite products. Conclusions are: -In line with earlier studies, our inversions point to missing sources in the tropics and missing sinks at high latitudes. We identify that the flux field that closes the budget exhibits an irregular seasonal cycle. Whether the missing sources in the tropics originate from the land or ocean cannot be determined currently, because observations in the tropics are sparse.

440
Part of the tropical sources can be explained by the dependence of COS uptake on atmospheric mole fractions.
-Simulations that are optimized by only NOAA surface observations lack information about COS in the free troposphere.
When HIPPO observations are used as an additional data source in the inversions, the comparison to NOAA airborne observations and satellite products generally improves.
-Future improvements are expected from the assimilation of satellite data and better prior descriptions of the ocean and 445 biosphere fluxes.
Our future plan is therefore to assimilate satellite data into our 4DVAR inverse modelling system to have better constraints on COS in the free troposphere and lower stratosphere. Other developments target the coupling of COS and CO 2 in a shared inverse modelling system, with the aim to better constrain gross primary productivity.

35
https://doi.org/10.5194/acp-2020-603 Preprint. Discussion started: 9 July 2020 c Author(s) 2020. CC BY 4.0 License. 36 https://doi.org/10.5194/acp-2020-603 Preprint. Discussion started: 9 July 2020 c Author(s) 2020. CC BY 4.0 License. Table 3. Names and error settings of the inversions performed in this study. The values correspond to grid-scale errors. Monthly flux fields are optimized using spatial and temporal correlation lengths of 4000 km and 12 months, except for inversion Su, in which multiple settings are explored.
Biosphere Ocean COS Ocean CS2 Biomass burning Anthropogenic COS and CS2 "Unknown"