Plant gross primary production, plant respiration and carbonyl sulﬁde emissions over the globe inferred by atmospheric inverse modelling

. Carbonyl Sulphide (COS), a trace gas showing striking similarity to CO 2 in terms of biochemical diffusion pathway into leaves, has been recognized as a promising indicator of the plant gross primary production (GPP), the amount of carbon dioxide that is absorbed through photosynthesis by terrestrial ecosystems. However, large uncertainties about the other components of its atmospheric budget prevent us from directly relating the atmospheric COS measurements to GPP. The largest uncertainty comes from the closure of its atmospheric budget, with a source component missing. Here, we explore the beneﬁt 5 of assimilating both COS and CO 2 measurements into the LMDz atmospheric transport model to obtain consistent information on GPP, plant respiration and COS budget. To this end, we develop an analytical inverse system that optimizes biospheric ﬂuxes for the 15 plant functional types (PFTs) deﬁned in the ORCHIDEE global land surface model. Plant uptake of COS is parameterized as a linear function of GPP and of the leaf relative uptake (LRU), which is the ratio of COS to CO 2 deposition velocities in plants. A possible scenario for the period 2008-2019 leads to a global biospheric sink of 800 GgS . yr − 1 , with 10 higher absorption in the high latitudes and higher oceanic emissions between 400 and 600 GgS . yr − 1 most of which is located in the tropics. As for

1 Introduction indicate that about 100 GgS.yr −1 of COS is oxidized by OH in the low troposphere while 50 ± 15 GgS.yr −1 is photolysed in the stratosphere . The largest sources of COS are from human activities and the ocean, with minor contributions from biomass burning (50-100 GgS.yr −1 , Glatthor et al. (2017); Stinecipher et al. (2019)). The oceanic source has been estimated between 200 and 400 GgS.yr −1 (Lennartz et al., 2017(Lennartz et al., , 2020a. The missing source is unlikely to arise from direct ocean emissions since the ship cruises have recorded a sub-saturation of tropical sea waters with respect to COS 5 (Lennartz et al., 2017). COS production from atmospheric oxidation of dimethyl sulfide (DMS) and carbon disulfide (CS 2 ) are two other candidates that may support the missing source, as they have been reported to peak over the tropics. Recently, Lennartz et al. (2020a) developed a mechanistic model to simulate COS emissions via CS 2 and estimated a global source of 70 GgS.yr −1 , too low to support the missing source. However, this model still relies on many assumptions and has limitations such as the lack of oceanic horizontal transport. As for the emissions through DMS, the oxidation yield is currently deduced 10 from experiments carried out under conditions which are not representative of the atmospheric environment with high DMS concentrations and without NOx at 298 K (Barnes et al., 1996). The recent identification of novel DMS oxidation products (Berndt et al., 2019;Veres et al., 2020) could challenge our current understanding of the mechanistic links between DMS and COS formation into the atmosphere. Regarding the anthropogenic emissions, the inventory from Kettle et al. (2002) used by most top-down studies has been demonstrated to be incomplete (Blake et al., 2004;Du et al., 2016). The anthropogenic 15 inventory has been revised upward from 200 GgS.yr −1 to 400 GgS.yr −1 , with the largest source shifting from North America to Asia (Zumkehr et al., 2018). Yet, firn air sampled in Antartica and Greenland suggests that anthropogenic emissions are still underestimated and are closer to 600 GgS.yr −1 (Aydin et al., 2020).
As an alternative to modelling direct emissions, attempts have been made to constrain the COS budget through inverse or "top-down" approaches. With the help of a transport model and a priori information, these approaches adjust the surface fluxes 20 to better match simulated atmospheric concentrations with observations. Previous top-down assessments of the COS budget identified the missing source as likely being from the ocean, with a total oceanic release between 500 and 1000 GgS.yr −1 Here, we present an update of the Launois et al. (2015b) analytical system that jointly assimilates COS and CO 2 measurements using recent prior fluxes and many more degrees of freedom given to the inversion. The new system makes it possible to optimize each process by region and by month and in particular, the GPP for each of the 15 Plant Functional Types (PFT) of the ORCHIDEE (ORganizing Carbon and Hydrology In Dynamic Ecosystems, Krinner et al. (2005)) terrestrial model.
We assume a linear relationship between GPP and biospheric COS uptake under a leaf relative uptake (LRU) approach. 5 We also take advantage of the additional sophistication of the inversion system to assimilate COS measurements together with CO 2 measurements, in order to constrain both GPP and respiration fluxes. Our study period spans 12 years, from 2008 to 2019.
The objectives of our study are threefold: 1. Evaluating the analytical inverse system applied for the first time to the joint assimilation of COS and CO 2 measurements from a technical point of view, 10 Analytical versions of the LMDz tangent-linear and adjoint operators have been developed. Those codes respectively perform operations Mx and M T y * , with M the Jacobian matrix of LMDz, x a vector of input variables of LMDz (i.e. tracer surface fluxes and initial tracer values), and y * a vector of size the number of output variables (i.e. the atmospheric concentrations at observation location and time), at the machine epsilon despite conditional statements in the LMDz code.
In our study, we assimilate LMDz to one of its Jacobian matrices: we linearized LMDz beforehand around a top-down 5 estimation of the CO 2 surface fluxes from the Copernicus Atmosphere Monitoring Service (https://atmosphere.copernicus.eu/). "We checked that this linearization using CO 2 was still valid for COS fluxes and expected COS flux increment λ patterns through a test for the tangent linear model. Specifically, we checked the alignment of the non linear evolution of M (x 0 + λ) with the linear evolution M(λ) for the COS fluxes x 0 (not shown). The archived Jacobian matrix was generated by the adjoint code of LMDz. This way of doing is in principle an improvement over previous COS studies with LMDz (Launois et al.,10 2015b; Peylin et al., 2016) which used a rough approximation of the adjoint M T y * , called "retro-transport", in which the direction of time was simply reversed in LMDz without strict inversion of the order of calculations (Hourdin and Talagrand, 2006). In addition, we use a much more recent version of LMDz here (LMDz6A, Remaud et al. (2018), vs. LMDz3, Hourdin et al. (2006), and at higher resolution, in particular in the vertical (39 vs. 19 layers). The adjoint code of LMDz was initially developed for variational inversion, but we use this facility for the first time with LMDz in an analytical framework, to calculate 15 the rows of the Jacobian Matrix M that correspond to the places where, and the times when, we have observations to assimilate.
By definition, each value of M is a derivative of an output tracer concentration relative to an input surface flux or initial tracer value. More specifically, we use one adjoint run M T y * for each observation to assimilate, with the elements of y * set to zero or one. We use the Community Inversion Framework (CIF, Berchet et al. (2020)) to manage these computations.
In practice, we considered 8-day-average synthetic observations at each selected measurement site (see Section 2.2.1) be- 20 tween 2008 and 2019. The implication is that the atmospheric transport model can not represent the temporal variability within a week. For sites below 1000 m above sea level, only afternoon observations were used as the models do not simulate the accumulation of the tracers in the nocturnal boundary layer well (Locatelli et al., 2015). For elevated stations, both daytime and early nighttime observations were discarded because coarse-resolution models cannot represent the advection of air masses during the day by upslope winds over sunlit mountain slopes in the afternoon (Geels et al., 2007). After corresponding forward 25 runs that defined the tracer linearization trajectories, the adjoint model was run nine months backward in time from measurement time for each of these synthetic observations (with appropriate y * ), giving as output the series of integrated sensitivities of the corresponding measurement with respect to the surface fluxes throughout the nine months and to the concentrations at the initial point in time. For times prior to nine months, we have in fact not used the exact adjoint values. Instead, we extended the databases of adjoint outputs for the surface fluxes beyond the nine-month windows with two parts: (i) monthly adjoint outputs 30 between months 9 and 24 taken from computations for the year 2017, and (ii) beyond 24 months, a globally-homogeneous value (i.e. 1 GtC emitted at the surface is translated to an average concentration of 0.38 µmol.mol −1 , or parts per million, ppm). We have verified that the CO 2 and COS concentrations obtained by the resulting Jacobian matrix (Mx) match well the one produced by the full LMDz transport model over the period (See Fig. S3 of the Supplementary materiel).
In total, we have computed 15 stations × 12 years × 2 weeks × 12 months adjoint computations of 8 process time hours each on a local parallel cluster. 2 weeks correspond to the typical frequency of the COS measurements.
As explained below in Section 2.4.2, LMDz is complemented here for the modelling of COS in the atmosphere by a chemical sink, represented by a surface flux. been collected as pair flasks one to five times a month since 2000 and have then been analysed with gas chromatography and mass spectrometry detection. Most measurements have been performed in the afternoon between 11 and 17h local time when the boundary layer is well mixed. The COS measurements have been kept for this study only if the difference between the pair flasks is less than 6.3 ppt. With the exception of site WIS, most sites have at least one measurement per month for 11 months 20 out of 12 within each year over the years 2008-2019 (see Figure S17). These data represent an extension of the measurements first published in Montzka et al. (2007). The Jacobian Matrix M described in the previous section reveals the information content provided by these measurements in terms of tracer surface flux. In particular, it helps to identify to what extent each region of the globe is seen by the observations and therefore, it provides an indication of the details needed or not in the flux variables to be optimized. The transport 25 sensitivities to the sources integrated over two months are represented in Fig. 1 on average for the period 2016-2019. The zonal distribution of sensitivities reflects the zonal atmospheric circulation at mid and high latitudes, with the north (south) stations seeing the entire domain above (under) 30 • N. The tropics are not well constrained by the observations: the tropical circulation, mainly vertical, limits the extension of the footprint zone around SMO and MLO, leaving the Indo-Pacific region for the most part unconstrained. However, the tropical areas are slightly constrained by well mixed air masses coming from remote stations 30 (see Fig. S2). We also see that the southern and northern oceans are also more constrained by the observations than the continents, with the exception of North America which is relatively well covered by the measurements. Fig. 1 suggests the need to separate between each latitudinal band (Tropics, northern and southern latitudes) and also between oceans and continents in the inversion. of a monthly mean concentration at all stations from the NOAA network with respect to CO2 surface fluxes in the two previous months. The yellow dots denote the location of the surface sites. The site KUM is not depicted as it has the same coordinates than MLO but at sea level.

Observations
Note that, if computed with respect to the COS fluxes, the annual climatology of Jacobian shown on Fig. 1 would have the same spatial pattern but with a different unit given that the atmospheric transport is linear and there are no atmospheric chemical reactions.

Independent observations
An ensemble of independent observations -i.e. data that are not assimilated in LMDz -is used to evaluate the fluxes retrieved by our inverse system. We focus here on the observations used to evaluate the COS and the GPP fluxes.
The first observation program is the HIAPER Pole-to-Pole Observations (HIPPO, Wofsy (2011)). HIPPO consisted of five aircraft transects of many trace gas measurements, including for COS and CO 2 , in the troposphere over the Western Pacific:   (Hattori et al., 2020).
The French sampling site, GIF (48°42'N -2°08'E), is located about 20 km to the south west of Paris where ground level COS has been monitored on a hourly basis since August 2014 (Belviso et al., 2020). According to the recent COS global gridded anthropogenic emission inventory of (Zumkehr et al., 2018), the Paris region is an important source of COS (791 5 MgS/yr, J. Stinecipher personal communication November 2018) where its indirect emissions from the rayon industry largely overpass its direct emissions from the aluminium industry and traffic. These estimates have been challenged by Belviso et al.
(2020). The location of the HIPPO data, NOAA airborne profiles, Japanese and GIF sites are depicted in Figure 2.
The fourth observation program is made of the satellite COS retrievals from MIPAS. The MIPAS spectrometer measured limb-emission spectra for several trace gases in the mid-infrared (Fischer et al., 2008) (Glatthor et al., 2015(Glatthor et al., , 2017. The number of vertical layers of the MIPAS retrievals is 60. Between altitudes 7 and 25 km the accuracy of the COS profiles is around 50 ppt in the absence of clouds (in particular deep-convective ones) (Glatthor et al., 2015).
Last, the SIF satellite retrievals from the Global Ozone Monitoring Experiment-2 (GOME-2) make it possible to eval- 15 uate the seasonality of GPP inferred by inverse modelling for each PFT. SIF represents the amount of light reemitted by chlorophyll molecules as a byproduct of photosynthesis. Satellite-based SIF data is considered as a proxy for the GPP of terrestrial ecosystems at large spatial-temporal scales (Frankenberg et al., 2011;Guanter et al., 2012;Zhang et al., 2016;Yang et al., 2015;Li et al., 2018). We use release number 28 of the NASA GOME-2 (Global Ozone Monitoring Experiment-2 onboard the MetOp-A satellite) daily corrected SIF product (Joiner et al., 2013(Joiner et al., , 2016. The dataset is available at: https : 20 //avdc.gsf c.nasa.gov/pub/data/satellite/M etOp/GOM E_F/v28/. We used the monthly level 3 product gridded at a 0.5 • resolution between years 2008 and 2019. This GOME-2 SIF product was shown to be very similar in terms of seasonality and magnitude (after spectral scaling) to the reference Orbiting Carbon Observatory (OCO-2, launched in 2014, Sun et al. (2018)) data (Bacour et al., 2019). For each PFT, we average all the grid points within the LMDz grid points that have a fractional cover greater than 0.8. We lower this threshold to 0.3 for PFTs 7 (Boreal Broad-leaved Evergreen Forest), 8 (Bo-25 real Broad-leaved Summergreen Forest), 9 (Boreal Needleleaf Summergreen Forest) and 15 (Boreal C3 grass). The PFTs are further defined in section 2.4.

Data sampling
For each species and each measurement, the simulated concentration fields were sampled at the LMDz 3D grid box nearest to the observation location. As mentioned above, the observations at selected local times are assimilated as 8-day averages. 30 For the independent observations, LMDz is sampled at the closest time from the observations. All observations are dry-air  When comparing MIPAS data with LMDz simulations, the a priori and vertical sensitivity of the retrievals must be taken into account. For each MIPAS retrieval, the modelled COS profiles have been interpolated linearly to the MIPAS vertical resolution (60 layers) while ensuring the conservation of the column-average mixing ratio (Chevallier, 2015). They were then smoothed with the corresponding MIPAS averaging kernels.
The a priori profile for the COS retrievals is a zero profile (Glatthor et al., 2015), hence it had not to be taken into account. 5 As done in Glatthor et al. (2015), we focus here on the spatial distribution of the COS mixing ratio at the 250 hPa pressure level (still after convolution of the model with the averaging kernels) for the period 2008-2012. In order to dampen the random noise in the MIPAS observations, we aggregate the retrievals into in 5 • ×15 • latitude-longitude bins.

Inverse framework
Our inverse system seeks to estimate the amplitude of n sources or sinks of CO 2 and COS gathered in a vector x by reducing 10 the mismatch between the observed concentrations gathered in a vector y o and those simulated with the atmospheric transport model M forced by these sources and sinks. Together with an initial disaggregation operator (that converts the low-resolution control vector into gridded fluxes using gridded reference fluxes, see section 2.5.1) and a sampling operator (see previous section), the linearized transport model M is part of the linear observation operator H that relates x and the model-equivalent CO 2 and COS measurements y at the sites shown in Fig. 1: In order to regularize the inverse problem corresponding to Eq.
(1), we use a Bayesian framework involving an a priori control vector, x b with its associated uncertainty statistics, summarized in covariance matrix B. Within the Gaussian assumption of the prior and observation errors, the solution of the inverse problem can be simply expressed by the following equation (see for instance Tarantola (1989)) for the the posterior control vector x a and the uncertainty covariance matrix P a : with R the error covariance matrix of the observations, encompassing measurement errors and H errors. Within the Gaussian assumption with no bias for all errors, the above solution minimizes the cost function :

Gridded reference fluxes
In the following, we call "reference fluxes" the maps of CO 2 and COS fluxes that are used in the observation operator, the control vector x being a low-resolution multiplier to these (see Section 2.5.1). For use at resolution 3.75 • ×1.90 • , the maps of the following components of the CO 2 and COS fluxes have been interpolated from their native resolution. All projections 10 conserved mass.

CO 2 fluxes
Our reference fluxes combine several information sources. Fossil fuel emissions are from the gridded fossil emission dataset GCP-GridFED (version 2019.1) (Jones et al., 2021). Biomass burning fluxes vary inter-annually and are described by the GFED 4.1s database (https://www.globalfiredata.org/data.html). Monthly air-sea CO 2 exchange is prescribed from the Coper- ORCHIDEE explicitly parameterizes the main processes influencing the water, carbon and energy balances at the interface between land surfaces and atmosphere. The vegetation is represented by 15 PFTs with a spatial distribution prescribed from the ESA Climate Change Initiative (CCI) land cover products (Poulter et al., 2015). The plant phenology is prognostic and 20 PFT-specific. We used version 9 tuned for the CMIP6 exercise and forced by the global CRUJRA reanalysis at global scale (https://sites.uea.ac.uk/cru/data/) v1-v2, applying land-use change and realistic increase of CO 2 atmospheric concentration.
Emissions from the land use and wood harvest have been included beforehand in the respiration term. Biomass burning emissions are not taken into account in this respiration term from ORCHIDEE. The yearly global GPP from ORCHIDEE amounts to 126.7 GtC.y −1 during 2008-2019. This value is within the range of the GPP estimates (106-137 GtC.yr −1 ) based on pho-25 tosynthesis proxies (see Table S1) (Beer et al., 2009(Beer et al., , 2010Welp et al., 2011;Alemohammad et al., 2017;Jasechko, 2019;Jung et al., 2020;Ryu et al., 2011;Badgley et al., 2019;Stocker et al., 2019). The PFTs and their acronyms are defined in Table 1.
Note that GPP, respiration, COS vegetation and soil fluxes are null within PFT 1 (base soil).

COS fluxes
The components of the COS budgets that are considered are biomass burning, soil emissions and uptake, anthropogenic emissions, plant uptake, oceanic emissions and the atmospheric oxidation by the OH radical in the troposphere. Photolysis in the stratosphere, estimated to 30 GgS.yr −1 in the LMDz atmospheric transport model (not shown), and volcano emissions, in the range 23-43 GgS.yr −1 , have been neglected .
Reference air-surface exchanges from oxic soils have been simulated by the steady-state analytical model of Ogée et al. (2016) implemented in the ORCHIDEE land surface model with the Zobler soil classification at a 0.5 • both in longitudes and latitudes (Abadie et al., in prep). This model is built on the assumptions that the soil atmosphere exchanges are governed by three processes, namely diffusion through the soil column, production and irreversible uptake via hydrolysis. The COS uptake reflects for the most part the activity of CA, ubiquitous in soil microorganism, which efficiently converts COS into

Plant uptake
We chose the empirical formulation of the COS uptake by leaves from Sandoval-Soto et al. (2005) given by the linear In this equation, F COS and GP P are the COS uptake and the CO 2 uptake (both in ppm/m 2 /s), respectively, [COS] and [CO 2 ] being the ambient air concentrations of COS and CO 2 . v COS and v CO2 are the COS and CO 2 leaf uptake velocities.
The ratio of uptake velocities of COS compared to CO 2 is defined as the LRU: 20 We use a zero-order LRU approach (i.e. with no interaction between vegetation and COS mixing ratio), given the complexity of an order approach (i.e. a coupled atmospheric COS concentration -COS flux calculation). To address this shortcoming, we use the time-evolving hemispheric means of the COS and CO 2 atmospheric concentrations, N H mean and N S mean as done in Montzka et al. (2007). They are computed from monthly means at selected stations or groups of stations weighted by the cosine of latitude of atmospheric boxes encompassing different site groupings in this way: We have only made a distinction between C4 (LRU=1.21) and C3 plants (LRU=1.68) and disregarded the dependence on light and water vapor deficit that was observed at both leaf (Stimler et al., 2010) and ecosystem scales (Maseyk et al., 2014;30 Commane et al., 2015;Kooijmans et al., 2019). Our LRU set is derived from Whelan et al. (2018) and uses, for C3 plants, the median value of 53 LRU data and, for C4 plants, the median value of 4 LRU data. This simplification is supported by Hilton et al. (2017); Campbell et al. (2017) who showed that the uncertainty on the LRU parameter is of a second order importance compared to the uncertainties on the GPP and the other COS fluxes. Morevoer, Maignan et al. (2020) showed that using a mechanistic model or its LRU equivalent model (i.e. with a constant LRU per PFT in ORCHIDEE LSM) for the plant uptake 5 leads to similar results when transporting the COS fluxes with LMDz and comparing the COS concentrations at stations of the NOAA network. A physical reason making the LRU simplification acceptable is that the observation sites sample plant sink signals from multiple parts of the day. We have not taken into account the epyphites which can both emit and absorb COS depending on environmental conditions (Kuhn and Kesselmeier, 2000;Rastogi et al., 2018). We have not considered DMS and CS 2 as separate tracers as done in Ma et al. (2021). CS 2 has a lifetime estimated between 4 days (Khan et al., 2017) and 12 days (Khalil and Rasmussen, 1984) and DMS has a lifetime of 1.2 days. For the sake of simplicity, the oxidation of CS 2 and DMS by OH has been assumed to happen instantly in the atmosphere.

Biomass burning
We derived from measurements over the peatlands. Moreover, their weak inter-annual variability was shown to better reproduce the annual trend in atmospheric concentration at the Jungfraujoch station, the long-term trend being primarily driven by changes in anthropogenic activity . It should be also noted that, compared to the Kettle et al. (2002)

Control vector
Our inversion window covers 12 years. The spatiotemporal resolution of the control vector x over this period represents a compromise between the assumed resolution of the errors of the reference fluxes, the expected resolution of the flux increments that can be inferred by the sparse site distribution (see Figure 1), and considerations on computing time. Typically, a large 5 control vector (i.e. many controlled regions and types of emission) may represent the complexity of reality better than a small control vector (i.e. few regions and emission processes), but also increases the inversion calculation load without always improving inversion skill, given the scarce and uneven observation network. The variables in the control vector are therefore all multipliers of the above-described gridded reference fluxes, as described as follows, rather than grid-point fluxes themselves.
The choice of multipliers rather than increments implies that the initial sub-control-scale patterns are kept. The prior control 10 vector x b is simply a vector of ones.
We control COS oceanic fluxes in three latitudinal bands : the tropics, the northern latitudes and the southern latitudes. This separation allows the inverse system to modify the latitudinal distribution of the reference emissions, which remains subject to large uncertainties, while preserving the prior longitudinal patterns. This amounts to saying that the coastal sites located in the northern hemisphere constrain the total oceanic emissions over the whole northern hemisphere above 30 PFTs which are present in both (4, 5, 6, 10, 11, 12, 13, see Table 1) to take into account the different seasonality. For the anthropogenic COS emissions, we control a single annual emission coefficient and rely on the reference distribution of sources 20 between Europe, Asia and America: the lack of observations in the Asia-Pacific region does not allow us to separately optimize Asian emissions. All parameters are optimized on a monthly scale with the exception of anthropogenic emissions which are assumed to be constant throughout the year.
For the CO 2 , we neglect the uncertainty on the oceanic, fire and anthropogenic CO 2 emissions compared to that of the sum of the respiration and GPP. The parameters of the control vector are described in Table 3.    twofold factor there. For instance, LEF is located in the Midwestern States, a region contributing half of the summer carbon uptake in North America (Sweeney et al., 2015). Similarly, the standard deviation is also multiplied by two at station SMO further to the challenging representation of sub-grid-scale transport by deep convective clouds in the tropics. The resulting observation error standard deviation at each stations is shown in Figure 5.
Our prior error covariance matrix B (that applies to x b , a vector of ones, cf. Section 2.5.1) is described in Table 4. Although Correlations between PFTs -0.5-0.6 0.5-0.6 -- Table 4. Description of the prior error covariance matrix. Since the control vector is made of low-resolution multipliers to reference maps, the standard deviations are fractions of the reference values. The lag-1 autocorrelation coefficients are the correlations assigned between two consecutive time steps for each controlled variable, the time step being defined in Table 3. only one scenario that is optimal in terms of fit to observations among those that we find compatible with our knowledge of the errors of the reference maps. For instance: -GPP and respiration. The monthly-mean GPP from ORCHIDEE within each of the PFTs agrees with site-level GPP estimates from eddy covariance measurements in the range of 20 % (not shown). For PFT 2 (Tropical Broad-leaved Evergreen Forests), we reduce the 1-sigma uncertainty to 10%, a more realistic value given the large gross fluxes over the 5 tropics. We introduce some non diagonal terms in the prior error covariance matrix to represent likely error correlations between PFTs given that they share for most processes the same equations in the ORCHIDEE model. Thus, the errors in the PFTs mainly located over the high latitudes (PFTs 7,8,9,15), the mid-latitudes (PFTs 4, 5, 6, 10), the tropics (PFTs 2, 3, 11, 14, see Table 1 for a description of the PFTs) are set to be correlated with a factor 0.6 (high latitudes), 0.5 (mid-latitudes) and 0.6 (the tropics), respectively. Thus, over the high latitudes, the PFTs 7,8,9,15 are correlated with 10 a factor of 0.6. We further introduce temporal correlations for GPP and respiration. At the first order, we expect that the errors associated to the monthly GPP simulated by ORCHIDEE are positively correlated because: i) errors in the structure of the ORCHIDEE model likely lead to positively correlated flux errors, ii) parametric errors will also provide similar correlations. However, errors in the meteorological forcing may de-correlate the gross flux errors, which could justify for an exponential decay as a function of time. The memory effect linked for example to soil moisture (and thus 15 precipitation) may also induce error correlation (Stocker et al., 2019). For the annual global GPP, this set-up leads to a 1sigma uncertainty of 5 GtC.yr −1 for a reference value here of 125 GtC.yr −1 : this uncertainty may look small compared with the range of GPP estimates found in the literature (see Table S1) but is in agreement with the most recent estimation of 125 ± 5.2GtC.yr −1 from Stocker et al. (2019). The same set-up has been chosen for plant respiration. There are error correlations between GPP and respiration but these are neglected in this study.

20
-Oceanic emissions. Our resulting 1-sigma uncertainty of 350 GgS.yr −1 for the globe and the year, given a reference value of 271 GgS.yr −1 (see Fig. 4), is consistent with Lennartz et al. (2017Lennartz et al. ( , 2020a who estimated the ocean emissions between 120 -600 GgS.yr −1 . -Anthropogenic emissions. Our correlation length of 500 days damps interannual variations, consistent with Zumkehr et al. (2018) who found that they do not vary by more than 5 % from one year to the next. The resulting 1-sigma uncertainty of 197 GgS.yr −1 for the globe and the year, given a reference value of 370 GgS.yr −1 (see Fig. 4), is consistent with the estimation of 223-586 GgS.yr −1 given by Zumkehr et al. (2018).
-Soil fluxes. Our choice of a standard deviation of 30 % is rather arbitrary given the lack of measurements to evaluate the 5 reference soil flux within each PFT. We also assign a large autocorrelation length (100 days) to damp month-to-month variations, consistent with local measurements made at Harvard and Gif-Sur-Yvette (Belviso et al., 2020;Commane et al., 2015).

Post-processing of the CO 2 and COS simulations and measurements
The seasonal cycle is derived from the surface data using the CCGVU curve fitting procedure developed by Thoning et al. 10 (1989) (http://www.esrl.noaa.gov/gmd/ccgg/mbl/crvfit/crvfit.html). The procedure estimates a smooth function by fitting the time series to a first order polynomial equation for the growth rate combined with a two-harmonic function for the annual cycle, nd a low-pass filter with 80 and 667 days as short-term and long-term cutoff values, respectively.

Metrics
The simulated atmospheric concentrations (for CO 2 or COS here) are evaluated against measurements using the Root Mean 15 Square Error, RMSE, defined as: where N is the number of considered observations, C Obs (n) is the n th observed concentrations and C M od (n) is the n th modelled concentration. The unit of RMSE is in ppm (ppt) for CO 2 (COS).
The global χ 2 is equal to twice the cost function J(x) at its minimum (see Equation 3 for the general definition of the cost 20 function): The variables are defined in the section 2.3. This metric allows us to check the consistency of the error covariance matrices.
The χ 2 follows the so-called chi-square law, with the the number of degrees of freedom equal to the number of observations (N obs ) (as in our case the observation error covariance matrix is diagonal). The ratio χ 2 /N obs (normalized χ 2 ), should there-25 fore be close to 1. This means that the residuals between observed and simulated concentrations should be aligned with the assigned measurement errors, and the residuals should be distributed as a Gaussian around the observed values. A value larger (respectively smaller) than 1 may indicate that the assigned uncertainties (of the measurements and/or from the a priori fluxes) are too small (respectively too large). However, tuning the prior and observation covariance matrices with the sole normalized χ 2 may actually be misleading since the matrices involve many variables (including off-diagonal elements) that may play compensating roles in the χ 2 (Chevallier, 2007).
The χ 2 per station, χ 2 i , represents the contribution of each site to the first term of the global χ 2 . For a station i, the metric is defined as: 5 with x i ) T and y o i being the simulated and observed concentrations at station i. This value, divided by N obs (normalized χ 2 i ), should ideally be close to 1. A value larger (respectively smaller) than 1 may indicate that the assigned uncertainties of the measurements at this station are too small (respectively too large).  Table 5 shows the error reduction achieved by the inversion in terms of RMSE between the simulated and the observed concentrations. As expected, the inverse system has reduced the observation-model mismatch by about 85 % at most stations. Table 5 is also the error reduction for the detrended smooth curves in which only seasonal variations are retained.

Of interest in
It is indeed important to accurately represent the large COS and CO 2 surface depletion in spring as it mainly reflects the amplitude of the GPP over the continents. The seasonal error reduction is usually smaller than the overall error reduction: 15 the COS inversion mainly corrects the negative tendency in COS mixing ratio arisen from the unbalanced prior budget. For instance at MLO between 2008 and 2011, the tendency of the CO 2 (COS) concentrations a priori is 3.9 ppm.yr −1 (-57 ppt.yr −1 ) against 2.0 ppm.yr −1 (1.4 ppt.yr −1 ) in the observations. Yet, the inversion has reduced the seasonal misfits to observations at most sites except at LEF and MLO for CO 2 and MLO, THD, WIS for COS. At the northernmost sites (ALT, BRW, SUM, MHD), the error reduction exceeds 50% for both compounds. Despite some improvements, the inversion still 20 struggles to represent the seasonal cycle of the COS measurements at sites WIS, HFM, THD for which the RMSE remains greater than 15 ppt. THD is a coastal station which suffers from the influence of fluxes nearby (Riley et al., 2005). For this reason, modelling the variability of its CO 2 and COS mixing ratio has been shown to be particularly challenging (Ma et al., 2021). The inverse system also struggles to match CO 2 measurements at sites WIS, NWR, LEF with a seasonal RMSE greater than 1.5 ppm.

25
The consistency of the estimate with the measurement errors and the a priori flux errors assumed is analyzed first with the global normalized chi-squared statistic (see Equation 9). This metric should ideally be close to 1. In our case, the normalized χ 2 equals to 1.04, a value consistent with a fair configuration. The relative contribution of the measurement term (first term of Equation 9) to the total χ 2 (Equation 9 or cost function at its minimum) is much larger than that of the flux term (80% versus 20% on average), suggesting that the a priori constraint is rather loose. 30 In addition to the global consistency between data errors and a priori flux errors, the validity of the relative weights (inverse of the squared data error) assumed for the individual measurement residuals (i.e., at each station) is assessed (see Equation 10).  Table 5. Column "RE" presents the fractional reduction of the model vs. assimilated measurement RMSE (1 − RM SE post RM SE prior ). Column "RM SE seas prior " presents the RMSE of the a priori detrended time series compared to the assimilated measurement time series. Column "RM SE seas post " presents the RMSE of the a posteriori detrended time series. Column "RE seas " presents the reduction of uncertainties using the RMSE metrics applied to the detrended time series (1 − RM SE seas post RM SE seas prior ). Column "χ 2 " presents the reduced chi-squared statistics (without unit) for each station. The detrended curves have been filtered to remove the synoptic variability (see Sect. 2.6). The RMSE is in ppm (ppt) for CO2 (COS). All statistics are for the period 2009-2019.
To this end, Table 5 shows the χ 2 per station. The value is less than 1 for seven stations out of 15 for both compounds, meaning that the residuals are within the range of the assigned observation uncertainty. Among the stations with χ 2 values greater than 1, HFM stands out and likely we assigned too small uncertainties to this station.
In order to better visualize the improvement on the seasonal cycle, we compare in Figure 5 the simulated a priori and a posteriori concentrations against observations at three sites: BRW, NWR and LEF. These time series have been detrended 5 beforehand to retain the seasonal cycle. At BRW, the inversion has corrected the too low seasonal amplitude and the phase lag in the a priori concentrations within the range of observation uncertainties. At LEF, the a priori concentrations were already in good agreement with the observations and the inversion has not improved the simulated concentrations much. However, at NWR, the inversion struggles to correct the advanced phase, especially in the CO 2 simulations, consistent with a χ 2 greater than 1. One likely explanation is that our biome-scaling approach with one coefficient per PFT is too coarse to correct the  Table 6 summarizes our top-down assessment of the COS and the CO 2 budgets. The inversion doubled the COS oceanic emissions to 530 GgS.yr −1 . Given the missing source in the reference fluxes, the ocean dominance in the measurement footprints, and the efficient reduction of the global error by 90%, the increase of oceanic emissions is an expected behaviour of the Bayesian inverse system. In contrast, the inversion marginally decreased the total soil and vegetation absorption likely due 5 to the seasonal constraints. Following a decrease of 7 GtC.yr −1 of the GPP to match the COS constraint, the respiration has decreased by 10 GtC.yr −1 in order to keep a land carbon sink in agreement with the global atmospheric CO 2 budget. Thus, on a global scale, the inversion seems to have corrected the overestimated prior atmospheric trend by a larger decrease in respiration than in GPP. All residuals between the total prior and the posterior fluxes are within the assumed 1-sigma range of the prior uncertainty, except for respiration, where the increment is twice as large as the standard deviation. The residuals are 10 even much smaller than the prior standard deviation for the anthropogenic and the biomass burning emissions, suggesting that we could have narrowed the initial errors for those components.

Optimized fluxes
The total oceanic COS emission remains lower than previous top-down studies using different configurations and observations, which instead estimated an oceanic source between 700 and 1000 GgS.yr   Table 3 (Lennartz et al., 2017(Lennartz et al., , 2020a. However, COS emissions through DMS oxidation in pristine marine environment, could play 10 a role in sustaining this tropical source. Over the northern and southern oceans, high emissions in our reference oceanic flux from Lennartz et al. (2017) mainly arise from the direct oceanic emissions (see Fig. 3). The latter could be overestimated: the COS concentrations simulated by the ocean box model are higher than most of the measurements made in sea waters sampled over different parts of the globe (Lennartz et al., 2017). This remark supports the inversion decrease of the oceanic emissions over the mid and high latitudes. The decrease beyond 50 • towards the poles also reflects a seasonal cycle in COS sea water 15 concentrations of a much lower amplitude than the one in atmospheric COS in the marine boundary layer (Lennartz et al., 2020b). This strong marine seasonal cycle is not attenuated enough by mixing processes within the boundary layer and the inversion weakened the oceanic release to match the seasonal cycle in atmospheric COS concentrations at BRW and ALT. In particular, the emissions in the northern high latitudes have been suppressed in summer to correct the late peak in the time series at BRW on Figure 5. While oceanic emissions decrease in the high latitudes, the terrestrial sink tends to increase. The 20 change in terrestrial sink is mainly attributed to vegetation (see Fig. S4). The change in soil fluxes goes in the same direction as the change in COS vegetation uptake. Regarding the impact on the CO 2 budget, Figure 7 shows the latitudinal distribution of the net CO 2 vegetation fluxes defined as the sum of respiration and GPP before and after inversion. The inversion has increased almost threefold the net vegetation absorption above 50 • N. This response is a common feature of the current inverse systems which, by assimilating 5 CO 2 measurements only into an atmospheric transport model, infer a higher net vegetation sink in the high latitudes than land-surface models. Indeed, in Fig. 8 of Friedlingstein et al. (2020), the net land sink (above 30 • ) calculated as the average of 17 process models is between 0.5 GtC/y and 1.5 GtC/y whereas the flux calculated from 6 inverse systems is between 1 and 2.5 GtC/y averaged over the last ten years. More specifically, Figure 8 illustrates how the inversion changes the seasonal cycle of GPP and respiration within each of the 15 PFTs of the ORCHIDEE model. The changes in the global total per PFT 10 are shown separately in the supplementary material (see Fig. S4). In the tropics within PFTs 2 and 3 (Tropical Broad-leaved Evergreen and Raingreen Forests, see Table 1), the inversion decreased GPP by about 4 GtC.y −1 whereas respiration lost 1 GtC.yr −1 , leading to a small source of CO 2 . In the mid-latitudes (PFTs 4, 5 and 10, Table 1), the inversion weakened GPP and respiration by 5 GtC.yr −1 and 2 GtC.yr −1 , respectively. The second salient change is an increase in CO 2 absorption within the high latitudes covered by PFTs 7, 8, 9 and 10 (see Table 1). Indeed, GPP increased by almost 2 GtC.y −1 while respiration only decreased by 0.2 GtC.yr −1 in total. The increased GPP over the boreal latitudes explains the larger seasonal cycle of the a posteriori COS and CO 2 concentrations at sites BRW and ALT. The comparison of GPP and respiration from 5 ORCHIDEE against eddy covariance measurements at several sites around the globe pointed at an underestimation of these components, consistent with our inversion results (not shown). A complete validation of this ORCHIDEE version will be the topic of a future publication.

Comparison with independent observations
3.3.1 Evaluating the seasonal cycle with SIF data 10 In order to assess the realism of the a posteriori GPP, its seasonal cycle is compared with seasonal cycle of the GOME-2 SIF product. Although the ecosystem-dependent bias in the SIF products makes a direct comparison with GPP impossible, SIF has been recognized as an good indicator of the temporal dynamics in GPP. At the ecosystem scale, SIF is anti-correlated with the GPP: a maximum in SIF corresponds with a minimum in GPP. Figure 8 superimposed the maximum of the SIF on the GPP seasonal cycle. The normalized SIF seasonal cycle is further shown on Fig. S6. Ideally, the maximum coincides with 15 the minimum of the GPP seasonal cycle. Overall, the inversion has not altered the timing of the COS seasonal depletion.
The optimized seasonal cycle disagrees with the SIF satellite retrievals within PFT 2 (Tropical Broadleaved Evergreen), PFT 3 (Tropical Boreal Raingreen Forest) and PFT 14 (Tropical C3 grass), questioning the realism of a weaker CO 2 and COS absorption over the tropics. Within PFT 2, the inversion tends to produce a seasonal signal in opposite phase with SIF. In the mid-latitudes, the seasonal phase of GPP is slightly degraded within PFT 4 (Temperate Needle-leaved Evergreen Forest) while 20 it is improved within PFT 12 (C3 Agricultural Land). In the high latitudes, the phase of the seasonal cycle, which was in quite good agreement with the SIF in the GPP a priori, has not been altered by the inversion. To conclude, the atmospheric inversion does not lead to a clear improvement in the representation of the GPP seasonal cycle.

Comparison with independent atmospheric observations
As a second step, we assess the a posteriori concentrations using several datasets : the MIPAS satellite retrievals, the HIPPO 25 airborne measurements and the surface measurements over Japan and France (see section 2.2). In particular, the MIPAS retrievals of COS atmospheric concentrations at 250 hPa in the tropics give insight into the magnitude of the main biospheric sink located over Brazil during the wet season, when convective air masses reach the upper troposphere (Glatthor et al., 2017).
First, Figure 9 shows the a posteriori and a priori COS seasonal concentrations at 250 hPa, convolved with the MIPAS averaging kernels and averaged over the period 2009-2012. We see that the inversion reduced the RMSE by more than one third Hemisphere are shown for PFTs 4, 5, 6, 10, 11, 12 and 13. The identifiers of the PFTs are described in Table 1. The acronyms Tr, Bo and Te mean Tropical, Boreal and Temperate, respectively. oceanic emissions). Such an increase is consistent with Glatthor et al. (2015), who also needed to multiply the vegetation sink and the oceanic sources from Kettle et al. (2002) by 4 to better match the MIPAS retrievals. However, there are some remaining deficiencies. In particular, the COS depletion observed between Brazil and Africa is well reproduced but its amplitude is slightly underestimated. The simulated COS concentrations are also too small over the Pacific Ocean. The reasons could be an underestimation of the tropical emissions or a too homogeneous distribution of these emissions through the longitudes. We 5 have to remember that we have optimized a single factor for the oceanic emissions over the whole tropical band and thus the spatial gradients within the tropical band have not been optimized. This could explain the lack of variability over the ocean.
Over the mid-latitudes, the smaller concentrations in spring point at a too weak terrestrial sink or too strong oceanic emissions.
The lack of stratospheric COS loss could also be responsible for these underestimated concentrations since they are close to the tropopause near 60 • .   We further assess the latitudinal distribution of the COS sources and sinks given by the inversion with the help of the HIPPO airborne measurements. For this purpose, Figure 10 compares the inter-hemispheric gradient in the a posteriori and a priori COS and CO 2 concentrations against the HIPPO airborne measurements. We have verified beforehand that the transport model performs well at sites LEF and THD (see Fig. S7) whose continental and coastal locations respectively emphasize transport errors. The representation of vertical mixing is indeed crucial for continental sites (Geels et al., 2007) such as LEF 5 whereas coastal sites such as THD are difficult to represent in coarse resolution models (Riley et al., 2005). Given the good agreement between modelled and observed vertical profiles at these two sites (see Fig. S7), transport errors are assumed here to be of secondary importance compared to the uncertainty in the fluxes and differences between the concentrations apriori and aposteriori are ascribed to differences in the surface fluxes. Figure 10 shows that the a posteriori better matches the observed latitudinal distribution. Especially, the shared positive bias in the northern latitudes between COS and CO 2 has been corrected (see Supplementary material). In contrast to the Ma et al. (2021) top-down study, there is no significant negative bias in the COS vertical profiles here (see Fig. S7-11 Zumkehr et al. (2018) inventory. The site located in middle Japan, YOK, has a simulated concentration of almost 100 ppt higher than the observations. This implies an error in the inventory which indicates a source above the site (see Figure S9).
As for the southern site MIY, the model underestimates the COS concentration by 100 ppt, pointing at an underestimation of the anthropogenic sources over the eastern edge of China or Korea.
In summer, sites YOK and OTA sample air masses coming both from continental Japan and from the Pacific Ocean at the 15 East of Japan. The southernmost site MIY seems to be mostly affected by oceanic sources originating from the east (see the LMDz footprints on Figure S12). The sites OTA and YOK overestimate the COS concentrations by 60 and 150 ppt and reflect the influence of the misplaced anthropogenic source in center Japan ( Figure S13). At MIY, the comparison with observations suggests that the oceanic source is too strong because the atmospheric concentrations are overestimated by 40 ppt in southern Asia and in northern Japan. However, the oceanic source may not be overestimated in Southern Asia because we have assumed to a decrease in surface COS mixing ratio of 40 ppt in the vicinity of Japan. Also, there is an oceanic hot spot located in the footprint of the site (see Figure S13) which might not be reliable.
The spatial pattern of the Zumkehr et al. (2018) inventory seems to show too strong sources over Japan and too weak sources in the eastern edge of China. The inversion system could therefore have compensated the lack of anthropogenic source in the eastern part of China by increasing the oceanic source. However, it is difficult to extrapolate conclusions drawn from a specific 5 region to a larger scale. There is also no clear indication that the oceanic sources are overestimated eastward of Japan.
Finally, we perform a similar assessment of the optimized COS fluxes in winter at station GIF in France. The footprint of the station covers central France and countries at the eastern edge such as Belgium and the eastern part of Switzerland (see Figure S14). The confrontation of the posterior concentrations against measurements serves at evaluating the Zumkehr et al. (2018) anthropogenic inventory and, in particular, its spatial distribution over central France since the terrestrial sink is assumed 10 to be much smaller. Station MHD provides very small constraints over France and Eastern Europe as its footprint is mainly oceanic. The comparison between the posterior concentrations and atmospheric measurements on Figure 11c indicates that the anthropogenic sources within the footprint of the station are also overestimated: the a posteriori concentrations are more than 130 ppt higher than the one observed. This confirms the study of Belviso et al. (2020) which reported a misplaced hot-spot on Paris (see Fig. S14). In reality, the concentrations at GIF are 10 ppt lower than the one at the background MHD, reflecting a 15 dominant influence of the biospheric sink in this season.

Discussion and perspectives
To conclude, there is an need of continuous in-situ carbonyl sulfide observations. The lack of continuous in-situ observations, especially over the tropics, limits our capacity to infer the COS surface fluxes by inverse modelling and therefore to optimize GPP. There is some hope that new satellite products could address this issue but at this stage, current COS retrievals have also 20 their limitations such as, for instance, cloud interference or the lack of sensitivity to the surface fluxes (Glatthor et al., 2017;Kuai et al., 2015b;Vincent and Dudhia, 2017). Letting aside this obvious lack of observations to be assimilated, we are now discussing the way forward to improve our knowledge of the COS budget.
-Improving the anthropogenic inventory The inverse system has weakened the global anthropogenic source by almost 20 %. It is unclear whether this decrease results from an overestimation of the global emissions or from misplaced hot-25 spots within the footprints of the assimilated stations. For instance, the overestimated concentrations in the model at a site located in middle Japan point to a misplaced hot-spot near the station. If these measurements were assimilated, the inverse system would tend to produce an unrealistic negative flux increment over the area to match the observed concentrations. A similar inconsistency has been reported between measurements at the Gif-sur-Yvette background site and the hot-spot to the north, over Paris, stated in the Zumkehr et al. (2018) inventory. Thus, the reported hot spot 30 locations and magnitudes must be improved to be able to benefit from these new observations at Gif-Sur-Yvette and in Japan. Further work should includes a more thorough evaluation of European anthropogenic sources using COS retrievals from Fourier transform infrared spectrometry (Wang et al., 2016;Krysztofiak et al., 2015) along with a high-  resolution (e.g., 0.5 • ) chemical transport model in order to correct the spatial distribution of these emissions. Samples in industrial facilities could also be made to validate anthropogenic inventories. Currently, due to large uncertainties in the emission factors and the use of a proxy for spatial disaggregation, the anthropogenic inventory is more appropriate for interpreting atmospheric COS measurements from background sites like MLO than atmospheric COS measurements which have a significant influence from nearby emissions (e.g. Japan/YOK).

5
-Improving the relationship between COS plant uptake and GPP. For the LRU values, we have only made a distinction between C4 and C3 plants. A complementary experiment would be to optimize a set of LRU coefficient for each PFT together with the GPP fluxes. We plan to include the PFT dependence of the LRU by using the LRU dataset of Maignan et al. (2020) derived from a mechanistic vegetation model, and for which conductances will be further tuned with eddy-covariance flux measurements. LRU absolute values are indeed critical. For instance, if the LRU were larger at high latitude, the inversion would not need to increase the GPP as much. However, LRUs have been estimated to be lower in the boreal ecosystems (around 1 and 1.8 for Maignan et al. (2020) and Seibt et al. (2010) respectively) than in the tropical and temperate ecosystems (around 1.3 and 3 for Maignan et al. (2020) and Seibt et al. (2010) respectively).
So, using another existing LRU dataset will likely lead to a comparable GPP sink in the high latitudes. Another simpli-5 fication of our study is that the current LRU approach does not take into account the feedback between COS vegetation sink and atmospheric concentrations. The atmospheric concentrations vary on seasonal and interannual timescales but have been indeed considered constant per latitudinal band. Such a feedback might have significant impact on the total vegetation sink (see Fig. S18), in particular over the Amazon. Thus, refining our inverse system would require including the feedback between the atmospheric concentrations and the COS vegetation sink (first order approach). This will 10 involve representing the sharp drop of COS between the canopy and the boundary layer, which can reach 70 ppt in redwood forests (Campbell et al., 2015). However, current global models do not represent the turbulence within the canopy and the link with the atmospheric boundary layer, which does not allow to correctly simulate the vertical gradient of concentrations between the lowest layer of an atmospheric model and the canopy. Some promising developments were made with the ORCHIDEE LSM (Naudts et al., 2015) but more research is needed before they can be used for our 15 application.
-Increasing the realism of the soil fluxes The GPP estimate strongly relies on the realism of the soil fluxes. The soil fluxes need to be more constrained and their errors better defined. In particular, more attention should be paid to the seasonality of soil fluxes compared to the one of the vegetation fluxes in the field measurements. For instance, this would help to know whether the two months-lag between the soil and vegetation fluxes in the high latitudes is realistic.

20
-Improving the prior COS oceanic fluxes with the help of an ocean model. Prior oceanic emissions are probably overestimated in the high and mid latitudes as shown by Lennartz et al. (2017Lennartz et al. ( , 2020a and as suggested by the inverse system. found erroneous. Future work should include the correction of the rate in the NEMO-PISCES ocean model and also the implementation the oceanic physical processes responsible for the CS 2 emissions. Moreover, the single factor used to convert DMS into COS is very uncertain and may not apply to any atmospheric conditions (Von Hobe, 2020, personnal communication). Since there is so much DMS emitted by the ocean (ca. 28 TeragramS per year), a small change in the conversion factor (e.g. from the current 0.7% to e.g. 2.5%) could make a large difference. When the relative contribution of indirect COS sources to total ocean emissions is better known, an extension of this work could be to optimize each oceanic process separately.
-Implementing a complete chemistry of COS into the LMDz atmospheric transport model For an economy of computation time, we have assumed that the DMS and CS 2 oxidation into COS happens instantly in the atmosphere. 5 However, Ma et al. (2021) showed that such simplifications could modify the average COS surface concentrations up to 80 ppt over Eastern China and Japan in winter. These chemical reactions need be implemented in the LMDz atmospheric model in order to properly evaluate the Zumkehr et al. (2018) inventory with the help of COS atmospheric measurements.
The lifetime of the DMS, CS 2 and, to a lesser extent, COS into the atmosphere depends on the realism of the OH fields.
Therefore, the impact of their uncertainty on the inverse results needs also to be quantified. Chemical transport models 10 disagree on the spatial distribution of the OH fields and using other OH fields could significantly alter the COS budget as it was demonstrated for the methane budget (Zhao et al., 2020b, a). In addition, we plan to introduce the stratospheric chemistry of COS into the LMDz atmospheric transport model. The implementation of a complete chemistry while keeping a multi-year inversion window requires using a variational approach: the chemical reactions are indeed more difficult to implement in an analytical inverse system using pre-computed Jacobian matrices.  , 1986;Chiodini et al., 1991;Symonds et al., 1992;Sawyer et al., 2008;Notsu and Mori, 20 2010) could be mapped and given as an input to the atmospheric model. Likewise, DMS emissions from vegetation, tropical forests, soil and wetlands (Yi et al., 2008;Kanda et al., 1992;Minami et al., 1993) have not been included although their contribution to the total DMS release have been estimated in the past between 2 and 15 % (Watts, 2000;Gondwe et al., 2003). Moreover, we have neglected the COS emissions from the anoxic soils that might be a part of the missing tropical source, in particular within the waterlogged soils of the rice paddies (Yi et al., 2008). 25

Conclusion
We have developed an analytical system that optimizes GPP, plant respiration CO 2 flux and COS soil fluxes within the 15 PFTs defined in the ORCHIDEE terrestrial model, enabling to take into account the ecosystem-dependence of the fluxes. The LRU approach was used to link the GPP to the COS plant uptake. With this system, we have performed a joint assimilation of CO 2 and COS atmospheric measurements into the LMDz atmospheric transport model for the period 2008-2019. From a 30 technical point of view, the inverse system is able to find the components of the CO 2 and COS budgets that give a good fit with assimilated measurements. Inverse results point at a large oceanic COS source between 450 and 600 GgS.yr −1 , most of it located in the tropics. The inversion leads to a GPP increase of a few GtC in the high latitudes and a decrease in the same order of magnitude in lower latitudes (tropics and mid-latitudes) compared to the initial prior estimates from the ORCHIDEE LSM.
For COS, this means a vegetation sink of around -620 GgS.yr −1 , which is in the lower range of recent estimates based on topdown approaches (Launois et al. (2015b): -663-772 GgS.yr −1 , Ma et al. (2021): -557-1053 GgS.yr −1 ). The soil sink and the anthropogenic sources have both decreased and amount to -210 and 335 GgS.yr −1 , respectively. Biomass burning emissions 5 have been slightly revised upward to 65 GgS.yr −1 . Compared to GPP, plant respiration has almost not been affected in the high latitudes whereas its total value has decreased by only one quarter of the change in GPP in lower latitudes. The resulting CO 2 biospheric fluxes, defined here as the sum of the respiration and GPP, has lost 2 GtC.yr −1 above 30 • N compared to the prior fluxes simulated by the ORCHIDEE LSM. This behaviour is shared by current inverse systems which infer the net CO 2 fluxes from atmospheric CO 2 measurements (Friedlingstein et al., 2020).

10
Several aspects of the inferred COS fluxes, such as the inter hemispheric gradient, the tropical spatial distribution, the anthropogenic emissions over Japan, China and France, were evaluated with independent atmospheric measurements over different parts of the globe. In the tropics, independent observations of the upper-troposphere COS partial column and the SIF weaken our confidence in the change in tropical GPP; the inverse system actually lacks measurements in this area to ensure a robust partitioning between the oceanic and the continental components of the COS budget. Indeed, the footprint 15 map of the assimilated measurements indicates that the tropical areas, in particular the continents, are poorly constrained by the inverse system. The inverse system partly relies on the terrestrial reference fluxes and adjust the tropical source to match the surface measurements over the tropics. If the tropical oceanic release is indeed underestimated in the reference fluxes, its magnitude remains highly uncertain. In contrast, in the high-latitudes, independent measurements suggest that the inversion has rightly corrected an underestimation of the GPP in the ORCHIDEE land surface model. Concerning the COS anthropogenic 20 sources, Japanese measurements suggest that these are underestimated in Eastern China, highlighting the need for an improved anthropogenic inventory.
Acknowledgements. This study was funded by the CO2 Human Emissions (CHE) project which received funding from the European Union's Horizon 2020 research and innovation program under grant agreement no. 776186. The authors kindly thank the scientists who provided the measurements used in this study. In particular, the MIPAS averaging kernels were provided by Michael Kiefer. The surface measurements sulfide (OCS) and carbon disulfide (CS2): a compilation of measurements in seawater and the marine boundary layer, Earth System Science Data, 12, 591-609, https://doi.org/https://doi.org/10.5194/essd-12-591-2020, https://www.earth-syst-sci-data.net/12/591/2020/, publisher: Copernicus GmbH, 2020b.