The 2015–2016 carbon cycle as seen from OCO-2 and the global in situ network

Abstract. The Orbiting Carbon Observatory-2 has been on orbit since 2014, and its global coverage holds the potential to reveal new information about the carbon cycle through the use of top-down atmospheric inversion methods combined with column average CO2 retrievals. We employ a large ensemble of atmospheric inversions utilizing different transport models, data assimilation techniques, and prior flux distributions in order to quantify the satellite-informed fluxes from OCO-2 Version 7r land observations and their uncertainties at continental scales. Additionally, we use in situ measurements to provide a baseline against which to compare the satellite-constrained results. We find that within the ensemble spread, in situ observations, and satellite retrievals constrain a similar global total carbon sink of 3.7±0.5 PgC yr−1, and 1.5±0.6 PgC yr−1 for global land, for the 2015–2016 annual mean. This agreement breaks down in smaller regions, and we discuss the differences between the experiments. Of particular interest is the difference between the different assimilation constraints in the tropics, with the largest differences occurring in tropical Africa, which could be an indication of the global perturbation from the 2015–2016 El Niño. Evaluation of posterior concentrations using TCCON and aircraft observations gives some limited insight into the quality of the different assimilation constraints, but the lack of such data in the tropics inhibits our ability to make strong conclusions there.



Introduction
Understanding the global carbon cycle and how it responds to human and natural forcing is 20 a first order requirement for predicting the future trajectory of Earth's climate (Friedlingstein et al., 2013). Our current understanding is embodied in models of the oceans and land biosphere, which characterize processes such as photosynthesis, respiration, nutrient uptake and transport, fire, and chemical cycling, as well as fossil fuel inventories. Measurements of CO 2 dry air mole fraction in the atmosphere serve as an integral constraint on the sum 25 of these in the form of a net flux of CO 2 to and from the atmosphere at the surface.

2
Many studies have used atmospheric transport models in conjunction with in situ CO 2 observations to infer surface fluxes of CO 2 (Gurney et al., 2002;Baker et al., 2006a;Peters et al., 2007;Chevallier et al., 2010a;Schuh et al., 2010;Feng et al., 2011;Basu et al., 2013;Deng et al., 2014;Lauvaux et al., 2016;Feng et al., 2017) but that the uncertainty in these estimates grows quickly as we move downscale in space and time, particularly for 5 regions in the tropics and southern hemisphere. This is partially due to the errors present in coarse global transport models, and partially due to a paucity of observations outside of North America and Europe.
To improve upon the sparse spatial coverage provided by the in situ CO 2 network, estimates of column-averaged CO 2 mole fraction (X CO2 ) have been derived from a variety of 10 satellite-based instruments. X CO2 can be retrieved from high spectral resolution measurements of reflected sunlight. The first space-based instruments designed for this application include ENVISAT SCIAMACHY (Buchwitz et al., 2005), Greenhouse gases Observing SATellite (GOSAT) TANSO-FTS (Kuze et al., 2009), and Orbiting Carbon Observatory-2 (OCO-2) spectrometer (Crisp, 2015). vational constraints on the carbon cycle. At this time, however, there are only a few publications that utilize the OCO-2 retrievals explicitly for top down flux estimation (Liu et al., 2017). In this work, we investigate the constraint on surface fluxes of CO 2 provided by OCO-2 using an ensemble of atmospheric transport inversion frameworks. By characterizing the impact were performed, including a description of the data assimilated and data that was used to evaluate the results. Section 4 presents optimized flux estimates and uncertainties from global to regional scales, along with evaluation using independent data, and discusses implications for our understanding of the carbon cycle. Section 5 examines the results in a broader context and suggests a few ways forward to reduce the remaining uncertainties. 5 Finally, Section 6 provides a summary and overall conclusions.

GOSAT
The Thermal And Near-infrared Sensor for carbon Observation (TANSO) aboard GOSAT is a Fourier Transform Spectrometer (FTS) that measures radiances in the near-infrared 10 (NIR), shortwave infrared (SWIR), and thermal infrared (TIR) bands. The NIR and SWIR bands are used to retrieve X CO 2 at a spatial scale of approximately 100 km 2 . GOSAT retrievals have been analyzed by a variety of teams using different schemes for retrieving column CO 2 from the measured radiances (Takagi et al., 2014).
GOSAT X CO2 retrievals have been used in global CO 2 flux inversions by a number of 15 groups. Houweling et al. (2015) compared results from a number of modeling frameworks for 2009-2010 and found that the GOSAT retrievals constraint resulted in a strong annual sink of 1.0 PgC in Europe, in agreement with Reuter et al. (2014) and Reuter et al. (2017), which was balanced mainly by outgassing in Northern Africa. Biases in the GOSAT retrievals were determined to be a potential cause of the large European sink obtained (Feng

OCO-2
OCO-2 measures radiances in the spectral bands near 0.765µm, 1.61µm, and 2.06µm. These radiances are returned as 8 distinct soundings across a narrow swath no wider than 10 km. Each sounding has a spatial footprint that is less than 1.29 km by 2.25 km projected onto the surface. This fine spatial resolution is expected to increase the number 5 of cloud-free scenes, and thus allow more successful retrievals with lower errors, as clouds are known to be a source of error in retrievals (O'Dell et al., 2018b). Additionally, this high spatial resolution permits the detection of some systematic biases which can appear as a set of unrealistically-varying X CO 2 over so-called "small areas" (O'Dell et al., 2018b). OCO-2 flies in the EOS Afternoon Constellation (A-Train) with a 705 km sun-synchronous orbit 10 and equator crossing time between 1:21 pm and 1:30 pm local time. The A-Train orbit has a 16-day ground track repeat cycle, which allows for complete global X CO2 coverage twice per month, with approximately 150 km horizontal offsets between nearby revisiting orbits.
Observations are made in one of three modes: nadir (looking at the sub-satellite point), directed toward the solar glint spot, or in the so-called target mode. 15 Both OCO-2 and GOSAT have been extensively evaluated against the Total Carbon Column Observing Network (TCCON) (Wunch et al., 2017). These validation activities reveal systematic errors in both data sets that must be removed using empirical corrections (Wunch et al., 2011). Even after bias correction, Wunch et al. (2017) demonstrated significant residual bias in the OCO-2 Version 7 glint soundings taken over the high southern 20 latitude oceans. The land nadir and land glint observations contain residual bias (Wunch et al., 2017), but the magnitudes and spatial patterns of that bias are difficult to detect at regional scales with the TCCON network alone. Comparisons to in situ-constrained models clearly highlight some of these differences, but it is difficult to distinguish between bias and real signal in regions with sparse data density. 25

Flux Estimates with Satellite Observations
In addition to Houweling et al. (2015), numerous other studies have demonstrated that inference of fluxes with atmospheric transport inversions, or "top-down" estimates, can be sensitive to both modeled transport (Gurney et al., 2002;Baker et al., 2006a;Stephens et al., 2007;Houweling et al., 2010;Chevallier et al., 2010b;Nassar et al., 2011;Deng 5 et al., 2015;Basu et al., 2018;Schuh et al., 2018) as well as assimilation technique (e.g. Peylin et al. (2013)). The covariance of errors due to seasonal sampling and transport has been studied in a series of idealized simulation experiments by Basu et al. (2018), who reported that this can be a significant source of error that may not be reflected in the spread for inversions constrained with OCO-2 retrievals. For example, Figure 5 in Basu et al. (2018) 10 shows that for the boreal regions, the efflux due to the onset of senescence in the fall is overestimated with the OCO-2 retrievals by more than 0.1 PgC per year, but the spread in flux estimates due to transport is insufficient to differentiate between models and source data. Additionally, Schuh et al. (2018) showed that vertical and meridional mixing differences between two widely used transport models, TM5 and GEOS-Chem, lead to large differences 15 in the inferred northern hemisphere meridional gradient, particularly when separated along the storm track in the Northern Midlatitudes. These findings, as well as those of Peylin et al. (2013) and others, show that inference using a single model is problematic, and an ensemble of models with varying transport, prior fluxes, and data assimilation methodologies gives an estimate of the sensitivity of inferred flux to the assumptions spanned by the ensemble 20 of models.

Experimental Design
The work reported here emerges from a large model intercomparison project (MIP) organized by the OCO-2 Science Team in order to understand how flux estimates using OCO-2 retrievals and in situ measurements depend on 1) transport, 2) data assimilation method- 25 ology, 3) prior flux (and its associated uncertainty) and 4) systematic errors in the OCO-2 6 retrievals. The OCO-2 MIP is composed of modelers using four different transport models with varying configurations, multiple different data assimilation frameworks, and diverse prior fluxes and uncertainties. This information is summarized in Table 2 and detailed in the supplementary information. We treat the scatter in the posterior fluxes across this ensemble induced by variability across these parameters as a proxy for the uncertainty in optimized 5 fluxes.
In order to control the drivers of ensemble spread, several assumptions for the different modeling efforts were standardized. The OCO-2 MIP team utilized a standard 10s average X CO 2 values for the time period from September 6, 2014 through April 1, 2017, with appropriate model-data mismatch values as described below to avoid spread due to data 10 handling. P13 noted a difference in flux estimates due to different assumed fossil fuel emissions, which are not typically optimized in global top down studies. To avoid this, all group members utilized the same fossil emissions, namely the Open-source Data Inventory for Anthropogenic CO 2 monthly fossil fuel emissions (ODIAC2016; Oda and Maksyutov (2011), Oda and Maksyutov (Reference Date: September 23, 2016), Oda et al. (2018)) together 15 with the TIMES diurnal and weekly scaling (Nassar et al., 2013). The OCO-2 MIP results are connected to other modeling studies such as Transcom (Gurney et al., 2002) and REC-CAP (Peylin et al., 2013) through another set of inversions that were performed by each group using a standardized set of in situ measurements (described below). 20 This work utilizes the Version 7 retrospective (V7r) OCO-2 retrieval dataset with a few modifications. The V7 dataset was released in late 2015 and was the first retrieval version from the OCO-2 mission with the precision and accuracy in X CO 2 required for scientific use. Initial work with these retrievals indicated a residual bias that was correlated with regions of high albedos in the 2µm band and relatively low albedos in the O 2 A-band. An additional 25 correction was added to reduce the effects of this "s31" bias, which is related to the signal to noise ratio in the O 2 band vs. the strong CO 2 band. The fine-scale detail contained in individual OCO-2 retrievals is not resolvable by global transport models, which provide CO 2 values for large grid boxes that are at least 100km in each dimension, with specific values given in Table 2. Rather than ingesting each OCO-2 retrieval falling inside a model grid cell separately, we compute a single representative retrieval value for a grid cell with appropriate uncertainty and assimilate that single value. The appropriate uncertainty to assign that representative retrieval is a function of the number of soundings it represents, their indi-5 vidual uncertainties, representativeness of soundings for the grid box, and the correlations between their individual errors. Since different models use grid boxes of different sizes, we grouped individual retrievals into 10-second bins (groundtrack swaths of 67 km in length), and we assume that the uncertainties between different 10s averages are independent. This assumption is in line with the conclusions of Worden et al. (2017). The spatial scale 10 represented by the 10s averages is small enough to provide enough detail for the highest resolution global models included in this study. The OCO-2 10s sounding locations for nadir and glint retrievals are shown in the top row of Figure 1.

OCO-2 retrievals
Each 10s average consists of a single observing geometry (glint or nadir). In line with the conclusions of Wunch et al. (2017), the ocean glint retrievals are not assimilated due to 15 poorly understood biases, particularly in the high southern latitudes. All OCO-2 experiments detailed in the Results and Discussion sections assimilate land glint and land nadir retrievals only.
We de-emphasize soundings that are taken close together in time and space, since their errors are likely to be strongly correlated. In the absence of a good description of spatial 20 error correlations, we 1) averaged the retrievals into 1-second bins along track (6.7 km) and then 2) averaged all 1s spans with good retrievals within the 10s span to get the 10s values for a given observation geometry. The weighting of each individual value within the 1s and 10s spans is done according to the uncertainty in each sounding, so that assimilating the summary value will give the same result as assimilating the individual values separately (as- 25 suming they are independent), although we assign an uncertainty to each aggregate value that is higher to reflect the fact that errors in the individual retrievals are highly correlated, and to account for transport errors.
Computing the 1s averages: We first select only those retrievals in the OCO-2 Lite files (from the "lite_test_20170410" build) with "good" retrievals according to the "xco2_quality_flag" variable. An inverse variance weighted average (IVE) of many of the variables in the Lite files (time, latitude, longitude,surface pressure, prior, retrieved and bias-corrected X CO 2 , averaging kernel vector, CO 2 vertical profile, pressure weighting function, and independent variables used as part of 5 the bias correction procedure to screen and correct the retrievals) is computed from these selected retrievals across each 1s span as follows: where X CO 2 denotes the 1s average, X CO 2 ,i are the values from each sounding, and σ i are the uncertainty in X CO 2 ,i for each shot (from variable xco2_uncert). If each shot in the span 10 were independent, X CO 2 would have a theoretical uncertainty of: where the uncertainty of the average drops approximately by √ N , where N is the number of shots in the average. However, since we believe the X CO 2 retrievals in the small area viewed inside one second are actually highly correlated, we instead use an average uncertainty of 15 the N shots to represent the uncertainty of the average: Because even this average uncertainty is sometimes too low (since it captures only the random estimation errors in the retrieval and not any systematic errors), we compare it to the standard deviation of all retrieved X CO 2 in the 1s interval, denoted by σ spread , as well 20 as to a minimum uncertainty threshold (for those cases in which there are too few shots 9 to compute a realistic spread), denoted σ f loor , and we then set the uncertainty for X CO 2 , denoted byσ, to be the maximum of σ IV E,1s , σ spread , and σ f loor .
Computing the 10s averages: 10s average values are computed across all 1s spans j with valid retrievals again as the IVE: 5 Again, we compute the average uncertainty as: where J is the number of 1 s values in the sum (just those with good data available). An additional uncertainty representing the variability across models at the OCO-2 sounding locations, denoted σ model is added in quadrature to σ IVE,10s , and this value is treated as the 10 uncertainty for the 10 s average X CO 2 , which is often referred to as the model-data mismatch (MDM) uncertainty. The MDM is effectively a weighting factor for each retrieval, with small values representing retrievals with the greatest expected utility in the assimilation.

In situ CO 2 measurements
CO 2 measurements collected in flasks or by continuous analyzers at surface, tower, and 15 aircraft sites are an important anchor for this exercise because their error characteristics are generally well-known, being directly established via calibration traceable to WMO standards. Additionally, these measurements provide traceability to a long history of flux estimates derived from these data as an atmospheric constraint. The in situ measurements used in these simulations come from the GLOBALVIEW+ project, and from a system de- 20 veloped for this project to deliver near-real time (NRT) CO 2 measurements, with spatial locations depicted in Figure 1. Both of these efforts are coordinated by collaborators at 10 NOAA Earth Systems Research Laboratory (ESRL). Each August, the GLOBALVIEW+ project publishes a collection of CO 2 measurements from academic and institutional data providers covering the previous calendar year. Measurements for this study were compiled from the GLOBALVIEW+ 2.1 and 3.1 (Cooperative Global Atmospheric Data Integration Project, 2017) releases. As of version 3.1, GLOBALVIEW+ contains more than 14 million 5 individual measurements of CO 2 in 353 datasets from 46 contributing laboratories, spanning the time range 1957 to 2016.
Several international measurement networks and campaigns are able to provide CO 2 observations with little or no delay, and NOAA has collected and published these measurements from many different sites in the "Near Real Time" (NRT) format. Because many 10 laboratories are not configured to deliver measurements in near-real time (NRT), there are many fewer datasets available in the NRT CO 2 product. These include provisional flask measurements from NOAA surface and aircraft sites, made available as soon as laboratory analysis is complete but without final quality-control procedures. Some of the final qualitycontrol analyses require a full year's worth of data. In other cases, analysis of multiple data mismatch scheme used by the CarbonTracker project (CT2016 release; Peters et al. 2007, with updates documented at http://carbontracker.noaa.gov). This scheme is unique in that it assigns temporally-varying MDM values to account for large seasonal variability in the performance of models. Many measurements are deemed unsuitable for assimilation into models of this class, due to excessive vertical stratification during stable planetary boundary 5 layer conditions, proximity to large anthropogenic sources, the influence of complex terrain, and other reasons.

TCCON
The Total Carbon Column Observing Network (TCCON) is a global network of Fouriertransform near-infrared (FTIR) spectrometers that retrieve the column average dry air mole fraction of trace gases such as CO 2 and CH 4 by analyzing the absorption of incident sunlight. The current version (GGG2014) of X CO 2 from TCCON instruments are available at 5 http://tccondata.org/, and a summary of all sites is given in Table 1. For this work, we downloaded all TCCON retrievals available as of July 6, 2017. We filtered the retrievals for outliers and averaged them to create 30 minute average X CO 2 as follows: 1. We first filtered all retrievals by TCCON's own quality flag to select only "good quality" retrievals, and to classify them by site and date.
2. For each day at each site, we fit a function of the form α cos(ωt + φ) + β through the remaining retrievals, where t is the local solar time (LST) in hours, ω = 2π/(24hours), and α, β and φ are free parameters to be fit.
3. We calculate σ, the standard deviation of the residuals from the fit, and reject the sounding with the largest residual if it is more than 3σ away from the fit function. Then 15 we recalculate the function fit with the updated set of retrievals, and repeat until no more retrievals are being rejected by the 3σ cutoff.
4. If at any stage the number of remaining soundings in a day falls below 3, or the total time spanned by the remaining soundings falls below 1 hour, we reject all soundings for that day. 20 5. If σ >1ppm for the remaining soundings, we reject all soundings for that day.
6. Once this outlier selection is done, we reject soundings with solar zenith angle SZA > 60 • , and average the remaining soundings in 30 minute windows. The window edges are aligned to integer and half hours of the LST. The SZA is likewise averaged, and then used to look up the averaging kernel according to the TCCON prescription. 25 15 Our outlier filtering and averaging helps us create a dataset which is more appropriate for comparing to coarse resolution global models, which are unlikely to reproduce local X CO 2 fronts and high frequency features. Figure C1 shows our filtering and averaging in action on a typical day's TCCON retrievals at Park Falls.

Results and Discussion
Each posterior flux is constrained by a single observation type. Posterior flux estimates are presented for in situ observations, with locations shown in Figure 1, and OCO-2 land nadir (LN) and land glint (LG) observations only, due to the obvious bias present in the OCO-2 ocean glint observations as previously mentioned. Ocean nadir data is not provided as a 5 standard data product due to low signal to noise ratios in the nadir viewing geometry over the ocean. Unless otherwise stated, prior and posterior fluxes have fossil fuel emissions presubtracted, meaning that fluxes over land are the sum of the photosynthesis, respiration, fires, and any effects from land use changes. Details of the different modeling assumptions are summarized in Table 2, and in greater detail in Appendix A. 10 The complete collection of regional flux datasets and imagery, as well as evaluation results, can be found at the OCO-2 MIP portal, found at https://www.esrl.noaa.gov/gmd/ccgg/OCO2/i

Global Flux Estimates
Since CO 2 is conserved at the global scale in these simulations, we expect that fluxes at that scale should be well-constrained even with a modest collection of observations. As we 15 see in the left panel of the top row of Figure 2, this is the case. As the right panel shows, all observation types constrain a similar seasonal cycle with comparable peak sinks during the northern hemisphere growing season. Interestingly, this peak sink is about 0.75 PgC per month larger than that of the prior emissions, and with a smaller spread. Additionally, all observations lead to a shifted seasonal cycle in which the northern hemisphere growing 20 season begins earlier and ends earlier than assumed in the prior. All data sets produce similar annual mean non-fossil fluxes, -3.5 PgC per year to -4 PgC per year, with a standard deviation of about 0.5 PgC per year across the ensemble. Schuh et al. (2018) showed some dependence of this number upon transport model, implying that further reduction of spread is likely still possible. Additionally, the satellite retrievals suggest a slightly stronger 25 peak growing season sink in 2016 than 2015, though this is not affirmed by the in situ measurements and is within the uncertainty as seen in the model spread. The global mean separately. Land fluxes drive the patterns seen in the top row of Figure 2. The summertime drawdown is shifted earlier in the year, and the peak of the drawdown is significantly larger, relative to the prior. Global ocean fluxes are largely unchanged relative to the prior. The shaded regions that pass outside of the prior spread are driven by 3 models that use larger prior uncertainties for ocean fluxes, allowing larger flux increments from atmospheric data, 10 which indicates that the land data could provide some constraint on ocean fluxes were the prior constraint sufficiently weak. This pattern is repeated in the annual ocean fluxes in the left-hand panels.

Zonal Flux Estimates
OCO-2 observes across the sunlit portion of the Earth 14-15 times per day, spanning a large latitudinal range. This fact, combined with the general zonal structure of large scale winds in the atmosphere, suggests that the observations should constrain fluxes in zonal bands. The difference in seasonality in the northern and southern hemispheres, even in the tropics, 5 leads us to examine fluxes split by hemisphere, together with the distinction of tropics and extratropics. Figure 3 shows prior and posterior fluxes at the monthly and annual time scales in the same manner as Figure 2, but split into zonal bands: Northern Extratropics (23N-90N), Northern Tropics (Equator -23N), Southern Tropics (23S -Equator), and Southern Extratropics (90S -23S). 10 The top row of Figure 3 depict the results for the Northern Extratropics. The global seasonality patterns in Figure 2 are reproduced in the Northern Extratropics, with deeper sinks relative to the prior, and a growing season that is shifted earlier in the year. Interestingly, LG fluxes in this region have a weaker annual mean sink than the other two experiments, which is largely driven by enhanced outgassing in the autumn in 2016. OCO-2 land glint 15 observations are limited to lower latitudes during the NH winter as a result of the longer path lengths than nadir at higher solar zenith angles and high latitudes, and hence there are fewer observations during this time period to constrain the LG results than the other two experiments. Additionally, retrieval biases are expected to grow with sensor and solar zenith angles (O'Dell et al., 2018b), and thus we speculate that this extra outgassing at 20 higher latitudes is perhaps an artifact of the observations, either due to sampling or retrieval bias.
The Southern Extratropics in the bottom row of Figure 3 are characterized by very little land mass, and hence much less land retrieval data to constrain fluxes. Coupled with the fairly large uncertainty on land fluxes in this region and potential satellite bias at the 25 larger solar zenith angles, we see an unsurprising lack of agreement for each experiment's ensemble. Given the global minimization structure of modern data assimilation systems, it is possible that the fluxes in this region represent a "residual" from matching stronger data constraints in other regions, though this is difficult to test directly. We also note the similar relative differences between the modes, between the Southern Extratropics and the Northern Extratropics, suggesting that biases between modes may drive differences at high latitudes.
The Northern and Southern Tropics are displayed in the middle two rows of Figure 3. 5 OCO-2 observations have potential to significantly improve our understanding of the tropical carbon cycle, given their relatively frequent coverage in a region that is poorly observed by the existing in situ network. However, persistent cloudiness during the wet season and biomass burning aerosol in the dry season in the tropics can lead to both fewer observations and residual bias in those that occur in the vicinity of clouds and aerosols (Mer-  Figure 3, we see that the seasonal cycles resulting from the assimilation of OCO-2 data have a larger amplitude seasonal cycle than the inversions in which in situ measurements were assimilated. The differences in the peak-to-trough fluxes were determined to be statistically significant for both the Northern and Southern Tropics (not shown). OCO-2 sees a source in 2016 in the Northern Tropics, the ensemble mean and spread, and individual models may respond differently, though the comparison of individual models is beyond the scope of this work. The annual mean flux from the Northern Extratropics and tropics are expected to be strongly anti-correlated with one another across the ensemble, as atmospheric inversions attempt to match the annual growth rate in the global carbon sink. H15 found that the 5 surface flask network and GOSAT-constrained meridional gradients were indistinguishable above the ensemble spread, though there is a suggestion of a stronger tropical source. We found that the annual mean flux in the Northern Extratropics and Tropics are also of similar magnitude in the IS, LN and LG experiments when the Northern and Southern Tropics are combined, in agreement with H15. The in situ measurements used to produce the IS results 10 are different than the data used in H15, as are the time periods being studied (2009-2010 vs. 2015-2016). Nonetheless, the flux gradient between the two regions is similar between H15 and the results in our study.

Northern Extratropical Region Flux Estimates
The posterior ensembles for the IS, LN and LG experiments exhibit similar seasonality, though different annual sinks, in the Northern Hemisphere extratropical zonal band, and so we examine the fluxes there by continent to determine whether this agreement extends to smaller regions. As is apparent in Figure 4, the different experiments agree over Europe. 5 This contrasts with Houweling et al. (2015), who found that GOSAT retrievals called for a European sink that was much larger than that inferred from in situ measurements, though for a different year. North American fluxes show a more complex pattern, with the LN experiment evincing a larger drawdown in 2016 than 2015 that is not present in the other two experiments. Additionally, the annual flux for the LN experiment is less than that from the 10 IS or LG experiments. This is driven by suppressed wintertime efflux for the LN experiment. Interestingly, both sets of OCO-2 retrievals suggest a peak sink that is a month earlier than the in situ measurements for both 2015 and 2016. In both Europe and North Asia, the LG experiment yields a stronger outgassing in the autumn than the other two experiments, which has the same potential explanation as for the Northern Extratropics taken as a whole 15 that was discussed above. Interestingly, both North America and North Asia show larger sinks for 2015-2016 than is explicable by the ensemble spread present in P13, which could indicate that the sinks in these regions are growing with time, though our experiments encompass only a two year time period that is influenced by the El Niño, and further years of data are required to test this hypothesis.

Tropical Region Flux Estimates
The in situ measurements and OCO-2 land retrieval inversions give significantly different results for the two zonal bands focused on the Tropics. In order to gain further insight, we examine fluxes for six smaller regions that compose the signal for these bands to look for meridional information. These regions are subdivisions of the regions from the Transcom 3 5 project, split at the equator to avoid mixing the seasonality in the Northern and Southern Hemispheres. The results are displayed in Figure 5 and Figure 6, and demonstrate that the largest differences between the satellite-driven and in situ-driven experiments are in Tropical Africa, and that the annual fluxes for LN and LG differ most in Tropical Asia. Perhaps unsurprisingly, the flux patterns are different north and south of the equator and follow, to a 10 large extent, the phase of the mean prior, which tends towards dry season sources and wet season sinks. In Northern Tropical Africa, the difference between the in situ and satellite inversions is largely during the drier part of the year (November-March), indicating a much larger source from this region inferred from the OCO-2 retrievals than from in situ measurements. In Southern Tropical Africa, the OCO-2 experiments indicate a larger amplitude in 15 both dry and wet seasons (which anti-phased with the seasons in Northern Tropical Africa) and some indication of a shift of about a month later in the year for peak carbon efflux. The other four regions are somewhat more difficult to interpret, given the disagreement between models for any of the assimilation constraints. In particular, the different viewing modes of OCO-2 are seeing different things in Tropical South America, likely due to residual biases 20 in the observations. These differences must be interpreted in the context of the density and quality of measurements and the priors. There are more OCO-2 retrievals in this region relative to in situ measurements, but there are relatively fewer successful retrievals during the wet season due to the prevalence of clouds. Adjustments to the prior occur mainly during the dry sea- 25 son when there are more satellite measurements, although this is more true for Northern Tropical Africa; significant adjustments from the mean prior in Southern Tropical Africa occur during the wet season as well. Additionally, cloud edges could potentially bias retrievals and lead to spurious patterns in the posterior fluxes. This hypothesis is difficult to reject given the dearth of evaluation data in the tropics.
When Africa as a whole is considered, the total annual CO 2 surface emissions from OCO-2 inversions are in better agreement with bottom up estimates (e.g. Table 1 in Williams et al. (2007)) than the prior and in situ experiment flux estimates. Of further note is the similarity 5 of flux seasonality in these regions derived from OCO-2 retrievals to land surface models employing prognostic phenology (i.e. ORCHIDEE and SiB4, which are used as prior fluxes by the CAMS and CSU models as described in Appendix A). These two factors indicate that the OCO-2 inferred fluxes may not be driven by retrieval biases.   Figure 6. As in the right column of Figure 2, but for selected terrestrial regions in the southern tropics on different continents.

Evaluation Against Independent Data
The fluxes discussed in the previous sections indicate different signals present in the OCO-2 land retrievals than from the global network of in situ measurements, particularly in the tropics. Given the scarcity of in situ measurements in these regions, particularly when compared to the number of OCO-2 soundings, this is not surprising. However, perennial cloudiness in the Tropics, as well as aerosols arising from biomass burning and dust, both reduce the number of OCO-2 soundings and potentially induce biases in the remaining data. These facts leave the question of accuracy in the posterior fluxes unanswered. In order to explore this question, we evaluate the posterior fluxes by sampling the resultant concentrations for comparison with TCCON and aircraft measurements.

TCCON
All modelers sampled their posterior concentration fields at TCCON retrieval locations and times to compare directly to the TCCON dataset as available during the full period starting January 1, 2015 and ending April 1, 2017. Not all sites have the same length of record due to latency in the release of quality controlled data. Time series of simulated and retrieved 5 X CO2 at TCCON sites are shown in Figures C2-C4, from which the length of the available records for each site can be seen. Figure 7 depicts the overall error statistics for each model by site and data constraint. The model concentrations are sampled for each 30 minute average TCCON retrieval, as described in the experimental design, and then subtracted from the TCCON values to cal-10 culate statistics. For comparison to OCO-2 retrievals, available 10s retrievals from OCO-2, using a 5 degree latitude and longitude geometric coincidence criteria, were averaged and compared to TCCON observations occurring within one hour of the overpass time, in much the same way that a coarse global transport model would be sampled for this purpose. For the LN and LG experiments in the middle and bottom rows of Figure 7, error statistics for 15 co-located OCO-2 observations are also displayed in the first column of the panel to give a sense of the correlation between the OCO-2 retrievals and the resulting modeled concentrations at each TCCON site. Of note is the strong correlation between OCO-2 mismatches with TCCON and the posterior simulated concentration mismatches with TCON. For example, the OCO-2 land nadir retrievals are biased high relative to most TCCON sites, in 20 line with estimates from Chatterjee et al. (in preparation), and the LN inversion simulated concentrations show a similar high bias across models. The European TCCON sites show a consistent pattern, in which all model concentrations are biased high. This indicates an issue with representativeness of coarse global transport models at these sites or with the accuracy of the TCCON retrievals, though no evidence for the latter has been presented in 25 the literature. Another similarity across the results is the strong difference between residuals for the Dryden and Caltech sites, which are located very close to one another. This is due to the highly local nature of these observations and the relatively broad coincidence criteria 35 used in the comparison. Coarse models are unable to simulate all of the variability at these sites. Caltech in particular is highly influenced by the Los Angeles basin, while Dryden, though geographically close to Caltech, is separated from the basin by mountains and thus samples the relatively clean environment outside the basin (Kort et al., 2012;Schwandner et al., 2017). The high bias at Dryden is likely due in part to models simulating conditions 5 from inside the Los Angeles basin, and the low bias at Caltech due to models simulating some of the cleaner air north of the basin. The challenges of comparing point data to model grid cell concentrations highlights that representativeness and model resolution are key issues for using TCCON and other data sets to evaluate model results.
There are four TCCON sites in the Tropics: Manaus, Ascension Island, Reunion Island, 10 and

Surface and Aircraft In Situ Observations
The posterior concentrations were sampled at the locations and times of the surface sites shown in Figure 1  for different latitudes (along the horizontal axis) and altitudes by row. As depicted in the upper left panel of Figure 8, the IS posterior concentrations compare well with the PBL measurements; this is expected as they assimilate these data to optimize the surface fluxes. The IS experiments also exhibit the smallest bias throughout the atmosphere in the northern extratropics and above the PBL in the southern extratropics (largely 10 represented by ORCAS data). Alternately, LN, LG and the prior all have a positive bias in the in the northern extratropics, indicating too much overall CO 2 in that region at all three atmospheric layers. Interestingly, above the PBL in the tropics, LN has the lowest bias of the three experiments, though with the important caveat that this comparison is driven totally by two seasons (boreal winter and spring) of ATom aircraft measurements with flights 15 in the Atlantic and the Pacific. Thus, we cannot draw the conclusion that the enhanced tropical outgassing in the northern tropics in the OCO-2 constrained fluxes is correct, particularly since LG posterior samples resemble the IS posterior samples more so than LN in the tropics, while the LG fluxes are more in line with LN. Lastly, none of the observational constraints improves the overall simulated variability in atmospheric concentrations relative 20 to the observations in any of the three atmospheric layers presented, at all latitudes, as shown in the right column of Figure 8. This is likely due to the coarse spatial resolution of the models included in this study.
It is tempting to draw conclusions about surface fluxes from these comparisons with independent data. However, the general sparseness of these samples in space and/or time as

Discussion
We have used a suite of atmospheric inverse models to analyze the OCO-2 X XCO2 retrieval data to identify CO 2 flux signals that stand out above the noise of transport model error and inversion assumption differences. The OCO-2 retrievals for different viewing modes (LN,LG) were assimilated in separate experiments given the differences between the signals present 5 in each, as detailed in Chatterjee et al. (in preparation). We have presented these flux results starting at the global scale, then moving to broad zonal results, and focusing finally on results at the continental scale; at this finest scale, we present results for the land regions only, since we do not expect the satellite data taken over land to provide a strong constraint on the ocean fluxes. The inversions point to several areas where the OCO-2 data drive 10 robust differences from our prior flux estimates, in some cases differing from the results given by the in situ data and in other cases showing agreement.
In the northern extra-tropics, the most robust signal in the inversion results is the phase adjustment of the seasonal cycle of net ecosystem exchange on land, as well as a deeper maximum summertime drawdown relative to the prior mean fluxes. Peak carbon draw down 15 appeared approximately a month early than expected, as did the onset of net positive fluxes in the early fall. In future work, it would be useful to see how these shifts in NEE agree with the solar induced fluorescence products that are now being produced by OCO-2 (Frankenberg et al., 2014;Sun et al., 2017) and the TROPOspheric Monitoring Instrument (TROPOMI). In the southern extra-tropics, we did not find significant differences from the 20 a priori fluxes, probably because the limited amount of land data available that far south precluded inference about the fluxes there. The OCO-2 data hint at a somewhat higheramplitude seasonal cycle in the global ocean fluxes than we had in our priors, but the experiments of Basu et al. (2018) caution us that ocean fluxes inferred from land data only may be particularly susceptible to sampling bias, transport errors, over-reliance on prior 25 fluxes, and the inability of coarse models to constrain land and ocean fluxes separately.
As mentioned previously, a key promise of satellite data is to provide new information relative to the global in situ network in the tropics, where the in situ data provide a minimal constraint, and that is in fact the case: the OCO-2 data imply a significantly larger seasonal cycle in the tropics than given in our prior or given by the in situ data, in terms of the land+ocean flux total. This greater seasonality is driven by the land fluxes, and most of it occurs in Africa, both north and south of the equator. The strongest of these deviations is evident in northern Africa, where annual net fluxes of carbon were positive (carbon efflux to 5 atmosphere) and much stronger than expected. The seasonality of fluxes in this area was also much stronger than in many of the prior land fluxes, which in our experiments arise from terrestrial ecosystem models. In particular, the positive adjustment in carbon fluxes from November to June time frame were the driving force behind posterior adjustments to both annual fluxes and seasonal amplitude. While this topic is beyond the scope and focus 10 of this paper, we feel obliged to discuss possible candidate processes that might contribute to what we see in North Africa. The positive flux adjustments we obtain there fall squarely within the strong local dry season, raising stronger carbon inputs from fire as an obvious possibility. However, fires are imposed within most of the modeling systems and the likelihood of fire emissions being wrong by 1 PgC or more seems slim, which implies that fires 15 alone cannot explain the results. Liu et al. (2017) found that respiration was an important part of the anomalous efflux (relative to a La Niña period) from this region during the time period of interest, which offers a potential explanation. Northern Africa is an area with large expanses of high surface albedo and aerosols due to wind and dust sources. Reasonable effort has been made to evaluate the potential biases in the area by running atmospheric 20 inversions with simulated biases in areas of concern (not shown) as well as analysis of downwind TCCON sites such as Ascension Island. With no clear indicators of bias and given the sparseness and representativeness of the available evaluation data, we cannot falsify either the IS-constrained tropical fluxes or the satellite-constrained fluxes, despite the large difference between them. Therefore, we must move forward with the hypothesis 25 that this signal may be valid and is tied to variations in either respiration, photosynthesis, or both.
Next, we point to the observation made in Section 4.4 where the suite of inversion results for Northern Tropical Africa tend to move toward the fluxes from the SiB4 and ORCHIDEE prognostic biosphere models. An analysis of the SiB4 prior fluxes indicate very strong seasonal flux signals from C4 grasslands in the region. Grasslands have large quick-turnover carbon pools and thus it is not surprising that respiration and photosynthesis are strongly correlated seasonally. There are also strong respiration and photosynthesis fluxes in deciduous and evergreen broad-leaf plant types in this area although the longer turnover wood 5 pools imply that the seasonality in the NEE for this vegetation is likely driven more strongly by photosynthesis. Grasslands have historically been very difficult to model with NDVI/EVI driven diagnostic biosphere models such as CASA and thus seem a natural candidate to explain higher posterior NEE amplitude. The larger amplification in the dry season could also point to more subtle reductions in photosynthesis across forested regions not being 10 captured by the diagnostic models, where there is often difficulty due to the saturation of vegetation indexes such as NDVI. The posterior adjustments from the models seem to imply a stronger annual sources and a stronger seasonal cycle, likely implying some combination of effects from both forests and grasslands.
We also not the difficulty in constraining ocean fluxes with only LN data and in partitioning 15 land and ocean fluxes due to inconsistencies between land nadir and ocean glint modes (Basu et al., 2018). Ocean glint retrievals in v7 of the data were unusable due to systematic biases discovered during this exercise. In light of this, several improvements were made in Version 8 (O'Dell et al., 2018a), and retained in Version 9 (Kiel et al., 2018) of the OCO-2 retrievals and we hope will make the ocean glint data more informative in the next round 20 of experiments. The continued difficulty of using data with biases between different modes (e.g. ocean glint vs land nadir) emphasizes the potential value of ancillary atmospheric tracers such as Atmospheric Potential Oxygen (APO) (Stephens et al., 1998) which could possibly be used to partition ocean and land NEE, "online" bias correction methods which allow for the post-hoc OCO-2 bias correction to be performed in a consistent fashion within 25 the atmospheric inversion framework, as well as alternate methods of using information on the CO 2 vertical information present in the retrievals. Satellite retrievals have tremendous potential for constraining surface fluxes of CO 2 (Rayner and O'Brien, 2001). In this study, we employ an ensemble of inversion models with different assumptions to estimate surface CO 2 fluxes in 2015 and 2016, and their uncertainties. We find that OCO-2 retrievals inform fluxes that agree at global scales with those of in situ data. 5 Furthermore, agreement is found where both satellite and in situ data are dense enough to provide sufficient constraint. The inferred fluxes differ significantly in the tropics, where the satellite retrievals suggest a much stronger seasonal cycle than the in situ measurements over most of the zone, and in particular a much stronger outgassing from the Northern Tropics, with the main differences occurring in Africa. Ocean fluxes generally remain close 10 to the prior in all experiments. Evaluating this new flux information is a difficult task. The TCCON retrievals suggest that the tropical outgassing in the LN experiments is too large, but this is weakened by the site dependence of the errors in these TCCON comparisons. PBL and aircraft observations lead to different conclusions, but again these are sparse and potentially do not capture the 15 influence of fluxes from the regions in question.
Despite the difficulties in evaluating the OCO-2 derived flux estimates obtained here, the comparison to more traditional in situ-based estimates has been illuminating. The satellite results have exposed the sensitivity of the in situ results to the transport used, especially the vertical transport: spread in the in situ results is largest over tropical land regions, and the 20 satellite results provide their most robust new insight into the global carbon cycle, especially in terms of the magnitude and timing of the seasonal cycle of flux. This process of questioning old results and testing the new results will continue as the satellite data are used in new ways. The impact of using vertical information from the satellite retrievals (instead of just the straight vertical mean given by X CO2 ) is a notable area of on-going research: 25 the bias correction of the OCO-2 retrievals with respect to TCCON X CO2 should be expected to change considerably as the information from the satellites closer to the surface is emphasized more. In the future, the analysis shown here will be repeated with updated OCO-2 retrievals, and new analyses performed for a longer period that includes 2017-on. The new Version 9 OCO-2 retrievals should have lower overall biases compared to Version 7 used for these experiments. In particular, the ocean glint retrievals should be significantly improved, due to the inclusion of aerosol dynamics that are expected to eliminate the bias in the high 5 southern latitudes (O'Dell et al., 2018b). This will provide an exciting opportunity for constraining ocean fluxes. Additionally, an updated ACOS GOSAT product for the entire data record is due to be released in 2019, and the comparison of OCO-2 constrained fluxes with the much longer GOSAT record is critical for understanding the long term behavior of the tropical carbon cycle.

Author Contributions
The inferred fluxes are estimated in each horizontal grid point of the transport model with 15 a temporal resolution of 8 days, separately for day-time and night-time. The state vector of the inversion system is therefore made of a succession of global maps with 9,200 grid points. Per month it gathers 73,700 variables (four day-time maps and four night-time maps). It also includes a map of the total CO2 columns at the initial time step of the inversion window in order to account for the uncertainty in the initial state of CO2.
regularly throughout the year. The gridded prior fluxes exhibit 3-hourly variations, but their inter-annual variations over land are only caused by anthropogenic emissions. Over land, the errors of the prior biosphere-atmosphere fluxes are assumed to dominate the error budget and the covariances are constrained by an analysis of mismatches with in situ flux measurements (Chevallier et al., 2006(Chevallier et al., , 2012 : temporal correlations on daily mean 5 Net Carbon Exchange (NEE) errors decay exponentially with a length of one month, but night-time errors are assumed to be uncorrelated with daytime errors; spatial correlations decay exponentially with a length of 500 km; standard deviations are set to 0.8 times the climatological daily-varying heterotrophic respiration flux simulated by ORCHIDEE with a ceiling of 4 gC per m 2 per day. Over a full year, the total 1-sigma uncertainty for the prior 10 land fluxes amounts to about 3.0 GtC per year. The error statistics for the open ocean correspond to a global air-sea flux uncertainty about 0.5 GtC per year and are defined as follows: temporal correlations decay exponentially with a length of one month; unlike land, daytime and night-time flux errors are fully correlated; spatial correlations follow an e-folding length of 1000 km; standard deviations are set to 0.1 gC per m2 per day. Land and ocean 15 flux errors are not correlated.

A4 CSU-Schuh
We use a Bayesian technique with SiB4 as the carbon flux prior model for respiration and gross primary production (GPP). SiB4 is an integration of heterogeneous land-atmosphere fluxes, environmentally responsive prognostic phenology, dynamic carbon allocation, and cascading carbon pools from live biomass to surface litter to soil organic matter. Rather 5 than relying on satellite data for the vegetation state, SiB4 brings together biological phenology, plant physiology, and ecosystem biogeochemistry to fully simulate the terrestrial carbon cycle, predicting consistent energy exchanges, carbon fluxes and carbon pools. To capture vegetation-specific phenology and biological processes, SiB4 uses twenty-four plant functional types (PFTs), including three specific crops (maize, soybean and winter 10 wheat). For this work, SiB4 fluxes were provided at 1 • x 1 • degree resolution. Each 1 • x 1 • box could consist of up to 24 PFTs, responding in a joint way to the atmosphere. Thus there is no effective "round off" error from using a single dominant PFT or biome on a coarse land surface grid.
We use a conceptually simple inversion framework with the goal of providing optimized 15 CO2 fluxes for plant functional types (PFTs) on continental scales. In particular, for each of 25 possible PFTs, and each of 11 Transcom land regions, we solve for β, the amplitudes of seven Fourier harmonics. This framework optimizes the seven coefficients for each of up to 25 PFTs for each of 11 Transcom Regions for GPP and respiration (separately) for a total of up to 7*25*11*2 = 20 3696 parameters. To illustrate this, two trivial univariate examples are presented for GPP in the Missouri Ozarks Ameriflux site and total respiration in the Howland Forest Ameriflux site in Maine. Ocean regions are divided into 30 regions according to Jacobson et al. (2007) and solved for in a similar fashion to land but with only 2 harmonics.
In practice, each of the stochastically fixed coefficients to the betas are run through 25 the highest frequency flux signals to be expected. With three harmonics, we expect to be able to recover seasonal corrections on time scales down to about 2 months. Each pulse provides a vector of sensitivities of the observations to that particular pulse. We then concatenate these vectors into a large Jacobian (sensitivity matrix) and solve for the regression coefficients 5

A5 CT-NRT
CarbonTracker Near-Real Time (CT-NRT) is an extension of the formal CarbonTracker CO 2 analysis system, designed to bridge the gap between annual updates of NOAA's formal Car-bonTracker product. It extends model results beyond the most recent CarbonTracker release until the end of available ERA-interim meteorology needed to drive its transport model, TM5. 10 The release of CT-NRT used in this study, CT-NRT.v2017, was initialized in September 2014 from the CT2016 release of CarbonTracker (Peters et al., 2007, with updates documented at http://carbontracker.noaa.gov). CT-NRT uses a unique set of flux priors, derived from the optimized fluxes of CT2016. The 2001-2015 climatology of these optimized terrestrial fluxes is augmented with a statistical model of flux anomalies, also derived from CT2015 results. 15 Ocean and wildfire prior fluxes are set to the seasonally-varying climatology of optimized CT2016 fluxes without interannual variability. This prior not only has a long-term mean terrestrial sink, but also attempts to represent interannual variability in land co 2 flux due to anomalies of temperature, precipitation, and solar insolation. This prior was developed to mitigate the smaller number of in situ CO 2 measurements available for assimilation in near- 20 real time, as it is presumably less biased than the standard CarbonTracker prior with its small land sink.

A6 TM54DVAR-NOAA
The TM5 4DVAR system is a Bayesian inverse modeling framework that infers surface fluxes of a tracer given measured tracer mole fractions in the atmosphere (Meirink et al., 25 2008 measurements with surface fluxes (Krol et al., 2005) . TM5 and its adjoint are used for a variational estimate of surface fluxes. For this work, we ran TM5 globally at 3 • lon x 2 • lat with 25 vertical layers. We used TM5 4DVAR to solve for terrestrial and oceanic CO2 fluxes, with fixed fossil fuel fluxes described elsewhere in this manuscript. Prior oceanic fluxes were constructed from a climatological average of CT2015 oceanic flux estimates. 5 Terrestrial CO2 fluxes -the sum of net ecosystem exchange and fire fluxes -were taken from SiB CASA GFED 4 (van der Velde et al., 2013). The uncertainty on the terrestrial fluxes were fixed to be 0.5 x heterotrophic respiration from SiB CASA, while the uncertainty on oceanic fluxes was fixed at 1.57 times the absolute flux at each grid cell and time step. The uncertainty of the prior flux is assumed to have exponential spatio-temporal correlation, 10 with length and time scales of 1000 km and 3 weeks for the oceanic component and 250 km and 1 week for the terrestrial component. OCO-2 retrievals assimilated are described elsewhere in this document, while the in situ CO2 measurements assimilated were identical to the set used by CT-NRT. 15 The OU results utilize the same model and data assimilation framework as the TM54DVAR-NOAA group, but with different inputs. The OU experiments utilize the CT-NRT unoptimized prior emissions, and uncertainties derived from different climatological fluxes. The initial conditions are provided by CarbonTracker, and the model constrains monthly 6 • by 4 • emissions from March 1, 2014 though April 1, 2017. The OU system uses the same prior fluxes 20 as CT-NRT, and so provides a measure of the contribution of the data assimilation framework, prior uncertainties, and spatial resolution to posterior emissions. Conversely, the OU experiment provides the impact of prior emissions and uncertainties and spatial resolution relative to the TM54DVAR-NOAA results.

A8 University of Edinburgh (UoE)
The UoE inversions are based on an existing EnKF (Ensemble Kalman Filter) framework (Feng et al., 2009(Feng et al., , 2016 for inferring surface CO2 fluxes by optimally fitting model simulation with the in-situ or space-based measurements of atmospheric CO2 concentrations. We use the global 3D chemistry transport model (CTM) GEOS-Chem of version 9.02 to sim-  ; and 4) three-hourly terrestrial biosphere fluxes (CASA, ). We assume a 60% uncertainty for land monthly fluxes, and 40% for oceanic fluxes. Errors for land (ocean) prior fluxes are also assumed to be correlated with each other with a correlation length of 500 (800) km. By optimally fitting model simulation with observations, we infer monthly CO2 fluxes over 792 geographic re-15 gions (475 land regions and 317 ocean regions), compared to the 199 global regions used in our previous experiments (Feng et al., 2009).

A9 University of Toronto (UT)
UT results employ the GEOS-Chem (http://geos-chem.org) global three-dimensional chemical transport model, driven by assimilated meteorological observations from the Goddard 20 Earth Observing System version 5 of the NASA Global Modeling Assimilation Office. The model configuration is the same as that used in Deng et al. (2016). The resolution of the model is 4 • x 5 • , with 47 vertical levels extending from the surface to 0.01 hPa. The assimilation is carried out using a four-dimensional variational (4D-Var) approach (Henze et al., 2007). 25 The a priori CO2 flux inventories are the following. For biomass burning, we used monthly emissions from the Global Fire Emissions Database version 4 (urlhttp://www.globalfiredata.org/). The atmosphere-ocean flux of CO2 is based on the monthly climatology of Takahashi et al. (2009). For the biospheric flux of CO2, we use 3-hourly fluxes from the Boreal Ecosystem Productivity Simulator  . As in Deng et al. (2014), it is assumed that the annual terrestrial ecosystem exchange is neutral in each grid box Deng and Chen (2011b). Although the temporal resolution for the terrestrial ecosystem exchange is 3 h, the optimized 5 scaling factors are estimated with a monthly temporal resolution.
Diagonal priori error covariance matrix was used and it is assumed (Deng et al., 2016) that the 1-sigma uncertainty for fossil fuel emissions is 16% of the fossil fuel emissions and 38% of the biomass burning emissions in each month and each model grid box. The uncertainty of the ocean flux is assumed to be 44%, and for both gross primary production 10 and total ecosystem respiration we assumed an uncertainty of 22% in each 3 hour time step and in each model grid.
ObsPack NRT was used, but observations from 'SCT', 'STR', 'TPD', 'PUY', 'KAS', and 'SSL' were removed. Red circles denote retrievals that were rejected by the outlier filter, SZA filter and TCCON flagging, while blue plus signs denote retrievals that passed those filters. Green diamonds denote the 30 minute averages of the accepted retrievals that were eventually used by the modelers for this study. Figure C2. The time series of monthly mean residuals between simulated X CO2 and TCCON observed X CO2 by site and data constraint for sites in North America. Each line represents a different model. The sites are arranged from north to south by site latitude. The colors denote the prior concentrations (grey), as well as the posterior concentrations from forward runs using fluxes constrained by in situ (IS, red), land nadir (LN, green), and land glint (LG, blue) data. For the LN and LG residuals, monthly OCO-2 overpass residuals are displayed as stars over the model residuals. Plots are ordered by site latitude.