The impact of improved satellite retrievals on estimates of biospheric carbon balance

The Orbiting Carbon Observatory 2 (OCO-2) is NASA’s first satellite dedicated to monitoring CO2 from space and could provide novel insight into CO2 fluxes across the globe. However, one continuing challenge is the development of a robust retrieval algorithm: an estimate of atmospheric CO2 from satellite observations of near-infrared radiation. The OCO-2 retrievals have undergone multiple updates since the satellite’s launch, and the retrieval algorithm is now on its ninth version. Some of these retrieval updates, particularly version 8, led to marked changes in the CO2 observations, changes of 0.5 ppm or more. In this study, we evaluate the extent to which current OCO-2 observations can constrain monthly CO2 sources and sinks from the biosphere, and we particularly focus on how this constraint has evolved with improvements to the OCO-2 retrieval algorithm. We find that improvements in the CO2 retrieval are having a potentially transformative effect on satellite-based estimates of the global biospheric carbon balance. The version 7 OCO-2 retrievals formed the basis of early inverse modeling studies using OCO-2 data; these observations are best equipped to constrain the biospheric carbon balance across only continental or hemispheric regions. By contrast, newer versions of the retrieval algorithm yield a far more detailed constraint, and we are able to constrain CO2 budgets for seven global biome-based regions, particularly during the Northern Hemisphere summer when biospheric CO2 uptake is greatest. Improvements to the OCO-2 observations have had the largest impact on glint-mode observations, and we also find the largest improvements in the terrestrial CO2 flux constraint when we include both nadir and glint data.

employed version 7 of the observations (e.g., Chatterjee et al., 2017;Crowell et al., 2019;Liu et al., 2017;Nassar et al., 2017), but the ACOS team has subsequently updated the observations through version 9 (at the time of writing).
The OCO-2 observations have changed markedly through this process. One of the largest changes occurred with the release of version 8 of the OCO-2 observations in September 2017 (Fig. 1). This update incorporated a multitude of changes to the quality control prescreening process, the forward spectroscopy model, the retrieval algorithm, and the bias correction 5 (O'Dell et al., 2018b). These changes led to widespread improvements in the observations; version 8 has smaller random errors when compared to ground-based observations, a smaller bias between land nadir and land glint observations, and less bias across many northern high-latitude terrestrial regions (Wunch et al., 2017;O'Dell et al., 2018b). These improvements had a particularly large impact on glint mode observations. For example, a correction to the averaging kernel reduced a 0.3ppm bias in land glint data relative to land nadir (O'Dell et al., 2018b). Previously, inverse modeling studies using version 7 of the OCO-10 2 retrieval did not assimilate land glint and land nadir observations simultaneously due to this bias (e.g., Crowell et al., 2019).
Furthermore, version 7 glint observations had biases greater than 1ppm across the southern ocean that have been remedied in version 8. These errors appeared to be due to high altitude aerosols, so the version 8 algorithm includes a new aerosol layer in the upper troposphere and lower stratosphere that has remedied many of these biases. Overall, the observations rated as good quality in version 8 are very different from those in version 7; 24% of the observations that were marked as high quality in 15 version 7 have been marked as low quality in version 8, and 34% of the observations marked as high quality in version 8 were marked as low quality in version 7.
More recently, version 9 of the OCO-2 observations has been released in October 2018. Improvements in version 9 of the retrieval algorithm yielded smaller changes in the observations (O'Dell et al., 2018a). In particular, this version includes a correction for small-scale biases over land due to topography. Furthermore, the ACOS team relaxed a filter that discards 20 observations collected over dark surfaces, and this change yields more observations over tropical forests (O'Dell et al., 2018a).
In spite of these advances, there are still many opportunities for further improving the retrievals. For example, OCO-2 retrievals appear to show biases across most of the northern tropical oceans (O'Dell et al., 2018b).
These improvements to the observations should also improve the reliability or accuracy of CO 2 fluxes estimated using the observations. Several studies indicate that errors in the retrieval can have a substantial impact on the strength of the CO 2 flux 25 constraint (e.g., Chevallier et al., 2007;Baker et al., 2010;Crowell et al., 2019;Miller et al., 2018). For example, Miller et al.
(2018) explored the detectability of biospheric CO 2 fluxes using version 7 of the OCO-2 observations. They found that OCO-2 observations can be used to identify variations in biospheric fluxes within continental or hemispheric regions but that the observations have limited ability to constrain biospheric CO 2 fluxes across smaller regions. The authors constructed a series of synthetic data experiments to understand the most important factors limiting the CO 2 flux constraint; they concluded that 30 atmospheric transport errors and prior flux errors play a role, but retrieval errors are a particularly salient factor. The OCO-2 science team is also developing an ensemble of inverse modeling estimates of CO 2 fluxes, and recent comparisons show results that are broadly parallel to Miller et al. (2018): inverse models provide consistent CO 2 flux totals for continents or hemispheres but diverge for smaller regions (e.g., Crowell et al., 2019).
The present study is a follow-up to Miller et al. (2018). We re-examine the conclusions of that study in light of recent improvements in OCO-2 observations of CO 2 . We also identify opportunities for future improvements to the retrievals.

Overview
Uncertainties in biospheric fluxes are thought to be greater than in other CO 2 source types (e.g., National Research Council, 5 2010;Huntzinger et al., 2012;Le Quéré et al., 2018), and the CO 2 signal from biospheric fluxes is often larger than from other source types. Hence, we design a set of top-down experiments to examine whether we can detect variations in biospheric CO 2 sources and sinks within different regions of the globe and different months of the year using OCO-2 observations. In the present study, these variations are defined as any spatial or temporal patterns in CO 2 fluxes that have been gridded to the resolution of a global atmospheric model -1 • latitude by 1.25 • longitude and a 3-hourly time interval.

10
Detecting variations in CO 2 fluxes is a pre-requisite for constraining CO 2 budgets or flux totals; we must be able to detect variations in CO 2 sources and sinks across a region if we are to constrain budgets across any region of smaller size. We begin with two large hemispheric regions and then decrease the size of those regions to create increasingly challenging tests of the We construct this set of experiments for each of the last three versions of the OCO-2 observations and examine how the results change with the retrieval version. These experiments are identical except for the retrieval version used. Therefore, this 20 setup provides a means to understand how improvements in the observations are improving the constraint on biospheric CO 2 fluxes. We examine these questions for each month within the year 2015 -to understand how these results vary by season and by region or biome.

Implementation of the top-down experiments
We design a regression framework to determine whether we can detect variations in CO 2 fluxes using OCO-2 observations. This regression will try to match CO 2 observations from OCO-2 using numerous atmospheric model outputs. Each model output estimates the enhancement in total column CO 2 (XCO 2 ) from fluxes in a particular region and a particular month. We generate all of these model outputs of CO 2 using the Parameterized Chemistry and Transport Model (PCTM) (Kawa et al., 2004). The model setup used here has a spatial resolution of 1 • latitude by 1.25 • longitude, and we incorporate CO 2 fluxes at 30 a 3-hourly time resolution. The wind fields used to drive PCTM are from the Modern Era Retrospective-Analysis for Research and Applications (MERRA) product (Rienecker et al., 2011). This setup is identical to Miller et al. (2018).
We run many atmospheric model simulations using numerous different biospheric CO 2 flux estimates. The regression will try to reproduce OCO-2 observations using a linear combination of these model simulations. For example, in the seven region experiments, we use seven different geographic regions, seven biospheric CO 2 flux estimates, and 16 different months 5 (September 2014-December 2015). We discard results from the first four months as model spin-up. These combinations equate to 784 total atmospheric model outputs. We further run atmospheric model simulations using a spatially and temporally constant flux in each region and each month, and we allow the regression to use these model outputs as well. The SI and Miller et al. (2018) describe the CO 2 flux estimates and regression in greater detail.
This approach provides a means to evaluate when and where current satellite observations can constrain variations in CO 2 10 fluxes. At least some of the atmospheric model outputs that are driven by biospheric CO 2 flux estimates should help reproduce the OCO-2 observations better than the model outputs that are driven by spatially and temporally constant fluxes. If so, a model with spatially and temporally variable fluxes is better able to reproduce OCO-2 observations than a model with constant fluxes.
This result would imply that OCO-2 observations can be used to detect variations in biospheric CO 2 sources and sinks within a given region for a given month. By contrast, suppose that the atmospheric model outputs driven by biospheric CO 2 flux 15 estimates do not reproduce the OCO-2 observations any better than the model outputs with constant CO 2 fluxes. This result would imply one of several conclusions. First, the observations may not be sensitive to fluxes from the region or month in question. This outcome may occur if the magnitude of fluxes is small in a given region or if there are no OCO-2 observations near that region. Second, errors in the atmospheric model or in the OCO-2 observations may obscure variations in XCO 2 that are due to CO 2 fluxes. Lastly, the biospheric CO 2 flux estimates used in the atmospheric model may not be skilled and may 20 not reflect real-world biospheric CO 2 fluxes. However, in this study, we offer up seven biospheric CO 2 flux estimates for each region and each month, and at least one of these estimates should correlate with real-world CO 2 fluxes to a reasonable extent.
Hence, it is unlikely that this explanation would drive the results. Rather, it is more likely that the observations are not sensitive to fluxes from a given region or that errors in the model-data system are too large.
Note that anthropogenic, biomass burning, ocean, and biospheric fluxes all contribute to XCO 2 observed by OCO-2, and we 25 need to account for non-biospheric CO 2 fluxes in order to isolate the signal from biospheric fluxes in the regression. We model atmospheric enhancements of XCO 2 from anthropogenic emissions using EDGAR v4.  We further implement model selection to evaluate when and where current satellite observations can constrain variations in biospheric CO 2 fluxes. Model selection will determine which combination of atmospheric model outputs to include in the regression based upon which best reproduces the OCO-2 observations. If this combination includes at least one biospheric CO 2 flux model for a given region and season, we conclude that the observations likely can be used to constrain variations in CO 2 fluxes. However, if this combination does not include any biospheric CO 2 flux model for a given region and season, we conclude that the observations likely cannot be used to constrain flux variations for that region and season.
We specifically employ a form of model selection known as the Bayesian Information Criterion (BIC), an approach commonly used in regression modeling (e.g., Ramsey and Schafer, 2012, chap. 12) and more recently in atmospheric inverse modeling (e.g., Gourdji et al., 2012;Miller et al., 2013;Shiga et al., 2014;Fang et al., 2014;Fang and Michalak, 2015). To this 5 end, we create different combinations of model outputs and use each combination in the regression. We score each combination based upon how well it reproduces the OCO-2 observations; combinations with a lower weighted sum of squares error receive a better score. Each combination is also scored based upon the total number of model outputs in that combination. Specifically, combinations with a greater number of model outputs receive a larger penalty for complexity, and this penalty prevents combinations that overfit the data from receiving an anomalously good score. The best combination of atmospheric model outputs  (Fig. 3e). In other words, at least one biosphere flux model is found to explain a sufficiently large fraction of the observed variability in XCO 2 as to be selected via the BIC model selection procedure for the tropical regions for most months. This result indicates that spatiotemporal variability in CO 2 fluxes from within each of these regions is preserved in the OCO-2 observations. This represents a marked improvement over results when using observations from version 7 of the OCO-2 retrieval algorithm (Figs. 3b and 3e, Miller et al., 2018). The results using the newer versions 8 and 9 also show 30 substantial improvements in other regions, including dryland and dry monsoon regions, temperate regions, and high-latitude regions (Figs. 3e and 3h).
The seven-region model selection experiments are an even more challenging test of current observations. These experiments examine whether we can detect spatiotemporal variations in biospheric fluxes across seven broad, aggregated global biomes.
These experiments produce much better results using versions 8 and 9 of the observations. Specifically, biospheric flux models are selected across tropical and subtropical biomes for at least one month of every season. The same is true across all temperate and high-latitude biomes for a minimum of one month during northern hemisphere summer.

5
These improvements appear greatest across tropical biomes. There is a consistent flux signal from many tropical regions throughout the year, and hence we are able to detect variations in fluxes from tropical regions across different seasons using versions 8 and 9 of the observations. By contrast, the atmospheric signal due to biospheric CO 2 fluxes in northern mid-and high-latitudes has the largest absolute magnitude during northern hemisphere summer. As a result, we see a large improvement in the flux constraint in mid-latitudes in northern hemisphere summer but not in other times of year when the absolute mag-10 nitude of CO 2 fluxes is smaller. Furthermore, there are far fewer land nadir and land glint observations in northern mid-and high-latitudes in northern hemisphere winter relative to summer.
One notable feature of all model selection experiments is the result for dryland and dry monsoon regions (Fig. 2c). At first glance, it may appear surprising that biospheric flux models are selected for so many months in this region, given that some parts of this region are very dry and presumably have small CO 2 fluxes. Several semiarid regions within this classification have 15 a very distinct monsoon that can bring over 500mm of precipitation per month (e.g., northeastern Brazil, western India, and Pakistan). As a result, there is a large spatial contrast in CO 2 fluxes across these regions during northern hemisphere spring and summer -large CO 2 uptake in places with a spring and summer monsoon and little to no fluxes in places like the Sahara or the Arabian Peninsula.
Note that the results using version 9 of the observations are not very different from those using version 8. The change in 20 the observations between versions 8 and 9 is only incremental (e.g., Fig. 1b). Version 9 has a lower quality control threshold for surfaces with low albedo, resulting in more observations across tropical rainforests (O'Dell et al., 2018a), and this version includes a topography correction that mostly manifests at small spatial scales. The latter change could be very important for studies that estimate point sources or urban emissions using OCO-2. However, these changes are unlikely to make a large difference in this study both given the large size of the regions examined and the 1 • × 1 • spatial resolution of the atmospheric 25 model simulations. The SI includes a detailed discussion of the subtle differences between the model selection results using versions 8 and 9 of the observations.

Drivers of the results
Numerous factors affect the accuracy of CO 2 fluxes estimated from satellite data. These factors include the accuracy and precision of the observations, the atmospheric transport model, and the prior flux estimate used in the inverse model. Improvements 30 in any of these inverse modeling inputs could improve the constraint on biospheric CO 2 fluxes. We find that recent improvements to the retrieval are having a particularly large impact on the strength of the CO 2 flux constraint. Furthermore, these improvements are not restricted to a single satellite like OCO-2. Rather, the ACOS retrievals and bias correction (O'Dell et al., 2012(O'Dell et al., , 2018b will be directly applicable to other NASA carbon monitoring missions, including the recently-launched OCO-3 mission (Eldering et al., 2019) and the planned GeoCarb mission (Polonsky et al., 2014). These improvements to the retrieval algorithm have had an effect on both glint and nadir observations from OCO-2 collected in almost every region of the globe. The sheer number of different changes makes it challenging to pinpoint exactly which have had the largest impact on the CO 2 flux constraint; there have been numerous updates to the quality control prescreening, 5 the forward spectroscopy model, the retrieval algorithm, and the bias correction. Furthermore, these updates have had multiple effects on the reported CO 2 observations, reducing white noise, reducing bias, and changing which observations do or do not pass quality control. O'Dell et al. (2018b) detail these changes in much greater detail.
With that said, a few of these improvements appear to have a particularly salient impact on the results of this study. For example, the largest improvements have generally been to the glint mode observations. A 0.2 to 0.3 ppm bias between land 10 nadir and land glint observations in version 7 has been remedied in version 8, and version 8 glint observations show smaller biases across many ocean regions. Furthermore, version 8 exhibits less random noise in all types of observations, but that noise reduction is largest in glint observations, both over land and over the oceans (O'Dell et al., 2018b).
Indeed, we also see the largest improvement in the flux experiments conducted in this study when we include glint mode observations. Figure 4 displays the results of the model selection experiments when the glint data are excluded. The figure shows 15 results using version 7, and 8, and 9 of the observations. The improvement between versions 7 and 8 is much smaller when the glint observations are excluded than when they are included (Fig. 3). Even in terrestrial regions, these glint observations may play a key role in the overall flux constraint. For example, the absolute number of nadir and glint observations over land are roughly equal; there are 4.3 × 10 6 land nadir observations with a positive quality control flag for 2015 and 4.3 × 10 6 land glint observations during the same time period. 20 Note that this study focuses on detecting variations in CO 2 fluxes from terrestrial regions in individual months. To that end, certain types of flux estimation problems are beyond the scope of the current study. For example, there is strong evidence that OCO-2 observations are still biased across northern tropical oceans, and reductions in these biases could improve ocean flux estimates derived from OCO-2 (Baker, 2018;O'Dell et al., 2018b). Furthermore, there is always a possibility that the observations have a bias that is correlated across regions larger than those examined in this study. For example, the observations 25 show a small, time-dependent drift from one year to another (O'Dell et al., 2018b). The approach used in this study would be unlikely to detect the impact of those biases.

Conclusions
CO 2 observations from the OCO-2 satellite have changed enormously with recent improvements to the retrieval algorithm.
New observations are more self-consistent (e.g., better agreement between glint and nadir data) and compare better against 30 ground-based observations. In some regions, these changes are comparable in magnitude to the atmospheric CO 2 enhancement due to biospheric CO 2 sources and sinks.
In this study, we specifically examine how these changes to the retrieval algorithm have improved the constraint on biospheric CO 2 fluxes, and we find that the improvement is large. Using observations based on version 7 of the retrieval algorithm, we find that biospheric fluxes can only be constrained across continental or hemisphere-size regions, as these observations can rarely be used to detect or constrain variations in CO 2 fluxes across smaller regions. By contrast, we find a step-change improvement in the biospheric CO 2 flux constraint using updated versions of the OCO-2 observations, based on versions 8 and 9 of the 5 retrieval algorithm. Specifically, these improvements make it possible to detect variations in CO 2 fluxes within seven global biome-based regions during many seasons of the year. This improvement is particularly large when both nadir and glint data are included.
This study indicates that improvements to space-based CO 2 observations are yielding large improvements in global monitoring of biospheric carbon fluxes. As new CO 2 monitoring missions like OCO-3 and GeoCarb launch into orbit, these im-10 provements will have a lasting impact on space-based monitoring of CO 2 .  8 was a much larger update to the observations than version 9. We average all of the differences between observations onto a grid to make the differences more visually apparent. The results shown here are for observations collected in 2015, the time period analyzed in this study.
In addition, this map only displays grid boxes with more than 250 total observations in 2015.