The potential of OCO-2 data to reduce the uncertainties in CO2 surface fluxes over Australia using a variational assimilation scheme

This paper addresses the question of how much uncertainties in CO2 fluxes over Australia can be reduced by assimilation of total-column carbon dioxide retrievals from the Orbiting Carbon Observatory−2 (OCO-2) satellite instrument. We apply a four-dimensional variational data assimilation system, based around the Community Multiscale Air Quality (CMAQ) transport-dispersion model. We ran a series of observing system simulation experiments to estimate posterior error statistics of optimized monthly mean CO2 fluxes in Australia. Our assimilations were run with a horizontal grid resolution of 81 km 5 using OCO-2 data for 2015. We found that on average, the total Australia flux uncertainty was reduced by up to 40% using only OCO-2 nadir measurements. Using both nadir and glint satellite measurements produces uncertainty reductions up to 80%, which represents 0.55 PgC y−1 for the whole continent. Uncertainty reductions were found to be greatest in the more productive regions of Australia. The choice of the correlation structure in the prior error covariance was found to play a large role in distributing information from the observations. Overall the results suggest that flux inversions at this unusually fine 10 scale will yield useful information on the Australian carbon cycle.

(FTS Level 2-3 data products; the product version is not given in the paper, but is likely to be version 02.60 (Liang, 2019)) were larger than OCO-2. Over 2014-2016, the GOSAT mean bias was -0.62 ppm with a precision of 2.3 ppm compared to OCO-2 biases (OCO-2 Lite File Product version 7), which was 0.27 ppm with a precision of 1.56 ppm. Because a wider detection coverage and higher spatial resolution, OCO-2 realize more accurate estimates of carbon dioxide. However, and despite these differences, both satellites on-orbit have atmospheric CO 2 detection capabilities to be used in regional atmospheric inversions 5 to infer CO 2 surface fluxes.
Since 2013, several studies have used GOSAT retrievals to estimate CO 2 fluxes over the globe using inverse modelling (Basu et al., 2013;Chevallier et al., 2014;Deng et al., 2014;Maksyutov et al., 2013), while just a few have used OCO-2 data (Basu et al., 2018;Crowell et al., 2019). Most of these studies use global models with a relatively coarse spatial and temporal resolution. For instance, the set of global three-dimensional models included in Basu et al. (2018) typically 10 have horizontal resolutions in latitude-longitude grid-cells between 1 • up to 5 • . Coarse-resolution models capture large-scale transport processes but do not take full advantage of high-frequency information collected in the continental interior (Geels et al., 2004). Uncertainties related to the simulation of large-scale transport lead to poorly constrained flux estimates . Several studies (e.g., Geels et al., 2004Geels et al., , 2006Göckede et al., 2010;Broquet et al., 2011;Lauvaux et al., 2012) indicate that errors in the simulation of large-scale atmospheric transport can be reduced if the transport model is run at 15 sufficiently high resolution. Some of these studies (e.g., Broquet et al., 2011) performed a regional-scale variational inversion of the European biogenic CO 2 fluxes on a 50 km resolution. Finer resolution models have the potential to be more successful since they can offer a better representation of surface CO 2 fluxes and variability, as well as a better simulation of the processes driving high-frequency variability of transport (Schuh et al., 2010).
In this study, we present a regional-scale, four-dimensional variational flux inversion system to assimilate OCO-2 retrievals. 20 The study area here is Australia, chosen for the following three reasons. First, the current estimate of Australian CO 2 fluxes is highly uncertain, mainly due to the uncertainties in the net primary productivity (NPP) simulated by biosphere models (Haverd et al., 2013b;Trudinger et al., 2016). In general, uncertainties in these NPP estimates are mainly driven by errors in model parameters (e.g., parameters associated with the leaf maximum carboxylation rate or the amount of chlorophyll content in plants; Norton et al., 2018). Second, Australia has a sparse in-situ CO 2 monitoring network (four stations operating in 25 our study year of 2015), so the broader coverage offered by satellite data may help to constrain fluxes. Third, Australia has reasonable coverage of OCO-2 measurements due to relatively low cloud, and the presence of three Total Carbon Column Observing Network sites in the region provides good calibration/validation for the OCO-2 data in the region. This paper aims to assess the likely uncertainty reduction for CO 2 fluxes over Australia using a series of observing system simulation experiments (OSSEs) and to test our four-dimensional flux inversion scheme. The structure of this paper is as 30 follows. Section 2 describes the flux inversions system, the OSSEs and the datasets used. Section 3 presents the main results found for our ensemble of inversions, such as degree of freedom for signal, percentage of uncertainty flux reduction at grid-cell scale and uncertainty flux reduction aggregated by land cover type over Australia. Section 4 describes seven different sensitivity experiments to test the robustness and the performance of our inversion. In Section 5 we further evaluate our inversion by using real data; essentially a consistency test, this is done by comparing the posterior CO 2 concentrations with OCO-2 data for March 2015. Sections 6 and 7 discuss the sensitivity experiments and summarise our findings.

Methods and Data
The methodology to perform our OSSEs follows Chevallier et al. (2007). This randomization approach is illustrated in Fig. 1 and follows four successive steps. First, we need to specify fluxes (see Section 2.4), boundary conditions and initial conditions 5 as inputs to the forward model (see Section 2.5). These inputs define the "true" field that we attempt to recover in the inversion.
We run the Community Multiscale Air Quality (CMAQ) model forward with these inputs to generate a four-dimensional concentration field. We sample the concentration field with the OCO-2 observation operator to generate perfect observations (see Section 2.3). The perfect observations are perturbed following the observational error statistics to generate the "pseudoobservations" used in the inversion. Second, we perturb the "true" fluxes according to the prior uncertainty to generate the prior 10 fluxes. Third, we perform the Bayesian inversion (see Section 2.1), using the prior fluxes and pseudo-observations. Finally, we repeat the process of adding random noise to generate prior fluxes and pseudo-observations, and then running the flux inversion; these random realisations represent a sampling of the posterior error, taken as the difference between the posterior and true fluxes. It can be shown that this difference is a realisation of a Gaussian distribution with zero mean and covariance given by the true posterior covariance. 15 In this study the OSSEs experiments were performed only for the months of March, June, September and December 2015.
We ran an ensemble of five inversions for each month using different perturbations, generating five samples of the posterior PDF. In the following subsections we describe the main ingredients of this procedure.

Inversion Scheme
The inversion scheme for optimizing CO 2 surface fluxes over Australia involves a Bayesian four-dimensional variational assim-20 ilation system. The system is a generalised minimisation-based inverse-modelling framework, which can be applied to several potential models. We refer to it hereafter as 'py4dvar'. py4dvar finds an optimal estimate of the CO 2 surface fluxes (x a ) that fits both observations (y) and the prior fluxes (x b ) Rayner et al., 2019). Assuming Gaussian PDFs, finding this maximum a posteriori estimate is equivalent to minimising the cost function J(x) shown in Eq. 1 (Rayner et al., 2019).
The first term in Eq. 1 represents the sum of squared differences between the control variable (x) and its prior or background state (x b ). The second term measures the sum-of-squared difference between the model simulation, H(x), and observations (y) during the time window of the assimilation. The term H(x) is the function composition of an atmospheric transport operator and an observation operator. Both terms in Eq. 1 are weighted by their respective error covariance matrices (B and R), and the errors are assumed to be Gaussian and bias-free. As mentioned in the previous paragraph, the minimum of J(x) 30 is found by an iterative process rather than by an analytical expression. The minimization inside py4dvar is performed using the Limited-memory BFGS (L-BFGS-B) algorithm, as implemented in the scipy python module (Byrd et al., 1995). The minimization algorithm L-BFGS-B requires values of the cost function and its gradient, which are calculated using the CMAQ forward model and the adjoint model, as shown in the third step in Fig. 1.
The gradient of the cost function in Eq. 2 is calculated using the adjoint of the CMAQ model (version 4.5.1;Hakami et al., 2007). We can observe that in the second term in Eq. 2, the adjoint model (H(x)) is applied to the vector R −1 (H(x) − y), which is often called the "adjoint forcing", or simply the "forcing", and represents the error-weighted differences between the forward model and the observed concentrations. Applying the adjoint model to the forcing, running backward in time from t i−i to t 0 , allows us to construct the gradient of the cost function, ∇ x J(x).

Choice of Control variables
Our underlying physical variables are the monthly-averaged fluxes at the spatial resolution of CMAQ (≈81 km). We do not split fluxes by day and night, consistent with only using daytime satellite observations, which are not subject to much influence by diurnal cycles in CO 2 fluxes (e.g., Deng et al., 2014;Houweling et al., 2015). Like most previous studies (e.g., Chevallier et al., 2007;Baker et al., 2010;Basu et al., 2013;Crowell et al., 2019) we use spatially correlated prior uncertainties to account for 5 systematic errors in flux estimates. The variables exposed to the minimiser are not the fluxes themselves, but rather multipliers for the principal eigenvectors of B. We truncate the eigen-spectrum at 99% of the total variance; doing this significantly reduces the size of the control vector x (relative to if the control vector was comprised of the fluxes at each grid-cell). This requires a different number of eigenvectors for different months (Table 1). The length of the control variables for our sensitivity experiments are defined in Table 6. Similar to Chevallier et al. (2005a), and because our inversion assimilation window is short, we also include (in the state vector for the inversion) a perturbation to the initial conditions (ICONs) of the CO 2 concentration field. Because we are not interested on the analysis of this field, and in order not to significantly increase the size of the control vector, we added a scaling factor for the ICONs to our control variables x = {i 0 , e 0 , e 1 , ..., e n }, where i 0 is the factor we solve 5 for ICONs, e n is the number of eigen-vectors. The scaling factor was applied to the full three-dimensional concentration field.
Some freedom in the initial condition avoids fluxes being unduly influenced by a mismatch in the initial concentrations. We assumed 1% (≈4 ppm) uncertainties for the scaling factor.

Observations and their Uncertainties
We used OCO-2 level 2 satellite data (Lite file version 9) distributed by the National Aeronautics and Space Administration 10 (NASA) (available for download from https://oco2.gesdisc.eosdis.nasa.gov/data/s4pa/OCO2_DATA/). We used the columnaveraged dry air mole fraction of CO 2 , referred to as XCO 2 . We selected bias-corrected data, as described by Kiel et al. (2019).
We used nadir and glint soundings over land that were flagged as good quality except in some of our sensitivity experiments (described in Section 4), in which we excluded glint mode data. We computed a weighted average for all OCO-2 measurements using a two-step process similar to Crowell et al. (2019). The first step is to average all the soundings into 1-second intervals 15 and the second is to average these 1-second averages into the CMAQ vertical columns (81 km × 81 km) for each satellite pass, where the transit time over the CMAQ grid-cell is about 11 seconds. For the 1-second averaging process, the weighted averaging is defined in Eq. 3.
where w i = 1 σ 2 i is the squared reciprocal of the OCO-2 uncertainties (σ i ). To get the uncertainties of these averaged soundings, we considered 3 different forms of uncertainty calculation (similar to Crowell et al. (2019)). First if we assumed that all errors are entirely correlated in a 1-second span, we can define the uncertainties as shown in Eq. 4.
However, and because the average shown in Eq. 4 is sometimes low, we also considered the standard deviation of the XCO 2 5 measurements (here referred to as the spread, or σ r , of the OCO-2 measurements). In other words, if the spread (σ r ) of the XCO 2 measurements were higher than the XCO 2 uncertainty (σ i ), we used the spread value as shown in Eq. 5. We did this because the spread in OCO-2 measurements may reflect real differences across the field within a 1-second timespan.
Third, we also considered a baseline uncertainty (σ b ), based on an error floor ( ) over land and ocean, as shown in Eq. 6. We 10 did this because sometimes we did not have enough OCO-2 soundings to compute a realistic spread. The values for our baseline uncertainties were taken to be 0.8 and 0.5 ppm over land and ocean, respectively. Finally, and after defining the uncertainties for the 1-second averages, we choose the maximum value between σ s , σ r and σ b .
The second step was to take these 1-second averages and average them within the CMAQ vertical columns using Eq. 7.
where w j = 1 σ 2 j represents the squared reciprocal of the uncertainties average in the 1-second span (σ j ) and J is the number of those 1-second values. The average uncertainty over the CMAQ domain (Eq. 8) was similar to the procedure outlined for 1-second average in Eq. 4. However, we also added a term to represent the contribution of the model uncertainty (σ m ). We assumed that the model had a uncertainty of about 0.5 ppm. The observational error covariance matrix R was assumed to be fossil fuel emissions, fires, land and ocean fluxes (see Section 2.4 for a description of these fluxes). Our py4dvar system takes in a vector x representing perturbations to the assumed emission profile, which is set to all be zeros in the "true case", and converts it into a format accessible to CMAQ model (e.g., copying the monthly average values into the hourly resolution CMAQ model is configured to run with). These perturbations to the emissions (zero values in the "true" case) are then added to the assumed emission profile for CMAQ before the model is run to produce a four-dimensional CO 2 concentration field, as is in 5 step 2 of Fig. 1. Fourth, this modelled CO 2 concentration field is then transformed using the OCO-2 observation space. Once is transformed, we perturbed the "true observations" with Gaussian random noise to generate pseudo-observations as follows.
The first term of Eq. 9, y sim , represents the OCO-2 simulated observations using the "true" fluxes. The second term of Eq. 9 p is a vector with the same size as y sim and contains normally distributed random numbers with mean zero and variance one. 10 Scaling p by the square root of R ensures that the resulting realisation has the assumed error distribution.

Prior CO 2 fluxes and their uncertainties
As is stated in Section 2.5, the CMAQ model needs hourly emissions to run forward in time. We use the atmospheric convention that a negative flux value indicates an uptake by the surface and a positive value means a release of carbon to the atmosphere.
Our total fluxes were comprised of four datasets representing elements of the CO 2 fluxes: terrestrial biospheric exchange, fossilfuel, fires and air-sea exchange. Hourly biosphere CO 2 fluxes were calculated by combining two data sets: the Net Ecosystem 5 Exchange (NEE) at 0.5 • × 0.5 • and daily resolution and the Gross Primary Production (GPP) at 0.5 • × 0.5 • and 3-hourly resolution from the Community Atmosphere Biosphere Land Exchange (CABLE) model (Harverd, 2018).
The post-processing of 3-hourly NEE data involved four steps. First, we calculated daily GPP. Then we used daily GPP to estimate the daily Ecosystem Respiration (ER); in terms of carbon balance, the ER can be calculated as ER = GPP − NEE.
Finally, daily ER was assumed equal throughout the day and subtracted from 3-hourly GPP to obtain 3-hourly NEE. These 3-hourly NEE fluxes were interpolated to hourly resolution. Recall that for our OSSEs, only the uncertainties, not the values themselves, are used. Given that the optimization was performed to optimize monthly fluxes, the uncertainties were computed 5 with monthly resolution. We assumed that the biosphere flux uncertainties were equal to the Net Primary Production (NPP) simulated by CABLE, with a ceiling of 3 gC m −1 day −1 following Chevallier et al. (2010a).
Fossil-fuel CO 2 emissions were obtained from the Fossil Fuel Data Assimilation System (FFDAS) Asefi-Najafabady et al., 2014). For this study, we used the 2015 FFDAS dataset (Gurney, 2018). The FFDAS uncertainty estimates were created by multiplying the FFDAS emissions dataset with a factor of 0.44. This factor was calculated by linear We used biomass-burning carbon emissions, a product based on GFEDv4 and the Carnegie Ames Stanford Approach (CASA) biosphere model (Randerson et al., 1996). Within the CASA model fire carbon losses are calculated for each grid cell and month, based on fire carbon emissions based on burned area from the GFED dataset. We assumed uncertainties for GFEDv4 corresponding to 20% of the biomass burning carbon emissions.

25
Ocean CO 2 fluxes were derived from the Copernicus Atmospheric Monitoring Service (CAMS) version 15r2 (Chevallier, 2016). The CAMS dataset is a global retrieval product, with a horizontal resolution of 3.75 • in longitude and 1.875 • in latitude at 3-hourly temporal resolution. Prior ocean fluxes estimated by CAMS were based on Takahashi et al. (2009). We assumed that the error statistics were uniform 0.2 gC m −2 day −1 over ocean, as in Chevallier et al. (2010a). After defining the emission profiles and their uncertainties, we incorporated spatial correlations into our prior error covariance matrix B. We assume no temporal correlations. This differs from Chevallier et al. (2010a) who used a temporal correlation length of four weeks, though this would only introduce weak correlations among our monthly-averaged fluxes. Following (Basu et al., 2013, section 3.1.1), the spatial correlation between grid-points r 1 and r 2 was defined as: After defining B, we performed an eigen-decomposition, B = W T wW, where W is a matrix of eigen-vectors and w is a diagonal matrix of corresponding eigenvalues. Figure 4a shows the cumulative percentage variance and demonstrates that 20 eigenvectors account for about 60% of the variance in B. We truncate the eigen-spectrum to retain 99% of the overall variance.
The number required varied each month but was at most 400, compared to approximately 6,700 grid-points. The main reason for this strong truncation is the large correlation length relative to the CMAQ grid resolution. We will test and discuss this later.

5
We solve the minimization with a change of variable x b . Given that our control vector x depends on the size of the multipliers of the principal eigenvectors of B, our vector x b was reconstructed (as is given in Eq. 11). This reconstruction includes a new vector q, which is normalized by the square-root of the eigenvalues of B; this transformation involves minimization with respect to q, rather than x p .
This step (often called pre-conditioning) accelerates convergence. It also simplifies the system since, all target variables have 10 unit standard deviation. In our case, where we solve for perturbations around a background state, they also have a true value of zero. Generating our prior flux for the inversion is achieved by defining a vector of normally distributed random numbers with unit standard deviation and zero mean. The process to generate the pseudo prior is represented in Eq. 11.

15
We used the CMAQ modelling system and its adjoint (version 4.5.1; Hakami et al., 2007) to conduct numerical simulation of the atmospheric CO 2 concentration over the Australian region. The CMAQ modelling system is an Eulerian (gridded) mesoscale Chemical Transport Model (CTM), initially created for air quality studies. It has been previously used to characterise the variability of CO 2 at fine spatial and temporal scales . The choice of an older version of the CMAQ modelling system (cf. the latest version, v5.3) relates to the requirement of the model adjoint (needed to calculate the gradient of the cost function in the inversion).
Thus modelled concentrations are determined only by emissions, the atmospheric transport (horizontal and vertical advection 5 and diffusion), and initial and boundary conditions. Initial and boundary conditions were interpolated from atmospheric CO 2 concentration data from the Copernicus Atmospheric Monitoring Service (CAMS) global CO 2 atmospheric flux inversions Chevallier et al. (2010a). These data have a resolution of 3.75 • in longitude and 1.875 • in latitude with 39 vertical layers in the atmosphere; this dataset was also the basis for the oceanic fluxes used in the prior. The CMAQ chemical transport model (or CCTM) also requires 24-hourly three-dimensional emission data (recall that in our py4dvar system we solve for a 10 perturbation around these background CO 2 fluxes). Here our background CO 2 fluxes were generated by adding the four CO 2 flux fields described in Section 2.4: carbon exchange between biosphere and atmosphere, carbon exchange between ocean and atmosphere, fossil-fuel emissions, and biomass burning emissions.
The CMAQ model is an off-line model, and thus requires three-dimensional meteorological fields as inputs for the transport calculations. We simulated meteorological data using the Weather Research and Forecast model (WRF) Advance Research 15 Dynamical Core WRF-ARW (henceforth, WRF) version 3.7.1 (Skamarock et al., 2008). Details on the physics schemes used in our WRF configuration are shown in Table 2 The WRF model was run with a spin-up period of 12 hours. The initial spin-up period stabilizes the model, that is, the inconsistencies between the initial and boundary conditions diminish in this period.
The WRF modelled meteorology was nudged towards the global analysis fields above the boundary layer. The default gridnudging configuration was used; that is, nudging coefficients were assumed to be 10 −4 s −1 for wind and temperature and 10 −5 25 s −1 for moisture, as suggested by Deng and Stauffer (2006). Nudging has been widely used in mesoscale modelling as an effective and efficient method to reduce model errors (Stauffer and Seaman, 1990). It relaxes the model simulations of wind, temperature and moisture towards driving conditions, preventing model drift over a long-term integration.  (Iacono et al., 2008) Long-wave radiation Rapid Radiative Transfer Model (RRTMG) scheme (Iacono et al., 2008) Surface layer Monin-Obukhov (Monin and Obukhov, 1954) Land/water surface The NOAH land-surface model and the urban canopy model (Tewari et al., 2007) Planetary Boundary Layercs (PBL) Mellor-Yamada-Janjic scheme (Janjić, 1994))

Cumulus
The Grell-Devenyi ensemble scheme (Grell and Dévényi, 2002) The conditions and the WRF model's internal physical processes both contribute.

Observation Operator: CMAQ CO 2 simulations and OCO-2 measurements
As is seen in Eq. 1, we need to compare the CMAQ simulated CO 2 concentration with OCO-2 satellite retrievals. As outlined in Section 2.3, we averaged observations to approximate the observed XCO 2 for any CMAQ grid-cell observed by OCO-2.
To compare modelled and observed concentrations, we used the Eq. 12 (Rodgers and Connor, 2003;Connor et al., 2008)) to 10 convolve the simulated CO 2 concentration with the relevant averaging kernels, as follows: where x a is the OCO-2 a priori, h is a vector of pressure weights, h j is the mass of dry air in layer j divided by the mass of dry air in the total column, a CO2 is the averaging kernel of OCO-2, x a is the OCO-2 a priori profile, and x m is the simulated profile from the CMAQ model. In our py4dvar system, the first and second terms in Eq. 12 represent an "offset term". The our OSSE simulation experiments. This is followed by an analysis of the uncertainty reduction categorized by MODIS land coverage. Finally, we present seven sensitivity experiments to determine the robustness and consistency of our inversions.

Convergence Diagnostic
One interesting diagnostic of the convergence is to compare the cost function at the end of the optimization to its expected theoretical value. In a consistent system, the theoretical value of the cost function at its minimum should be close to half 5 the number of assimilated observations, assuming all error statistics are correctly specified (Tarantola, 1987, p. 211). Table 3 shows the mean (across our five realisations) of the cost function J(x) and its gradient norm ∇ x J. For example, with 842 observations, the theoretical value should be 421. We see that the theoretical value is reached to within a few percent for all months. We see a corresponding decrease in the gradient norm by about 99%.  The number of degrees of freedom for signal (DFS) in our OSSEs is another useful diagnostic of the inversion (Rodgers, 2000, Eq. 2.46). The DFS quantifies the number of independent pieces of information that the OCO-2 measurements can provide given the prior information. In our experimental framework, we computed the DFS following (Chevallier et al., 2007, section 3.4.): 15 where x a represents our posterior estimates. Table 3 shows that on average the DFS in the prior for our four months is about 30. This value is consistent with Fig. 4a and b, which shows that only about 20 eigenvalues account for 60% of the variance in our prior error covariance matrix. The inversion cannot add much information to other components, limiting the DFS. Australia is a special case in this respect since most of the continent comprises semi-arid and arid regions. We assumed that land flux uncertainties are driven by NPP, as simulated by CABLE. Thus, the prior uncertainty will be small in arid and semi-arid 20 regions.

Spatial distribution of uncertainty reduction
The uncertainty reduction between the posterior and prior fluxes is a useful way to evaluate the potential of satellite data to constrain CO 2 fluxes. We calculated the percentage uncertainty reduction following (Chevallier et al., 2007, section 3.5.), as follows: where σ a and σ b are the posterior and prior standard deviations, respectively. Figure  mol m −2 s −2 . We also mask areas with negative uncertainty reduction. Such uncertainty increase is simply a result of the small number of realisations. We will now describe the magnitude and spatial patterns in the uncertainty reduction, and in Section (3.4) we will discuss the uncertainty reduction aggregated by land cover class. 10 In March, the largest uncertainty reductions (Fig. 5a) are located in the north of Australia. In this area, the uncertainty reduction is greater than 30%, reaching values up to 80%. We note that the regions with the largest reduction in uncertainty coincide with the locations with high prior uncertainty (Fig. 3). In June 2015 (Fig. 5b), for instance, the largest uncertainty reduction was found in the north, north-east, east and south-east of Australia, where values range between 70−80% and 60−70% respectively. Uncertainty reduction in September (Fig. 5c) are higher compared to June in the south-east of the 15 country, ranging between 70−80%. This is consistent with the fact that September is in the middle of the growing season in this part of Australia and our prior uncertainties are driven by NPP. Also, more satellite soundings are available for this region in September compared to other months. The uncertainty reduction in December (Fig. 5d) decreases in the north of Australia to 20−30%. This is likely due to the fact that relatively few OCO-2 soundings are available in that month (Fig. 2), due to increased cloud coverage during the wet season in northern Australia. This is discussed further in the next section.

Uncertainty reduction over Australia by MODIS land cover classification
To get a better understanding of the constraint on CO 2 surface fluxes provided by OCO-2, we aggregated the prior and posterior fluxes into six categories over Australia: grasses and cereal (GS), shrubs (SH), evergreen needle-leaf forest (ENF), savannah (SAV), evergreen broadleaf forest (EBF), and unvegetated land (UN). We used the MODIS Land Cover Type Product (MCD12C1) Version 6 data product. The distribution is shown in Fig. 6. After aggregating fluxes for each realisation we well our assumed prior uncertainties (we should also note that, due to computational limitation, the uncertainty reduction is based only on these five realizations).
The largest uncertainty reduction in March is over SH (81%). The large uncertainty reduction is likely due to the large number of OCO-2 soundings in this region (464 observations). The next largest uncertainty reductions are over GC (78%) and ENF forest (68%) likely due to the relatively large NPP in these regions (Fig. 3a). 10 June shows less uncertainty reduction for GS (51%) compared to March likely due to the smaller number (one third as many) of OCO-2 soundings (Fig. 2) in southern Australia. similarly, uncertainty reduction over the SH ecotype decreases. Due to the small number of realizations, however, this percentage of reduction might not be representative of this region. For this category we can see that the prior uncertainty with 5 realizations is about 0.1 PgC y −1 whereas with 100 realizations it is about 0.25 PgC y −1 . Uncertainty reduction over SAV is about 31%, similar to the percentage of reduction found in March. Even though 15 we found relatively few soundings over EBF and ENF forest in June, uncertainty reductions for these regions are 47% and 7%, respectively. The reduction over UN areas is about 39%, again demonstrating the potential of OCO-2 data to constrain fluxes.
In September the most significant uncertainty reduction was found over EBF (74%) and GC (68%) compared with all other months, associated with the peak of the growing season in much of Australia. Uncertainty reductions in these categories are much larger due to the increase of OCO-2 soundings in south-eastern Australia (see Fig. 2c). The uncertainty reduction over areas designated as SAV and ENF forest is about 53% and 30% respectively. Over areas classified as SH and UN, we see a weaker uncertainty reduction of 22% and 33%.
Similar to September, in December we found the largest uncertainty reductions over EBF (72%) in line with the structure of the uncertainties seen in southern-east of Australia in (Fig. 3c). The percentage of uncertainty reduction found over GC (77%) 5 may not represent the precise percentage for this category (given the small number of realisations used). For this category, we see that the prior uncertainties of 100 realizations is about 0.17 PgC y −1 , whereas with five realizations it is about 0.28 PgC y −1 . We would expect to have a smaller uncertainty reduction for this category due to scarcity of soundings available in the North and north-eastern Australia for this month, likely due to cloudiness associated with the wet season. Uncertainty reductions found over areas classified as SH, SAV and ENF were 56%, 62%, and 36% respectively. Different to other months, 10 the uncertainty reduction over UN are about (58%). 3.5 Uncertainty reduction in the total Australian CO 2 flux Table 4 shows the standard deviation of the total CO 2 flux uncertainty over Australia for the four months in which inversions were run. Months with the largest uncertainty reductions are found in December (80%), March (76%) and September (70%).
In contrast with these results, the smallest reduction is found in June (31%). The last of these results is not surprising, since June is the month with the smallest number of OCO-2 soundings (for this month we only find 694 observations compared to 5 September and March, with 1002 and 842 soundings, respectively).
Differences in the uncertainty reduction between months not only depend on the number of soundings and the structure of the uncertainty but also other variables (e.g. wind direction). Coastal grid-points present a problem for our inversion when the wind direction comes from the ocean because our system only assimilates data over land). Prevailing winds in this coastal zone restrict the ability of OCO-2 to constrain surface fluxes (Supplementary Figs. S1-S3).

Sensitivity Experiments
To assess the robustness and consistency of the previous results, we performed seven different sensitivity experiments (S1, S2, S3, S4, S5, S6-A, S6-B), which are summarized in Table 5. These experiments follow the same randomisation approach shown in Section 2, but with the following changes: -S1: Test the effect of reducing the correlation lengths in our prior error covariance matrix B. The correlation length was 15 changed from 500 km to 50 km over land, and from 1000 km to 100 km over the ocean. By reducing the correlation length, the number of retained eigenvectors increased from 811 (control experiment) to 4101. The shorter correlation lengths allow a larger selection of possible flux structures, requiring more eigenvalues to capture the possible variance.
-S2: Assess what percentage of uncertainty reduction of the Australian flux is affected by excluding glint land observations from our inversion. Our control cases treat land nadir and glint data as one single dataset because of the small offset 20 between them. The number of observations influences the footprint coverage, and therefore, the number of fluxes we can solve. In this particular experiment, we would expect a smaller uncertainty reduction of Australian flux because the number of observations has been reduced from 842 to 419.
-S3: Evaluate the effect of having uniform uncertainties over land and a simplified structure of B. In this case, we assumed uncertainties of 3 gC day −1 over land with correlation lengths of 5 km over land and 10 km over ocean. This change effectively transforms B into a diagonal matrix.
-S4: Test the impact of adding a bias of 3.3 ppm to the OCO-2 observations. Here, biases were calculated by taking the differences between the raw and bias-corrected XCO 2 values found in the OCO-2 retrieval product. We performed this 5 experiment because some studies (e.g., Chevallier et al., 2007) indicate that just a few tenths of a part per million bias in the observations are enough to prevent the inversions from converging on optimal fluxes.
-S5: Test the impact of introducing a mean absolute bias of 0.21 PgC y −1 to prior fluxes. In this experiment, the prior bias were created using a normal Gaussian random perturbation of the prior uncertainty. For all five realization, biases were introduced as constant component. in the same way that we solve for the surface fluxes, as they are not among the key results (i.e., BCs were treated as nuisance variable). In this case, we gave the optimizer the ability to modify the BCs while it is optimizing surface fluxes.
For this test, we assumed uniform uncertainty of 1 ppm s −1 . This is applied as an additive perturbation to temporally and spatially varying concentration boundary conditions based on the CAMS global CO 2 simulations.

Degrees of Freedom for Signal
20 Table 6 shows the number of retained eigenvalues from B and the DFS for sensitivity experiments S1, S2, S3 and control cases. Experiment S1 shows that merely reducing correlation lengths does not lead to extra information being resolved by the observations. S2 shows that, as expected, subtracting observations from our inversion resolves less information on fluxes.
Experiment S3 (in which we reduce correlation lengths but also increase the uncertainty on many grid points) demonstrates an increase in the number of components resolved by the observations. The comparison of S1 and S3 suggests it is the low 25 uncertainty rather than the smoothness imposed by the uncertainty correlations that limits the DFS.
Land nadir data is defined as (LN), and land nadir and glint data as (LNG). 4.2 Spatial distribution of uncertainty reduction over Australia Figure 8 shows the spatial distribution of the uncertainty reduction at grid-scale over Australia for sensitivity experiments S1, S2 and S3. These figures should be compared to Fig. 5a (control case). Experiment S1 shown in Fig. 8a demonstrates that the correlation length plays a significant role in the uncertainty reduction. A lower correlation length yields a lower reduction of the uncertainties. For example, the error reduction over the productive areas in northern and north-eastern Australia is between 5 0−20% compared to the control experiment's 40−80%. This implies that longer correlation length-scales allow for information to be effectively "transferred" in space, thus pooling data over a wider region and magnifying the benefit from the assimilation.
Experiment S2 (Fig. 8b) illustrates that decreasing the number of observations, also reduces the percentage of reduction per grid-cell. The uncertainty reduction (40−60%) is much weaker than the control experiment. These results complement Table 6, where the DFS decrease from 38.66 (control experiment) to 34.38 (S2).
Experiment S3 (Fig. 8c) shows how the structure and magnitude of the prior uncertainty influence uncertainty reduction.
The uncertainty reductions are distributed almost uniformly across Australia and their values range between 0−20%. Our as-5 sumption of a linear relationship between uncertainty and NPP means much of Australia has negligible impact on the prior uncertainty in the control case. This result shows the importance of that assumption. Assuming equal uncertainty across Australia may have a significant impact on the final total flux estimate, because most of the continent is largely composed of arid and semi-arid land. The small percentage of uncertainty reduction is due to the negligible correlation length assumed in the prior error covariance matrix.  4.3 Uncertainty reduction over Australia by MODIS land cover classification Fig. 9 shows the uncertainty reduction for the sensitivity cases S1, S2, and S3 aggregated by ecotype. There is good consistency between the geographical distribution (Fig. 8) and these spatial aggregates. Thus for case S1, the uncertainty reductions were found to be small compared to the results in the control experiment (Fig. 7a). For example, the sensitivity case S1 in Fig. 9a shows uncertainty reductions over GS and UN are about 30% and 1%, respectively. No uncertainty reductions are observed 5 over SH, SAV, EBF and ENF. Because of an insufficient number of realizations, for these particular categories, we found a negative error reduction. In these land-use classes, we display the posterior to be equal to the prior uncertainty.
Similarly, case S2 (Fig. 9b) displays significantly weaker uncertainty reductions for some of the six land-use classifications compared to the control experiments ( (Fig. 7a). For instance, the fractional uncertainty reductions over GC and SH reach values of about 51% and 57%, respectively. In the control experiment in (Fig. 7a) these values were 78% and 81% respectively. As 10 mentioned in the previous section, the stronger posterior reduction is due to the correlation length in the prior covariance and an increase of the OCO-2 soundings over Australia. Findings in the sensitivity case S3 (Fig. 9c) shows similar results to those found in sensitivity case S1: the smaller the correlation length, the less efficient the inversion.

Uncertainty reduction in the total Australia CO 2 flux uncertainty
Uncertainty reduction of the total Australian CO 2 flux for sensitivity experiments S1, S2 and S3 are shown in Table 7. Experiment S1 shows that the regional flux uncertainty in Australia was only reduced by (∼ 9%) compared to control case (which was 76%). In this test, we can see again the importance of the choices of the correlation length in B. We saw in Table 6 that by decreasing the spatial correlation to 5 km over land, we increase the number of principal components. Given the small 5 number of realizations and an increase in the number of components in the prior, we expect that this estimate of the uncertainty reduction may be less representative using our randomization approach.
Experiment S2 shows an uncertainty reduction over Australia from 73% compared to 76% (control case). This small shift in the percentage of reduction is related to the number of soundings found in the northern region of Australia. By removing glint land data from our observations, we are reducing the coverage of surface flux footprints. 10 Experiment S3 demonstrates the same artefact as S1, though the generally higher prior uncertainties in S3 result in a higher uncertainty reduction for the total Australian flux. In this case, the assimilation reduce the total uncertainty to 34%.

Impact of OCO-2 biases on the posterior fluxes
We mentioned in Section 4 that potential biases in the observations prevent the inversions from converging on optimal fluxes.
The results of Experiment S4 confirms that biases in the observations do indeed affect the resulting posterior fluxes. After adding biases of about 3.3 ppm our inversion produced a posterior flux, which was bias by approximately 5.0 PgC y −1 over Australia. This value indicates that in order to obtain an accuracy of 0.1 PgC y −1 in the total Australian flux, bias in the 5 observation must be reduce roughly to 0.07 ppm. This sensitivity case shows us the importance of minimising biases in the observations, if the goal is to estimate accurately CO 2 fluxes. Figure 10 illustrates the impact of the observational biases on the posterior mean fluxes in each of the 6 MODIS land-use categories. Significant biases are observed over SH (1.7 PgC y −1 ), GS (1.4 PgC y −1 ), and EBF (0.9 PgC y −1 ). For each category, the inversion system only generates positive flux biases, consistent with the direction of the bias in the observations. Our results are mainly due to large biases we prescribed in the observations. 10 Finally, we found that uncertainty of prior and posterior were 0.68 and 0.25 PgC y −1 , respectively. Given the magnitudes of the prior uncertainties (and hence biases in this case) this result is consistent with the control case.

Unbiased Prior CO 2 flux
Results of experiment S5 are illustrated in Figure 11. This figure shows the monthly mean biases (black diamonds) added to our prior true fluxes (assumed to be 0.0 PgC y −1 ) categorized by MODIS ecotype. In this Figure, we can see that after performing the inversion we can recover successfully the mean of our true fluxes (dashed grey line). On average the total biases added to our Australian prior flux was about 0.21 PgC y −1 (using a conversion factor of 2.12 PgC/ppm, this value is equivalent to 5 adding 0.1 ppm bias). After performing the inversion the posterior mean bias was reduced to 0.024 PgC. The distribution of the fluxes across the different land-use classes (centred around zero; Fig. 11) reflects the fact that biases added to our prior were randomly distributed. We added negative biases to GC and SH (-0.12, -0.05 PgC y −1 ) and positive bias to SAV, EBF (-0.05, 0.05 PgC y −1 ). We can see clearly in this figure that the inversion system is able to handle negative and positive biases.

Impact of boundary condition biases on the posterior fluxes
Unlike global flux inversions, regional flux inversions are sensitive to lateral boundary conditions (BCs). To explore how sensitive is our system to biased BCs, we ran two further sensitivity experiments (collectively termed 'S6'). In sensitivity experiment S6-A we increased the BCs by adding 0.5 ppm to each boundary grid cell. Findings of this experiment show that

Solving for the boundary condition in the inversion
Experiment S6-B was designed to see if the inversion could correct for biases in the boundary conditions given additional parameters to optimize. After solving for BCs in the inversion, the biases introduced to BCs in S6-A were corrected. We analyzed the corrections, looking at the bias of the posterior flux for each land-use category. Figure 12 shows that the decrease of biases over GC was significant. In this category biases were reduce from -0.11 to -0.019 PgC y −1 . Similar results were 5 found over SAV, EBF, ENF, where biases were also reduced. Biases over SH does not show much improvement. In this category biases decreased only from -0.30 to -0.20 PgC y −1 . After a wind-rose analysis for 10 selected locations around the coast in west Australia (Supplementary Figs. S1-S3) we found that the small reduction of the biases in this category is explained by the orientation of the wind in March. When winds come from ocean, the inversion loses the ability to correct the wrong BCs. The treatment of the bias in BCs is relatively simple, with a goal of introducing relatively few additional parameters into the control 10 vector. The experimental design assumes that these biases are constant in time and across large areas of the domain. The biases in the BCs were generated with the same framework as was used to solve for them (i.e. fully specified by eight parameters).
In reality, error in BCs will vary in both space and time. Thus, the results here are indicative and but suggest that biases (as opposed to fluctuations) at least can be accounted for in such a system. 5 Comparison between CMAQ simulations and OCO-2 observations 15 One key uncertainty in any OSSE is the realism of the observational uncertainties. One simple test involves performing a limited inversion of data and assessing whether the cost function (Eq. 1) is consistent with the number of observations. Unlike the OSSE, this is not guaranteed; in the 'real-data' inversion, there are likely errors in the atmospheric transport and the initial and boundary conditions. To test this, we performed an inversion for March 2015 using nadir and glint data. As mention in Section 2.2, we added a scaling factor for the initial condition to our target variables for test the inversion. Fig. 13 shows a histogram of residuals between the CMAQ model simulations using optimised fluxes and OCO-2 observations. We can see that the monthly mean bias was reduced from 0.49 to -0.01 ppm, with a decrease in the root mean square error (RMSE) from 1.08 to 0.89 ppm. While these are based on the same data that were assimilated and do not necessarily 5 show that the posterior fluxes are closer to the truth, it does show that our system is self-consistent. The cost function J(x a ) at its minimum is 418.52, close to half the number of observations (842).

Discussion
In this paper, we quantified the potential uncertainty reduction in monthly CO 2 fluxes when assimilating OCO-2 satellite retrievals with a regional-scale model at approximately 80 km grid-resolution. If we compare our results shown in Fig. (5) 10 against, for example, Figure 2 of Chevallier et al. (2007) we see that our grid-scale uncertainty reductions are higher than those of Chevallier et al. (2007) by almost a factor of 2, using nadir and glint data over land. In Chevallier et al. (2007) uncertainty reductions in Australia are about 30−50% over productive areas while in this study they reach 60−80%. One possible explanation for this is the lower observational uncertainty assumed in our study, averaging 0.6 ppm compared with 2 ppm assumed by Chevallier et al. (2007) before OCO-2 was launched. We can also compare our results with those for the 15 in-situ network studied by Ziehn et al. (2014). At the national scale, Ziehn et al. (2014) suggested an uncertainty reduction of 30% while we see 76% for our control case.
Our results must be interpreted with caution because, like all OSSEs, they depend strongly on assumed inputs (such as B and R), which are difficult to characterize. In particular, we have assumed that the CABLE NPP (Haverd et al., 2013a) is a good proxy for biospheric net flux uncertainty, following Chevallier et al. (2010a). Chevallier et al. (2010a) used a different model and a different domain, so these assumptions may require further testing in our model configuration and region of interest.
In future, we could compare CABLE simulations against eddy-covariance CO 2 flux measurements following Chevallier et al. 5 (2012). Characterization of the prior biospheric flux over semi-arid regions in Australia is critical to account for the interannual variability of these ecosystems (Poulter et al., 2014). Recent studies (e.g., Poulter et al., 2014) have suggested that the semi-arid regions in Australia could become an important driver of the carbon cycle in comparison with ecosystems dominated by tropical rainforests.
Sensitivity experiments S1 and S3 show that the uncertainty reduction in CO 2 surface fluxes over Australia is sensitive to a 10 combination of both magnitude and spatial distribution of the uncertainty, as well as the choice of the correlation length-scale.
We saw in case S1, for example, that by reducing the correlation length in B, we do not necessarily increase the number of degrees of freedom (DFS) in our prior compared to the control. These findings suggest that the number of DFS in our prior fluxes depends more on the spatial distribution of error variance than on the assumed correlation length-scale. These results are much clearer in experiments S3, where the distribution of the uncertainty is uniform across Australia. In this case, we see that 15 the number of DFS increases by increasing the magnitude of the uncertainty across Australia. In sensitivity case S2, we saw that by subtracting glint data, our system was able to solve for fewer DFS in the fluxes compared to the control experiment.
Sensitivity experiment S4 shows that the existence of biases in the observations has a significant impact on our posterior flux estimate. Adding biases to our simulated OCO-2 observation prevents our inversion from converging on optimal fluxes. We saw in Section 4.5 that adding biases (corresponding to an average increase of 3.3 ppm) to the observation, the posterior flux 20 is also bias by about 5.0 PgC y −1 . Our results are in agreement with Chevallier et al. (2007Chevallier et al. ( , 2010a, and shows that regional biases in column-averaged CO 2 can significantly bias our posterior fluxes. Similar results are found in experiment S6-A, which looked at biases in boundary conditions. Adding 0.5 ppm to the boundary conditions also has an impact on our posterior fluxes. Increased BCs resulted in negative bias in the posterior fluxes, and to a degree that was consistent with sensitivity case S4. These findings suggests that our regional flux inversion is sensitive to boundary conditions, therefore, in a real inversion 25 controls on boundary conditions should be included them in the state vector, in addition to the surface fluxes.
Results in sensitivity case S5 shows that biased prior fluxes satisfy the theoretical assumption in the variational optimization similar to using an unbiased prior case. We demonstrated that our system is able to handle the impact of possible biases in the CMAQ model that might contaminate the resulting posterior fluxes.
Another direction for future work would be to explore the impact of a finer temporal and horizontal on the resulting fluxes. 30 Model simulations at higher spatio-temporal resolutions have been shown to have better agreement with observations, partly on account of allowing for a better representation of the measurements. (Law et al., 2004;Peylin et al., 2005;Patra et al., 2008).
However, as we saw in Section 2.3, we found it necessary to average OCO-2 soundings before assimilating in the system. To simplify this process, the averaging process removed any 1-second soundings that spanned multiple grid-cells in the CMAQ domain. This is about 7 km in along-track distance. If we use a finer resolution than 80 km, we could remove more soundings and thus weaken our constraint.
We emphasise again that our study quantifies the uncertainty but not the realism of our posterior flux estimates. The assessment of posterior fluxes from assimilation of real data will be the subject of an upcoming paper. This requires comparison with independent concentration data or, if available, flux estimates at comparable scales.

7 Conclusion
We have performed an observing system simulation experiment for the retrieval of CO 2 fluxes over Australia using OCO-2 data and a regional-scale flux inversion system. The main findings indicate that OCO-2 nadir and glint (version 9) data can provide a moderate (≈ 30%) to significant (>70%) constraint on the Australian CO 2 flux uncertainty in 2015 (for most months studied).
We saw that these reductions at a grid-point resolution reached values of about 90%, with the largest uncertainty reductions 10 being observed over biologically productive areas. Small uncertainty reductions are found over arid and semi-arid ecosystem, where we assumed the prior uncertainties where small. These reductions only become significant when aggregating by landuse classifications (e.g. shrubs 20%-80% ). For future work, it is relevant to consider a better characterization of our prior uncertainties in this region to account for the inter-annual variability of the carbon cycle in these semi-arid regions. Sensitivity experiments show that uncertainty reductions are quite sensitive to the assumed prior correlations but less sensitive to the 15 spatial distribution of prior uncertainties. Moreover, we also saw that by excluding glint data from the assimilated observations, we reduce the coverage of the surface flux footprint, and therefore, the uncertainty reduction of the total Australian flux. It seems likely, therefore, that this combination of land and glint data can help quantify the Australian carbon cycle, provided simulations are sufficiently realistic. Finally, we showed that such OSSE experiments are useful to test the potential of the inversion to possible biases in the observation, prior and boundary conditions. Our future work will focus on the application of 20 this assimilation system to estimate CO 2 surface fluxes in Australia as a contribution to the Regional Carbon Cycle Assessment and Processes (RECCAP) project.
Code availability. The py4dvar code was written by Steven Thomas and Peter Rayner and it can be found on GitHub. The code is available upon request from the authors.
Appendix A: Convergence Diagnostic Table A1. Convergence diagnostic of the inversion system using an ensemble of five independent OSSEs for March 2015 (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.

March, 2015
Realizations  Table A2. Convergence diagnostic of the inversion system using an ensemble of five independent OSSEs for June 2015 (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.

June, 2015
Realizations  Table A3. Convergence diagnostic of the inversion system using an ensemble of five independent OSSEs for September 2015 (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.

September, 2015
Realizations  Table A4. Convergence diagnostic of the inversion system using an ensemble of five independent OSSEs for December 2015 (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.
December, 2015   Table C1. Convergence diagnostic of sensitivity case (1) after the inversion using an ensemble of five independent OSSEs for March 2015 (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.

March, 2015
Realizations  Table C2. Convergence diagnostic of sensitivity case (2) after the inversion using an ensemble of five independent OSSEs for Marc 2015 (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.

March, 2015
Realizations (∇xJ0 and ∇xJ0 represents the initial cost function and its gradient at the beginning of the optimization, and ∇xJ f and ∇xJ f at the end of the optimization.

March, 2015
Realizations Appendix D: Sensitivity cases: Uncertainty reduction of the total CO 2 Australian flux classified by MODIS ecotype