Greenhouse Gas Network Design Using Backward Lagrangian Particle Dispersion Modelling – Part 2: Sensitivity Analyses and South African Test Case

This is the second part of a two-part paper considering a measurement network design based on a stochas-tic Lagrangian particle dispersion model (LPDM) developed by Marek Uliasz, in this case for South Africa. A sensitivity analysis was performed for different specifications of the network design parameters which were applied to this South African test case. The LPDM, which can be used to derive the sensitivity matrix used in an atmospheric inversion, was run for each candidate station for the months of July (representative of the Southern Hemisphere winter) and January (summer). The network optimisation procedure was carried out under a standard set of conditions, similar to those applied to the Australian test case in Part 1, for both months and for the combined 2 months, using the incremental opti-misation (IO) routine. The optimal network design setup was subtly changed, one parameter at a time, and the optimisa-tion routine was rerun under each set of modified conditions and compared to the original optimal network design. The assessment of the similarity between network solutions showed that changing the height of the surface grid cells, including an uncertainty estimate for the ocean fluxes, or increasing the night-time observation error uncertainty did not result in any significant changes in the positioning of the stations relative to the standard design. However, changing the prior flux error covariance matrix, or increasing the spatial resolution, did. Large aggregation errors were calculated for a number of candidate measurement sites using the resolution of the standard network design. Spatial resolution of the prior fluxes should be kept as close to the resolution of the transport model as the computing system can manage, to mitigate the exclusion of sites which could potentially be beneficial to the network. Including a generic correlation structure in the prior flux error covariance matrix led to pronounced changes in the network solution. The genetic algorithm (GA) was able to find a marginally better solution than the IO procedure, increasing uncertainty reduction by 0.3 %, but still included the most influential stations from the standard network design. In addition, the computational cost of the GA compared to IO was much higher. Overall the results suggest that a good improvement in knowledge of South African fluxes is available from a feasible atmospheric network, and that the general features of this network are invariable under several reasonable choices in a network design study.


Introduction
Mitigating climate change is one of the great challenges of our time.To further this end, it has become essential to accurately estimate the emission and uptake of CO 2 around the globe.Greenhouse gases affect the absorption, scattering and emission of radiation within the atmosphere and at the Earth's surface (Enting, 2002;Denman et al., 2007).Of these gases, CO 2 contributes the greatest forcing on the climate (Canadell et al., 2007).Monitoring CO 2 sources and sinks is necessary for validating important components of climate models and for determining the best course of action to mitigate climate change.The method of inverse mod-Published by Copernicus Publications on behalf of the European Geosciences Union.

2052
A. Nickless et al.: Greenhouse gas network design South Africa elling can be used to estimate the magnitude of sources and sinks of CO 2 at different temporal and spatial scales (Enting and Mansbridge, 1989;Rayner et al., 1999;Rödenbeck et al., 2003;Chevallier et al., 2010).This method relies on precision measurements of atmospheric CO 2 concentrations to refine the prior estimates of the CO 2 fluxes.Using the machinery of atmospheric inversion, an optimal network of new sites to add to the existing infrastructure for measurement of atmospheric CO 2 concentrations can be derived from a list of potential sites.
Previous optimal network studies run at the global scale have highlighted southern Africa as a region associated with large uncertainty in its terrestrial CO 2 fluxes, requiring further constraint by measurements (Patra and Maksyutov, 2002).Measurements over Africa are much sparser compared to other continents.Only the Cape Point Global Atmospheric Watch (GAW) station has a long-term CO 2 concentration record, measuring since 1992.This tower is located at Cape Point (34.35 • S, 18.49 • E) predominantly to record baseline measurements of well-mixed, clean air originating over the Southern Ocean.A study by Whittlestone et al. (2009) demonstrated that it would be difficult to improve estimates of terrestrial CO 2 fluxes for southern Africa using the Cape Point station alone.In 2012, an atmospheric observatory was installed near the Gobabeb Training and Research Centre, on the west coast of Namibia (22.55 • S, 15.03 • E), which continuously measures trace gases, including CO 2 (Morgan et al., 2012).To build on this rudimentary network, and to improve estimates of CO 2 fluxes at least for South Africa, high-precision instruments for measuring atmospheric CO 2 concentrations have been purchased, which are to be installed at sites yet to be determined, across South Africa.To maximise the impact of these stations on the estimation of CO 2 fluxes across South Africa, an optimal network design can be used to indicate the best placement of new stations with the aim of reducing the uncertainty of the terrestrial CO 2 source and sink estimates.A reduction in the uncertainty of the estimated fluxes is only one of many considerations when determining the location of new measurement sites, but an optimal network design with this goal will provide a guide which can be included in the assessment of these new locations.. Part 1 of this paper conducted an optimal network design study for Australia aimed at augmenting its current observation network, and introduced the methodology employed in this study (Ziehn et al., 2014).
An optimal network design requires the theory of atmospheric inversions to generate the posterior error covariance matrix of the CO 2 fluxes which would be estimated from a given network of stations.From this the reduction in uncertainty can be determined.The second requirement is an optimisation routine which will select between a list of potential sites (Rayner et al., 1996;Patra and Maksyutov, 2002;Rayner, 2004).Part 1 of this paper sought to reduce the uncertainty of Australian terrestrial fluxes by 50 % and began by considering the addition of new stations to the existing base network (Ziehn et al., 2014).Similarly, the Cape Point and Gobabeb stations make up a base network of CO 2 monitoring stations for southern Africa.This optimal network design will add five new measurement stations to our base network to best reduce the uncertainty in flux estimates across the region, and under the assumption of continuous, hourly measurements of CO 2 concentrations.
The posterior flux error covariance matrix used to derive the uncertainty metric does not require any knowledge of the measured concentrations or of the prior fluxes, only of the prior error covariance matrix of the fluxes, the error covariance matrix of the observations and the sensitivity matrix, which are all determined separately.Basing the cost function of the optimisation procedure on the result of the posterior error covariance matrix of the fluxes under a given network ensures the uncertainty in the estimated fluxes under the final network solution is reduced.As in Part 1 (Ziehn et al., 2014), the incremental optimisation (IO) procedure was used for the standard optimal network design in this study.We used a regular grid of potential stations for the South African case study.The reason for doing is that, unlike Australia, South Africa does not have the relatively dense network of meteorological stations suitable for atmospheric monitoring.Therefore, if we were to base the network on the existing sparse network of stations, we could exclude important sites which may provide better constraint.Therefore we have chosen the regular grid, and the sites selected in the optimal network can then be further investigated to determine if there is infrastructure available -such as meteorological stations, communication towers or other research facilities -which could be amenable to atmospheric measurements.
As well as providing this first-time optimal network design focusing on CO 2 flux estimation over South Africa, the paper presents a sensitivity analysis of several parameters needed in the optimisation routine.For the standard case we used parametrisations which were most commonly implemented in the literature.We then considered alternatives and determined their impact on the network.This analysis is important because, as shown by Rayner et al. (1996), certain changes to the optimisation problem -such as changing the quantity to be optimised, even if very similar in nature to the originalcan result in drastically different placement of stations.This would ultimately impact on the implemented network design used in deployment of the new stations.By having alternative network solutions based on parametrisation changes, we can assess how important certain stations are, since these should remain constant from one network solution to the next despite these changes, and it provides insight into which parameters are likely to be important for the estimation of fluxes using the new network of measurement sites.
The inversion procedure requires a sensitivity matrix which calculates the contribution of the different sources to the CO 2 concentration at a particular measurement site.We used a stochastic Lagrangian particle dispersion model (LPDM), driven by the global circulation model Conformal-Cubic Atmospheric Model (CCAM) run in stretched-grid regional mode, to determine this matrix.One set of parameters that we considered for the sensitivity analyses related to the specified dimensions of the surface grid box in which the particles provided by the LPDM are counted.This is determined by the spatial resolution of the problem.The next set of parameters we considered relates to the two error covariance matrices which are needed for the calculation of the posterior flux error covariance matrix.We changed how these matrices were parametrised and assessed the consequences for the optimal network design.Finally we implemented an alternative optimisation procedure to IO and considered the optimisation of a different metric of uncertainty in the fluxes.As the alternative optimisation procedure, we used the genetic algorithm (GA), as described by Rayner (2004), which uses a very different optimisation philosophy to the IO method.
This paper proceeds by introducing the inversion methodology, followed by an explanation of the different sensitivity tests.The results are then presented for the South African optimal network design under the standard conditions, followed by a comparison of the sensitivity tests.The conclusions provide insight into the most influential locations identified and discuss courses of action to address the optimal network design parameters highlighted in the study.

Surface flux inversion
The Bayesian synthesis inversion method, first proposed by Tarantola (1987) and used for the network design in this paper, is the most common method used for solving atmospheric inverse problems in the literature (Rayner et al., 1996;Bousquet et al., 1999;Kaminski et al., 1999;Rayner et al., 1999;Gurney et al., 2002;Peylin et al., 2002;Gurney et al., 2003;Law et al., 2003;Baker et al., 2006;Ciais et al., 2010;Enting, 2002).The regional inversion method we implemented is explained in detail in Part 1 (Ziehn et al., 2014).The observed concentration (c) at a measurement station at a given time can be expressed as the sum of different contributions from the surface fluxes (c s ), from the boundaries (c b ) and from the initial condition (c i ).For the purposes of the network design, the initial concentration is ignored since it is assumed that this condition is constrained by the observations.Peylin et al. (2005) found for their European regional inversion that the initial condition had an effect on the flux estimates for only a few days.In a smaller domain, this effect will be even shorter, and therefore it is assumed that the initial condition will contribute very little to the total flux uncertainty.
The linear relationship used to model the relationship between c and the contribution from the sources (c s and c b ) is as follows: (1) The vector of the modelled concentrations c mod is a result of the contribution from the sources f , described by the transport or sensitivity matrix T. The vector f can be composed of surface fluxes and boundary concentrations (Lauvaux et al., 2012).The surface fluxes for which our inversion setup would solve are the total CO 2 fluxes within a pixel, which we take to be the sum of the biospheric and fossil fuel fluxes.We aim to solve for the total flux since there is not enough information to disentangle these fluxes.In this type of inversion setup, the surface fluxes can be separated into biospheric and fossil fuel fluxes after the inversion run, using additional information regarding either the fossil fuel or biospheric fluxes (Chevallier et al., 2014).The contribution from the boundaries was first assessed to determine if its influence on the observation errors was negligible, in which case the boundary conditions could be excluded from the network design process.We developed an algorithm for assessing the contribution of the boundary concentrations on the observation error covariance matrix in Sect.2.7.
As described in Part 1, for the network design approach we are only interested in the posterior covariance matrix of the fluxes, since our aim is to obtain a network that reduces the CO 2 flux uncertainties (Ziehn et al., 2014).The posterior flux error covariance matrix, C f , can be calculated as follows (Tarantola, 1987): where C c is the covariance matrix of the observation errors, and C f 0 is the prior error covariance matrix of the surface fluxes.C f is obtained without the vector of observed concentrations c or the vector of prior fluxes f 0 , which means that it is possible to assess the contribution that a hypothetical station can have on the reduction of the flux uncertainty, without the need to generate synthetic data.

Lagrangian particle dispersion model (LPDM)
To determine which sources and how much of each of these sources a measurement site sees at a given moment, the sensitivity matrix T containing the influence functions is required.For a regional inversion this matrix can be directly obtained by running a Lagrangian particle dispersion model in backward mode.The particles are released from the measurement locations and travel to the surface and the boundaries (Lauvaux et al., 2008;Seibert and Frank, 2004).We used the A. Nickless et al.: Greenhouse gas network design South Africa LPDM developed by Uliasz (1994).In this mode the model simulates the release of a large number of particles from arbitrary emissions sources by tracking the motion of the particles backward in time (Uliasz, 1993(Uliasz, , 1994)).By running the model in this receptor-orientated mode, the influence functions for a given receptor are calculated, as described in Part 1 (Ziehn et al., 2014).
The LPDM is driven by the three-dimensional fields of mean horizontal winds (u, v, w), potential temperature and turbulent kinetic energy (TKE).In the case of the South African network design, these variables are produced by the CSIRO CCAM, a variable-resolution global circulation model run in regional mode.We use the regional mode so that we can resolve the atmospheric transport at a high temporal resolution, which requires that the transport model be run at a high spatial resolution as well (Sarrat et al., 2009).CCAM uses a two time-level semi-implicit semi-Lagrangian method to solve the hydrostatic primitive equations (McGregor and Dix, 2008;McGregor, 2005;McGregor and Dix, 2001).Total-variation-diminishing vertical advection is applied to solve for the advective process in the vertical.CCAM employs a comprehensive set of physical parametrisations, including the Geophysical Fluid Dynamics Laboratory (GFDL) parametrisation for long-wave and shortwave radiation (Schwarzkopf and Fels, 1991) and the liquid and ice-water scheme of Rotstayn (1997) for interactive cloud distributions.A canopy scheme is included, as described by Kowalczyk et al. (1994), having six layers for soil temperatures, six layers for soil moisture (solving the Richards equation) and three layers for snow.The cumulus convection scheme uses mass flux closure and includes both downdrafts and detrainment (McGregor, 2003).
In the simulations performed here CCAM is applied in stretched-grid mode by utilising the Schmidt (1997) transformation.A multiple-nudging strategy was followed.First, a modestly stretched grid providing 60 km resolution over southern and tropical Africa was applied following Engelbrecht et al. (2009), with subsequent downscaling to a strongly stretched grid providing 15 km resolution over southern Africa.Away from the high-resolution region over southern and tropical Africa, CCAM was provided with synoptic-scale forcing of atmospheric circulation, from the 2.5 • (about 250 km) resolution National Centers for Environmental Prediction (NCEP) reanalysis data set.This forcing was provided at 6-hourly intervals for the period 1979-2010 using a scale-selective Gaussian filter (Thatcher andMcGregor, 2009, 2010).CCAM was set up so that it produced output at an hourly time step and at a 0.15 • spatial resolution over South Africa.The domain extended far beyond the South African border, from 10 to 40 • S and from 0 • W to 60 • E. Meteorological inputs for the LPDM were extracted for 2 months in 2010: January for summer and July for winter.For a 4-week period during each of these months, the LPDM was run for each of the hypothetical measurement sites.
During processing of the particle count data from the LPDM, particles that were near the surface were allocated to a surface grid cell and the total count within each of these was obtained to determine the surface influence or sensitivity.These counts depended on the dimensions and position of these surface grid boxes.The particle counts were used to calculate the source-receptor (s-r) relationship, or influence functions, which form the sensitivity matrix T. Here, we followed Seibert and Frank (2004) to derive the elements of that matrix.As described in Part 1 (Ziehn et al., 2014), we modified the approach of Seibert and Frank (2004) to account for the particle counts which were produced by the LPDM as opposed to the mass concentrations which were outputted by the transport model in their study.The resulting s-r relationship between the measurement site and source i at time interval n, which provide the elements of the matrix T, is where χ is a volume mixing ratio (receptor) expressed in parts per million (ppm), qin is a mass flux density (source), N in the number of particles in the receptor surface grid from source grid i released at time interval n, T is the length of the time interval, P is the pressure difference in the surface layer, g is the gravity of Earth and N tot the total number of particles released during a given time interval.
For the network design we are interested in weekly fluxes of carbon separated into day-and night-time contributions, which means that we have to provide the particle count N in as the sum over 1 week ( T =1 week (day/night)).Therefore, the mass flux density qin in Eq. 4 has units of grams of carbon per square metre per week (gC m −2 week −1 ) for the day and similarly for the night.
For the standard network design, the surface layer height is set to 50 m, which corresponds to approximately 595 Pa ( P ).If we assume well-mixed conditions, then the s-r relationship should be independent of the thickness of the surface layer, as long as the layer is not too deep, as the particle count will be adjusted proportional to the volume of the grid box.Under stable conditions, this may not be the case (Seibert and Frank, 2004).To test if changing the surface grid box height affects the optimal network design, we have included two cases in the sensitivity analysis where the height has been adjusted to 60 m and 75 m.The optimisation routine was run under each of these specifications, holding all other choices as for the standard network design.
As for most inversion studies, a compromise needs to be reached between the dimensions imposed on the source regions and the computational resources available (Kaminski et al., 2001;Lauvaux et al., 2012).To ensure that the computational time of each of the optimisation runs was feasible, the spatial resolution of the surface flux grid boxes was set so that the domain was divided into 50 by 25 grid boxes (a resolution of approximately 1.2 • × 1.2 • ) for the standard optimal network design.As a sensitivity test, the resolution of the surface grid boxes was adjusted so that there were 72 by 36 grid boxes (a resolution of 0.8 • × 0.8 • ) in one case, and 100 by 50 grid boxes (a resolution of approximately 0.6 • × 0.6 • ) in a second, much closer to the original resolution of the transport model.This change in resolution of the surface grid boxes impacts on the sensitivity matrix, increasing the number of elements in the matrix by a factor of 2 in the medium-resolution case and by a factor of 4 in the highresolution case.It has further consequences for the prior flux covariance matrix, which needs to accommodate this change in source dimensions, increasing its element count by a factor of 4 for the medium-resolution case and a factor of 16 in the high-resolution case, requiring far more computational resources than the standard case.

Observation error covariance matrix
Observation errors result in the values of c mod differing from the observed values in c.Sources of these errors include random and systematic measurement errors, which are usually negligible at an accredited measurement station; transport model errors; and aggregation errors, which are discussed in more detail at the end of this section (Ciais et al., 2010).Baker (2000) estimated the observation error covariance matrix by comparing the monthly observation means at Mauna Loa to a smoothed line and determining the monthly standard deviations.These values ranged between 0.34 and 0.16 ppm, and so in their case a value of 1 ppm was applied for the standard deviation to each region for monthly averaged concentration values, with the assumption that most places would have a higher standard deviation than Mauna Loa.It was also assumed that the measurement sites would be independent of one another with no temporal correlation from month to month, so the matrix was assumed to be diagonal.Wu et al. (2013) fitted the standard deviation terms of the observation error covariance matrix to available data for a mesoscale inversion study and estimated values between 2.9 and 3.6 ppm for hourly concentration measurements.
Since both studies were conducted for regions in the Southern Hemisphere, where intra-station measurement variability is usually lower compared to the Northern Hemisphere, we adopted the same observation errors as for the standard case in Part 1 of 2 ppm for the hourly averaged concentrations used in this study.This value falls within the range of values reported in the literature.The dominant source of observation error represented here is from the transport model.In Part 1 (Ziehn et al., 2014), a sensitivity analysis was conducted by adjusting the error estimate of the observations based on the location of the station.Since there are far fewer existing stations in South Africa from which we can extract data to assess the potential transport model error, we used the same error for all stations.As part of the sensitivity analysis we assessed the impact of increasing the night-time observation error uncertainty to 4 ppm to account for known errors in modelling night-time atmospheric trans-port.In atmospheric inversions night-time observations are sometimes not considered at all, achieved by drastically increasing the night-time observation error uncertainties (Lauvaux et al., 2012).
The high-resolution test case discussed above allows the opportunity to assess the aggregation error as well.This is the error due to the degradation of the spatial resolution from the original resolution of the transport model to a lower resolution that the inversion can accommodate.When there is heterogeneity in the surface fluxes and inhomogeneous transport, averaging the surface fluxes to a coarser resolution leads to errors occurring in the modelled concentrations due to the measurement not representing the larger pixels over which the transport is modelled (Kaminski et al., 2001;Ciais et al., 2010).The aggregation errors need to be added to the observation errors, as shown by Kaminski et al. (2001) and Tarantola (2005), and must be adjusted if the resolution of the problem is changed.To determine the aggregation error in a feasible manner for each of the potential measurement sites, the 0.6 • × 0.6 • test case was substituted as the highresolution case in this calculation, where the grid cells of this case fit exactly into the grid cells of the standard lowresolution case.This allowed us to follow the method outlined in Kaminski et al. (2001), who determined that the aggregation error C c,m can be calculated as where P − = I − P + and P + is the projection matrix which, if multiplied with the prior flux estimates f 0 of the highresolution case, produces the low-resolution prior flux estimates in place of the corresponding high-resolution estimates.The solution of C c,m was obtained for each measurement site, and as a conservative approach the maximum value of the diagonal was assigned as the aggregation error for that measurement site for the standard-resolution case.For the medium-and high-resolution test cases, since aggregation error would certainly exist but presumably get smaller as the resolution approached that of the transport model (Wu et al., 2011), the aggregation error was reduced according to the increase in number of grid cells.Therefore it was reduced by half for the medium-resolution test case, and to a quarter for the high-resolution test case.

Prior flux error covariance matrix
The elements of the prior flux error covariance matrix need to be constructed from the best available knowledge of sources and sinks at the surface and at the boundaries.Lauvaux et al. (2008) carried out a mesoscale inversion on synthetic data, where their inversion setup included the contributions from the boundaries as part of the sources they wished to solve for.Their approach for obtaining the boundary elements of the prior flux error covariance matrix was to use modelled values of CO 2 and adjust them for biases based on observed aircraft and tower data that were available for the 4-day period un-A.Nickless et al.: Greenhouse gas network design South Africa der assessment.For the prior error covariance matrix of the fluxes, the error was set at 2 gC m −2 day −1 for the surface fluxes and 4 ppm for the boundary contributions, and they assumed uncorrelated flux errors on the domain (no spatial correlation).This was further developed by Wu et al. (2013), who used available data to fit hyperparameters, which were the variance and correlation lengths of the prior flux and observation error covariance matrices.The approach of Chevallier et al. (2010) to obtain the elements of the prior flux error covariance matrix was to set the standard deviations of the fluxes proportional to the heterotrophic respiration flux of the Organizing Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE) land-surface model.This is the approach adopted in the case of the South African optimal network design, where we were interested in the sensitivity of weekly fluxes, separated by day and night, on hourly concentration values.We used a recent carbon assessment study by Scholes et al. (2013) which produced monthly maps of gross primary productivity (GPP), net primary productivity (NPP), heterotrophic respiration (Rh), autotrophic respiration (Ra) and net ecosystem productivity (NEP) for South Africa.Of these products, most confidence lay in the NPP product.Since NEP = NPP − Rh and in a balanced system NEP should be a small flux (Lambers et al., 2008), NPP was used rather than Rh.The biosphere flux uncertainties for a particular month were estimated using the following simple relationship: where nearest(NPP) represents the NPP estimated for the nearest South African grid cell.As a realistic estimate, areas outside of South Africa, which had no estimates available for NPP from the carbon assessment product, were assigned the NPP estimate from the closest South Africa grid cell for a particular month.In this way, pixels to the east of the continent in the Mozambican region had similar flux uncertainties prescribed to those for the northern savannah pixels within South Africa, and those on the west of the continent in Namibia had uncertainties prescribed similar to those for the semi-desert pixels in the Northern Cape Province of South Africa.This type of interpolation was carried out to avoid adding unnecessary aggregation errors at the South African terrestrial borders, which would occur if a blanket estimate for NPP outside of South Africa were used.A blanket estimate would lead to artificially large changes in the flux uncertainties between neighbouring pixels, exaggerating aggregation errors for stations near these borders and conversely null changes in uncertainty between non-South African terrestrial pixels.Since Ra and GPP were also available, and NPP = GPP − Ra, daytime NPP and night-time Ra were obtained by assuming that all the GPP took place during the day, and half of the Ra took place during the day and half at night.This meant that the daytime NPP values tended to be larger in magnitude than the night-time Ra values, which is what we would expect for the South African systems.Using this assumption, the monthly estimates for NPP were converted into weekly values, separately for day and night, to give the final uncertainty values used to construct the prior flux error covariance matrix.The daytime NPP and nighttime Ra values used as proxies for the NEP uncertainties are plotted for July and January (Fig. 1).In South African systems much more biological activity occurs during the summer months compared to the winter months, with the consequence that the uncertainty during the summer months is considerably larger.
Since the domain of the network design includes the fossil fuel sources of South Africa, fossil fuel flux uncertainties needed to be derived as well.Previous regional inversions, where the total flux of a source pixel was solved for, had detailed inventory data available for the fossil fuel emissions, and they assumed these were perfectly known (Schuh et al., 2013).Since this information was not available for South Africa, we had to consider errors in the fossil fuel fluxes.As for Part 1 (Ziehn et al., 2014), these uncertainties were derived from the Fossil Fuel Data Assimilation System (FF-DAS) (Rayner et al., 2010;Asefi-Najafabady et al., 2014).Ten realisations for the year 2010 were obtained from the FFDAS product at the original resolution of 0.1 • × 0.1 • .The fluxes were aggregated to our network design resolution of 1.2 • × 1.2 • , and then the variance was calculated for each grid cell.Since the emissions from fossil fuels are usually localised, such as those at power plant locations, the variability in the fossil fuel emissions between grid cells is quite large.But, as shown by Ziehn et al. (2014), the effect of aggregating the data smooths the fossil fuel emissions over the network design domain, and this leads to a reduction in the variability between the different realisations of the FFDAS.It also leads to the aggregation errors discussed in Sect.2.2. Figure  and fossil fuel products only apply to the land surface.We assumed no correlation in the prior error covariance matrix of the fluxes.This is a necessary assumption since we have no data from which to determine the best correlation lengths.In reality, grid cells with similar biota and under similar climate will have correlated fluxes.Similarly, fluxes from the same source which occur close in time will also be correlated (Chevallier et al., 2010;Wu et al., 2013).Correlation lengths in space and time are difficult to assess but have a large impact on the estimated fluxes (Lauvaux et al., 2012).Independence is assumed, which is a more conservative approach for the optimal network design.Eventual data from the implemented network will then help to resolve the flux correlation estimates during the inversion procedure.To determine what impact assuming positive correlation lengths in the prior flux error covariance matrix could have on the optimal network design, we used the results from Chevallier et al. ( 2012) and put together a simple correlation structure where it was assumed that temporal correlations for the same grid cell 1 week apart had a correlation of 0.5 (independent for day and night), decaying to 0.3 at 2 weeks apart and 0.1 at 3 weeks apart.Grid cells adjacent to each other had a correlation of 0.3.The interactions between time and space correlations were determined by the Kronecker product of the spatial and temporal correlation matrices (e.g. two grid cells adjacent to each other but 1 week apart would have a correlation of 0.3×0.5).Therefore correlation lengths were relatively short.
In the network design under the standard case, we kept the uncertainties of the ocean-only grid cells set to 0, since our focus is on reducing the flux uncertainty over land.We are not seriously assuming that we know the ocean fluxes perfectly, but for the purposes of this optimal network design we would prefer if the terrestrial measurements focused on solving for the terrestrial fluxes.Of course, to run a full inversion, knowledge is needed about the ocean fluxes, and this would be obtained through ocean-based measurements.The contributions from the ocean can be divided into the "near field" and "far field".The far-field contributions are contained within the boundary contributions.The near-field contributions are those within our domain.A sensitivity test was conducted whereby 10 % of the maximum land NEP standard deviation was allocated to the ocean grid cells.This uncertainty represents the uncertainty in the ocean productivity models which would be used to obtain prior estimates of ocean fluxes during an inversion, which are similar to the values allocated by Chevallier et al. (2010).A second case was considered where 10 % of the nearest land NEP uncertainty was allocated to each ocean grid cell, so that the uncertainties of the ocean grid cells would increase as the uncertainties of nearby land fluxes increased.The purpose of this test case was only to demonstrate the effect of implementing a variable ocean uncertainty scheme.

Optimisation
Three optimisation routines have been used for optimal network design in the literature, namely IO (Patra and Maksyutov, 2002), GA (Rayner, 2004) and simulated annealing (Rayner et al., 1996).We used the IO routine, as used for Part 1 (Ziehn et al., 2014), for the standard network design.This method was previously compared to simulated annealing by Patra and Maksyutov (2002) and found to perform as well or better, with significantly less computational cost.
During the IO procedure we added one station at a time from the candidate list to our base network of two stations and calculated C f .We chose the station that resulted in the smallest uncertainty metric and added it to the network, simultaneously removing it from the candidate list.We then repeated the process until we reached our target of five stations.The IO procedure provides us with a stepwise progression of the optimal network.
The overall uncertainty in fluxes can be expressed by two different metrics (Rayner et al., 1996): either through obtaining the trace of C f (J Ct ) or by summing over all the elements of C f (J Ce ): where n is the number of elements in the diagonal of C f .The use of Eq. ( 7) results in the minimisation of the average variability between surface pixels.Equation ( 8) is the more accepted metric to calculate uncertainty for network designs, and it results in the minimisation of the uncertainty of the total flux over the full domain.As for Part 1 (Ziehn et al., 2014) and as used by Rayner et al. (1996), we use J Ce as the uncertainty metric for the standard design.In our case, because the domain of the transport model contains terrestrial regions outside of South Africa, we only include the elements of C f which are within South Africa in the calculation of the uncertainty metric.
As a sensitivity test, the J Ct uncertainty metric replaced J Ce .Minimising either of these metrics should result in an optimal network with reduced overall uncertainty in flux estimates across South Africa, but the results could potentially be quite different, particularly if there are large correlations in the posterior flux error covariance matrix.
We evaluated the different networks in terms of their uncertainty reduction: where ĴCe is the optimised uncertainty metric value and J Ce base the value of the uncertainty metric calculated from the posterior error covariance matrix of the fluxes if only the base stations are included.
Although IO is expected to be more computationally efficient, optimisation through the GA would also be well suited for this kind of problem, considering that this network design for South Africa is starting with so few existing stations.The GA begins with each of the solutions in the population containing five stations.Therefore all five stations are optimised simultaneously, rather than sequentially.Thus, this method may be more suited to the case where there are multiple deployments, as we have.It is possible under these circumstances that the best solution for a five-station network in terms of reducing the overall uncertainty for South Africa may not include the one station which on its own reduces the uncertainty the most.The GA is highly parallel, and so we can take advantage of high performance computing, but the running time of the GA is still higher in comparison to IO.
The approach used to run the GA during the sensitivity analyses is adopted from Rayner (2004).GA procedures are a class of stochastic optimisation procedures for any numerical algorithm which calculates a score based on a function of inputs.In this case the algorithm calculates a score based on the posterior flux error covariance matrix, given a set of stations.A GA does not optimise based on a single solution, but rather by improving a population of solutions, from which a final solution is selected.New members are added to the population through a process of loss of members which are not sufficiently fit (culling), pairwise combination of previous members (crossover) and random changes to members (mutation).This represents "survival of the fittest" and pairwise reproduction and mutation in biological populations.
In this implementation of the GA, elitism is maintained by keeping the best solution from the previous population, without making any changes through crossover or mutation on this member.The algorithm converges once a given number of iterations is reached, or once a convergence criterion is met.The solution with the best fitness criterion is then selected from this population, where the fitness F is calculated as where r is the ordinal ranking of the member and N is the population size, which for our South African case study was taken to be 100 members.A pseudorandom number x is generated and members are then deleted, or culled, if the value of F is less than x.The culling process will remove about 50 % of the population members.These need to be regenerated to get the population back to the required size.Members are selected at random from the remaining population, and, based on new pseudorandom numbers, members are duplicated if their fitness score is above this random number.Sampling is with replacement, so the members with the highest fitness have a good chance of being duplicated more than once.This continues until all the culled members have been replaced and the population size is back to N.
The GA requires a trade-off between the diversity in the solutions, ensuring that the algorithm does not get stuck in local extrema, and strong selection to ensure that the population moves towards the optimum solution.This is achieved by adjusting the mutation rate -high enough to produce diversity in the solutions but low enough to ensure that members with high fitness persist and so ensure a tendency towards the optimum solution.From previous work (Rayner, 2004) a good mutation rate for network design was found to be 0.01.
The population size and number of iterations affect the computation time of the algorithm.A large population size is favourable because this ensures diversity in the solutions.The more iterations that take place, the more solutions the algorithm can assess and the better the chance of finding the global minimum.High values for both of these parameters result in long computation times.In this study the number of iterations was set at 100 for a single-month optimisation, and to 150 for a combined month optimisation.These values were determined from GA trials carried out on the data prior to deriving the results for this study.

Measurement sites
Hypothetical stations were selected from a regular grid over South Africa, resulting in 36 equally spaced locations (Fig. 3), from which five stations need to be selected.Ultimately, the exact location of the stations will be determined by practical considerations, for example the presence of existing infrastructure, such as communication towers and meteorological stations; available manpower; the relative safety of the instruments; and the accessibility of the sites.The optimal network will be used as a guide as to which locations are ideal.Once the final station sites have been selected, the posterior flux error covariance matrix can be calculated based on these exact tower locations, in order to determine how close to the idealised uncertainty reduction the implemented network will achieve.

Influence from outside the modelled domain
Since the surface sources are expressed as fluxes in carbon, the contribution to the concentration at the measurement site is expressed in the amount of carbon seen at the measurement site from a particular source.In the case of the boundary sources (or contributions from outside of the domain) which are given as concentrations, their contributions to the concentration at the measurement site are expressed as a proportion of their concentration, dependent on their influence at the receptor site.Part 1 (Ziehn et al., 2014) showed that the boundary contribution can then be written as where M B is the submatrix of T for the boundary concentrations, c B .If the elements of M B are large enough, it may be necessary to include the boundary concentrations in the network design.
For the network design, four boundaries (north, south, east and west) were used, and we calculated the sensitivity of hourly observed concentrations to weekly boundary concentrations.To determine if the influence of the boundary concentrations on the observation errors should be included in the network design, we needed to know whether the uncertainties contributed by the boundary concentrations were significant compared to other contributions.To see this we calculated M B for each station.Assuming uncertainties of 1 ppm in the boundary concentrations (reasonable for the Southern Hemisphere), this yielded where C I is the prior error covariance matrix of boundary concentrations.The diagonal elements of the error covariance matrix of the boundary concentrations, C b , provided us with the uncertainty contribution of the boundary concentrations to the observations.If they are much smaller than the observation error uncertainty, we can neglect boundary influences in the network design.As the boundary concentrations should be highly correlated, we also considered C I to have correlation between boundary concentrations, where correlations of 0.5 were allocated between boundary concentrations during the same week, and values of 0.25 between boundary concentrations separated by a week or more.

Comparison of network solutions
To compare the utility of the optimal networks from each algorithm run, the uncertainty reduction was assessed for each of these networks.The similarity of the networks in terms of the station locations was assessed using a test statistic from the chi-squared complete spatial randomness test, measuring the degree of clustering, where the expected value is based on the null hypothesis that the stations are located randomly over the domain.The intention here was not to perform a statistical test based on the chi-squared distribution, since the network did not constitute a sample nor were there enough stations, but to calculate an indicator that would assess the degree of clustering of the measurements stations for a particular network solution, referred to as the clustering index, which was also used to compare between two networks.
where i and j are the indicators for the latitude and longitude categories respectively, O ij is the observed number of stations in quadrat ij , and E ij the expected number of stations assuming the stations are scattered randomly.The domain was divided into quadrats, in this case 16 equally sized quadrats covering the entire domain.
A dissimilarity index (DI) was calculated as the sum of the distance to the nearest neighbour in the compared network, over all the members in the pair of assessed networks.
where i and j ∈ [1,2,3,4,5], and x 2 ij and y 2 ij are the squared differences between the Cartesian coordinates of the ith station in network 1 and the jth station in network 2. In cases where the two networks compared were the same, the index results in a value of 0. The index increases as the networks become more dissimilar in space.This provides a onenumber measure of network similarity that can consistently be used for the network comparisons provided each solution consists of the same number of stations.The index provides a measure in kilometres of how different two network solutions are.This allows for an objective assessment of how different the positioning of sites are between two network solutions which may not be obvious to the eye.

Influence from the boundaries
The particle counts generated during the LPDM runs for each station were summed over the month in order to obtain a footprint of each station.To illustrate this, plots of the influence footprint in January (Fig. 4) are provided, using a logarithmic scale, for Cape Point and three other candidate stations: 28 (near Potchefstroom), 18 (near Mthatha) and 4 (near Port Elizabeth).For both January and July, the influence footprints shows that the three candidate stations have more contributions from terrestrial South African sources than Cape Point has.The plots show that the majority of influence for a site is from the sources in the surrounding pixels.
Using the influence functions now available for each station, the test of the influence from the boundaries on to the observation errors was conducted.Given the large domain over which the LPDM was run, it was not surprising that the boundaries had minimal influence.Overall, the square root of the maximum diagonal element of C b for all stations was only 0.012 ppm.The mean of the maximum diagonal elements over all measurement sites was 0.006 ppm with a standard deviation of 0.002 ppm.Even when correlation between the boundary concentrations was included in the covariance matrix of the boundary concentrations, the maximum diagonal element only reached 0.012 ppm, and the maximum diagonal element for a particular station was no more than 40 % higher than for the independent case.We note that the influence of the boundary conditions may be highly correlated, i.e. that a given boundary condition may influence many observations in a similar way.However the covariances in C b are bounded by its variances.These variances are so much smaller than the values of the error covariance matrix that the impact of the accompanying covariances is guaranteed to be small.We note also that the assumption of boundary conditions changing on the scale of a week is conservative; using more, somewhat independent boundary concentrations would reduce the impact on C b yet further.

Aggregation error
Aggregation errors were found to be a significant contributor to the overall observation error covariance matrix.Aggregation errors of as high as 17.10 ppm were found for measurement sites in the north-eastern interior, and as low as 0.01 ppm for stations in the south-western interior (Fig. 5).The average aggregation error across sites was 4.70 ppm with a standard deviation of 5.10 ppm.The sites with the largest aggregation errors were generally those closest to large fossil fuel sources.These large values are due to the significant amount of smoothing of the relatively localised fossil fuel fluxes during the lower-resolution case.This results in large heterogeneity between the high-resolution fossil fuel fluxes which contribute to the average fossil fuel flux estimate of the low-resolution case, which are exactly the circumstances that lead to the generation of aggregation error.Sites near the terrestrial or coastal borders also tended to have large aggregation errors.Site-specific aggregation errors were determined, and these errors were added to the diagonal elements of the observation error covariance matrix separately for each site.
When running the LPDM to generate the sensitivity matrix, it is imperative to specify a sufficient number of particles per release, as well as to run the model for at least as long as required, with additional time at the beginning of the run.This is to avoid transport errors, and to avoid exaggerating the aggregations errors.Therefore, the aggregation errors were calculated using the last week of the 4-week sensitivity matrix.
The next sections present the results of the optimal network design, first under the basic parametrisations as used in Part 1 (Ziehn et al., 2014) and then under the sensitivity tests.

Basic network design
The network solution for July was able to achieve a reduction in uncertainty in the total South African flux from 6.42 gC m −2 week −1 under the base network to 3.66 gC m −2 week −1 under the optimal network.The results under the standard conditions used in the basic network design for the month of July reveal that the best set of stations to add to the current network would include two stations near the western coast of the country, stations 0 and 6, including one just north of the city of Cape Town (station 0) (Fig. 6).These stations are located near the areas of highest NEP uncertainties during the winter months.These areas in the Western Cape fall into the fynbos biome, which is under a winter rainfall regime.Therefore productivity during the winter months is expected to be higher in this area (Fig. 1a).In con- trast, activity over much of South Africa during the winter months, when water availability is reduced, is expected to be low to almost entirely dormant.Due to the increased uncertainty in NEP in the fynbos regions during this time, as well as the proximity to the city of Cape Town, the optimal network would need a station in this area to reduce the overall uncertainty of South Africa.Two stations are located in the eastern interior of the country (stations 18 and 24), including one on the border of Lesotho and a station in the central interior of the country (station 21), not far from the Zimbabwean border.These stations are located near to areas with high fossil fuel flux uncertainties.The base network on its own reduced the posterior flux uncertainty by 17.0 %.During the month of July, the best station to add to this network would be station 24, located in the eastern interior of South Africa, just north of Lesotho, which reduced the uncertainty relative to the base network by 12.8 % (Table 1).The secondbest station to add is station 0, near the south-east coast of South Africa.This station reduced the uncertainty by an additional 10.5 %.Since the optimal network included a station near Cape Point during July, it supports the conclusions by Whittlestone et al. (2009) that measurements at Cape Point are not sufficient to estimate fluxes for the Western Cape region.The reduction in uncertainty by the addition of the three remaining stations to the network was an additional 19.3 %.
During the winter months, the biospheric fluxes are small, with small uncertainties, whereas the fossil fuel flux uncertainties remain high.Due to the penalty imposed by the aggregation error for measurement sites located near fossil fuel sources, the return on uncertainty reduction during the winter months is low, at only 42.9 %.In January the total flux uncertainty was much higher compared to July, with a total flux uncertainty of 128 gC m −2 week −1 , which was reduced to 27.93 gC m −2 week −1 under the optimal network.The placement of stations changes with respect to July, with the stations all located towards the eastern interior of the country, and no stations positioned on the western side of South Africa (Fig. 6).The stations were located near regions of high summertime NEP uncertainty and in the region where most of the fossil fuel activities in the country  are concentrated.In contrast to the winter months, the NEP uncertainty during summer is much higher on the eastern side of the country compared to the mid-interior or the west of the country (Fig. 1c), resulting in a need to concentrate the new measurement sites in this area.The uncertainty reduction attributable to the base network in January is similar to July, at 16.8 %.The best-performing station in the network for January is station 12, located on the eastern coast of South Africa, which further reduces the uncertainty by 40.0 % relative to the base network.The next-best-performing station was station 29, which reduced the uncertainty by an additional 18.0 %.An additional 10.3 % increase in uncertainty reduction was attained from adding the last three stations to the network.The total uncertainty reduction achieved in January is much higher compared to July, at 78.3 %.This is due to the ability of the network to view the larger summer biospheric fluxes in areas where the aggregation error penalty is low, or even despite the aggregation error penalty.
The total flux uncertainty under the base network for the combined months of January and July was calculated to be 128.43gC m −2 week −1 , similar to the month of January.This is reduced to 19.83 gC m −2 week −1 under the optimal network.The network for the combined months has a similar positioning of stations compared to January (Fig. 6), locating most of the stations in the eastern interior, as well as a very similar reduction in uncertainty at 84.6 %.The most important station, as ranked by the IO solution, is station 18, which reduces the uncertainty by 53.3 % relative to the base network.This station is located in a region of both high NEP and fossil fuel flux uncertainty (Figs. 1 and 2).The secondbest station to add to the network is station 29, increasing the uncertainty reduction by 24.4 %.This station is located near the area of highest fossil fuel flux uncertainty (Fig. 2).The remaining three stations (stations 11, 22 and 27) add only 6.8 % to the uncertainty reduction.The network solution is different to January's, in that the stations are more concentrated around the areas of larger fossil fuel flux uncertainty.This is due to the much lower NEP uncertainty estimates for the winter months across South Africa compared to the summer months, but the fossil fuel flux uncertainties remain consistent during the year.The optimal network for the combined seasons is therefore dominated by the need to reduce these consistently large uncertainties.The results from the combined months shows that a substantial reduction in the posterior uncertainty for South Africa is possible by introducing only a few additional stations at key locations.

Sensitivity analysis
The results for the sensitivity analyses run for both months individually as well as the combined months of January and July appear in Fig. 7.During the winter months, there was consistency between the network solutions from the different sensitivity tests.All of the tests were in agreement that stations 0 and 18 should be included: station 0 near the winter NEP uncertainties, and station 18 near an area of large fossil fuel flux uncertainty.The tests assessing surface grid box height, the doubling of night-time observation error uncertainty and the addition of ocean flux uncertainty were identical to the standard network design solution.Both the medium-resolution and the GA network solutions were very near the standard solution, each obtaining the secondsmallest DI relative to the standard design of 879.These tests both favoured two stations which were each one step away from a standard network design station.The solution using the uncertainty metric based on the trace of the posterior flux covariance matrix was similar to these two but favoured a station near the south coast of South Africa, far from the general concentration of stations, near a localised fossil fuel source.The two test cases most different from the standard  solution were the high-resolution network solution and the solution from the case considering correlation between the prior fluxes, obtaining a DI of 1747 and 1343 respectively.They favoured stations near the south coast but also located stations in the north-eastern interior, near areas of large fossil fuel uncertainty.
The results from the sensitivity tests for January show a great deal more variability between network solutions compared to July, with DI values of greater than 0 for almost all network solution comparisons.Similarly to July, the network solutions do appear to converge towards three stations, but not the same stations as July.Under January's conditions, only the homogeneous ocean variance test case resulted in an identical solution to the standard case.There is no single station which all network solutions contained.2).The high-resolution test and test case considering correlation between prior fluxes obtained DI values of 1121 and 1162 respectively.The solutions from these tests focused on stations around areas of large fossil fuel flux uncertainty in the north-western and north-eastern interior.The solution from the GA resulted in the largest DI value of 1213 when compared to the standard network, and equal to this or larger when compared to all other network solutions.The station in the GA solution responsible for the disagreement with other solutions is station 7, located in the south western interior, far from the concentration of stations for most network solutions.The remaining four stations from the GA test are located towards the north-western and northeastern interior parts of the country.As discussed in the previous Sect.(3.3), the three best stations to add to the network according to the IO solution are stations 18, 29 and 11, with station 18 attaining the greatest uncertainty reduction.All of the network solutions for the combined months of January and July have included station 18, and the three most important stations are all in the solution of the GA.
The statistics for the different sensitivity tests for the combined months (Table 3) indicate that the test considering correlation between the prior fluxes obtained the highest uncertainty reduction, followed by the GA test.The GA was able to achieve marginally greater uncertainty reduction by 0.3 % compared to the IO standard solution.Most of the test cases were able to achieve between 80 and 85 % uncertainty reduction.The test case utilising the trace uncertainty metric achieved a smaller uncertainty reduction, and the two higher-resolution tests achieved the smallest uncertainty reduction overall.It should be noted that the uncertainty reduction achieved for the trace sensitivity test was calculated using the J Ct uncertainty metric, due to the use of this metric for the optimisation procedure.Estimates of the posterior uncertainty for the total flux of South Africa under the base and optimal networks were obtained for each month.Those which differed substantially from the standard network solution were the high-and medium-resolution test cases, and the correlation test case.Under the assumption of positive correlations between the flux errors, the base network results in a higher total flux uncertainty of 205.82 gC m −2 week −1 for the base network, which is reduced to 27.79 gC m −2 week −1 under the optimal network, now similar to the result of the standard network solution.Under the base network, the additional covariance terms introduced through the correlation structure are poorly resolved, leading to higher total uncertainties.When there are more stations added to the network, this is improved.The high-and medium-spatial-resolution test cases gave total flux uncertainties of 271.55 and 190.14 gC m −2 week −1 respectively under the base network.These were then reduced to 82.82 and 44.19 gC m −2 week −1 respectively under the optimal network.At the spatial resolutions that we have considered in our study, the between-pixel variability in the terrestrial fluxes will increase as the spatial resolution is increased, for both the biospheric and fossil fluxes (Turner et al., 2000).For the fossil fuel fluxes, we create the surface of flux uncertainties using the same procedure for each of the different spatial resolution cases.As explained earlier, for each of the 10 realisations from the FFADS product, we regrid the 0.1 • × 0.1 • fossil fuel emissions onto the surface grid we are using.To obtain the uncertainty estimates, the within-pixel variance is calculated for the 10 realisations.The result of carrying this procedure out at higher spatial resolutions is that the variance values are larger compared to lower resolu-tions, and the between-pixel variability is increased (Asefi-Najafabady et al. 2014).Therefore, the total flux uncertainty derived at high resolution is expected to be larger than for lower resolutions.
Most network solutions tended towards the same amount of clustering of stations, obtaining a clustering index of 23.8.The GA and test case considering correlation had more dispersed networks, and the high-resolution test case had the highest amount of clustering, with a clustering index of 36.6.We would expect the correlation case to spread stations since a given station will reduce uncertainty everywhere within one correlation length.The GA for the combined months took the longest to run, at over 32 h, which is 39 times longer than the running time of the standard IO solution.This was followed by the high-resolution solution, which took 25.2 h, and the two ocean flux uncertainty test cases, which took over 5 h each.

Summary and conclusions
Under a reference set of conditions, an optimal network design was obtained for South Africa for two representative months of the year.The resulting designs reduced the uncertainty of carbon fluxes from South Africa compared to the base network by 43 % in July and 78 % in January.These relatively large reductions in uncertainty are due to the lack of coverage by the current network, which only reduces the uncertainty of fluxes from South Africa by 16 % for both July and January.The concentration of stations by all networks tended towards the central interior, near the North West Province of South Africa and in the eastern parts of the country.These represent the areas with the largest uncertainty in biospheric fluxes, as well as fossil fuel emissions, in the country.
Station 11 is located near the uKhahlamba Drakensberg World Heritage Site.Several remote holiday destinations are found in this area, near the town of Mooi River, and road infrastructure is available.Potentially, facilities at or near these holiday destinations could be utilised in order to conduct atmospheric measurements, particularly if there is a communications tower available.Station 18 is located near the peak of Ben Macdhui.This is near the site of a 1996 atmospheric monitoring campaign, which assessed the ability of transport models to resolve recirculation over and exiting South Africa to the Indian Ocean (Piketh et al., 1999).Station 29 is near the atmospheric monitoring site of the North-West University (South Africa), at Welgegund, about 20 km from the Potchefstroom campus.This site was established in collaboration with the University of Helsinki to measure the impact of aerosols and trace gases on the climate and air quality (Tiitta et al., 2014).Therefore, for at least three of the most influential stations, facilities or previous measurement campaigns exist, indicating that it should be possible to establish long-term monitoring of CO 2 concentrations near these sites.
Table 2. Ranking of the new stations added to the base network under 10 different sensitivity tests for the combined months of July and January.The tests are presented in the following order: surface grid height set at 60 m; surface grid height set at 75 m; trace of the posterior covariance used in the uncertainty metric; uncertainty of the night-time observation errors is doubled; correlation structure is included in the prior covariance of the fluxes; spatial resolution is increased to 0.8 • ; spatial resolution is increased to 0.6 • ; ocean sources are assigned 10 % of max NPP variance; ocean sources are assigned 10 % of nearest terrestrial NPP variance; and GA is used for optimisation.The percentage cumulative reduction of uncertainty of the posterior fluxes relative to the base network is provided in brackets.

Rank
Ht The sensitivity analysis demonstrated that, for most of the network design parameters considered in this study, the stations found to be most important by the standard network design were always identified in the network design solution.Many of the choices required for the optimal network design -such as the height of the surface grid cells, whether to inflate night-time observation error uncertainties relative to the daytime and the inclusion of ocean flux uncertainty -have a negligible impact on the final network design.Substituting the trace for the sum of the covariance elements also resulted in similar solutions.
The test cases considering higher spatial resolution tended to result in network solutions different from the standard case, largely due to the increase in spatial heterogeneity in prior flux uncertainties compared to the coarser resolution.The spatial resolution of an inversion study impacts network design in several ways.It is the main determinant of the amount of aggregation error attributed to a measurement site, with aggregation error reducing as the resolution increases.As the spatial resolution is degraded, aggregation errors can become large, leading to the exclusion of sites in the case of an optimal network design, even if they are in view of regions of large flux uncertainty.The spatial resolution of the sources also determines the dimensions of the sensitivity matrix and prior flux covariance matrix, which impacts on the computational resources required to run an inversion or network optimisation.Ideally, the highest manageable resolution should be used, as close as possible to the resolution of the transport model and original spatial products used for obtaining the prior fluxes and their covariances.Alternative approaches, such as the use of multi-scale representation of the source region, can be used to mitigate aggregation errors as well (Wu et al., 2011), but these errors should always be considered during an inversion or inversion-based optimal network design exercise.
The GA was able to find marginally better solutions than the IO method, if run with sufficient population size and number of iterations, but in general did include the most influential stations from the IO solution.The increase in uncertainty reduction was found to be marginal but cost a great deal more in running time before this solution was found.If the resolution of the standard case had been higher, the GA would have taken longer to run, and the current computing system may have had insufficient memory.Moreover, to find a better solution than the IO, the iterations and population size would have had to be set even higher, due to the greater heterogeneity in the prior flux uncertainties in a higher-resolution setup, further increasing the computational costs.An additional advantage of the IO method over the GA method is that an evolution of results is generated, which is useful for practical purposes.By identifying the station which on its own best reduces the uncertainty in the posterior fluxes, it gives the decision makers the location of the site which should be prioritised over others in the network.
Even though we accounted for aggregation error, which would have corrected the total flux estimate for the domain, there were still large differences between the total flux uncertainties from the inversion results under different spatial resolutions.This was due to the treatment of the prior uncertainties under the different spatial resolutions.Degrading the spatial resolution results in a loss of information; there-

Figure 1 .
Figure 1.The daytime net primary productivity (NPP) and night-time autotrophic respiration (Ra) data used as standard deviations of net ecosystem productivity (NEP) at the resolution of 1.2 • expressed in gC m −2 week −1 for July (left) and January (right).Values for the standard deviation are capped at 28 gC m −2 week −1 .The value of the nearest South African pixel (separately for day and night) is assigned to non-South Africa land surface pixels.

Figure 2 .
Figure 2. The standard deviations of 10 realisations (top) of the Fossil Fuel Data Assimilations System (FFADS) at the original 0.1 • resolution in gC m −2 week −1 .The standard deviations of the aggregated fluxes (bottom) (1.2 • resolution) showing significant smoothing of the fossil fuel fluxes over the lower resolution.

Figure 3 .
Figure 3.The 36 potential locations of the new stations in the optimal network design.The locations were spaced on a regular grid over the surface of South Africa.The existing Cape Point and the Gobabeb GAW stations are marked by the triangles.

Figure 4 .
Figure 4.The footprint of Cape Point, station 28 (top right), station 18 (bottom left) and station 4 (bottom right) relative to the surface grid cells at a resolution of 1.2 • expressed as the count of particles over the month of January for each surface grid cell.

Figure 5 .
Figure 5. Map of the aggregation error values (ppm) associated with each measurement station for the month of January.

Figure 6 .
Figure 6.Map of the optimal stations to add to the existing network to reduce the overall uncertainty of fluxes in South Africa for July, January and the combined months of July and January.The standard network design conditions are 50 m surface grid height, diagonal prior covariance, 2 ppm uncertainty in concentration observations, a 1.2 • surface grid resolution and the sum of the posterior covariance matrix elements used to calculate the uncertainty metric for the IO procedure.

Figure 7 .
Figure 7. Map of the optimal stations to add to the existing network to reduce the overall uncertainty of fluxes in South Africa under the 11 different sensitivity cases for July (top), January (middle) and the combined months of July and January (bottom).The cases include the standard case (Standard), surface grid height set at 60 m (Ht 60 m), surface grid height set at 75 m (Ht 75 m), use of the trace in the uncertainty metric (Trace), doubling of the night-time observation error uncertainty (Night), addition of correlation between elements in the prior covariance matrix (Correl), spatial resolution set at 0.8 • (Med Res), spatial resolution set at 0.6 • (High Res), uncertainty in the ocean sources set at 10 % of the maximum land NPP (Ocean1), uncertainty in the ocean sources set at 10 % of the nearest land NPP (Ocean2) and use of the GA.

Figure 8 .
Figure 8.The daytime net primary productivity (NPP) data used as standard deviations of net ecosystem productivity (NEP) at the resolution of 0.8 • expressed in gC m −2 week −1 for January (a), and at the resolution of 0.6 • (b).The Fossil Fuel Data Assimilation System standard deviations aggregated over a resolution of 0.8 • , also expressed in gC m −2 week −1 (c) and over a resolution of 0.8 • (d).
Stations 29 (north-eastern interior) and station 12 (eastern coast) were agreed on by 10 out of 11 tests, and stations 27 (northern interior) and 11 (south-eastern interior) were agreed on by 9 out of 11 tests.These four stations are influenced by areas of large fossil fuel flux uncertainty, and stations 29 and 12 near regions of large summer NEP uncertainty.Sensitivity tests with DI values below 1000 when compared to the standard case include the tests considering surface grid box height, doubling of night-time observation error uncertainty, the test considering variable ocean flux uncertainty, the trace uncertainty metric test and the GA test case.These five test cases show strong agreement.The trace uncertainty metric case favoured a station near the central interior.This station was also included in the solutions of the correlation and medium-resolution cases, where these tests obtained DI values of 1225 and 1305 respectively when compared to the standard solution.These tests, as well as the GA and highresolution test cases, included stations near the south coast, near areas of localised fossil fuel uncertainties.The sensitivity tests from the combined months showed less variability between solutions compared to January (Fig. 7c).Station 11 was included in all of the network solutions.Station 18 was agreed upon by 10 out of 11 network solutions, and stations 27 and 29 (both in the north-eastern interior) were favoured by 9 out of 11 solutions.The tests considering 60 m surface height, the trace uncertainty metric, doubling of the night-time observation error uncertainty and inclusion of ocean flux uncertainty have identical solutions to the standard network design.The 75 m surface height and medium-resolution tests cases obtained relatively low DI values of 468 and 449 respectively when compared to the standard solution (Table

Table 1 .
Ranking of the new stations added to the base network for two seasons (winter and summer) represented by July and January, as well as the integrated 2 months.The cumulative reduction of uncertainty relative to the base uncertainty is provided in brackets.

Table 3 .
Table of network comparison statistics for the combined months of January and July.The sensitivity tests are presented in the same order as for Table 2.