This is the second part of a two-part paper considering a measurement network design based on a stochastic Lagrangian particle dispersion model (LPDM) developed by Marek Uliasz, in this case for South Africa. A sensitivity analysis was performed for different specifications of the network design parameters which were applied to this South African test case. The LPDM, which can be used to derive the sensitivity matrix used in an atmospheric inversion, was run for each candidate station for the months of July (representative of the Southern Hemisphere winter) and January (summer). The network optimisation procedure was carried out under a standard set of conditions, similar to those applied to the Australian test case in Part 1, for both months and for the combined 2 months, using the incremental optimisation (IO) routine. The optimal network design setup was subtly changed, one parameter at a time, and the optimisation routine was re-run under each set of modified conditions and compared to the original optimal network design. The assessment of the similarity between network solutions showed that changing the height of the surface grid cells, including an uncertainty estimate for the ocean fluxes, or increasing the night-time observation error uncertainty did not result in any significant changes in the positioning of the stations relative to the standard design. However, changing the prior flux error covariance matrix, or increasing the spatial resolution, did.

Large aggregation errors were calculated for a number of candidate measurement sites using the resolution of the standard network design. Spatial resolution of the prior fluxes should be kept as close to the resolution of the transport model as the computing system can manage, to mitigate the exclusion of sites which could potentially be beneficial to the network. Including a generic correlation structure in the prior flux error covariance matrix led to pronounced changes in the network solution. The genetic algorithm (GA) was able to find a marginally better solution than the IO procedure, increasing uncertainty reduction by 0.3 %, but still included the most influential stations from the standard network design. In addition, the computational cost of the GA compared to IO was much higher. Overall the results suggest that a good improvement in knowledge of South African fluxes is available from a feasible atmospheric network, and that the general features of this network are invariable under several reasonable choices in a network design study.

Mitigating climate change is one of the great challenges of our time. To
further this end, it has become essential to accurately estimate the emission
and uptake of

Previous optimal network studies run at the global scale have highlighted
southern Africa as a region associated with large uncertainty in its
terrestrial

An optimal network design requires the theory of atmospheric inversions to
generate the posterior error covariance matrix of the

The posterior flux error covariance matrix used to derive the uncertainty
metric does not require any knowledge of the measured concentrations or of
the prior fluxes, only of the prior error covariance matrix of the fluxes,
the error covariance matrix of the observations and the sensitivity matrix,
which are all determined separately. Basing the cost function of the
optimisation procedure on the result of the posterior error covariance matrix
of the fluxes under a given network ensures the uncertainty in the estimated
fluxes under the final network solution is reduced. As in Part 1

As well as providing this first-time optimal network design focusing on

The inversion procedure requires a sensitivity matrix which calculates the
contribution of the different sources to the

This paper proceeds by introducing the inversion methodology, followed by an explanation of the different sensitivity tests. The results are then presented for the South African optimal network design under the standard conditions, followed by a comparison of the sensitivity tests. The conclusions provide insight into the most influential locations identified and discuss courses of action to address the optimal network design parameters highlighted in the study.

The Bayesian synthesis inversion method, first proposed by

The linear relationship used to model the relationship between

The vector of the modelled concentrations

As described in Part 1, for the network design approach we are only
interested in the posterior covariance matrix of the fluxes, since our aim is
to obtain a network that reduces the

where

To determine which sources and how much of each of these sources a
measurement site sees at a given moment, the sensitivity matrix

The LPDM is driven by the three-dimensional fields of mean horizontal winds (

In the simulations performed here CCAM is applied in stretched-grid mode by
utilising the

During processing of the particle count data from the LPDM, particles that were
near the surface were allocated to a surface grid cell and the total count
within each of these was obtained to determine the surface influence or
sensitivity. These counts depended on the dimensions and position of these
surface grid boxes. The particle counts were used to calculate the
source–receptor (

For the network design we are interested in weekly fluxes of carbon separated
into day- and night-time contributions, which means that we have to provide
the particle count

For the standard network design, the surface layer height is set to 50 m,
which corresponds to approximately 595 Pa (

As for most inversion studies, a compromise needs to be reached between the
dimensions imposed on the source regions and the computational resources
available

Observation errors result in the values of

Since both studies were conducted for regions in the Southern Hemisphere,
where intra-station measurement variability is usually lower compared to the
Northern Hemisphere, we adopted the same observation errors as for the
standard case in Part 1 of 2

The high-resolution test case discussed above allows the opportunity to
assess the aggregation error as well. This is the error due to the
degradation of the spatial resolution from the original resolution of the
transport model to a lower resolution that the inversion can accommodate.
When there is heterogeneity in the surface fluxes and inhomogeneous
transport, averaging the surface fluxes to a coarser resolution leads to
errors occurring in the modelled concentrations due to the measurement not
representing the larger pixels over which the transport is modelled

The elements of the prior flux error covariance matrix need to be constructed
from the best available knowledge of sources and sinks at the surface and at
the boundaries.

The approach of

The daytime net primary productivity (NPP) and night-time
autotrophic respiration (Ra) data used as standard deviations of net
ecosystem productivity (NEP) at the resolution of 1.2

Since the domain of the network design includes the fossil fuel sources of
South Africa, fossil fuel flux uncertainties needed to be derived as well.
Previous regional inversions, where the total flux of a source pixel was
solved for, had detailed inventory data available for the fossil fuel
emissions, and they assumed these were perfectly known

The standard deviations of 10 realisations (top) of the Fossil Fuel
Data Assimilations System (FFADS) at the original 0.1

For the standard network design, the prior flux error covariance matrix is
estimated as a diagonal matrix, where the diagonal elements are the sum of
the variances of the biospheric fluxes and the fossil fuel fluxes for that
grid cell. The biospheric flux uncertainties were multiplied by the fraction
of the grid cell covered by land, separately for day and night. By
multiplying with the land fractions, we guarantee that the prior uncertainties
for coastal grid cells are scaled accordingly and ocean only grid cells are
set to 0, since the NEP and fossil fuel products only apply to the land
surface. We assumed no correlation in the prior error covariance matrix of
the fluxes. This is a necessary assumption since we have no data from which
to determine the best correlation lengths. In reality, grid cells with
similar biota and under similar climate will have correlated fluxes.
Similarly, fluxes from the same source which occur close in time will also be
correlated

In the network design under the standard case, we kept the uncertainties of
the ocean-only grid cells set to 0, since our focus is on reducing the
flux uncertainty over land. We are not seriously assuming that we know the
ocean fluxes perfectly, but for the purposes of this optimal network design
we would prefer if the terrestrial measurements focused on solving for the
terrestrial fluxes. Of course, to run a full inversion, knowledge is needed
about the ocean fluxes, and this would be obtained through ocean-based
measurements. The contributions from the ocean can be divided into the
“near field” and “far field”. The far-field contributions are contained
within the boundary contributions. The near-field contributions are those
within our domain. A sensitivity test was conducted whereby 10 % of the
maximum land NEP standard deviation was allocated to the ocean grid cells.
This uncertainty represents the uncertainty in the ocean productivity models
which would be used to obtain prior estimates of ocean fluxes during an
inversion, which are similar to the values allocated by

Three optimisation routines have been used for optimal network design in the
literature, namely IO

During the IO procedure we added one station at a time from the candidate
list to our base network of two stations and calculated

The overall uncertainty in fluxes can be expressed by two different metrics

As a sensitivity test, the

We evaluated the different networks in terms of their uncertainty reduction:

Although IO is expected to be more computationally efficient, optimisation through the GA would also be well suited for this kind of problem, considering that this network design for South Africa is starting with so few existing stations. The GA begins with each of the solutions in the population containing five stations. Therefore all five stations are optimised simultaneously, rather than sequentially. Thus, this method may be more suited to the case where there are multiple deployments, as we have. It is possible under these circumstances that the best solution for a five-station network in terms of reducing the overall uncertainty for South Africa may not include the one station which on its own reduces the uncertainty the most. The GA is highly parallel, and so we can take advantage of high performance computing, but the running time of the GA is still higher in comparison to IO.

The approach used to run the GA during the sensitivity analyses is adopted
from

In this implementation of the GA, elitism is maintained by keeping the best
solution from the previous population, without making any changes through
crossover or mutation on this member. The algorithm converges once a given
number of iterations is reached, or once a convergence criterion is met. The
solution with the best fitness criterion is then selected from this
population, where the fitness F is calculated as

The GA requires a trade-off between the diversity in the solutions, ensuring
that the algorithm does not get stuck in local extrema, and strong selection
to ensure that the population moves towards the optimum solution. This is
achieved by adjusting the mutation rate – high enough to produce diversity in
the solutions but low enough to ensure that members with high fitness
persist and so ensure a tendency towards the optimum solution. From previous
work

The population size and number of iterations affect the computation time of the algorithm. A large population size is favourable because this ensures diversity in the solutions. The more iterations that take place, the more solutions the algorithm can assess and the better the chance of finding the global minimum. High values for both of these parameters result in long computation times. In this study the number of iterations was set at 100 for a single-month optimisation, and to 150 for a combined month optimisation. These values were determined from GA trials carried out on the data prior to deriving the results for this study.

Hypothetical stations were selected from a regular grid over South Africa,
resulting in 36 equally spaced locations (Fig.

Since the surface sources are expressed as fluxes in carbon, the contribution
to the concentration at the measurement site is expressed in the amount of
carbon seen at the measurement site from a particular source. In the case of
the boundary sources (or contributions from outside of the domain) which are
given as concentrations, their contributions to the concentration at the
measurement site are expressed as a proportion of their concentration,
dependent on their influence at the receptor site. Part 1

For the network design, four boundaries (north, south, east and west) were
used, and we calculated the sensitivity of hourly observed concentrations to
weekly boundary concentrations. To determine if the influence of the boundary
concentrations on the observation errors should be included in the network
design, we needed to know whether the uncertainties contributed by the
boundary concentrations were significant compared to other contributions. To
see this we calculated

The 36 potential locations of the new stations in the optimal network design. The locations were spaced on a regular grid over the surface of South Africa. The existing Cape Point and the Gobabeb GAW stations are marked by the triangles.

To compare the utility of the optimal networks from each algorithm run, the
uncertainty reduction was assessed for each of these networks. The similarity
of the networks in terms of the station locations was assessed using a test
statistic from the chi-squared complete spatial randomness test, measuring
the degree of clustering, where the expected value is based on the null
hypothesis that the stations are located randomly over the domain. The
intention here was not to perform a statistical test based on the chi-squared
distribution, since the network did not constitute a sample nor were there
enough stations, but to calculate an indicator that would assess the degree
of clustering of the measurements stations for a particular network solution,
referred to as the clustering index, which was also used to compare between
two networks.

A dissimilarity index (DI) was calculated as the sum of the distance to the
nearest neighbour in the compared network, over all the members in the pair
of assessed networks.

The particle counts generated during the LPDM runs for each station were
summed over the month in order to obtain a footprint of each station. To
illustrate this, plots of the influence footprint in January (Fig.

Using the influence functions now available for each station, the test of the
influence from the boundaries on to the observation errors was conducted.
Given the large domain over which the LPDM was run, it was not surprising that
the boundaries had minimal influence. Overall, the square root of the maximum
diagonal element of

The footprint of Cape Point, station 28 (top right), station 18
(bottom left) and station 4 (bottom right) relative to the surface grid
cells at a resolution of 1.2

Aggregation errors were found to be a significant contributor to the overall
observation error covariance matrix. Aggregation errors of as high as
17.10 ppm were found for measurement sites in the north-eastern interior, and as
low as 0.01 ppm for stations in the south-western interior (Fig.

Map of the aggregation error values (ppm) associated with each measurement station for the month of January.

When running the LPDM to generate the sensitivity matrix, it is imperative to specify a sufficient number of particles per release, as well as to run the model for at least as long as required, with additional time at the beginning of the run. This is to avoid transport errors, and to avoid exaggerating the aggregations errors. Therefore, the aggregation errors were calculated using the last week of the 4-week sensitivity matrix.

The next sections present the results of the optimal network design, first
under the basic parametrisations as used in Part 1

The network solution for July was able to achieve a reduction in uncertainty
in the total South African flux from 6.42 gC m

Map of the optimal stations to add to the existing network to reduce
the overall uncertainty of fluxes in South Africa for July, January and the
combined months of July and January. The standard network design conditions
are 50 m surface grid height, diagonal prior covariance, 2 ppm uncertainty
in concentration observations, a 1.2

Ranking of the new stations added to the base network for two seasons (winter and summer) represented by July and January, as well as the integrated 2 months. The cumulative reduction of uncertainty relative to the base uncertainty is provided in brackets.

In January the total flux uncertainty was much higher compared to July, with
a total flux uncertainty of 128 gC m

Map of the optimal stations to add to the existing network to reduce
the overall uncertainty of fluxes in South Africa under the 11 different
sensitivity cases for July (top), January (middle) and the combined months
of July and January (bottom). The cases include the standard case (Standard),
surface grid height set at 60 m (Ht 60 m), surface grid height set at 75 m
(Ht 75 m), use of the trace in the uncertainty metric (Trace), doubling of
the night-time observation error uncertainty (Night), addition of correlation
between elements in the prior covariance matrix (Correl), spatial resolution
set at 0.8

The total flux uncertainty under the base network for the combined months of
January and July was calculated to be 128.43 gC m

The daytime net primary productivity (NPP) data used as standard
deviations of net ecosystem productivity (NEP) at the resolution of
0.8

The results for the sensitivity analyses run for both months individually as well as the
combined months of January and July appear in Fig.

The results from the sensitivity tests for January show a great deal more variability between network solutions compared to July, with DI values of greater than 0 for almost all network solution comparisons. Similarly to July, the network solutions do appear to converge towards three stations, but not the same stations as July. Under January's conditions, only the homogeneous ocean variance test case resulted in an identical solution to the standard case. There is no single station which all network solutions contained. Stations 29 (north-eastern interior) and station 12 (eastern coast) were agreed on by 10 out of 11 tests, and stations 27 (northern interior) and 11 (south-eastern interior) were agreed on by 9 out of 11 tests. These four stations are influenced by areas of large fossil fuel flux uncertainty, and stations 29 and 12 near regions of large summer NEP uncertainty. Sensitivity tests with DI values below 1000 when compared to the standard case include the tests considering surface grid box height, doubling of night-time observation error uncertainty, the test considering variable ocean flux uncertainty, the trace uncertainty metric test and the GA test case. These five test cases show strong agreement. The trace uncertainty metric case favoured a station near the central interior. This station was also included in the solutions of the correlation and medium-resolution cases, where these tests obtained DI values of 1225 and 1305 respectively when compared to the standard solution. These tests, as well as the GA and high-resolution test cases, included stations near the south coast, near areas of localised fossil fuel uncertainties.

The sensitivity tests from the combined months showed less variability
between solutions compared to January (Fig.

Ranking of the new stations added to the base network under 10
different sensitivity tests for the combined months of July and January. The
tests are presented in the following order: surface grid height set at 60 m;
surface grid height set at 75 m; trace of the posterior covariance used in
the uncertainty metric; uncertainty of the night-time observation errors is
doubled; correlation structure is included in the prior covariance of the
fluxes; spatial resolution is increased to 0.8

Table of network comparison statistics for the combined months of
January and July. The sensitivity tests are presented in the same order as
for Table

Table of dissimilarity indices for the optimal network solutions for
the combined months of January and July. The sensitivity tests are presented
in the same order as for Table

The statistics for the different sensitivity tests for the combined months
(Table

Most network solutions tended towards the same amount of clustering of stations, obtaining a clustering index of 23.8. The GA and test case considering correlation had more dispersed networks, and the high-resolution test case had the highest amount of clustering, with a clustering index of 36.6. We would expect the correlation case to spread stations since a given station will reduce uncertainty everywhere within one correlation length. The GA for the combined months took the longest to run, at over 32 h, which is 39 times longer than the running time of the standard IO solution. This was followed by the high-resolution solution, which took 25.2 h, and the two ocean flux uncertainty test cases, which took over 5 h each.

Under a reference set of conditions, an optimal network design was obtained for South Africa for two representative months of the year. The resulting designs reduced the uncertainty of carbon fluxes from South Africa compared to the base network by 43 % in July and 78 % in January. These relatively large reductions in uncertainty are due to the lack of coverage by the current network, which only reduces the uncertainty of fluxes from South Africa by 16 % for both July and January. The concentration of stations by all networks tended towards the central interior, near the North West Province of South Africa and in the eastern parts of the country. These represent the areas with the largest uncertainty in biospheric fluxes, as well as fossil fuel emissions, in the country.

Station 11 is located near the uKhahlamba Drakensberg World Heritage Site.
Several remote holiday destinations are found in this area, near the town of Mooi
River, and road infrastructure is available. Potentially, facilities at or
near these holiday destinations could be utilised in order to conduct
atmospheric measurements, particularly if there is a communications tower
available. Station 18 is located near the peak of Ben Macdhui. This is near
the site of a 1996 atmospheric monitoring campaign, which assessed the
ability of transport models to resolve recirculation over and exiting South
Africa to the Indian Ocean

The sensitivity analysis demonstrated that, for most of the network design parameters considered in this study, the stations found to be most important by the standard network design were always identified in the network design solution. Many of the choices required for the optimal network design – such as the height of the surface grid cells, whether to inflate night-time observation error uncertainties relative to the daytime and the inclusion of ocean flux uncertainty – have a negligible impact on the final network design. Substituting the trace for the sum of the covariance elements also resulted in similar solutions.

The test cases considering higher spatial resolution tended to result in
network solutions different from the standard case, largely due to the
increase in spatial heterogeneity in prior flux uncertainties compared to the
coarser resolution. The spatial resolution of an inversion study impacts
network design in several ways. It is the main determinant of the amount of
aggregation error attributed to a measurement site, with aggregation error
reducing as the resolution increases. As the spatial resolution is degraded,
aggregation errors can become large, leading to the exclusion of sites in the
case of an optimal network design, even if they are in view of regions of
large flux uncertainty. The spatial resolution of the sources also determines
the dimensions of the sensitivity matrix and prior flux covariance matrix,
which impacts on the computational resources required to run an inversion or
network optimisation. Ideally, the highest manageable resolution should be
used, as close as possible to the resolution of the transport model and
original spatial products used for obtaining the prior fluxes and their
covariances. Alternative approaches, such as the use of multi-scale
representation of the source region, can be used to mitigate aggregation
errors as well

The GA was able to find marginally better solutions than the IO method, if run with sufficient population size and number of iterations, but in general did include the most influential stations from the IO solution. The increase in uncertainty reduction was found to be marginal but cost a great deal more in running time before this solution was found. If the resolution of the standard case had been higher, the GA would have taken longer to run, and the current computing system may have had insufficient memory. Moreover, to find a better solution than the IO, the iterations and population size would have had to be set even higher, due to the greater heterogeneity in the prior flux uncertainties in a higher-resolution setup, further increasing the computational costs. An additional advantage of the IO method over the GA method is that an evolution of results is generated, which is useful for practical purposes. By identifying the station which on its own best reduces the uncertainty in the posterior fluxes, it gives the decision makers the location of the site which should be prioritised over others in the network.

Even though we accounted for aggregation error, which would have corrected
the total flux estimate for the domain, there were still large differences
between the total flux uncertainties from the inversion results under
different spatial resolutions. This was due to the treatment of the prior
uncertainties under the different spatial resolutions. Degrading the spatial
resolution results in a loss of information; therefore it is best to run the
inversion at as high a resolution as possible. Favouring optimisation
techniques like IO, which can more easily accommodate high spatial
resolution, over those which could force a reduction in resolution due to
high computational demands, such as the GA, may be unavoidable. Techniques
like simulated annealing and the GA do not guarantee the global optimum, as
demonstrated by

Of the sensitivity tests, including correlation had one of the largest
impacts on the final network result, often differing significantly from the
standard solution. The correlation structure used in this study was generic,
simply assuming that fluxes from nearby grid cells and fluxes at the same
location near in time would be correlated, included for the purpose of
assessing the impact of correlation in the prior fluxes. For a network to be
based on a prior covariance matrix including correlation, there would need to
be confidence that this correlation structure and size of correlations
between fluxes were accurate. This is generally not the case, and easier to
assess when concentration measurements are available, which is why many
network designs have assumed independence between prior fluxes

Overall the results suggest that a good improvement in knowledge of South African fluxes is achievable from a feasible atmospheric network and that the general features of this network are invariable under various parameterisations of the transport model, prior information, and optimisation routine.

Peter Rayner is in receipt of an Australian Professorial Fellowship (DP1096309). This worked was supported by parliamentary grant funding from the Council of Scientific and Industrial Research. The authors would like to thank the helpful commentary from Thomas Lauvaux on the implementation and post-processing of the LPDM. Edited by: C. Gerbig