Towards monitoring the CO 2 source–sink distribution over India via inverse modelling: quantifying the ﬁne-scale spatiotemporal variability in the atmospheric CO 2 mole fraction

. Improving the estimates of CO 2 sources and sinks over India through inverse methods calls for a comprehensive atmospheric monitoring system involving atmospheric transport models that make a realistic accounting of atmospheric CO 2 variability along with a good coverage of ground-based monitoring stations. This study investigates the importance of representing ﬁne-scale variability in atmospheric CO 2 in models for the optimal use of observations through inverse modelling. The unresolved variability in atmospheric CO 2 in coarse models is quantiﬁed by using WRF-Chem (Weather Research and Forecasting model coupled with Chemistry) simulations at a spatial resolution of 10 km × 10 km. We show that the representation errors due to unresolved variability in the coarse model with a horizontal resolution of 1 ◦ ( ∼ 100 km) are considerable (median values of 1.5 and 0.4 ppm, parts per million, for the surface and column CO 2 , respectively) compared to the measurement errors. The monthly averaged surface representation error reaches up to ∼ 5 ppm, which is even comparable to half of the magnitude of the seasonal variability or concentration enhancement due to hotspot emissions. Representation error shows a strong dependence on multiple factors such as time of the day, season, terrain heterogeneity, and changes in meteorology and surface ﬂuxes. By employing a ﬁrst-order inverse modelling scheme using pseudo-observations from nine tall-tower sites over India, we show that the net ecosystem exchange (NEE) ﬂux uncertainty solely due to unresolved variability is in the range of 3.1 % to 10.3 % of the total NEE of the region. By estimating the representation error and its impact on ﬂux estimations during different seasons, we emphasize the need to take account of ﬁne-scale CO 2 variability in models over the Indian subcontinent to better understand processes regulating CO 2 sources and sinks. The efﬁcacy of a simple parameterization scheme is further demonstrated to capture these unresolved variations in coarse models.


Introduction
Accurate assessment of the sources and sinks of CO 2 is essential in planning and implementing mitigation strategies for greenhouse gas emissions and associated climate change.However, estimations of CO 2 fluxes contain significant uncertainties, which increase even more with finer spatial scales such as those required for the climate change mitigation policies at regional and national levels (e.g.Ciais et al., 2014;Li et al., 2016;Cervarich et al., 2016).By using atmospheric CO 2 concentration measurements, the CO 2 fluxes can be estimated by a multi-constrained observation-modelling approach, often referred to as a top-down approach or inverse modelling (Enting, 2002).For about 2 decades, these top-down approaches have been widely used to understand the modifications in the carbon cycle through natural and anthropogenic-induced environmental changes (Bousquet et al., 2000;Schimel et al., 2001;Rödenbeck et al., 2003;Patra et al., 2005).In addition to the observations, the inverse modelling system makes use of an atmospheric transport model (forward model), which determines the distribution of CO 2 concentration.Thereby, the inverse optimization approach derives the surface fluxes that are consistent with the measured concentration.The United Nations Framework Convention on Climate Change (UNFCCC) has acknowledged the increasing capability of inverse modelling to systematically monitor greenhouse gas (GHG) concentrations (Bergamaschi et al., 2018).
Most of the inverse modelling systems rely on global atmospheric transport models with a coarse horizontal resolution (often greater than 1 • ) (Rödenbeck et al., 2003;Peters et al., 2007;Rödenbeck et al., 2018a, b;Inness et al., 2019).These global data assimilation systems play an important role in the study of continental or subcontinental fluxes at annual or sub-annual scales.However, the regional estimation of fluxes using global models is hindered by the inability of these transport models to represent the observed CO 2 variability.The observed variability, as seen from the spatial and temporal distribution of atmospheric CO 2 , is highly correlated with the space and timescales of weather systems (Parazoo et al., 2011).This explains the presence of large modeldata mismatches in regions where mesoscale circulation is predominant (Ahmadov et al., 2007).Wind speed, wind direction, and the height of the planetary boundary layer (PBL) are the critical variables that determine the atmospheric CO 2 variability.Strong wind normalizes other small-scale variations in observed concentration due to mixing.In such cases, the CO 2 variability is expected to be dominated by the variations in background concentration; hence, the predictability can be higher during these conditions (Sarrat et al., 2007).The height of the PBL is an essential variable since the atmospheric CO 2 is subjected to rapid mixing up to this altitude.Hence, for a given location with a negative gradient in CO 2 vertical distribution, an overestimation of the PBL height leads to an underestimation of CO 2 concentration, and vice versa (Gerbig et al., 2008).
Another important variable that impacts the CO 2 variability is the heterogeneous topography.Variations in topography influence the transport of the tracers.When the smallscale orographic details are not adequately represented in the models, they can lead to representation errors in CO 2 simulations as large as 3 ppm (parts per million) at scales of 100 km (Tolk et al., 2008;Pillai et al., 2010).Horizontal gradients in CO 2 concentrations can go up to values of 30 ppm within a spatial scale of 200 km, depending on the land surface heterogeneity (van der Molen and Dolman, 2007).Furthermore, variations in land use patterns between neighbouring regions can cause considerable variability in the CO 2 surface fluxes.Thus, a proper representation of land use patterns is also important in terms of simulating CO 2 variability.Previous studies based on airborne measurements reported that transport models need a spatial resolution smaller than 30 km to be able to represent CO 2 spatial variability in the continental boundary layer (Gerbig et al., 2003).Significant efforts have been invested in deriving fluxes by taking into account these fine-scale variations (e.g.Gerbig et al., 2003;Lauvaux et al., 2009a;Carouge et al., 2010;Pillai et al., 2011Pillai et al., , 2012;;Broquet et al., 2013) over North American and Eurasian domains in the past decade.However, there still exists lower confidence in estimates over other regions where there is a lack of both advanced modelling systems at relevant spatiotemporal resolutions and good coverage of ground-based monitoring stations.
In the context of the Indian subcontinent, the inverse-based estimation of fluxes at fine scales is essentially new; hence, many questions remain.A number of monitoring sites measuring atmospheric greenhouse gases have become available in India during the last decade (Tiwari et al., 2011;Lin et al., 2015;Nomura et al., 2021).Aside from the ongoing progress in augmenting observational data streams, it remains challenging to assimilate these data for deducing process-specific information effectively (e.g.McKain et al., 2012;Bréon et al., 2015;Pillai et al., 2016).The limitation of coarse global models in representing observations over the Indian subcontinent is reflected in the analysis made by Patra et al. (2011).
The seasonally reversing South Asian monsoon system is a prominent meteorological phenomenon affecting the Indian subcontinent, which is also expected to influence the terrestrial-atmosphere flux exchanges.Various studies have demonstrated the role of Indian monsoon circulations on regional atmospheric transport by strong southwesterly winds during the summer monsoon (June to September) and by northeasterly winds during the winter monsoon (October to November; e.g.Goswami and Xavier, 2005;Krishnamurthy and Shukla, 2007).Monsoon convection transports the boundary layer air into the upper troposphere.Subsequently, air parcels are slowly uplifted by diabatic heating to higher altitudes (e.g.Vogel et al., 2019).An accurate representation of convective vertical transport is very chal-lenging and an important source of uncertainty in current transport models (Willetts et al., 2017).Note that the Asian summer monsoon anticyclone (ASMA) active during the Indian summer monsoon period plays a key role in uplifting trace gases to the upper troposphere and lower stratosphere (e.g. Park et al., 2007).Moreover, a significant component of flux variations can arise from biospheric fluxes (Schimel et al., 2014), which is influenced by variables such as rainfall, radiation, and temperature (Chen et al., 2019).Several studies showed that the monsoon system substantially impacts vegetation growth, generating distinct spatiotemporal patterns of the biogenic fluxes (e.g.Gadgil, 2003;Valsala and Maksyutov, 2013;Ravi Kumar et al., 2016;Kunchala et al., 2022).It is noteworthy that the cropping patterns over India have a strong dependence on seasons and are mainly determined by dry and wet seasons for nearly 65 % to 70 % of the country's area, except over the northeastern and southwestern (Western Ghats) regions of India.In India, wet-season crops (kharif crops cultivated from June to November), including rice, millets, and maize, mainly depend on monsoon rain.Dry-season crops (rabi crops, e.g.wheat, barley, and mustard, cultivated from November to April) are less waterdependent and primarily rely on irrigation (DAC/MA, 2015).Therefore, employing higher-resolution modelling over the Indian subcontinent is desirable to better account for finescale variations generated by both mesoscale transport processes and surface flux patterns.
This study focuses on accounting for unresolved sub-gridscale variability when employing current generation global models.The assimilation of observations in an inverse framework requires the characterization of these error structures at relevant scales that can be utilized to retrieve the source-sink distribution over India.The main objectives of this paper are to describe and quantify the expected spatiotemporal variability in atmospheric CO 2 that is not resolved by the current generation global models, quantify to what extent these variations cause uncertainty in flux estimations, and assess how these uncertainties can be minimized by modelling the subgrid variations in the global models.Specifically, we address the following questions: (1) how good is the level of agreement among global transport models that are used in current generation inversion systems for predicting atmospheric CO 2 concentrations over the Indian subcontinent?(2) How large are the variations in atmospheric CO 2 that are unresolved by global and regional models, which operate at different spatial scales from 4 • × 4 • to 0.5 • × 0.5 • ?(3) What is the role of seasonal changes in generating different patterns in these sub-grid variations in CO 2 ?(4) How much is the uncertainty in the inverse-based flux estimation caused by these unresolved variations in the coarse models when utilizing a given network of surface observations over the domain?(5) How effectively can we capture the key aspects of the variability and account for it in flux estimations?Information from observations can be better utilized if we improve the atmospheric transport models to resolve the observed variabil-ity as accurately as possible.As a result, the data assimilation system gains significantly (e.g. with increasing weights on observations and performing minimal data filtering) from this for improving the flux estimates.
In this article, we present results based on the analyses of high-resolution simulations at a spatial resolution of 10 km × 10 km for the months of July and November 2017.The year 2017 was characterized by neutral Indian Ocean Dipole conditions over the Indian Ocean with the beginning of a mild La Niña over the Pacific by the end of the year (NOAA/ESRL, 2022a, b).The month of July represents a monsoon period when both biospheric and convective activity are strong.July is also characterized by strong lowpressure system activity over the Bay of Bengal, which results in large rainfall over central India (Krishnamurthy and Ajayamohan, 2010).In contrast, the month of November is more representative of post-monsoon wintertime over the Indian subcontinent.We quantify the expected sub-grid variability using our high-resolution simulations.We have also utilized optimized CO 2 products at global scales to provide a more comprehensive overview of the typical mismatch among the existing model simulations over the Indian subcontinent even at monthly and annual scales.These intermodel mismatches arise due to various reasons such as differences in input datasets (e.g.prior fluxes), transport, and model configuration.Part of these mismatches can also arise due to the inability of coarse-resolution global models to simulate the sub-grid-scale processes which can lead to representation errors; thus, there is uncertainty in inverse estimations.By designing a pseudo-surface observation network over the domain, we investigate the impact of these unresolved variations on the regional flux estimations and assess how a simple parameterization scheme can help to reduce these errors in the global model.To our knowledge, until now, there is no comprehensive published study of this kind over the Indian subcontinent assessing the magnitude and impact of temporal and spatial variability exhibited by atmospheric CO 2 .
The outline of the paper (see Fig. S1 in the Supplement) is as follows: Sect. 2 describes our modelling system, data, and the methods used for reporting intermodel mismatches and quantifying the sub-grid-scale variability CO 2 .In Sect.3, we present our analysis, demonstrating the impact of unresolved sub-grid-scale variability on CO 2 flux estimation over India.Finally, we discuss the implications of our findings in Sect.4, highlighting possible ways forward to yield an improved estimation of CO 2 budgets over India.

WRF-Chem-GHG modelling system
We use the modelling system WRF-Chem-GHG in which the Weather Research and Forecasting model (coupled with Chemistry) version 3.9.1.1 (Skamarock et al., 2008) is coupled with the greenhouse gas module (WRF-Chem-GHG; Beck et al., 2011) that is implemented as part of the WRF-Chem distribution (WRF-Chem; Grell et al., 2005).For simulating the atmospheric transport, the model uses fully compressible Eulerian non-hydrostatic equations on an Arakawa C staggered grid, conserving mass, momentum, and scalars (Skamarock et al., 2008).In the WRF-Chem-GHG (hereafter referred to as WRF-GHG), we use the passive tracer chemistry option to simulate changes in CO 2 mixing ratios associated with surface fluxes and atmospheric transport.We utilize a biospheric model and emission inventory data to simulate atmospheric CO 2 enhancements associated with biogenic and emission fluxes, as described in Sect.2.1.1 and 2.1.2.Table 1 summarizes the model configuration, including physics parameterizations and input data used in this study.
The model domain covers a region spanning from 65 to 100 • E and 5 to 40 • N, configured in a Lambert conformal conic (LCC) projection with 307 × 407 grid points.The spatial resolution of the grid is 10 km × 10 km, and the model time step is 60 s.We have used a model output with a temporal resolution of 1 h for this study.The simulations are performed using 39 vertical levels, with the model top at 50 hPa and 10 levels within the lowest 2 km.WRF-GHG simulations are performed for the entire months of July and November 2017.Implementation of the WRF-GHG system over the Indian subcontinent enables us to customize it according to the domain features and build a state-of-the-art modelling system, which eventually estimates CO 2 fluxes through regional inverse systems.The potential of the WRF-GHG model in simulating the fine-scale spatial variability was also established in previous studies (Ahmadov et al., 2009;Pillai et al., 2011;Park et al., 2018).

Representation of biospheric fluxes
We use the Vegetation Photosynthesis and Respiration Model (VPRM) in the modelling system to calculate net ecosystem exchange (NEE) representing the biospheric fluxes (Mahadevan et al., 2008).VPRM is a diagnostic biosphere model, which utilizes the following remote sensing products: Enhanced Vegetation Index (EVI) and Land Surface Water Index (LSWI) derived from reflectance data of the Moderate Resolution Imaging Spectroradiometer (MODIS), in addition to meteorological data, namely solar radiation and air temperature.In this study, these hourly NEE calculations are performed within WRF-GHG and simultaneously with the meteorology simulations in which NEE is calculated as a sum of gross ecosystem exchange (GEE) and ecosystem respiration (R eco ).VPRM, in this case, uses the meteorological data provided by WRF-GHG.VPRM uses the SYNMAP vegetation classification (using the tile approach; Jung et al., 2006) together with EVI and LSWI from MODIS surface reflectance data at a resolution of 1 km and 8 d.We aggregate these indices specific to different vegetation types onto the LCC projection for the entire domain at the model's spatial resolution.A number of studies have used VPRM for other regions around the world in which derived NEE shows good prediction skills for hourly to monthly timescales (Ahmadov et al., 2009;Pillai et al., 2011;Liu et al., 2018;Park et al., 2018).

Representation of emission fluxes
Anthropogenic CO 2 emission fluxes are prescribed from the Emissions Database for Global Atmospheric Research (EDGAR) dataset, version 6.0, provided at a horizontal resolution of 0.1 Crippa et al., 2021).We disaggregate the available annual emission data into hourly emissions using the temporal distribution CO 2 profiles (Steinbach et al., 2011;Kretschmer et al., 2014).To represent biomass burning emission, we have used data from the Global Fire Assimilation System (GFAS) with a spatial resolution of 0.1 • × 0.1 • and a temporal resolution of 1 d.GFAS is based on satellite data, which provides the fire emissions by assimilating fire radiative power (FRP) observations from MODIS instruments (Kaiser et al., 2012).All these flux data are gridded and projected to WRF-GHG's model domain.

Initial and boundary conditions
Meteorological and chemical initial and boundary conditions are required in WRF-GHG to account for the initial state and inflow or background flow.The initial and lateral boundary conditions for the meteorological variables include horizontal wind components, pressure, specific humidity, and sea surface temperature (SST).These fields and other necessary surface initialization fields are obtained from the ERA5 reanalysis dataset of the European Centre for Medium-Range Weather Forecasts (ECMWF) and extracted at a horizontal resolution of 25 km and a temporal resolution of 1 h (Hersbach et al., 2020).The initial and lateral boundary conditions of CO 2 tracers are obtained from the Copernicus Atmosphere Monitoring Service (CAMS) global greenhouse gas forecast products (currently in development; see Massart et al., 2016;Agustí-Panareda et al., 2019).Namely, we have used the dry-air mole fractions of CO 2 from the CAMS greenhouse gas experiment analysis (gqiq), with a temporal resolution of 6 h, horizontal resolution of 0.5 • × 0.5 • (original resolution 9 km × 9 km), and 137 vertical levels.Note that the CAMS product at 9 km × 9 km resolution is in the developmental and not yet available to the general public (contact anna.agusti-panareda@ecmwf.int for more information).
We have utilized a simulation strategy to update the initial meteorological conditions to take advantage of assimilated meteorological fields from ECMWF.The model is reinitialized each day with ECMWF assimilated data at the model starting time of 12:00 UTC (day + 0) and runs for 30 h until 18:00 UTC of the next day (day + 1).The first 6 h are considered to be the meteorological spin-up, and the remaining 24 h (from day + 0, 18:00 UTC to day + 1, 18:00 UTC) are used for the analysis.The initialization of CO 2 is done at the beginning of the first hour of the model simulation, which is at 00:00 UTC (e.g.Ahmadov et al., 2012;Pillai et al., 2011).

Global model products
We have used optimized products at global scales to examine the differences in the representation of CO 2 variability over the Indian subcontinent at monthly and annual scales.These global model outputs are derived from inverse model simulations, which estimate the source-sink distributions of CO 2 and then generate three-dimensional CO 2 concentration fields that are consistent with the optimized posterior fluxes.Four global inverse modelling products -Carbon-Tracker, CarboScope, LSCE v18r3, and LSCE FT18r1available for the year 2017 are used for our analysis (see Table 2 for more details).The Laboratoire des Sciences du Climat et l'Environnement (LSCE) model version v18r3 (hereafter LSCE) utilizes surface observations, and the model version FT18r1 (hereafter LSCE FT) uses satellite retrievals from the Orbiting Carbon Observatory (OCO-2) for the optimization of CO 2 fluxes (Chevallier et al., 2005(Chevallier et al., , 2010;;Chevallier, 2013).All of these above models differ in terms of the model formulations and configuration (e.g.transport and the employed inversion methodology), observational datasets that were assimilated (e.g.data from surface monitoring stations, aircraft missions, ship cruises, AirCore balloon soundings, and the satellite's total column retrievals), prior datasets, and spatiotemporal resolutions.None of these products used ground-based observations from the Indian subcontinent for their optimization, which raises concerns about the reliability of the optimized flux estimations over the region.Hence, it can be assumed that part of the intermodel differences in predicting the variability is related to the paucity of the CO 2 observations over the region.To represent the daytime, we have used the concentration fields for the local time (LT) ranging from 11:30 to 16:30 LT from all these models for the analysis.

Quantification of spatial variability
For quantifying the spatial variability due to sub-grid-scale processes that cannot be resolved by the coarse-resolution models, we follow the approach as described in Pillai et al. (2010).The term "representation error" indicates the mis-match between the scales of model simulations and observations collected (Pillai et al., 2010;Janjić et al., 2017).In other words, the representation errors arise due to unresolved scales not captured by the model.Here we calculate the representation errors in the coarse-resolution models, which can be resolved by implementing a high-resolution model at 10 km resolution.It is assumed that the high-resolution simulation captures the majority of the sub-grid-scale variability, even though it cannot be expected to resolve all observed variability.Most of the current global model simulations are performed at coarse resolutions of several degrees.But, with the recent advancement in computational capacity and numerical techniques, a horizontal resolution of 1 • × 1 • is quite likely achievable for the global data assimilation systems.For estimating the representation error in a coarse model with a typical spatial resolution of 1 • × 1 • , we have calculated the standard deviation of the CO 2 dry-air mole fraction simulated by the WRF-GHG model within the coarse grid boxes of 1 • × 1 • as follows: where m j .n is the number of 10 km boxes inside the coarser grid cell of 1 • × 1 • , m is the CO 2 dry-air mole fraction corresponding to 10 km boxes, and m is the average within the coarser grid cell.So, the estimated values represent the sub-grid-scale variability within the coarse model grid cell with a horizontal resolution of 1 • × 1 • .The representation errors are calculated at corresponding vertical model levels to represent the impact of surface influence and mesoscale transport adequately, as predicted by the high-resolution model.As mentioned before, we assume that the high-resolution simulations represent the realistic distribution of CO 2 .Furthermore, we assume that the coarseresolution model also has a terrain-following vertical coordinate system and also has the same vertical grid spacing of high-resolution model.As spaceborne instruments measure total columns rather than near-surface concentrations, we extend the analysis to the column-averaged dry-air mole fraction (XCO 2 ) as measured by the satellite instrument, i.e. m represents either CO 2 at a given model level or XCO 2 .In order to assess the dependence of the representation errors in the horizontal resolution of the employed model, we have computed representation errors for multiple resolutions ranging from 0.5 which would encompass the resolutions of both present and nearfuture global inverse modelling systems.
The surface representation errors are calculated using the model simulations from the second model level (mean height is ∼ 200 m from sea level) to avoid the inconsistency that can be generated from inputting emission fluxes at the first model level.Representation errors are calculated separately for daytime (11:30 to 16:30 LT) and nighttime (23:30 to 04:30 LT) to account for the difference in sub-grid-scale process during these times.The representation error presented in Eq. ( 1) varies from one model time step to the next.In order to obtain a typical (average) representation error, we compute the monthly average representation error (σ CO 2 ) using Eq. ( 2).
where T is the total number of simulations in a month during daytime or nighttime.Furthermore, we have calculated the representation error (σ CO 2(mon) ) using Eq. ( 3), which describes the systematic component of representation error and provides important constraints for inversions using both ground-based and satellite observations over India. where M j .n is the number of 10 km boxes inside the coarser grid cell of 1 • × 1 • , M j is the monthly averaged CO 2 dry-air mole fraction at a 10 km spatial scale, and M is the corresponding average within the coarser grid cell of 1 • .The difference between Eqs. ( 1) and ( 3) is that we use monthly averaged CO 2 concentration values in Eq. ( 3) instead of hourly values as in Eq. ( 1).Both July and November are used to understand the differences in the variability during summer and winter.
Due to the paucity of adequate ground-level observations over India, satellite observations play an essential role in the estimation of CO 2 fluxes via inverse modelling (e.g.Philip et al., 2022).Satellite observations can provide the columnaveraged CO 2 (XCO 2 ) concentration with a precision of 1 to 1.5 ppm (O'Dell et al., 2012;Wunch et al., 2017;Liang et al., 2017).In order to utilize these satellite observations, the transport models being used in the inverse estimation must be highly accurate.Since satellite footprints are smaller (∼ 2-20 km 2 ) than the current model grid size (> 100 km), using these measurements for optimization via inverse modelling introduces spatial representation errors and associated uncertainties in the inferred fluxes.Note that the spatial biases of a few 10ths of parts per million in column-averaged CO 2 can potentially alter even the annual subcontinental fluxes in the range of 10ths of a gigaton of carbon fluxes (Chevallier et al., 2007;Miller et al., 2007;Chevallier et al., 2010).To quantify these systematic transport errors when representing satellite measurements in inverse models, we calculate the spatial representation errors for XCO 2 that coarse inverse modelling would suffer from using highly precise and accurate satellite measurements.
We have selected monsoon (July) and post-monsoon (November) periods for our analysis to identify the seasonal differences in the sub-grid variability over India.
In July, many low-pressure systems were active in the monsoon trough region (India Meteorological Department, IMD, weather reports; https://mausam.imd.gov.in, last access: 20 November 2020).In general, tropical cyclones in the Asian monsoon region can cause fast uplift of air masses into the upper troposphere and lower stratosphere (e.g.Li et al., 2021), which may increase the modelling error due to the misrepresentation of the associated convective activity that is only parameterized in global models.The presence of enhanced biospheric activity during July can reduce the CO 2 concentration in the lower troposphere (e.g.Patra et al., 2011).Also, the strong vertical and horizontal mixing due to the monsoon circulation dilutes the CO 2 concentration in the atmosphere during July compared to November.The convective activity associated with the Indian summer monsoon was absent during November; however, the convection caused by synoptic systems such as tropical cyclones was still present.Such a low-pressure system activity was found over the Bay of Bengal and over the Lakshadweep area (≈ 8 • N, 74 • E) from 22 November onwards.One of these low-pressure systems in the Bay of Bengal further developed and intensified as a deep depression and moved to the southeastern Arabian Sea and evolved into a severe cyclonic storm (Ockhi) by 30 November.

Estimation of representation-error-induced flux uncertainty using pseudo-surface measurements
In order to quantify the impact of representation errors in flux estimations when utilizing surface measurements, we have devised the following strategy.We used nine CO 2 surface monitoring sites representing various geographical regions in India (Fig. 1).Not all of these observation stations are currently fully operational or have continuous measurements.
We have performed an observation system simulation experiment (OSSE) using high-resolution CO 2 simulations generated by the WRF-GHG model for each of these stations.We first-order simplification for the inversion, we assume that the footprints of each observation site span a radius of 200 km around the site based on our analysis using the Stochastic Time-Inverted Lagrangian Model (STILT; Lin et al., 2003).
STILT footprints indicate that 50 % of the sensitivity of a site to fluxes over India is located in a region that has about the same area as a circle with a radius of 200 km.For nine stations, this footprint area covers around 35 % of the total area of India.The STILT is driven with ECMWF IFS (Integrated Forecasting System) meteorological fields and the trajectories are calculated based on 100 virtual particles that are released for each time interval and location.The residence time of the particles in the surface layer is weighted by the atmospheric density to derive the footprints of each location.
In our inversion set-up, we have used the hourly biospheric contribution of the atmospheric CO 2 mixing ratios simulated by WRF-GHG over the coarse grid cell of 1 • × 1 • surrounding the location of each measurement site as OSSE observa-tions (m i,j (t)).
where H is the transport operator, and F (λ) is the flux model in which a subset of parameters λ out of total model parameters p will be optimized in the inversion.Here, i (i = 1 to 9) represents the nine observation sites and j (j = 1 to 100) is the number of WRF-GHG pixels inside the coarser grid cell of 1 • × 1 • .The modelled biospheric CO 2 signal (m i ) for the inversion is given by the following: ( The modelled values deviate from the observations by a representation error ε i,j (t).Since the modelled values (m i ) correspond to the mean of the 100 fine grid cells, the simulated values at site i are given as follows: Here, F (λ) is taken as being linearly dependent on λ; hence, it can be expressed as follows: where is the biospheric flux (NEE) distribution over the region.
In the inversion, we retrieve the monthly NEE by utilizing hourly m i,j (t) and m i (t) over a month.For OSSE and uncertainty flux estimation, we use the VPRM-derived NEE fluxes as the "true" fluxes (see Sect. 2.1.1).By using this inverse modelling design, we require the performance of 100 inversions per site, each of which uses a realization of the representation error to estimate the corresponding realization of the resulting uncertainty in the retrieved fluxes.
Both the observation and simulation vector have 6480 (= 9 × 30 × 24) elements for a month having 30 d, and the state vector has nine elements corresponding to scaling factors of fluxes for that month over regions around the nine sites (see Fig. 1).In other words, each site has been assigned with one scaling factor for NEE, and there is a total of nine scaling factors for a given month.We use a unit vector λ as the prior scaling factor.The prior uncertainty is neglected here, as the expected impact of the representation error in the retrieved fluxes is significantly smaller than the typical prior uncertainties assumed in Bayesian inversions (on the order of 50 %-100 % for biospheric fluxes).Hence, neglecting this prior uncertainty does not have a large impact on our results.The inversion retrieves optimized scaling factors λ retr .
We have performed 100 inversions per site, and the scaling factors are retrieved by minimizing the cost function for each observation station as follows: where T is the number of observations for a month.Minimizing these cost functions results in an optimized estimate of scaling factors λ retr , which is a vector of scaling factors with nine elements (λ retr,i ) for each of the 100 inversion cases.By this inverse design, the deviation of posterior fluxes from the true fluxes over India is thus the uncertainty in retrieved fluxes, S rep , that arises solely due to the contribution from the representation error.The standard deviation of the scaling factors from these 100 inversions (σ λ retr ) are used to retrieve flux uncertainty.S rep is obtained as follows: where true is the monthly VPRM biospheric flux (NEE) over the Indian region, and k is the number of pixels (33 141 pixels) over the Indian region.Here, S λ retr has the dimension of Indian region at a 10 km spatial resolution and is defined in such a way that all the grid cells (at 10 km spatial resolution) other than the grid cells within the influence region (200 km radius around the station) of each station is given with zero values (21 335 pixels) and the grids in the influence region of each station (11 806 pixels) is given with the corresponding values of σ λ retr,i .In this way, the approach does not depend on Eqs.
(1) to (3) but shows the impact of difference between m j and m on retrieved fluxes.Any temporal correlations in the representation error are not considered for this experiment.We have performed the inversion separately for daytime and nighttime values to identify the impact of diurnal variations in representation errors in flux uncertainty.Note that, by following the above inversion design and assumptions, there is a high likelihood of underestimating the impact of the modelling error in flux estimations, since we have not considered other sources of uncertainties such as model transport uncertainty and inappropriate prior assumptions.Thus, the quantification of flux uncertainty using this approach may be considered to be the lower bound of the uncertainty (i.e. the minimum flux uncertainty one may expect while estimating fluxes using a model with a grid cell of 1 • × 1 • and nine stations with the representativeness of 200 km).

Agreement among global models
We first analyse the level of agreement among currentgeneration global transport models in simulating CO 2 concentration over the Indian subcontinent.Note that a mere agreement among the coarse models does not guarantee a good model performance over the region due to their plausibly large model errors in common and interdependency in terms of data sources.We restrict this analysis to daytimeonly values, since different processes control the variability in CO 2 concentration at daytime and nighttime, and simulating nighttime variability is more complicated than the daytime (Lauvaux et al., 2009a).For a consistent comparison among global models, all the products are sampled at the same time for the region spanning from 67 to 98 • E and 7 to 38 • N. Figure 2a depicts the annual vertical profiles of CO 2 concentration, showing models' discrepancy in simulating the vertical gradients in concentration values including the boundary layer and the free troposphere.A notable difference is observed in the simulation of the gradient within the boundary layer.The magnitude and the height up to which this positive gradient is observed are different for these models.LSCE (both versions) has the largest positive gradient among these models (∼ 1 ppm).It shows the maximum concentration at around 700 m height and then a decrease in concentration.CarbonTracker also shows this positive gradient in the surface layers up to a height of 900 m.But the gradient is much smaller compared to the other two models.Among these four models, CarboScope does not exhibit this tendency in the lower atmosphere.Its concentration decreases linearly from the surface as the height increases.
The seasonal variability in CO 2 uptake through photosynthesis, release through ecosystem respiration, and vertical transport is seen when analysing the monthly averaged CO 2 concentration profiles over the Indian subcontinent (Figs.2b  and 3).Comparatively lower surface CO 2 concentrations are found during months with an active biosphere (June to October) than the rest of the period, owing to the higher ecosystem productivity over the Northern Hemisphere and particularly over the Indian subcontinent in response to the availability of monsoon rainfall.Also, the presence of strong southwesterly monsoon winds from June to September may result in bringing CO 2 -depleted air from the Southern Hemisphere and thereby lowering the CO 2 concentration over the domain.While comparing the seasonal maximum (May) and minimum (September) of CO 2 concentrations measured at the Mauna Loa Observatory (MLO) located in Hawaii, Fig. 2b shows a temporal shift of around 1 month for exhibiting the seasonal maximum (April) and minimum (August) CO 2 concentrations.This temporal shift is attributed to the differential impacts of anthropogenic and terrestrial ecosystem activities on atmospheric concentration in addition to the longdistance transfer of atmospheric carbon dioxide to a remote location (Nomura et al., 2021).MLO observations are generally representative of global mean CO 2 due to the minimal influence of terrestrial ecosystems and anthropogenic activities at this remote location.The seasonal variation in monthly averaged CO 2 seen over the Indian subcontinent is mostly dominated by terrestrial carbon fluxes, i.e.net ecosystem exchange (NEE) as seen from the VPRM simulations (see Fig. S2) and as, e.g., in Tiwari et al. (2013).
Furthermore, we see a CO 2 vertical profile with a small vertical gradient (∼ 0.5 ppm within an altitude range of ∼ 500 to 4000 m) from June to October (Fig. 3).This is likely linked to the increased convective activities associated with  The considerable intermodel variation in monthly averaged CO 2 concentration profiles, as predicted by different global models, is problematic as it indicates significant uncertainties in flux estimations over India.Part of this discrepancy can come from the coarse-resolution global model's inability to represent transport processes like convection and vertical mixing, in addition to the strength and distribution of anthropogenic sources and ecosystem activities, that operate at fine scales.The extent of this unresolved variability in existing global models is further explored in Sect.3.2.The spatial distribution of CO 2 concentration shows structural differences among these models (see Fig. S3), which indicates a substantial knowledge gap in models for representing atmospheric CO 2 variability over the Indian subcontinent.As a consequence, the country's carbon budget estimations inferred via inverse modelling can be unreliable.

Representation errors in global transport models
The spatiotemporal variability in representation error and the influence of various factors in creating this variability are examined here.The larger the variations that are caused due to sub-grid processes within the grid box of 1 • × 1 • , the larger the representation error.The derived seasonal differences in structural patterns of the sub-grid variability facilitate to (1) quantify what would be typical representation errors associated with incorporating observations from different seasons into atmospheric models, (2) determine what drives the seasonality in sub-grid variability and ultimately, and (3) design a possible parameterization of representation error with a seasonal component in the inverse modelling framework, as well as identify periods or seasons in which the use of this parameterization would be valid to improve our estimations of CO 2 fluxes.Furthermore, the seasonal spatial variability analysis of column averages can provide useful information to gap-fill the satellite-based products over India when large data gaps are present, which can be utilized for applications that do not demand high-precision observations (e.g.Hammerling et al., 2012).

Spatiotemporal patterns
Representation errors in the surface CO 2 concentrations of a global model at a spatial resolution of 1 • × 1 • for July and November are shown in Fig. 4. The representation error at 1 • × 1 • spatial scale reaches values ranging from 0.5 to 5 ppm, which are comparable to the magnitude of variability at hotspot emission regions or half of the seasonal variability in CO 2 over the region (see Fig. 2b).The median representation error is 1.2 ppm at the surface, which is considerably larger than the measurement errors.In the case of high-accuracy in situ measurements, the typical uncertainty for CO 2 measurements is of the order of 0.1 ppm (Andrews et al., 2014;Zellweger et al., 2016).A remarkable feature is the presence of very high representation error over the Northeastern and Western Ghats regions, where biosphere activity is very prominent.The heterogeneous distribution of biosphere fluxes generates significant sub-grid-scale variability that leads to high representation error.Also, we can find high representation error along the foothills of the Himalayas.In addition to the complex terrain, the region over the Ganges basin is characterized by increased anthropogenic activity, which contributes to a larger representation error surrounding in this region.High representational error is also found in the coastal regions, ranging from 2 to 5 ppm (median of 4 ppm), due to the temporal covariance between the coastal meteorology and exchange fluxes.The CO 2 fluxes from coastal regions can be transported over the ocean and accumulated in the shallow boundary layer over the ocean.The shallow boundary layer is a characteristic of the marine atmosphere due to less vertical mixing as compared to land regions.Horizontal CO 2 gradients can also be generated by the influence of highly varying biospheric fluxes under different advection patterns at the boundary between land and ocean.A similar mechanism is applicable to mountain regions, where the temporal covariance of mountain-valley circulation and respired CO 2 fluxes are regulated by atmospheric radiation.The terrain-following coordinates, as used in the model, may also result in spurious tracer concentration gradients over the steep mountain terrain (Beck et al., 2020;Skamarock et al., 2021;Park et al., 2019).Though the mesoscale models are expected to perform better in simulating CO 2 variations over the complex terrain than the coarse models (e.g.Engelen et al., 2002;Gerbig et al., 2003;Ahmadov et al., 2007;Corbin et al., 2008;Lauvaux et al., 2009b;Pillai et al., 2011;Uebel et al., 2017;Agustí-Panareda et al., 2019), they may also suffer from the inadequate representation of complex weather features and associated variability.We can also find individual cells with high representation errors associated with point emission sources such as cities, mining sites, and coalfired power plants at different parts of the domain.The daily variations in surface representation errors are small (within a month), although there exists a clear distinction between daytime and nighttime values (figure not shown).The nighttime representation error is higher (e.g. a median value of 1.5 ppm for surface during November) compared to the daytime representation error (e.g. a median value of 1.1 ppm for surface during November) throughout the analysed domain.This is expected due to the coupling between nocturnal shallow transport and different flux processes accentuating local effects.During the nighttime, photosynthesis is absent, and respiration is the major biospheric activity, leading to an increase in CO 2 concentration in the atmosphere.The large heterogeneity in flux distribution that is mostly from respired CO 2 fluxes, the shallow boundary layer processes, and the weak nocturnal turbulence cause CO 2 to be accumulated lo-cally near the surface with large variations.Compared to July, we find higher representation error in November owing to the wintertime transport with decreased vertical mixing and heterogeneous biospheric uptake (see Fig. 4).
In the case of XCO 2 , the magnitude of sub-grid-scale variability is much smaller than that of surface CO 2 (Fig. 5), but it follows a similar spatial pattern.This confirms the dominance of surface-level processes in causing sub-grid variability in column averages.The sub-grid-scale variability in XCO 2 reaches up to 2 ppm in some parts of the region, especially where there are high variations in topographic features or point emission sources.The estimated column representation errors over these regions are thus capable of causing significant biases in the satellite inferred CO 2 fluxes as regional biases of a few 10ths of parts per million in column-averaged CO 2 can create a bias of a few 10ths of a gigaton of carbon fluxes (Chevallier et al., 2007).Also, the representation error for a large part of the domain is found to be above 0.5 ppm, which is around half of the typical precision of current satellite measurements.Note that the representation error reported here is different from satellite measurement errors (e.g.spectroscopic retrieval error or sampling biases) and tends to be systematic in nature.
Figure 6 shows the statistical distribution of the representation error (σ CO 2 ) sampled over India, during July and November, separated by daytime and nighttime.July shows a median surface representation error of 0.9 and 1.1 ppm during daytime and nighttime, respectively, while November shows a median value of 1.1 and 1.4 ppm for daytime and nighttime, respectively.In July, 95 % of the representation error is less than 2.1 ppm for daytime (3.9 ppm for nighttime), while it is 3 ppm for daytime (4.2 ppm for nighttime) for November.For the column average, median values for representation error are 0.3 and 0.4 ppm for July daytime and November daytime, respectively.
To further reduce the effect of random error that might be introduced by short-term weather phenomena, the representation errors (σ CO 2(mon) ) are calculated from the monthly averaged CO 2 field and are denoted as a systematic error (Fig. 6).Uncorrelated errors are expected to decrease when averaging over a sufficiently long period.As expected, the median values of the systematic representation errors are smaller for all cases, showing the effect of random errors.Especially for November, when the cyclonic event was present, the values of the systematic errors (in the 95th percentile) for the surface CO 2 are considerably lower than total errors, reducing from 3 ppm (daytime) and 4.2 ppm (nighttime) to 2.2 ppm (daytime) and 3 ppm (nighttime).In the case of column CO 2 , this reduction is from 1.1 ppm (daytime) and 0.9 ppm (nighttime) to 0.8 ppm (daytime) and 0.7 ppm (nighttime) in the 95th percentile.In contrast to surface representation error (Fig. 6a), median values of nighttime representation errors are found to be slightly lower than the daytime representation error for the column average (Fig. 6b).To assess the de-pendence of representation error in possible horizontal resolutions of the global models, we have further derived the representation errors for different spatial resolutions between 0.5 • and 4 • .As expected, we see reductions in representation errors for both surface and column-averaged CO 2 with the increasing horizontal resolution of the model (see Figs. 7  and S4).During July, the median surface representation error reduced from 1.6 ppm (2 ppm) to 0.6 ppm (0.7 ppm) during the daytime (nighttime), while increasing the horizontal resolution from 4 • to 0.5 • .This increment in the spatial resolution has also resulted in similar error reductions in November during which the median of surface representation error shows a reduction from 2.4 ppm (2.8 ppm) to 0.7 ppm (0.9 ppm) during daytime (nighttime).In the case of the column-averaged values, the median representation error decreased from 0.7 ppm (0.6 ppm) to 0.25 ppm (0.2 ppm) during the July daytime (nighttime) and from 0.95 ppm (0.9 ppm) to 0.25 ppm (0.2 ppm) during the November daytime (nighttime).The spatial distribution of the representation errors for a model with a horizontal grid resolution of 0.5 • × 0.5 • (e.g.regional models) is provided in Figs.S5 and S6.On average, we find a ∼ 33 % to 36 % decrease in daytime representation errors for both months when increasing the model grid resolution from 1 to 0.5 • .There exists a similar spatial pattern of representation errors for both resolutions of 0.5 and 1 • .Though our results indicate a reduction in the representation error for regional models with a typical resolution of 0.5 • , compared to global models with 1 • spatial resolution, the emission hotspots and point sources are still pronounced with high sub-grid-scale variability, especially during nighttime.The above analyses indicate that the sub-grid variability alone can produce significantly higher errors compared to the measurement errors (e.g.0.1 ppm as per World Meteorological Organization standards for surface measurements), which necessitates a proper treatment of these errors in models for the optimal estimation of CO 2 fluxes.

Vertical distribution
Figure 8 shows the vertical profile of representation error distribution within different altitude bins.We find that the maximum representation error is in the surface layer, and most of the higher values are found to be within the lowest 4-6 km bins.Also, sub-grid-scale variability decreases sharply with increasing altitude.This dominance of the variability in the surface concentration can be explained by surface flux heterogeneity influencing mole fractions in lower atmospheric layers (PBL), as described in van der Molen and Dolman (2007) and Pillai et al. (2010).There is a slight increase in the representation error in the upper tropospheric levels near the 12 to 14 km altitude range.This may be associated with the presence of strong circulations in the upper troposphere and lower stratosphere, such as subtropical westerly jets or the Asian summer monsoon anticyclone (e.g.Chandra et al., 2017).

Influence of terrain heterogeneity and flux variability on representation errors
Here we explore the factors influencing the size and patterns of the representation error in coarse models.For this, statistical relationships between the representation error and possible explanatory variables are examined for both surface and column-averaged CO 2 .Identifying these factors influencing representation errors and quantifying their local effects allow us to further investigate on how these biases in retrieved fluxes can be minimized in global models (see Sect. 3.5).We find a significant influence of terrain heterogeneity on the representation error, which is evident from the spatial maps in Figs. 4 and 5, where the largest sub-grid-scale variations are found in the Himalayan regions.Spatial variations in topography produce mesoscale circulation patterns and corresponding variations in atmospheric CO 2 at fine scales.At the same time, there is a plausible additional error in global model simulations related to the insufficient resolu-tion of vertical grids necessary to account for different surface influences (e.g.mountain vs. valley).This effect of the coarse vertical resolution is excluded in our representation error estimates by preserving the vertical grids used for the high-resolution simulations.To further explore the importance of using the high-resolution topography data to represent the CO 2 variability, we analyse the dependence of terrain variations (as derived from the standard deviation of the terrain height) on the distribution of the representation error.We have estimated the statistical dependence (R 2 ) of the representation error in topographic variability within the corresponding global climate models' grids to understand the relation between them.Topographic variability within a 1 • × 1 • spatial box is estimated as being the standard deviation of topography (m) for all 10 km × 10 km boxes within the larger grid and is denoted as σ topo .Bins are created based on the values of this topographic variability, in which different points from different parts of the domain are binned together on the basis of their standard deviation of topography.Each bin is created with a size of 50 m variation in terrain height.The linear fit is estimated between the average value of topographic variability within a bin and the average value of the representation error in the corresponding points in the bin.Our results show that the terrain heterogeneity alone can explain about 20 %-48 % (R 2 = 0.20 to 0.40) of the surface representation errors over the domain.In a similar way, we have estimated the influence of topographic variability on the representation error in the column-averaged model simulations.It is found that topography alone can explain 45 %-52 % (R 2 = 0.45 to 0.52) of the representation errors in the column-averaged simulations.
Furthermore, we estimate the statistical relationship (R 2 ) between the surface flux heterogeneity and representation error.The surface representation error is strongly linked to the biosphere flux variability, and the relationship between heterogeneity in biospheric surface flux (as derived from the standard deviation of VPRM-derived NEE fluxes, denoted as σ bio ) and representation errors depends on the time of the day and season.During daytime, when there is strong ecosystem activity, the dependence of representation error (σ CO 2 ) on σ bio of surface and column CO 2 is found to be ∼ 75 %-80 % and ∼ 66 %-74 %, respectively.σ bio explains about 62 % of the surface CO 2 variability and 48 % of the column variability during the July nighttime.However, σ CO 2 and σ bio are less correlated (23 % for surface and 19 % for column) during the November nighttime.The diurnal difference in the dependence of representation error in σ bio can be explained by the increased magnitude and spatial variability in daytime biospheric fluxes in the growing season (primarily due to photosynthesis activities) compared to nighttime fluxes.Moreover, poor vertical mixing under the stable nocturnal atmospheric conditions with more advection and drainage flow reduces the influence of surface fluxes on spatial variability in mixing ratios.The dependence of the representation error in the anthropogenic flux heterogeneity (as derived from the standard deviation of EDGAR fluxes, denoted as σ ant ) is found to be negligible, except for the nighttime (13 %-30 %).We find   a smaller influence of seasonality on the relationship between anthropogenic surface flux heterogeneity and representation errors (see Table S1).Similar to the above analysis with σ bio , the combined effect of atmospheric stability and flux heterogeneity can explain the diurnal differences in the relationship between σ ant and σ CO 2 .
In the case of the variability in monthly averages, we see that σ CO 2(mon) is well explained by σ bio during the daytime (see Table S2), as expected.A similar strong correlation can be seen between σ CO 2(mon) and σ bio (23 %-69 %) during nighttime for surface variability in CO 2 , while there exists only a lesser dependence of the nocturnal column CO 2 variability on local fluxes.This shows the decoupling of the mixing ratios in other parts of the column from the surface during the night, due to less vertical mixing, combined with more drainage flow in the nocturnal boundary layer, which reduces the effect of surface flux variability on the column CO 2 variability.
In general, the above analysis underlines the need for using digital elevation models (DEMs) at a high resolution to take into account the terrain-induced mesoscale atmospheric flows adequately in atmospheric transport models.Further-more, the results indicate the importance of utilizing highresolution surface fluxes in atmospheric CO 2 simulations.

Estimation of NEE flux uncertainty due to representation error
By following the assumptions and approach as given in Sect. 2.

Possible treatment of representation error in the global model
The simplest possible way to minimize the uncertainty in the flux estimation using a coarse model is to construct a parameterization model that can account for the representation error using explanatory variables.For this, we create a multivariate model to capture spatial patterns in the representation error.Employing this parameterization in a global model would thus redefine the likelihood of better estimates (improving the state of knowledge) with a variance greater than that of the measurement error in the inverse framework by minimizing the modelling error.The multivariate linear model, with explanatory variables that includes sub-grid variations in terrain (σ topo ), biospheric (σ bio ) and anthropogenic (σ ant ) fluxes, remarkably captures the derived column representation error all over the Indian region during the July daytime, with a R 2 value of 0.96 (Fig. 9).The difference between the modelled and derived representation error is found to be well below 0.5 ppm in most parts of the domain.Similarly, we have modelled the surface representation error using the linear model with these three explanatory variables and found that the proposed model could capture the derived surface representation error well (R 2 = 0.89) with a deviation of less than 1 ppm in most of the regions (see Fig. S7 and Tables S1 and S2).More work is needed to demonstrate the extent of applicability of this method to minimize the flux uncertainties while utilizing actual observations.Nevertheless, the above finding provides a possibility for a parameterization that can be further developed in inverse models or data assimilation systems, which defines the degrees of freedom for describing the posterior states.Applying this parameterization scheme to the specific problem requires a high-resolution map of the terrain and prior information on anthropogenic and biogenic fluxes as the uncertainties in the topography and surface fluxes can significantly impact flux estimation.The caveat of this linear model is that the uncorrelated spatial variability in the prior and true states of the fluxes is ignored in the present form, which cannot be the case for the real inverse problems.This assumption obviously hampers the system in achieving the maximum reduction in uncertainty, and further study is needed to refine this model from a practical perspective.We emphasize, however, that the above parameterization does not require a high-resolution simulation of transport, which has high computational costs.

Conclusion
Given the upcoming availability of atmospheric observations over India, significant effort is required to critically enhance the modelling capabilities to derive carbon budgets over India within the definite confidence intervals and at scales relevant to the ecosystem and countrywide policy-making.The misrepresentation of mesoscale transport phenomena and unresolved flux variations in modelling systems operating on coarse grids hinders the optimal utilization of observations.In this context, the present study quantifies the spatial variability in atmospheric CO 2 mixing ratio over India that is not resolved by the coarse models and assesses their impact on flux estimations.We demonstrate the potential of a simple parameterization scheme to model these unresolved variations in the coarse models for minimizing the uncertainty in retrieved fluxes.
A large spread among existing global model simulations in representing the monthly averaged CO 2 concentration profiles indicates a considerable knowledge gap in the estimations of fluxes even at a monthly scale.It can be argued that a significant part of these differences arises due to the lack of observational constraints over India, which leads to a possible compensatory model artefact over this region in order to match the global mass constraint.At the same time, it is also expected that the spatial variability in the observed atmospheric CO 2 mole fractions can be large, so that these coarse models fail to represent them adequately.For instance, we find that the unresolved variations (representation error) in global models with a spatial resolution of 1 • × 1 • can be ∼ 1.5 ppm on average for the surface CO 2 , which is even larger than the currently reported differences between global models (∼ 1 ppm).Similarly, the average representation error estimated for the column-averaged CO 2 is ∼ 1.1 ppm.These estimated values are larger than the corresponding measurement errors, which cause the inverse optimization to infer a state that is not close to the truth, as is required in the regional CO 2 budget for various applications.
Coastal areas and mountains have particularly high representation errors (≈ 4 ppm for surface CO 2 ).Emission hotspots can also lead to significant CO 2 variability near the surface as large as ≈ 8 ppm.Larger values are typically associated with the nocturnal shallow boundary layer dynamics and the stronger respiration signal with considerable flux variability.These findings are consistent with Pillai et al. (2010), which show that there exist spatial differences in the sub-grid variability for both surface and column CO 2 .Although the magnitude of the sub-grid variability in the  total column is significantly smaller than the variability at the surface, the spatial pattern remains similar for both, owing to the dominance of surface heterogeneity in topography and fluxes.With the underlying assumptions, the total uncertainty in optimized fluxes solely due to the unresolved sub-grid variations is estimated to be 3.1 % to 10.3 % of the total NEE, while utilizing pseudo-data from nine observation stations over India.Increasing the spatial and temporal resolutions of the transport models can generally capture the mesoscale features and associated CO 2 gradients, thereby reducing the representation error.Increasing the model resolution from 1 to 0.5 • has shown an improvement in capturing variability with representation error reduction of 33 % and 36 % for summertime and wintertime, respectively.By showing the existence of the unresolved variability in 0.5 • resolution with a similar spatial pattern of the error as of 1 • spatial resolution, we demonstrate the need for a much finer resolution than 0.5 • for representing the atmospheric CO 2 variability over India.However, merely increasing the resolution without having a realistic representation of terrain heterogeneity and flux (both natural and anthropogenic) variability would not be beneficial.The uncertainties in the highresolution fluxes can worsen the model's skills, whose effect would not be more pronounced at coarser resolutions due to the diffusive nature of fluxes, as seen in Agustí-Panareda et al. (2019).
A parameterization scheme with explanatory variables of sub-grid variations in terrain, biospheric, and anthropogenic fluxes is shown to capture a considerable fraction of expected representation error in the global model.The proposed method is easy to implement in the coarse models, as it does not require computationally expensive transport simulations at high resolution.As we see a significant dependence of the distribution of sub-grid variability on terrain variahttps://doi.org/10.5194/acp-22-15287-2022Atmos.Chem.Phys., 22, 15287-15312, 2022 tions, our results reinforce the requirement for using highresolution DEMs in the atmospheric transport models.The biosphere flux variability explains about 62 % to 80 % of the surface representation errors over the Indian region, indicating the need for using precise high-resolution surface fluxes.Overall, we show that the mesoscale transport mechanisms and flux variability contribute to fine-scale CO 2 variations that the current-generation models cannot resolve.Our findings indicate that the models need to be critically improved to capture mesoscale variations associated with horizontal and vertical transport and fine-scale flux variability to maximize the potential of highly precise and accurate measurements.Our results provide a baseline for overcoming the shortcomings mentioned above and account for the realistic distribution of atmospheric CO 2 to improve the estimation of surface fluxes through inverse modelling.
Author contributions.DP designed the study and performed the model simulations.VT performed the analysis and wrote the paper.VT and DP interpreted the results.CG, MG, AR, and TAM provided significant input to the interpretation and the improvement of the paper.All authors discussed the results and commented on the paper.
Competing interests.At least one of the (co-)authors is a member of the editorial board of Atmospheric Chemistry and Physics.The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements.This study has been supported by funding from the Max Planck Society allocated to the Max Planck Partner Group at IISERB and the Science and Engineering Research Board (SERB) through an Early Career Research Award (grant no.ECR/2018/001111) to Dhanyalekshmi Pillai.Thara Anna Mathew acknowledges the financial support provided by a SERB grant (Junior Research Fellowship).We acknowledge the support of IISERB's high-performance cluster system, for computations, data analysis, and visualization.The WRF-Chem simulations were done on the high-performance cluster Mistral of the Deutsches Klimarechenzentrum GmbH (DKRZ).We thank the editor and both referees, for their constructive comments.

Figure 1 .
Figure 1.The WRF-GHG model domain used in this study, showing the topography.The CO 2 monitoring sites over India used for the OSSE experiments are marked.Not all these observation stations are currently fully operational.The colour scale is restricted to 5000 m for a better visualization of terrain details over the Indian subcontinent.

Figure 2 .
Figure 2. Comparison of global models over the model domain during daytime (11:30 to 16:30 LT) in 2017.(a) Annually averaged vertical profiles of CO 2 concentration in the lower troposphere.(b) Time series of monthly averaged CO 2 concentration at surface (∼ 100 m above surface).

Figure 3 .
Figure 3.Comparison of average monthly vertical profiles of CO 2 concentration from global atmospheric transport models over the model domain during daytime (11:30 to 16:30 LT) in 2017.Panels show data for the respective months, as indicated at the top of each panel.
V.Thilakan et al.:  Towards monitoring the CO 2 source-sink distribution over India via inverse modelling

Figure 6 .
Figure 6.Variability in derived representation error over India in July and November 2017 (both during daytime and nighttime).Boxes indicate the central 50 %, the bar across the box is the median value, and the whiskers indicate the values between the 5th and 95th percentiles.Individual data points shown are the outliers.(a) Representation error estimated for the surface CO 2 .(b) Representation error estimated for the column-averaged CO 2 .

Figure 7 .
Figure 7. Variability in derived surface representation error over India for different horizontal resolutions.Boxes indicate the central 50 %, the bar across the box is median value, and the whiskers indicate the value between the 5th and 95th percentiles.Individual data points shown are the outliers.(a) Representation error estimated for July daytime.(b) July nighttime.(c) November daytime.(d) November nighttime.

Figure 8 .
Figure 8. Variability in the representation error over India with altitude for July and November 2017.(a) July daytime, (b) July nighttime, (c) July full time, (d) November daytime, (e) November nighttime, and (f) November full time.Median values are plotted with black curves, and the shaded region indicate the 25th to 75th percentiles of data.

Figure 9 .
Figure 9. Monthly averaged values of representation error estimated for column-averaged CO 2 concentration during the July daytime (11:30 to 16:30 LT) in 2017.(a) Representation error derived from WRF-GHG simulations, as explained in Sect.2.3.(b) Representation error calculated from the multivariate linear model, as described in Sect.3.5.(c) Difference between panels (a) and (b).

Financial support .
This research has been supported by the Max-Planck-Gesellschaft (Max Planck Partner Group, IISER Bhopal) and the Science and Engineering Research Board (SERB) through an Early Career Research Award (grant no.ECR/2018/001111).The article processing charges for this open-access publication were covered by the Max Planck Society.Review statement.This paper was edited by Rolf Müller and reviewed by Prabir K. Patra and one anonymous referee.

Table 1 .
WRF-GHG Model setup.NA stands for not available.

Table 2 .
Specifications of different global model products used in this study.

Table 3 .
Flux uncertainty over India calculated from the OSSE experiments using a pseudo-observation network of surface observations.The time filter indicates the time of the data sampled for estimation of the scaling factors.A full day is 24 h in each day, daytime is from 11:30 to 16:30 LT, and nighttime is from 23:30 to 04:30 LT.Note: the fraction of uncertainty to the total NEE is given in parentheses.