Comment on acp-2021-392

than extreme

The publication by Thilakan et al. investigates some of the challenges of estimating the sources and sinks of CO2 over India from atmospheric observations. It particularly focusses on the problem of small-scale spatial variability of CO2 concentrations not resolved by global scale inversion systems, which may therefore lead to significant errors in the source/sink estimates.
The publication contains a number of interesting and valuable aspects, but it is missing coherence and a clear line of thought. The overall goal of the publication is rather vague, the individual elements are only loosely connected, and some of the analyses need to be better motivated and explained and more thoroughly analyzed. I therefore cannot recommend publication at this stage but suggest major revisions.

Main issues:
The overall aim of the paper is not sufficiently clear. On the one hand, the authors emphasize the need of applying high-resolution models (with resolutions of 10 km x 10 km or better), but on the other hand they present a method, how unresolved spatial variability can be accounted for in large-scale models to improve CO2 source/sink estimates. In my view the paper would gain a lot, if it would much more clearly focus on global coarse-resolution model systems and on how problems of not resolving the small-scale CO2 variability in these models can be mitigated. Although this goal is nicely formulated at the end of page 3, this focus is lost in many of the other sections and especially in the abstract and the conclusions. Global data assimilation/inverse modeling systems will continue to play an important role. Since the spatial resolution of these systems is continuously increasing, they are more and more applied to study sub-continental or even national scale fluxes. Current systems (4 of them are presented) are typically operating at coarse resolutions of several degrees (i.e. several 100 km), but resolutions of about 1°x1° (i.e. about 100 km) are quite likely achievable in the near future, so that the analysis of spatial variability below 1° as presented in this study is quite relevant. Many of the elements of the paper could be preserved, but important parts of the text need to be revised or rewritten to sharpen the focus of the manuscript. It is quite disturbing that the need for high-resolution model systems is emphasized over and over again, while the main essence of the paper is to present a method that allows accounting for small-scale, unresolved variability in coarse global models.
The setup of the OSSE described in Section 2.4.1 is not clear at all, and therefore it is impossible to interpret the results. In particular, it is unclear how the simulated observations y_OSSE were generated. Why are y_sim and y_OSSE different, if the same transport model was used to generate them? How exactly was the representation error accounted for? None of the equations in this section contain a representation error. Did the error change with time or was it set to a monthly mean value? Was the systematic component accounted for or were all errors treated as random? Were temporal correlations in the representation error considered? A careful setup of an OSSE is critical. Based on the information provided, it is not possible to judge whether this was the case (for further specific questions see also my minor comments below).
The individual parts of the publication are not sufficiently well connected. For example, the analysis of the differences between the global models in Section 3.1 is interesting by itself, but it is not explained why this is of interest in the context of the overall goal of the paper. Similarly, the discussion of the influence of convective periods in July and of a cyclone in November on the vertical distribution of CO2 and of the representation error is quite interesting, but again there is little discussion of how this relates to the overall scope of the paper.
The analysis of sub-grid scale variability in total column XCO2 as observed by satellites needs to be better motivated. Subgrid-scale variability is an obvious problem when using surface in-situ measurements in a coarse model system, but it is much less obvious for satellite observations. Different from surface in-situ observations, satellite observations (from an imaging satellite) could be averaged over a whole model grid cell, which would alleviate the problem of not resolving sub-grid scale variability. I therefore disagree with the statements on lines 422 to 424.
The analysis of the factors influencing the representation errors is too limited and not sufficiently systematic. As shown in different parts of the paper, the errors vary with time (e.g. day vs. night), which meteorology (higher during convective periods), and depend on topography and surface flux variability. It would be useful to analyze the importance of these factors more systematically in a single section. An attempt is made in Section 3.3, but only focusing on the importance of topography. Another attempt is made in Section 3.5, this time using a multivariate model and using total column observations. It is very hard to understand these choices. There should be one single section applying a multivariate model to representation errors in both near-surface CO2 and in total column CO2. Furthermore, it should be analyzed separately how much these factors influence the systematic part of the representation errors and how much they contribute to the random part.
Minor points: Abstract, Line 21: The reader doesn't know at this point which coarse models are meant, and therefore one cannot write "THE coarse models". The definite article "the" is wrongly used at many other places in the manuscript. I trust that the manuscript will be checked for grammar before a possible publication.
Abstract, line 22: Typical/average/median values are much more relevant than extreme values.
Abstract, line 23: What is a "sampling error"? Here and at many other places this should be replaced by "measurement error".
Page 2, lines 66-70: Both variations in orography (affecting the flow) and in land use (affecting the fluxes) are important. These two different factors should be more clearly distinguished and described. P3, L80: replace "from the last decade" by "during the last decade" P3, L83: replace "these coarse models on representing" by "coarse global models in representing" P3, L97: Please explain in which way the dry and wet seasons affect the cropping patterns. Is cropping enhanced during the wet or during the dry season? Or does this depend on the type of crop? P3, L101: replace "The study" by "This study". P4, L121: replace "of the high-resolution" by "of high-resolution" P4, L122: replace "is characterized" by "was characterized" P4, L134: Please reformulate the sentence. Estimating an assessment doesn't make sense. P4, L141: replace "from the inverse" by "from inverse" (again a wrong use of the definite article) and replace "estimates" by "estimate" Section 2: Consistent with the emphasis on global models, the global model systems should be described before the WRF-Chem model system.
Section 2: The global models need to be described in more detail. E.g. which observations were assimilated is not always described. Furthermore, what was the driving meteorology in these offline transport models? Could this explain the large differences? Or is it the fact that the models use different convection and PBL turbulence parameterizations? It would be good to summarize the main features of the models (resolution, driving meteorology, parameterizations, emission inputs, biospheric flux models, assimilated observations) in a table.
P5, L154: Why should entropy be conserved? As far as I know, the Skamarock report doesn't mention any conservation of entropy. P5, L168: replace "is also established" by "was established" P5, L177: How is SYNMAP mapped onto the 0.1° grid? Was only the dominant land cover type used, or is a tile approach implemented in WRF-GHG, i.e. an approach accounting for all different land cover types within the 0.1° grid cell? P6, L207: ".. a different simulation strategy .. ". Different from what? P7, L243: I disagree that sub-grid scale variability is "fully resolved" by the highresolution model. Actually, a model at 10 km resolution is not at all sufficient to resolve mesoscale flows in mountainous terrain. It needs to be explained that the simplifying assumption is made that the high-resolution model captures a major part of sub-grid scale variability, but that the true variability is likely larger, since even a model at 10 km resolution cannot resolve all variability. P7, L244: The choice of a resolution of 1°x1° to study sub-grid scale variability is poorly motivated. Why not 2°? Why not 0.5°? Actually, the paper would gain a lot if it would study the variability at multiple resolutions between 0.5° and 4°, which would encompass the resolution of both present and (near) future global inverse modelling systems.
P7, L246: The equation describes the standard deviation at any given instance in time. Later in the paper, a distinction is made between random and systematic variations. How these separate components are computed needs to be explained in this section, too.
P7, L255: It is hard to believe that the center of the second layer is at 200 m. The lowest model levels should be much narrower in order to properly capture the diurnal dynamics of the atmospheric boundary layer. P7, L258: I don't understand how the correlated term was deduced. Please explain clearly, ideally by providing a formula. The correlated (systematic) component seems very important to me, and therefore should be introduced properly.
Section 2.4.1: The title of this section should be changed to "Generation of pseudoobservations" or something similar. As mentioned earlier, the setup of the OSSE is not clear at all. The description needs to be improved significantly. P8, L285: I don't understand what is meant by "50-90 percentile". I guess one should either use the 50% percentile or the 90% percentile, but why would one use a 50-90% range? Furthermore, it remains unclear whether a "radius of 200 km" was used (i.e. the area within a circle) or really a site-specific area derived from the mean station footprint. P8, L290: Replace "Through our .. approach" by "Through a .. approach (see Eq. 2)" P8, L292: Why do you use all hourly values and not only afternoon values here? The results of the inversion critically depends on the choice of observations, and especially on whether the assumed errors are temporally uncorrelated or not. For hourly data, it is very likely that the (spatial representation) errors are temporally correlated. For the OSSE to be meaningful, any spatial and temporal correlations of the errors need to be properly accounted for.
Section 2.4.2: It is not sufficiently clear how the pseudo satellite observations were created. What do you mean by "dense" spatial sampling? At the density of OCO-2? What do you mean by "as frequently as possible"? Every hour of the day? Once a day? These formulations are too vague. P9, L327: I don't think that retrievals in the short-wave infrared range are sensitive to molecular (i.e. Rayhleigh) scattering, but of course they are sensitive to molecular absorption (by CO2, H2O, O2, etc.) P9, L333: A cloud fraction threshold of 20% is much too high for satellite XCO2 retrievals. Usually the thresholds are in the low percentage range (e.g. 2% cloud fraction), because uncertainties in photon paths are quickly increasing even when thin cirrus is present. P10, L336: replace "significantly low" by "too low" P10, L344: The performance of the models is not assessed in this section. This would require a comparison against observations. P10, L346: Although quite plausible, it is only a hypothesis that the models have large common model errors. But here it is stated as a fact. P10, L353: Delete "by different models". This is clear from the context. The analysis of the differences between the global models is interesting and would deserve a bit more discussion. Furthermore, in order to better integrate this section into the paper, it would be important to compare these differences with the magnitude of the representation errors due to sub-grid scale variability . The differences between the models are surprisingly large, especially during the monsoon season. How plausible are the strong vertical gradients of 2 -3 ppm below 700 hPa in the LSCE model in July and August? Wouldn't one expect a well-mixed atmospheric boundary layer during the monsoon season in the afternoons? P10, L363: The CO2 concentrations are not only lower in Jun -Oct due to the active biosphere over India but due to the biosphere over the whole northern hemisphere. P11, L370: Replace "the significant" by "significant" P11, L378: None of the current generation global models used in this study has a resolution of 1°x1°. As mentioned earlier, it would be useful to analyze how the representation error changes with resolution rather than just presenting the results for one rather arbitrarily selected resolution. P11, L391: replace "high values" by "higher values" Figure 4 (and Fig. 7) needs to be discussed more thoroughly. One of the remarkable differences between July and November is the much larger representation errors in November along a band extending almost through the whole model domain. It is not clear to me whether this is band is along the border of the Himalayan or along the Ganges river. There are also individual cells with much higher values compared to their neighboring cells. Is this due to anthropogenic sources (cities, industries), due to topography, or due agriculture? P11, L402: replace "well mixed vertical gradients" by "weak gradients due to strong vertical mixing". The next sentence could probably be deleted.
P12, L431-433: The sentence does not make sense to me. Why should the computation of a correlated representation error reduce the effect of the random errors?
The distinction between random and systematic components in the representation error seems very important to me, since the influence of random errors can be compensated by large numbers (of observations), whereas the systematic component likely leads to systematic biases in the flux estimates. These aspects deserve much more attention in the paper and it should be clearly explained how they are calculated (as mentioned earlier). P13, L450: Replace "sampling errors" by "measurement errors" (here and throughout the manuscript), and replace "significantly high" by "significantly higher". P13, L454: replace "minimal" by "relatively small" Figure 9: Figure titles like "Sur-July" or "ColAvg-July" should be replaced by more explicit titles, e.g. "Surface -July" and "Column average -July". P13, L457: replace "synoptic systems prevailed" by "prevailing synoptic systems" P13, L461: A resolution of 1°x1° should generally be sufficient to represent synoptic events. It is actually not so clear to me why the cyclone so strongly influenced subgridscale variability. Is it because there were many individual convective cells, or narrow frontal lines? The strong increase in the median value is really remarkable, which suggests that more than 50% of the area of India were affected by the event.
Section 3.2.3: The differences in the vertical profiles of the representation error between July and November should also be discussed in the context of surface versus total column CO2 observations. The large representation errors in the upper troposphere in July are problematic for satellite observations but not for surface observations. On the other hand, there are probably no satellite observations available in July due to cloud cover. P14, L487: replace "spatial figures" by "spatial maps" P14, L489: It is not only mesoscale circulations which influence the spatial variability over hilly terrain, but also the simple fact that the lowest model layer is at a higher altitude over a mountain than over a valley. Total columns are also affected by the same effect. Actually, I suspect that this effect is more important than mesoscale circulations. Section 3.4: As mentioned earlier, it is very difficult to interpret the results presented in this section without knowing how exactly the OSSE was set up. Furthermore, without knowing what fraction of the area of India is covered by the footprints of the nine stations, the reported flux uncertainties for whole India of 14.5 to 16.2% in July and 6.3 to 7.5% in November are quite meaningless. What is the typical uncertainty for the individual regions? Why are the uncertainties so high, if the spatial representation errors are primarily random and if there is such a large number of observations constraining the fluxes each month? P15, L530: replace "take to the account of" by "account for the" Conclusions section: L580: As in the abstract, it would be more useful to mention typical (median) values rather than just the extreme values.