Thanks to the authors for responding to the prior reviews. These were useful checks, but I think that these have led to some aspects that deserve further exploration. Specifically, the prior reviews requested that the authors check the geometric approximation using O4 data, consider ground slope, further address the signal to noise of the observations, and consider comparisons to satellite observations. Below are some aspects of these issues that should be explored by the authors such that a revised manuscript can be made and re-considered for publication.
Failure of geometric model to retrieve O4 VCD:
The response to reviewers indicates that O4 is not well retrieved by the geometric approximation using 15° elevation angle viewing geometry. The O4 VCD retrieved using the geometric approximation is often half (~40 to 60%) of the column that is calculated from meteorological data. This is worrisome with respect to the quantification of VCDs by the geometric method. The authors have made a useful set of calculations using a radiative transfer model (RTM), which appears to show that this underrepresentation of O4 is also in the RTM. The authors do not discuss why O4 should fail but the geometric approximation should succeed for NO2 and HCHO, despite their desire to keep using the geometric method for NO2 and HCHO. The authors should explore for reasons why O4 fails by the geometric method yet apparently succeeds for NO2 and HCHO in order to keep using the geometric method for these gases.
One possible speculation could be that O4 at larger heights above the surface is not contributing as much to the SCD as does O4 nearer the surface. The scale height of O4 is about 3.5km (half the scale height of pressure due to O4 amount being proportional to the square of O2). The table in the reply to reviewers indicates that NO2 and HCHO are assumed to have a layer height of 0-1km or 0-2km (I presume above the terrain), so even with a 2km thick layer of these gases, most of the O4 column is above the 2km top of these layers. If the geometric approximation is less sensitive to this higher-altitude part of the O4 column, it might explain why the geometric approximation is failing for O4. This hypothesis could be tested by splitting the O4 distribution in the RTM into a below 2km (AGL) and an above 2km part and running the model on these two parts. Whether this idea proves out or not, the authors should explore and discuss reasons why the geometric approximation for O4 failed yet they want to keep using it for NO2 and HCHO.
An important reason to explore the failing of the geometric method for O4 is that for smaller columns of HCHO, there is a larger contribution of "background" HCHO arising from oxidation of methane to the HCHO VCD. This "background" HCHO is not just in the boundary layer, but extends further aloft because methane is fairly well mixed in the troposphere. Therefore, the actual profile of HCHO might not be the assumed 0-1km or 0-2km layer, which succeeded in the geometric approximation retrieval, but might look more like that of O4, and might therefore be underestimated by the geometric approximation (as O4 is). Figure S1 in the revised supplement may start to hint at increased underestimation of the true column for HCHO as the layer thickness increases from 0-1km to 0-2km. I believe that some satellite retrievals measure the differential VCD compared to a reference sector and then add back this "background" column of HCHO to get a total column. The "background" column typically comes from a global chemical transport model and examination of the column over this high-altitude region could give an estimate of the fraction of the HCHO column that is in the background (and thus potentially under-represented in the geometric retrieval). Figure S2 in the revised supplement indicates that HCHO's mixing ratio profile is much more constant with altitude than NO2, which may indicate that the HCHO concentration profile extends to higher altitudes AGL than does the NO2 profile, potentially indicating that the RTM calculations that assume HCHO is in the 0-1km (AGL) or 0-2km layer are not appropriate. The authors should explore how their geometric method would work for "background HCHO" and use satellite / GCM estimates of the background to determine how much of the columns they are observing may be not in the boundary layer.
In the reply to reviewers, the authors say: "Part of the underestimation is probably related to clouds, but a strong underestimation is also found for measurements for clear skies." It would be useful to give more information on this statement. Specifically, there are times when the O4 VCD is very small (e.g. on July 25, 2021) during a cloudy period? Can the authors indicate when there were clouds on their timeseries so that we can understand the effect of those clouds? Although the author's radiative transfer simulations can help to address questions of largely clear-sky behavior (they have AOD up to 0.2), the simulations do not help address understanding of cloudy behavior. It may be the case that if there is a cloud that is above the boundary layer NO2 that the presence of the cloud might not affect the retrieval much, but the authors have not shown that. Can the authors expand the radiative transfer simulations to include a layer cloud aloft? That seems like a situation that should be addressable with their model.
Although the radiative transfer model calculations are useful, details on these calculations are lacking. For example, which radiative transfer model is used? Presumably some aerosol properties (e.g. asymmetry factor, single scattering albedo) are used, but are not stated. The O4 simulations say 0-1000m in their caption box on the figure, which I guess is the aerosol layer thickness because O4 goes a lot higher than that. Please clarify what this height range refers to.
Effect of ground slope:
The authors calculate that errors up to 21% can arise from slope, which is a good number to keep in mind. The authors then go on to average positive and negative slope errors to get a near zero error (1%). However, that calculation assumes equal mix of up and downhill driving, while in fact there might not be an equal mixture. Later they indicate that only about 1 minute in 8 minutes is observing at 15°, so the slope during that period is what matters, and driving up or down a slope for a minute seems very reasonable in an area that covers ~3km vertical range. I think it would be safer to say that the ground slope may lead to an error of +/-21%, but that over the full loop these errors should at least partially cancel. It is possible that these errors might contribute to the low correlation between the mobile measurements and satellite observations.
Error estimates:
Text was added to the end of section 3.1 describing error analysis. The authors appear to use two times the median spectral fit error. I presume the fit error is like a standard deviation (sigma), so this is 2*sigma, a reasonable definition of detection limit, but the text should be more clear. These (2-sigma) DL are 0.24×10^15 molecule cm^-2 for NO2 and 0.74×10^15 molecule cm^-2 for HCHO. These detection limit estimates use the airmass factor at 15° (e.g. the geometric approximation), but no error is added for uncertainties in the geometric approximation. I think that at least 21% error for road tilt and ~20% error from the radiative transfer calculations should be added to this spectroscopic-only error estimate.
In section 4.1 (and the abstract), "background" levels of these gases are described, with an +/- listed (I would have assumed to be an error estimate), yet it is differently defined than the section 3.1 error analysis. In section 4.1, the text says "The uncertainties of the background levels were estimated by the half width at half maximum of Lorentz fitted curves (Fig. 6a)." I think that these are not "uncertainties", but rather the combination of variability in the species combined with analytical uncertainties in the measurements. The Lorentzian half width is used, which I think would be narrower than 1-sigma of a Gaussian fit. Can the authors justify why a Lorentzian is used here rather than a Gaussian? The Gaussian is connected to normal statistical error analysis and seems preferable, although the distributions do look longer tailed than a Gaussian. Overall, the similarity of section 3.1 detection limits and the width of the distributions in Figure 6 would lead me to believe that a significant part of the width of these distributions is instrumental noise. The quoting of the Lorenzian half width in the abstract seems misleading to me as I would have expected that the +/- number listed would be a Gaussian error estimate, possibly even 2-sigma. Please clarify this error discussion and make sure that the definition of the error estimate is included in the abstract.
Comparison to satellite:
The text says "Interestingly, there is almost no correlation of the two data sets, if we only use the tropospheric NO2 VCDs within the 1.5 h time difference between mobile MAX-DOAS and TROPOMI at the same grid (referred to ‘ΔT1.5’ in Fig. 15a, corresponding to the red pluses in Fig. 13)." It seems to me that the noise on the measurements, both satellite and ground based, coupled with effects like slope, variable solar geometry, etc. are all going to reduce the correlation between the two data sets, particularly due to the low levels of these pollutants at these high-altitude remote sites. Therefore, I think the weak correlation is to be expected and is a product of the low level of pollution. Although the correlation is poor, the difference on average of the data by both methods (the bias) is a useful result of the study. I believe that the other reviewer also seeks to have greater focus on the bias in measurements than correlation (given noise on both measurements). The discussion of this correlation plot should include reference to the error estimates discussed above and also should discuss errors on TROPOMI measurements. |