Reply on RC1

The authors’ would like to thank the reviewer for their supportive comments and for taking the time to review the manuscript. We have updated the manuscript to take into account the reviewers comments, both general and specific. Outlined below is a breakdown of the reviewer comments (bold and italic) with corresponding author responses give below each. Line numbers given correspond to the revised manuscript.

The authors' would like to thank the reviewer for their supportive comments and for taking the time to review the manuscript. We have updated the manuscript to take into account the reviewers comments, both general and specific. Outlined below is a breakdown of the reviewer comments (bold and italic) with corresponding author responses give below each. Line numbers given correspond to the revised manuscript.
Specific Comments:

L69: What is the typical aircraft speed?
The typical aircraft speed (science speed) was 74.5 ± 10 m s -1 , which has now been added to the revised manuscript. Line 73. Table 1 to give a better sense of diel sampling. Table 1 has been updated to include additional information as to when each flight occurred, including the; date, weekday and specific hours for each.

L114: Is the 3.3% correction referenced here to generate dry mixing ratios from wet? Or is this the correction due to quenching of the chemiluminescence reaction from water vapor? If the former, although the correction is small, the authors should still note that unless using dry mixing ratios to generate NOx fluxes, they may need to apply a Webb density correction to account for heat and water vapor. No mention of a density correction is made later on in the text.
The 3.3% correction has been removed from the manuscript, with a more detailed discussion now appear in Section 1.1 of the supplementary material, discussing the water vapour addition to the Fast-AQD-NOx as a stability method in order to negative changes in chemiluminescence quenching. Section 3.1 (lines 254-259) now contains a detailed assessment of WPL corrections for the measured NO x fluxes, using the method outlined by Hartmann et al., 2018. Fig. S5a in the supplementary shows the effect of using wet vs dry mole mixing ratios for calculating NO x fluxes. Correcting for WPL increased measured NO x flux on average by 1.35%.

L133-134: Detection limits of 49 and 78 pptv for NO and NO2 seem unrealistically low for a 9 Hz integration time. Another reference from the paper using the same instrument (Lee et al, 2009) quotes a 2-sigma LOD for NO of 36 pptv at 1Hz. If this is true, adding the Allan-Werle plot to the supplement would be useful.
The 2σ precision has been reanalysed using in-flight zero data from across the campaign. Fig. S2 in the supplementary shows for each flight the density distribution of the zero counts. The updated 2σ was found to be 153 and 249 ppt for NO and NO 2 (for a 9 Hz integration time). It should be noted that the PMT temperature of the detector is significantly lower (<-60 oC) than that of the system reported in Lee et al. (2009) (-25 o C), which will give better signal stability.
6. L145: This is not a strictly correct error estimation. The individual uncertainties should be propagated through all the equations used to calculate NO/NO2 mixing ratio.

L149-150: Which species were measured by the PTR-MS and Picarro? Were they used in this analysis at all?
Data collected by the PTR-MS has been discussed already elsewhere, with the papers discussing various anthropogenic and biogenic non-methane VOCs (Shaw et al., 2015;Vaughan et al., 2017). Data collected by the Picarro and PTR-MS is not used in this study.
The manuscripts text has been updated to clarify this (lines 137-138).

Section 2.2.3: Was there any treatment or consideration of a vertical flux divergence? This is an important point that the authors should address.
Fig S4b in the supplementary material shows the effect correcting for vertical flux divergence could have on calculated NO x fluxes (up to a 50% increase) using the Sorbjan (2006) method. As the boundary layer estimates in this study are from the ER5 reanalysis dataset, there is potential for large uncertainties compared to in-situ LiDAR measurements. There is also high uncertainty as to the effect flux storage has in highly complex urban terrains, such as from street canyons. Future studies are needed to assess these processes in greater detail. Fluxes reported in this study are classified as conservative, with any vertical divergence processes having the potential to increase measured fluxes and the discrepancy between measurement/inventory.

L225-226: "whereas time-frequency EC gives a flux measurement every 400 m along the transect using a 4000 m moving window…" How is the 4000 m moving window applied to the 400 m CWT fluxes? Are the measurements overlapping? This wording is unclear.
Yes, the 400m window will lead to flux estimates will overlap. The manuscript text has been updated to clarify this.

.L245: One of the advantages of the CWT method is that by decomposing the signal into the time domains, the strict criteria for stationarity is not necessary.
Fluxes were flagged at the 100% stationary mark, but most flight legs that failed the stationary criteria had already failed other filtering criteria as discussed in section 2.2.3. Therefore, we don't believe that this QA/QC method will lead to incorrection data filtering.

L250: It's not explicitly stated, but was data with u*<0.15 m/s discarded?
The manuscript text has been clarified to show that data below 0.15 was filtered out. A 0.15 threshold was used to mirror other urban studies in London as a developed turbulence metric.

Section 3.1: Some additional quality metrics for the NOx fluxes would strengthen the results. Examples include a lag-covariance plot, a CWT crossscalogram, wind and scalar power spectra and co-spectra. Such figures could be in the supplement but would build confidence in the application of the eddy covariance technique to NOx fluxes and would give a visual idea of signal-tonoise.
We have added additional flux quality metrics to the supplementary material (Section 1.2.2). These include; lag-covariance plots, a CWT cross-scalogram and the average cospectra for NO x and heat flux. These metrics support the strength of the present NO x fluxes as a good quality dataset.

L299-300: The point-by-point errors are significant. How do errors reduce when averaging? Figure 5 shows the standard deviation in shading, but it would be helpful to get a sense of the error when averaging all transects of a given type together. These uncertainties should be reported, even if not displayed in the figure.
Figure 5 has been updated to show the average flux random error divided by the square root of the number of sample points which went into each mean (shaded area). This gives a better visualisation of the overall uncertainty of the flux averages. By averaging multiple transects together, the temporal variability between legs is reduced, provide a more accurate spatial picture. The individual uncertainties of a single flux measured are also discussed in Section 3.1 (lines 245 -252).

L334-335: In general, not much explanation is given in the manuscript about when sampling occurred. At what times are the measurements acquired? How is averaging performed across transects sampled at multiple times of day? Doesn't this dampen any diel variability? How is the timing compared to the emissions inventory? Are you comparing emissions for each transect time independently or to a mean? It would be helpful to add some clarity on these points in the text.
Section 3.1 has been updated to include an additional figure (now called Fig. 4), discussing the temporal variability of measurement NO x flux in three distinct areas during the campaign (lines 270 -284).
Comparison between measured fluxes and annual emission inventories was achieved by calculating an individual inventory estimate for each snap sector (different emission sources) using the outlined footprint methodology in section 2.3.1. Each snap sector estimate is then weighted using source-specific scaling factors that account for; monthly, daily and hourly variations. These scaled estimates are then summed up to give a time-ofday inventory estimate, accounting for the location and time-of-day that the flux measurement was made.

Figure 6: What does the shading represent? Is it the uncertainty of the measured/inventory fluxes? Also, please elaborate further on the GAM. There is not much description in the text. Are the NAEI estimates here generated from the flux footprint?
The manuscript text has been updated to clarify that time-of-day scaled NAEI estimates are footprint calculated (lines 352-355). The GAM models fit to each dataset have also been further discussed, with the shaded areas representing the 95% confidence interval of the GAM fit.

L410: The description of Figure 8 is unclear, particularly the phrase "median average". Please elaborate on what is being depicted.
The figure caption has been updated to clarify which median average is being shown. The median average is the average of all individual flight transect projections using the ERF approach. Figure 8 has been updated to specify that NO x emission rates are being shown. Using the method outlined in Metzger et al., (2013), the variability in the BRT model performance was assessed by individually omitting one flight leg at a time and using the incomplete model to predict the omitted flight leg. The median difference between complete and incomplete model prediction was 13.7%, which is comparable to the model differences observed for sensible and latent heat flux (11-18%), using the same technique. Figure S7 has been added to the supplementary, showing each predicted flight leg emission map, with measured NO x fluxes overlaid. The majority of flight leg projects successfully scaled Central London emissions comparably to that of measured fluxes. The projects also successfully captured key features in the flux observation, such as major road networks and densely populated areas.

L431: It would be helpful to show a figure depicting the comparisons between ERF-reproduced and measured fluxes in the supplement to get a visual idea of how robust the ERF technique is.
Technical Corrections: Table 1.

Figure 1: Lat/lon coordinates for each transect do not align with those listed in
Both Table 1 and Figure 1 have been updated to use the same decimal degree coordinate system.

Table 1: Add typical transect altitude or range of altitudes. Add time of day each transect was sampled.
Table 1 has been updated to include altitude information for each flight. Table 1.