Comment on acp-2020-1086

The paper by Wang et al. focuses on flux measurement uncertainties and sulfur content from ship emissions. The measurements are conducted by gradient method for selected trace gas pollutants and supplemented by eddy covariance measurements of CO2 from a 9 m mast located at a coastal site and on a research vessel 2 km SSW into the sea. Quantifying emission rates from moving ships is certainly not trivial and the number of challenges encountered by the authors is simply impressive. While the paper shows a large effort in conducting the measurements, at least in this version, I would have reservations to the data interpretations and whether they can fully capture ship emissions using the presented approach. The major result from the paper are measured FSCs from the ships all of which did not exceed the EC regulation limit. Overall, I found this paper interesting for the focus on ship emissions, but there are inconsistencies in the data and the paper shows a high potential for further analysis and more coherent presentation of the results.


General
The paper by Wang et al. focuses on flux measurement uncertainties and sulfur content from ship emissions. The measurements are conducted by gradient method for selected trace gas pollutants and supplemented by eddy covariance measurements of CO 2 from a 9 m mast located at a coastal site and on a research vessel 2 km SSW into the sea. Quantifying emission rates from moving ships is certainly not trivial and the number of challenges encountered by the authors is simply impressive. While the paper shows a large effort in conducting the measurements, at least in this version, I would have reservations to the data interpretations and whether they can fully capture ship emissions using the presented approach. The major result from the paper are measured FSCs from the ships all of which did not exceed the EC regulation limit. Overall, I found this paper interesting for the focus on ship emissions, but there are inconsistencies in the data and the paper shows a high potential for further analysis and more coherent presentation of the results.

Major comments
The major question is how well the assumptions of the gradient and EC methods worked for this heterogenous coastal site. If the large short-term episode (e.g. in SO 2 or CO 2 ) occupies only a fraction of the flux integration period, the episodic/spike data would most likely make it nonstationary regardless of whether other micromet variables were stationary or not. The stationarity test should be conducted on each flux tracer including the CO 2 data. The gradient method also requires accurate measurements at two different heights. If the systematic offset between the instruments (SI Figure S1) was not corrected for, it would lead to large errors in calculated vertical GR fluxes. It is unclear how the data in Fig. S1 were used to correct/cross-calibrate the instruments when the correlation slope differs from 1. In the GR method, the authors rely on the assumption that the eddy diffusivity for heat transfer is the same as that for gas mass transfer (e.g. L.81). This could lead to large uncertainties which should be calculated independently for each chemical species. The comparison of GR and EC sensible heat fluxes could have been relatively easy and a good start in comparing the EC and GR methods. It would have been great to see a more quantitative comparison for GR and EC methods for CO 2 (and for heat). However, from Fig. 9 it is clear that the CO 2 fluxes agreed rather poorly, where for example between 08/28 12 PM and 08/29 12 PM, the gradient data show all negative values while EC data are scattered in a broader range mostly positive values but often changing the flux sign. The relative difference between the methods for most of the measured period therefore largely exceeds the uncertainties stated in the abstract (25-36 % and 30-60 % for the GR and EC methods, respectively). For this reason, I am finding highly suspicious the exact same median value for GR and EC CO 2 fluxes reported in Table 1. I agree with the comment of the other referee that the scatter plot would have reflected more clearly how both methods worked. If the agreement does not work well for CO 2 , the question is why and whether the gradient flux method was valid for SO 2 and other reported trace gases. The flux footprint contribution does not seem to be discussed. The data could give a completely different picture if the ship was outside the footprint (depositing fluxes to the site expected) compared to when the ship would be inside the footprint (emission fluxes expected). It is therefore challenging to attribute any enhancement to the ships without the knowledge of what the footprint was and how it was changing. Given the moving point source within likely changing footprint (not uniform at the two heights) I am not convinced that the chosen approach was optimal for quantifying emission rates from ships. There are other methods such as wavelet analysis which could be more appropriate to measure intermittent or short-term emission episodes (e.g. Steiner et al., 2011;Misztal et al., 2014) which are not dependent on stationarity criteria. I could not find it in the main text and SI, so I am curious if the data were subjected to coordinate rotation and how close to zero was the average vertical wind speed w? A small tilt of the sonic anemometer could greatly skew the flux data.
There is no mention about how the lag time was derived for each integration period or if a constant value was used. I am particularly concerned about the potentially incorrect lag time because the CO 2 flux was changing sign from one period to the other like, for example, from 28 to 30 Aug (Fig. 9a). It would be great to see how a peak in the covariance function looked like and if the lag time was stable. The data quality control is not presented clearly. It would be great to know what criteria were used and which data were actually rejected. For instance, Figures 8 and 9 show the data for when M-O theory was not fulfilled. If it is important to show these low-quality data they could be shown in grey so it is clear that they were rejected and are not distracting from observing potentially good data. Conclusions lack the main take-home messages. Practically entire conclusions are spent on emphasizing high uncertainties and challenges and not pointing out the main results or findings. Was the goal to say that the methods did not work at all or that they might potentially work with some improvements? Including the major findings based on the valid data (FSCs?) and further analysis of the remaining data (especially NOx) could significantly improve the manuscript.

Specific comments
What inlet was used for sampling ultra fine particles? SI Figure S2 shows how uncertainty increases closer to the detection limit which is a nice demonstration. However, it is unclear how the data below the detection limit were treated. I suggest to consult Helsel (1990). What was the message the multipanel Figure S3 was meant to come across? Is it suggested that the absolute uncertainties exceeded almost all the data values? It could perhaps be clearer to show the relative uncertainties as shaded areas.
In the uncertainty budget, I would suggest the authors describe the systematic and random errors as well as the treatment of data below the detection limit. It is unclear if the data have been corrected for the systematic error. Eq. 10, the value of the 0.232 multiplier seems somewhat off when using the emission factor from Petzold et al. Was a different EF value used instead? Figure 1, poor resolution, I could not read the text. Figure 3, panel a) low resolution CFD figure, I could not read the legend. It would be useful to add in the text how exactly CFD was used to correct the data and if it was a constant or time-dependent correction. Figure 5, these trajectories show long-range transport. Can these be zoomed to the measurement site? Figure S1. Make x and y axes consistent. Show the 1:1 line. Were these data used to correct the instruments? How? I am surprised that the uncertainty is shown in Table S1 for the nonstationary periods as I do not think it is meaningful. The nonstationary periods should have been rejected. Did the CFD calculation correct only for the horizontal wind speed? What was the frequency distribution of CO 2 fluxes (FFT spectrum)? As the flux data collection was conducted only at 1 Hz (L. 275), were the data corrected for high frequency losses?