Comment on acp-2021-870

This paper evaluates the ability of the U.S. Navy’s NAAPS-RA model in reproducing observations of extinction profiles and AOT in the vicinity of the Philippines during a recent field campaign. While both anthropogenic and biomass burning aerosols are transported into this region, the high frequency of cloudiness makes using satellite measurements of AOD problematic. The low frequency of AOD retrievals also means that NAAPS-RA has little information to constrain the simulation of aerosols in this region via data assimilation. Therefore, it is useful to take advantage of airborne HSRL-2 measurements to evaluate the performance of the model, in addition to other measurements that can be used to understand the factors affecting extinction profiles. Evaluation of extinction is complicated since errors can arise from many sources as pointed out by the authors. In general, the paper is well written, but the discussion of the evaluation methodology and the interpretation of the results need improvement.

1) The introduction provides some motivation of NAAPS-RA, which includes aerosol-cloud relationships. However, it is not clear how that can be accomplished with an offline model. The authors do not describe how the model results are used to compute CCN and/or aerosol-cloud-radiation interactions. The paper focuses on aerosol optical properties which are important for aerosol-radiation calculations and also indirectly affecting clouds by modifying heating at the surface and heating profiles. It does not make a connection between extinction and aerosol-cloud interactions. The paper does a nice job at quantifying the errors in simulated extinction and AOT, but the evaluation seems disconnected from the introductory material. In addition, are the errors significant for other potential uses? It is not clear "how good is good enough". It would be useful to describe in the conclusion/summary the implications of this work for NAAPS-RA applications.
2) Apparently other sources of uncertainty in simulated extinction, such as aerosol mass, is left to a subsequent study. I am torn about that approach taken by the authors. While adding that component to this paper would increase its length and complexity, the paper seems incomplete without it. After reading the paper, the main outcome is a straightforward evaluation of the simulated aerosol optical properties (extinction and AOT) from NAAPS-RS for CampEx. Performing the sensitivity calculations with RH only partially explores the possible sources of uncertainty. Therefore, the reader is left with the perception that there are many uncertainties and conundrums (mass concentrations higher than observed while extinction is too low) that are unresolved. There is some comparison/evaluation with observed aerosol mass, but apparently there is not sufficient analysis in the present study since there are a lot of statements sprinkled in the results sections about material left to future studies. If the authors wish to leave the manuscript in its present form, they should more clearly articulate in the purpose of the present study versus a subsequent study.
There are also some aspects of the model assumptions that are not commented on. For example. The mass fractions shown in the supplemental material show relatively large mass fractions of sea-salt far above the surface. Does this seem realistic? There are lots of aerosol measurements from other aircraft campaigns that could be tapped into to at least in general confirm whether or not that seems to be a reasonable assumption. The authors mention in a couple of places that data assimilation may introduce some uncertainty in the mass fractions.
3) The measurement and modeling comparison is rather complicated. I suggest the authors look over Section 2.4 -2.9 to explain as best as possible the methodology and possible consequences on the results. Part of this is organization. Section 2.4 just provides a broad description of the strategy, which is talked about in more detail in Sections 2.5 -2.9 and those sections really should be sub-sections within 2.4. The evaluation strategy complication seems to arise from the flight paths and sampling strategy used. Ideally two aircraft are needed: one to obtain the HSRL-2 extinction profile sampling while the other aircraft sampling aerosol mass that coincides with the HRSL-2 measurements as was done during TCAP (Berg et al. JGR, 2016). 4) One of the conclusions of the paper is that simulated errors in RH were not the primary source of uncertainty in simulated extinction. This is not backed up with sufficient evidence. Other sources of errors seem to be left to another study and assumptions used in the aerosol water uptake calculations (which may be minimizing the sensitivity to RH) are not fully explored.

Specific Comments:
Line 124: Reid et al. (2021) has not been submitted. Papers that are in preparation or in submission stage should not be cited. If the paper is published by the time this paper is accepted, it could be included. There seems to be sufficient discussion on the field campaign measurements used in this study.
Line 151: Please include the time period of the campaign here. It is included later in lines 159-160, but it would be useful to include it up front.
Lines 151-159: The interests of the campaign could be applied to many regions of the world. What is needed here are some specifics as to the value of data collected around the Philippines. Perhaps some of the material in lines 168-170 could be moved here to provide a better motivation of the campaign for the reader.
Lines 175-177: MLH and aerosol classifications are derived products from the HSRL-2 measurements, but here they are put at the same level as the primary measurements (it measures backscatter to that is not mentioned). Most readers will not know that, so putting all this information at the same level is a bit deceiving. Perhaps including citations for HSRL-2 and the other products should be included here.
Lines 173-187: Where black carbon measurements available, i.e. from SP2? It would seem that those measurements would be useful in identifying anthropogenic and BB plumes, as well as aging of BB plumes (via coating of BC particles).
Line 186: Perhaps change "particles" to "cloud droplets and aerosol". Just saying particles might imply to readers that only aerosols are measured.
Line 211: Does NAVGEM include feedbacks between aerosols and meteorology via radiation and clouds? In areas of high aerosol concentrations, such as the biomass burning plumes examined in this study, aerosols can affect the meteorology which would then be used to drive NAAPS-RA.
Line 219: Does the phrase "species-dependent mass scattering" mean that the model treats aerosols as an external mixture? If so, it might be useful to explicitly say that so that the reader better understands the assumptions in NAAPS-RA. Atmospheric particles are often a complex mixture of different species. Some models treat aerosol optical properties as internal mixtures which is the other extreme. In reality, aerosol populations often complex in that some regions may be more externally mixed and more internally mixed in other regions.
Lines 237-239: Aerosol water significantly affects extinction in regions of relatively high RH, but is not included as a specie in NAAPS-RA. Instead, it seems that aerosol water is diagnosed when computing extinction. The issue I have is how MODIS AOT is used for assimilation. If the data assimilation process adjusts the four species to be close to the observed AOT when neglecting aerosol water, then the NAAPS-RA should always exceed the observed AOT once water uptake is accounted for. I am probably missing something important here that is not described.
Line 338-340: It would seem that a more appropriate comparison is to average the 15-m HSRL-2 range gates within the model vertical grid cell rather than just take the points closest to the mid-point of the model grid. I assume the model is assuming an average within its cell, so a coarse grid spacing will not resolve large gradients in extinction. So averaging the HSRL-2 data would seem to be a better approach, but it may not change the overall conclusions of this study. This can also be applied to the dropsonde comparison described starting on line 344.
Line 254: After reading Section 2.4, not using the other wavelengths from HSRL-2 seems to be a missed opportunity. Instead, the evaluation focuses only on 550 nm. Is that because NAAPS-RA does not account for aerosol size distribution? Atmospheric models that can account for aerosols in their radiation calculations simulate their effect on all wavelengths, not just at one. Or is AOD at 550 nm the primary purpose of NAAPS-RA? Some discussion would be useful to describe why the evaluation focusses only on one wavelength.
Line 366: I assume that only FCDP measurements outside of clouds are used.
Line 397: This may be an overly broad statement. There are a wide range of aerosol models and the degree to which aerosols are parameterized varies. NAAPS-RA does have a simple treatment, since it only predicts bulk aerosols for four species. Other aerosol models are more explicit in predicting size resolved mass and number for a larger number of species.
Line 401: I understand trying to make the connection between the evaluation and CCN; however, the use of "representative" is misleading in this context. The authors are evaluating extinction, but CCN depends on aerosol size (which is neglected in NAAPS-RA) and hygroscopicity (via relative mass specie contributions). As noted by the authors in other places of the manuscript, extinction and AOT can have compensating errors -so how extinction alone relates to CCN is problematic.
Line 400-403: This text is confusing. First they state that the performance focuses on the ML (even though earlier they note 3 layers that are used for the evaluation), then they say evaluation of the performance in the PBL is the subject of another paper. Is there a difference in the ML and PBL, since these terms are often treated interchangeably? This seems to be the second area (in addition to aerosol mass?) that is left to another study?
Line 405: The authors mention one HSRL-2 profile is used in the 1 x 1 deg box. Why not horizontally average the extinction profiles within the 1 x 1 deg box? The authors do not show any time-height profiles of HSRL-2 extinction to know whether there are large spatial gradients or not.
Line 406-407: I am confused by this statement. It sounds like the nephelometer, AMS, and FCDP measurements were usually not available in the 1 x 1 box. Is this because that box is chosen because of the dropsonde location which is made at high altitude? So you are using that data at lower altitudes (which may be in a different 1 x 1 box) for comparisons? This is obviously not ideal but one has to deal with the aircraft measurements you get. Ideally would be useful for the aircraft to also obtain an aerosol profile in the same column as the dropsonde -but I assume this rarely happened. It would be useful to reiterate the assumptions here. The comparison methodology is getting quite complex at this point.
Line 465: The way this sentence is phrased implies the observed MLH is biased, but I assume the authors compared the model MLH to each of the three dropsonde methods and the HSRL-2 and the bias refers to the model. If this is not a comparison with the model, what does the bias in Table S2 refer to?
Lines 493-499: What is missing in this paragraph is noting that while the correlation is reasonable, there is still a lot of scatter in Fig 2 with some differences as large as two orders of magnitude.
Line 514: Does NAAPS-RS include wet-scavenging? I do not recall that being mentioned in the model description. 1 x 1 deg grid spacing is coarse, but I assume the parent meteorological model would simulate clouds and precipitation in some way that could be used for wet scavenging?
Line 523: Doesn't Table S2 contain the model bias in MLH?
Lines 523-529: Seems that another explanation might be the assimilation of MODIS AOD and how that is handled in the vertical. Has past evaluations of NAAPS-RS provided any guidance on that? Although there are not many retrievals in this area, presumably aerosols from other regions (which would be subject to assimilation) would be advected over the Philippines.
Line 537: Does this statement mean vertical variability between 145 and 500 m or horizontal variability of extinction I that layer. Not clear.
Line 552: Change "Biases" to "Flight-averaged biases". Figure 2 has the biases for each profile, but it looks like Fig. 3 averages them for each flight.
Lines 570-578: I wonder if Figures 5 and 4 can be combined in some way to highlight the differences which are difficult to see currently. Is it important to differentiate the flights in these plots? If not, the two figures could be combined showing the simulated AOTs using the model vs observed RH as different colors. Then the original figures could be moved to the supplemental information.
Line 576: I agree that other parameters in the model might be contributing to uncertainties in the simulated AOT, however, I would have expected changing RH would have had a much larger effect. Figure 4 indicates there were some cases in which observed RH was 20% higher than simulated below 500 m -and the differences could be larger at higher altitudes. Since NAAPS-RA is using simplified techniques to represent aerosols -how good is its method of computing aerosol water uptake? There are aerosol box models available with complex thermodynamical representations that could be used to estimate aerosol water uptake and compare those results with the methodology in NAAPS-RS.
Line 620-627: I appreciate this discussion on the simulated mass concentrations with relation to observations. In line 626 the authors say that mass concentrations need to be increased, but the fine mode mass is similar to observed and coarse mode mass is higher than observed. So the authors are saying they would have to create another error to fix a current error in extinction. There is a mystery here, and it seems that another study would be needed to understand the true source of error(s) in the extinction calculation. I wonder if the source of uncertainties are the assumptions used in the simple treatment of hygroscopicity and/or aerosol water. Line 657: The authors mention the possibility of aerosol mass increasing with height. Why not use the AMS measurements confirm this? Are some of the contradictions (i.e. simulated aerosol mass larger than observed when simulated extinction is slightly lower than observed) due to the different boxes (Fig. S2) where aerosol mass and extinction profiles are compared? With smoke plumes, there could be large spatial gradients.
Line 677: Again it would seem that the AMS could be used to evaluate the ABF and smoke species in NAAPS-RS. At the end of the paragraph, they state that more work is neededso I presume this will be the subject of a future paper?
Lines 781-782: I felt that only the authors only presented some preliminary speculation as to what the possible errors may be. There were no concrete conclusions here regarding what the specific errors for specific cases, so no tangible understanding is provided in this paper.
Line 802: The conclusions are probably not applicable to the entire modeling community. It seems that the uncertainties are largely applicable to NAAPS-RA, and perhaps to other similar classes of aerosol models such as GOCART.
Lines 805-807: Are there other studies evaluating NAAPS-RA in other locations with other field campaign observations that might have urban and biomass burning sources? If so, it would be useful to compare the present work with results from those locales.