Retrieval of Aerosol Combined with Assimilated Forecast

Abstract. We developed a new aerosol retrieval algorithm combining a numerical aerosol forecast. In the retrieval algorithm, the short-term forecast from an aerosol data assimilation system was used for a priori estimate instead of spatially and temporally constant values. This method was demonstrated using the Advanced Himawari Imager onboard the Japan Meteorological Agency’s geostationary satellite Himawari-8, and the results showed spatially finer distributions than the model forecast and less noisy distributions than the original algorithm. We validated the new algorithm using ground observation data and found that the aerosol parameters detectable by satellite sensors were retrieved more accurately than a priori model forecast by adding satellite information. Moreover, the retrieval accuracy was improved by using the model forecast as compared with using constant a priori estimates. By using the assimilated forecast for a priori estimate, information from previous observations can be propagated to future retrievals, thereby leading to better retrieval accuracy. Observational information from the satellite and aerosol transport by the model is incorporated cyclically to effectively estimate the optimum field of aerosol.



Introduction
Airborne aerosols have impacts on air quality and human health. Aerosols also influence the energy budget of the earth's climate system through the scattering and absorption of solar radiation. The fifth assessment report of the Intergovernmental Panel on Climate Change (IPCC 2014) stated that radiative forcing of the total aerosol effect in the atmosphere, including 25 cloud adjustments due to aerosols, is -0.9 W m −2 . The report also showed, however, that the range of uncertainties in radiative forcing remains large (−1.9 W m −2 to −0.1 W m −2 ). In order to estimate the impact of aerosols on climate systems, it is important to investigate the global distribution of aerosols using satellite observations, by taking advantage of the ability to observe the globe continuously, in addition to aerosol ground observation. 30 https://doi.org/10.5194/acp-2020-356 Preprint. Discussion started: 27 May 2020 c Author(s) 2020. CC BY 4.0 License.
Various aerosol retrieval algorithms from satellite observations have been developed. However, not all aerosol properties can be accurately detected by satellite sensors, as there are more unknown aerosol parameters (e.g., particle size distributions, vertical density distribution, shape, refractive index) than the actual information obtained by satellite sensors. General studies use some assumptions or information about aerosol parameters, and limit the retrieval aerosol parameters. For example, Higurashi and Nakajima (1999), and Fukuda et al. (2013) assumed fixed complex refractive indices (1.5 -0.005i in 35 Higurashi and Nakjima (1999), and 1.503 -7.16 × 10 -8 i for small mode particles and 1.445 -1.00 × 10 -8 i for coarse mode particles based on sulfate and sea spray models, respectively, in Fukuda et al. (2013)), and retrieved the aerosol optical thickness and Ångström exponent. Some studies assumed aerosol particle models according to the location and season. For example, Kaufman et al. (1997), Remer et al. (2005), and Levy et al. (2007) estimated the aerosol optical thickness and fine mode fraction over a dark target using the Moderate Resolution Imaging Spectroradiometer (MODIS), by selecting the fine-40 dominate particle models as a function of geography and season. Hsu et al. (2004) retrieved the optical thickness and type of aerosols over desert regions using blue channels (< 500 nm), where the surface reflectance was relatively low, assuming dust or a mixture of dust and smoke depending on the region and season. Jeong et al. (2016) used a priori information according to the location and season. However, these studies did not take temporal changes into account. Because it is impossible to completely fix aerosol type as a function of geography and season, unrealistic assumptions lead to one of the major causes of 45 retrieval error.
Aerosol data assimilation studies using satellite data have also been developed to obtain better initial conditions for the aerosol transport model. The aerosol data assimilation study was first developed with low Earth orbit (LEO) satellites (Benedetti et al., 2009;Saide et al., 2013;Dai et al., 2014;Rubin et al., 2015;Yumimoto et al., 2015). In recent years, 50 assimilation studies have been conducted using geostationary satellites with large spatial coverage and fine observation frequencies (Saide et al., 2014;Lee et al., 2016;Yumimoto et al., 2016;Yumimoto et al., 2018;Die et al., 2019;Jin et al., 2019).
Due to the development of such assimilation studies, the satellite data have contributed to more accurate aerosol forecasts. 55 However, no such study utilizes the assimilated model forecast of aerosol for a priori estimate of the retrieval. Since not all parameters can be accurately detected by satellite sensors, and unrealistic assumptions of aerosol parameters are a major cause of retrieval errors as mentioned above, adding the model information is expected to improve the retrieval accuracy. Therefore, in this study, we utilize the forecast of an aerosol transport model for a priori estimates of the retrieval. This allows the aerosol information in the aerosol transport model to be used for retrieval. And by using the assimilated forecast, 60 information from previous satellite observations can be propagated to future retrievals through the aerosol transport model. https://doi.org/10.5194/acp-2020-356 Preprint. Discussion started: 27 May 2020 c Author(s) 2020. CC BY 4.0 License.

Methodology
The aerosol retrieval algorithm applied in this study is based on Yoshida et al. (2018). Further, we use an aerosol forecast from the transport model that has been assimilated with previous satellite observations for a priori estimates of the retrieval.
As the retrieval algorithm of Yoshida et al. (2018) can be applied to various imaging satellite sensors, the methodology explained in this section can also be applied to various sensors. However, this study first targets the Himawari-8/AHI with its 70 assimilation system in place. The Himawari-8/AHI has six bands from the visible to near-infrared wavelength ranges, and observes the top of atmosphere radiance at a resolution of 0.5-2.0 km over Asia and Oceania at 10-min. intervals (Bessho et al., 2016). Figure 1 depicts the process of using forecast data for a priori estimates of the retrieval. In the original retrieval process, the 75 Level-2 (L2) aerosol optical thickness at 500 nm ( ), Ångström exponent between 400 and 600 nm (), and single-scattering albedo at 500 nm ( ) are retrieved using Level-1 (L1) AHI-observed radiance every 10 minutes around time T0 as per Yoshida et al. (2018). The Level-3 (L3) and at T0 are then estimated using L2 products in one hour by an hourlycombined algorithm . The hourly-combined algorithm is a method that (1) minimizes cloud contamination using the difference between aerosol and cloud spatiotemporal variability characteristics, and (2) interpolates 80 the aerosol retrievals using 1 h of data and the movement of clouds within the hour.
The L3 at T0 is then assimilated into a global aerosol transport model by the 2D-Var assimilation system (Yumimoto et al., 2018). For the aerosol transport model, we use MASINGAR (Model of Aerosol Species IN the Global AtmospheRe; Tanaka et al., 2003;Tanaka and Chiba, 2005) developed at the Meteorological Research Institute (MRI) of the Japan Meteorological 85 Agency (JMA). MASINGAR covers the major tropospheric aerosol components (i.e., black and organic carbon, mineral dust (10-size bins), sea salt (10-size bins), sulfate aerosols)) and their precursors (e.g., sulfur dioxide (SO2), dimethyl sulfide, terpenes)), and is coupled online with an atmospheric general circulation model (MRI-AGCM3; Yukimoto et al., 2012). The model's grid resolution is set to horizontal Gaussian TL479 (960 x 480 grids, about 0.375 degree) and 40 vertical layers in hybrid sigma-pressure coordinates from the ground to 0.4 hPa. The integration time step is set to 600 seconds. 90 Anthropogenic emissions of SO2, black and organic carbon are taken from the MACCity emission inventory (Granier et al., 2011). Daily biomass burning emission flux is taken from the Global Fire Assimilation System (GFAS, Kaiser et al., 2012) version 1.2 provided by the European Centre of Medium Range Forecast (ECMWF). The horizontal wind components and sea surface temperature are nudged toward the global analyses and forecasts of JMA (GANAL). The forecast from the https://doi.org/10.5194/acp-2020-356 Preprint. Discussion started: 27 May 2020 c Author(s) 2020. CC BY 4.0 License. assimilation system serves as the operational sand and dust forecasting by JMA, the aerosol property model product in the 95 JAXA Himawari Monitor (https://www.eorc.jaxa.jp/ptree/index.html), and a member of the ICAP multi-model ensemble (MME) (Xian et al., 2019). The volume concentration (then ) of each aerosol component at the next time (T1) is then forecasted using the assimilated aerosol transport model.
In the new retrieval process, we retrieved the L2 aerosol properties ( , , and from AHI-observed radiance at T1 using 100 these L4 forecasts for a priori estimates of the retrieval. In this way, the information from previous observations at T0 is used for the next aerosol retrievals at T1 through the aerosol transport model. Figure 6 compares the improved retrieval results with the original retrieval results at T1. The methodology for using the forecast as a priori estimates of the retrieval is detailed as follows: In the retrieval process, 105 the final retrieval parameters ( , , and  are calculated from the set of aerosol parameters ( , external mixing ratio of dry volume concentration for fine particles , and imaginary part of refractive index for fine mode ) defined by Yoshida et al. (2018). We retrieve the aerosol parameters ( , , and ) using an optimal estimation method (Rodgers 2000). The state vector of a set of aerosol parameters , , is derived by minimizing object function J (Eq. (1) (1) where , , is the vector of a prior estimate of , and Se and Sa are the covariance matrices of R and , respectively. The calculations of R, , and are the same as those of Yoshida et al. (2018), but we apply canonical correlation analysis to find the optimal coordinate system, and converted R, , and to the coordinate system whose 115 dimension is reduced to the number of retrieved parameters (three). In the original retrieval process, we used spatially and temporally constant values of , and Sa that are derived from climate analysis, and assumed that the non-diagonal component of covariance matrices was set to 0 (Yoshida et al., 2018).
To introduce more realistic a prior estimate and covariances into the retrieval process, we employ the forecast from the 120 aerosol assimilation system instead of the constants. The model forecast includes the total aerosol optical thickness at 500 nm and 870 nm, and the absorption aerosol optical thickness at 500 nm derived from the modeled volume concentration and extinction cross section of each aerosol component (Yumimoto et al., 2017). We assign a priori estimate as follows: The model's total aerosol optical thickness at 500 nm is used for . is derived from the ratio of total aerosol optical thickness between 500 nm and 870 nm. As the selection of , we use the model's as calculated from the total and absorption 125 aerosol optical thickness at 500 nm.
The assimilation system uses an ensemble method to calculate the background error covariance matrix (Yumimoto et al., 2018). In the method, the ensemble was collected from forecast values within ±2 hours of the targeted hour of the five previous forecasts (Fig. 2). We employ this method to define Sa. The model ensemble enables Sa to include the non-diagonal 130 component and express the error of aerosol transport. However, Sa from model ensemble may become too small when the model does not predict the aerosol event itself. For that reason, in order to estimate total , we add a model absolute error ( to the error estimated from the ensemble ( as follows: , , where , , and are the standard deviations of , , and , respectively, estimated from the ensemble. , , and are the same as those of the model absolute error. is set to or (whichever is larger) as follows: 140 , where is the Root Mean Square Error (RMSE) of the model's from ground observation (0.447 in Fig. 6 (c)), and is the standard deviation of for five years as calculated by the free run model without assimilation. The free run model's spatial resolution is around 1.2 degrees, and the standard deviation is calculated for spring (March, April and May), summer (June, July and August), autumn (September, October and November), and winter (December, January and February) (Fig.  145 3).
(0.110) is calculated from RMSE of the model's  (0.233 in Fig. 6 (f)), which is uniquely determined by . is set to 0.5 because takes a value from 0. to 1, and (which is uniquely determined by ) has no correlation with the ground observation data (Fig. 6 (i)). As the non-diagonal component of cannot currently be calculated from our limited database, we assume that the correlation from Sa and is the same, and calculate the non-diagonal component of Sa as follows: 150 155 3 Results and Discussion

Results of application to Himawari-8
We applied the methodology described in Section 2 to the Himawari-8/AHI. We retrieved , and , and then derived ω and  at 10-min. intervals from the calibrated L1 data subsampled at 0.05° using the method described in Section 2. The channels used for the retrieval are same as those used by Yoshida et al. (2018), which are channels 1 (0.46 m), 2 (0.51 m), 160 3 (0.64 m), 4 (0.86 m), and 5 (1.6 m) over land, and channels 4 and 5 over the ocean. As the number of satellite channels (two) used over the ocean is less than the number of retrieval parameters (three), not all parameters are stably retrieved by satellite data. Therefore, over the ocean, which is the least sensitive to satellite observation, is set to 0 (i.e., non-absorbing aerosol) at this time, because the aerosol over the ocean is generally less absorbing than that over land, and using the model's as over the ocean leads to a worse estimation of (not shown). After obtaining a better model in the future, we 165 will use the model's as over the ocean. Note that over land is properly retrieved from satellite data (i.e., not set to 0) using the model's as a priori estimate, since the number of satellite channels (five) used over land is greater than the number of retrieval parameters (three).
We compare the retrieval results from (b) the new algorithm using for , (c) the new algorithm using only 170 for , and (a) the original algorithm in Figs. 4 and 5. Figure 4 depicts the retrieved , , and at 0200 UTC on May 19, 2016 when aerosols originating from wildfires at a proximity to Lake Baikal in Russia reached Japan. The model's used for retrieval in Fig. 4 Fig. 4 (d). The , , and used for retrieval in Fig. 4 (b) are shown in Fig. 4 (e). The white regions denote the area where retrieval is not executed due to the presence of clouds, etc. The 2-h forecasts starting from 0000 UTC on May 19 were assimilated with L3 merged at 0300, 0600, and 0900 UTC on May 18, 175 and at 0000 UTC on May 19, and then used for a priori estimate (Fig. 4 (d)). Figure 5 is the same as Fig. 4 except for another case at 0500 UTC on May 7, 2017, when Asian dust was observed in Japan. The 5-h forecast starting from 0000 UTC on May 7 (and assimilated at 0300, 0600, and 0900 UTC on May 6, and at 0000 UTC on May 7) is used for a priori estimate ( Fig. 5 (d)). These short-term forecasts with data assimilation are considered relatively realistic compared to long-term forecasts or a free run without assimilation (Yumimoto et al., 2018). If only model ensemble error is used for Sa (Figs. 4 (c), 180 5 (c)), that is, the absolute error is not included in Sa, all retrieved parameters (especially and over land) are highly dependent on a priori estimate (Figs. 4 (d), 5 (d)). However, when using an appropriate Sa containing absolute error, the https://doi.org/10.5194/acp-2020-356 Preprint. Discussion started: 27 May 2020 c Author(s) 2020. CC BY 4.0 License.

(b) and (c) is indicated in
retrieved , , and are updated by satellite data or remain close to a priori estimate depending on the location (Figs. 4 (b), 5 (b)). Specifically, spatially finer distributions than the model forecast are retrieved for cases of both wildfire aerosol ( Fig.   4 (b)) and Asian dust (Fig. 5 (b)) due to the relatively coarser model horizontal resolution. Similar is retrieved over open 185 ocean in Fig. 4 (b) and Fig. 5 (b), and the large (i.e., small particle) and small (i.e., large particle) are successfully retrieved in areas corresponding to wildfire aerosol ( Fig. 4 (b)) and Asian dust (Fig. 5 (b)), respectively. This distribution is also expressed in the forecast model in Fig. 5 (d), but cannot be expressed sufficiently in the forecast model in Fig. 4 (d) because information about the aerosol particle size (e.g.,  is not assimilated into the model. That is, by using an appropriate Sa, both the model and satellite data are used for estimating the aerosol parameters. In addition, the local noise in 190 and is apparently reduced for this algorithm (Figs. 4 (b), 5 (b)) as compared with the original algorithm (Figs. 4 (a), 5 (a)). This will be discussed in Subsection 3.2.

Validation
We conducted a preliminary validation of our method by comparing the retrieved ,  and from the Himawari-8/AHI with those from ground observation known as the Aerosol Robotic Network (AERONET). AERONET's τ and were 195 derived from Level 2.0 quality-assured Version 3 direct sun algorithm data (Giles et al., 2019;O'Neill et al., 2003), and was derived from Version 3 AERONET inversion products (Dubovik and King, 2000a;Dubovik et al., 2000b;Dubovik et al., 2002a;Dubovik et al., 2002b;Dubovik et al., 2006;Sinyuk et al., 2007). AERNET's at 500 nm was calculated from linear interpolation of at 440 nm and 675 nm. In this study, the 60 AERONET sites on the full disk of Himawari-8 were used for the validation. We used the AERONET data averaged over 10 min. of AHI observation time. For our retrieval data, we used 200 τ,  and estimated from AHI L1 radiance data subsampled at 0.05° nearest to the AERONET sites. Initial validation was conducted for the characteristic three months (March 2018, June 2018, and February 2019). Long-term validation will be required in future studies. Figure 6 compares the retrieved ,  and from the AHI with those from AERONET. For the estimations (Fig. 6 (a), (b), 205 (c)), the root mean square error (RMSE), mean bias (MB), and correlation (0.397, -0.172, and 0.635) from this algorithm ( Fig. 6 (b)) are all better than those (0.447, -0.277, and 0.595) from the model forecast (i.e., a priori estimate) in Fig. 6 (c), which means that satellite information is very effective for the retrieval of . In addition, the RMSE and correlation (0.397 and 0.635) in Fig. 6 (b) are better than those (0.533 and 0.615) without the forecast model ( Fig. 6 (a)), which means that the model information is also effective and the improved algorithm shows better performance than the original algorithm. The 210 MB (-0.172) in Fig. 6 (b) is worse than that (-0.011) in Fig. 6 (a), probably because the large outlier in Fig. 6 (a) is improved in Fig. 6 (b). Figure 7 shows an example of the retrieval results of the outlier (red asterisks in Fig. 6 (a), (b), and (c)), and depicts that the outlier of the original algorithm ( Fig. 7(a) For the estimations (Fig. 6 (d), (e), (f)), large variance in the original method is considerably reduced by this method. The RMSE and correlation (0.253 and 0.505) from this algorithm (Fig. 6 (e)) are much better than those (0.461 and 0.225) from the original algorithm without the forecast model (Fig. 6 (d)), which indicates that the new algorithm could improve the precision of estimations by adding more accurate (RMSE of 0.233) information from the model. In addition, the MB (-0.086) from this algorithm (Fig. 6 (e)) is better than that (-0.117) from the model forecast ( Fig. 6 (f)), due to the 220 improvement of negative bias in the large in the model forecast. Thus, the new  can be retrieved with good accuracy by utilizing the relatively accurate model's  and correcting the biasby using the satellite data.
For the estimations (Fig. 6 (g), (h), (i)), the RMSE and correlation (0.038 and 0.519) from this algorithm (Fig. 6 (h)) are better than those (0.053 and -0.009) from the model forecast ( Fig. 6 (i)), which indicates the effectiveness of satellite 225 information for retrieval. In addition, while statistic scores (i.e., RMSD, correlation, MB) show little modification, this algorithm improved the slope and intercept of the regression line by introducing the model forecast. This is probably because the model's is not very consistent with AERONET (RMSE of 0.053), but less biased (-0.001 of MB) due to the possibility that the model's , whose determinants are complex (e.g., different for the same type of aerosol), generally reflects reality, but not enough in individual cases. Note that the current system assimilates only total (total amount of aerosols), and 230 information about the fraction of fine and absorbing aerosols from the retrieval is not fed back to the model forecast. The improved retrieval accuracy of can be expected if the model's becomes more realistic in the future, such as by assimilating the satellite's to the model. Considering the validation results of ,  and , this new algorithm effectively improved the retrieval accuracy using information from both the model and the satellite by setting appropriate Sa and Se.

Worst-case scenario 235
We have shown that the new retrieval algorithm using the forecast of an aerosol transport model improves the retrieval accuracy. However, in order to use this algorithm as an operational system, the effects of the model forecast (a priori estimate) that deviate from reality must be examined, because the model forecast may miss an aerosol event. Therefore, we conducted a sensitivity test to investigate the impact on the retrieval results of using unrealistic forecast as a priori estimate. Figure 8 shows the retrieval results on the same day as in Fig. 4, except for using the forecast on another day (April 27, 240 2018) as a priori estimate of the retrieval (Fig. 8 (d)). If only is used as Sa (Fig. 8 (c)), all parameters (especially and ) are retrieved unrealistically by being dependent on the unrealistic a priori estimate. However, when using an appropriate Sa (Eq. (2)), the retrieved parameters are well-updated by satellite data with less dependence on unrealistic a priori estimate (Fig. 7 (b)). Even in such an extremely worst-case scenario, this new algorithm is apparently not significantly worse than the current algorithm, especially where the model forecast is missing an aerosol event, which may occur in the 245 model forecast for natural aerosols (e.g., mineral dust and smoke from biomass burning).

Summary
We developed a new aerosol retrieval algorithm combining a numerical aerosol forecast. In the retrieval algorithm, the shortterm forecast from an aerosol data assimilation system was used for a priori estimate instead of spatially and temporally constant values. This is the first study that utilizes the assimilated model forecast of aerosol for a priori estimate of the 250 retrieval. We applied this new algorithm to the Himawari-8/AHI and confirmed that the aerosol parameters detectable by satellite sensors were retrieved more accurately (RMSE of 0.397 for and 0.038 for ) than a priori model forecast (RMSE of 0.447 for and 0.053 for ) by adding satellite information. Moreover, the retrieval accuracy was improved (RMSE of 0.397 for and 0.253 for ) by using the model forecast as compared with using constant a priori estimates (RMSE of 0.533 for and 0.461 for ). As a result, retrieval with high accuracy can be performed by effectively using both model and 255 satellite information depending on each covariance. By using the assimilated forecast for a priori estimate, information from previous observations can be propagated to future retrievals, thereby leading to better retrieval accuracy. In this way, satellite observation and model simulation are used synergistically to continuously estimate the optimum field of aerosol. In the future, by applying this methodology to other polar orbit sensors, the observation of geostationary satellites sensors can be utilized for the polar orbit sensors, leading to the combined use of geostationary and polar orbit sensors. 260

Data availability
Himawari-8/AHI aerosol data are available from JAXA Himawari Monitor site: https://www.eorc.jaxa.jp/ptree/index.html. The retrieved data in this study can be requested directly from the lead author (mayum@restec.or.jp).

Author contributions 265
MY developed retrieval code and analyzed the data with significant conceptual input from KY and critical technical support from TN, MK, and HM. KY and TT prepared the assimilated forecast data for test cases and long-time validations, respectively. MY prepared the manuscript with contributions from all co-authors.