Improving air quality forecasting with the assimilation of GOCI AOD retrievals during the KORUS-AQ period

The Korean Geostationary Ocean Color Imager (GOCI) satellite has monitored the East Asian region in high temporal and spatial resolution every day for the last decade, providing unprecedented information on air pollutants over the upstream region of the Korean peninsula. In this study, the GOCI aerosol optical depth (AOD), retrieved at 550 nm wavelength, is assimilated to ameliorate the analysis quality, thereby making systematic improvements on air quality forecasting in South Korea. For successful data assimilation, GOCI retrievals are carefully investigated and processed based on data charac5 teristics. The preprocessed data are then assimilated in the three-dimensional variational data assimilation (3DVAR) technique for the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem). During the Korea-United States Air Quality (KORUS-AQ) period (May 2016), the impact of GOCI AOD on the accuracy of air quality forecasting is examined by comparing with those of other observations including Moderate Resolution Imaging Spectroradiometer (MODIS) sensors and fine particulate matter (PM2.5) observations at the surface. Consistent with previous studies, the assimilation of surface PM2.5 10 concentrations alone systematically underestimates surface PM2.5 and its positive impact lasts mainly for about 6 h. When GOCI AOD retrievals are assimilated with surface PM2.5 observations, however, the negative bias is diminished and forecasts are improved up to 24 h, with the most significant contributions to the prediction of heavy pollution events over South Korea.

surface particulate matter -all particles with aerodynamic diameter less than 2.5 µm (PM 2.5 ) or up to 10 µm (PM 10 ) (Schwartz et al. (2012) and Jiang et al. (2013), respectively). GOCI AOD retrievals have been assimilated in several studies to assess their impact on short-term air pollution forecasts in the online coupled forecasting system. Saide et al. (2014) performed the Observing System Experiment (OSE) using the eight bin MOdel for Simulating Aerosol Interactions and Chemistry aerosol model (MOSAIC) (Zaveri et al., 2008) in the 5 WRF-Chem/GSI 3DVAR system. Pang et al. (2018) assimilated AOD retrievals from GOCI and Visible Infrared Imaging Radiometer Suite (VIIRS; Jackson et al. (2013)) to predict surface PM 2.5 concentrations over Eastern China and found that the assimilation of AOD retrievals improved the forecast accuracy but still underestimated heavy pollution events.
This work further extends the assimilation capabilities in the GSI 3DVAR system to best use GOCI AOD retrievals during the KORUS-AQ period with careful investigation of data characteristics. Aiming at improving the operational air quality fore- 10 casting in Korea, which is currently lacking the state-of-the-art analysis system, we are discussing how to effectively assimilate satellite-derived aerosol data and examine its impact on surface PM 2. 5 predictions compared to that of other observations. In the categorical forecasts for different air pollution events, we focus on severe pollution cases describing how air pollutants evolve, coupled with the synoptic weather systems.
A brief overview of the analysis and forecasting systems used in this study is presented in Section 2, followed by cycling 15 experiments with details on observation processing for GOCI retrievals described in Section 3. Results are summarized in Section 4 discussing the observation impact during the cycles and extended forecasts separately. Forecast performances in heavy pollution events are briefly described as well. Finally, conclusions are made in Section 5, along with a discussion on the limitations of this study and suggestions for the future research.
2 The WRF-Chem forecast model and the GSI 3DVAR analysis system 20

WRF-Chem forecast model
The model used in this study is an online-coupled meteorology and chemistry model, WRF-Chem version 3.9.1 . The physics options used in WRF-Chem include the rapid and accurate radiative transfer model for GCM (RRTMG) for long-wave radiation (Iacono et al., 2008), new Goddard shortwave radiation (Chou and Suarez, 1994), the Yonsei University (YSU) planetary boundary layer (PBL) scheme (Hong et al., 2006), the Lin microphysics scheme (Lin et al., 1983), as well as 25 a new Grell 3D cumulus parameterization scheme. These options are chosen based on the operational configuration currently used in the Korean National Institute of Environmental Research (NIER) for their daily air quality forecasting in South Korea.
The Goddard Chemistry Aerosol Radiation and Transport (GOCART;Chin et al. (2002)), developed by the National Aeronautics and Space Administration (NASA), is used as an aerosol scheme. Aerosol direct effects are allowed through the interaction between GOCART and the Goddard shortwave radiation scheme (Fast et al. (2006); Barnard et al. (2010)). 30 The Model for Ozone and Related Chemical Tracers (MOZART) gas phase chemistry (Emmons et al., 2010) is generated with the kinetic preprocessor (KPP) (Damian et al. (2002); Sandu and Sander (2006)), and is used together with the simple GOCART aerosol scheme, known as the MOZCART mechanism (Pfister et al., 2011). The MOZART chemistry in WRF-Chem is designed to run with the Madronich FTUV scheme for photolysis processes (Tie et al., 2003), reading in climatological O3 and O2 overhead columns. It also utilizes the standard WRF-Chem implementation of the Wesley dry deposition scheme (based on Wesely (1989)) allowing for seasonal changes in the dry deposition. The resolved scale wet scavenging is inactivated but convective wet scavenging is applied in the Grell cumulus parameterization. Also, GOCART sea salt emissions and dust emissions with AFWA modifications (LeGrand et al., 2019) are included in this study. 5 Anthropogenic emissions are estimated offline based on the global EDGAR-Hemispheric Transport of Air Pollutants (HTAP) emission inventory (http://edgar.jrc.ec.europa.eu/htap_v2/) that consisted of 0.1 • x 0.1 • gridmaps of CH4, CO, SO2, NOx, NMVOC, NH3, PM10, PM2.5, BC and OC from the year of 2010. The emission data mapped to our model grids has a single level with no vertical variations and is generated from the annual mean with no diurnal variations (e.g. time-invariant). In terms of data range, the maximum (average) value of PM 2.5 in the data, for example, is 3.56 (0.032) µg m −2 s −1 and 2.84 (0.026) 10 µg m −2 s −1 in domain 1 and 2, respectively.
Biogenic emissions are built up using the Model of Emission of Gases and Aerosol from Nature (MEGAN; Version 2) (Guenther et al., 2006) and for biomass burning emissions, daily fire estimates provided by Fire Inventory from NCAR (FINN; Wiedinmyer et al. (2011)) are used with tracer transport allowed. All the wrf files including biomass and biomass burning emissions are processed using the MODIS landuse datasets (Friedl et al., 2002). 15

The GSI 3DVAR analysis system
To assimilate AOD retrievals and surface PM 2.5 observations in the Weather Research and Forecasting-Chemistry (WRF-Chem) model, the NCEP GSI 3DVAR Version 3.5 system is used. As Liu et al. (2011) and Schwartz et al. (2012) described the details of the system for aerosol data assimilation, only a brief explanation follows. Incorporating observations into the threedimensional model state space, a 3DVAR system produces the best estimate to the true state by minimizing the differences 20 between observations and background forecasts (e.g. innovations; (o-b)'s), which is called the "analysis". Given the model state vector (x), the penalty function (or cost function) J(x) is defined as where x b stands for the background state vector (e.g. forecasts from the previous cycle), y an observation vector, and H is an observation operator that projects the model states onto the observation space linearly or nonlinearly to compute the model 25 correspondent to each observation. Background and observation error covariance matrices B and R, respectively, indicate how reliable the background forecast (B in the first term) and the observed information (R in the second term) might be to determine how to properly weight the two disparate resources. By minimizing the cost function (J(x)) with respect to the model state vector x at the analysis time, the variational analysis algorithm produces the analysis that fits best to all the observations assimilated within the assimilation time window. 30 To characterize the forecast error magnitude and its spatial structure, background error covariance B is estimated for each aerosol species using the National Meteorological Center (NMC) method (Parrish and Derber, 1992) based on the differences between 48-and 24-h WRF-Chem forecasts valid at the same time for 30 samples ending at 0000 UTC in May 2016. The current GSI/3DVAR system does not allow cross-correlation between aerosol species or between aerosol and meteorological variables. As this is a 3DVAR analysis with no time information, B only characterizes the spatial correlations in each analysis variable, which determines how to propagate the observed information across the model grids.
Following Liu et al. (2011) and Schwartz et al. (2012), this study also takes the speciated approach where the analysis vectors are comprised of 15 WRF-Chem/GOCART aerosol variables -sulfate, organic carbon (O) and black carbon (B), mineral dust 5 (D) in five particle-size bins (with effective radii of 0. 5, 1.4, 2.4, 4.5, and 8.0 µm), and sea salt (S) in four particle-size bins (with effective radii of 0.3, 1.0, 3.25, and 7.5 µm for dry air), and P as unspeciated aerosol contributions to PM 2.5 -, as opposed to using total aerosol mass of PM 2.5 as the analysis variable in Pagowski et al. (2010). For organic and black carbon, hydrophobic and hydrophilic components are considered (e.g. O 1 , O 2 , B 1 , and B 2 ).
The observation operator H(x) for surface PM 2.5 requires 10 GOCART aerosol variables as where P represents unspeciated aerosol contributions to PM 2.5 ; U denotes sulfate; O 1 and O 2 (B 1 and B 2 ) are hydrophobic and hydrophilic organic (black) carbon, respectively; and D 1 and D 2 (S 1 and S 2 ) are dust (sea salt) aerosols in the smallest and 2nd smallest size bins. This formula originated from the WRF-Chem diagnostics of PM 2.5 for the GOCART aerosol scheme.
PM observations are mass concentrations in µg/m 3 while all the model variables listed within the bracket in the right-hand side 15 are aerosol mixing ratios (µg/kg), dry density ρ d is thus required for the unit conversion in equation 2.
In this study, we assimilate AOD retrievals at 550 nm from both MODIS and GOCI sensors using the same observation operator based on the community radiative transfer model (CRTM; Han et al. (2006); Liu and Weng (2006)) as described in Liu et al. (2011). Although the GOCART aerosol scheme is well known to underestimate surface PM concentrations due to the lack of secondary organic aerosol (SOA) formation, nitrate, and ammonium Volkamer et al. (2006);  2018)), it is widely used in the analysis study because it is the only scheme publicly available for assimilating AOD retrievals from satellite data in the GSI system. Aerosol optical depth (AOD) measures the amount of light extinction by aerosol scattering and absorption in the atmospheric column which depend on the refractive indices and the size distribution of aerosol. In GSI, the CRTM computes the effective radii and the refractive indices of the 14 speciated WRF-Chem/GOCART aerosol species, assuming spherical aerosol particles and lognormal size distributions. Applying single-25 scattering properties of spheres by Mie theory, the mass extinction coefficient is computed as a function of the effective radius for each aerosol species at a certain wavelength (here, 550 nm) at each model level. The mass extinction coefficient (m 2 /g) for each aerosol species multiplied by the aerosol layer mass (g/m 2 ) produces dimensionless AOD for the species at that level.
To represent the entire atmospheric column, model-simulated AOD is then computed as the column integration of AOD for all aerosol species. Using the CRTM as a forward operator, AOD retrievals are assimilated separately or simultaneously with During the month of May 2016, observations are assimilated in the GSI 3DVAR system to produce the analysis that is used as an initial condition for the following WRF-Chem simulations. WRF-Chem forecasts valid at the next analysis time are then used as first-guess (or background) for the next GSI analysis. In this study, the whole process is repeated every 6 h (called "cycled") for the month-long period. Here we describe the analysis and the forecast systems used in the cycling.  (Emmons et al., 2010) that are converted to WRF-Chem species by using the "mozbc" utility (downloaded from https://www2.acom.ucar.edu/wrf-   Figure 1 shows the entire surface observing network that was used to 25 assimilate surface PM 2.5 . Observation sites are concentrated in the urban area where many sites are close enough to be overlapped with each other. The Seoul Metropolitan Area (SMA; centered around 37.5 • N, 127 • E), for example, has hourly reports from total of 41 stations.
As part of data quality control (QC), surface PM 2.5 concentrations higher than 100 µg/m 3 are not assimilated and observations producing innovations ((o-b)'s) that exceed 100 µg/m 3 were also discarded during the analysis step. To accommodate 30 most measurements in China during heavy pollution events, a much higher threshold of 500 µg/m 3 was once applied as the maximum observed value in our test experiment for the same month-long cycles, but it did not lead to any meaningful changes in the forecast performance over South Korea (not shown). Presumably this is because such high values were observed only over China where air pollutants were already overestimated by the emission data based on the 2010 inventory such that the forecast skills over Korea became insensitive to the assimilation of those additional surface observations in China. Therefore, we applied the original threshold of 100 µg/m 3 to all our experiments presented here. 5 Observation error is composed of measurement error ( o ) and the representative error ( r ) caused by the discrete model grid spacing (e.g. where γ is 0.5, ∆x is grid spacing (here, 27 km for domain 1 and 9 km for domain 2) and the scaling factor L is defined as 3 km. Based on this formula, observation error ( pm 2.5 ) ranges from 2.0 to 3.2 µg/m 3 in domain 2, assigining the error of 10 2.48 µg/m 3 to the PM 2.5 observation of 50 µg/m 3 , for example. In this 3DVAR analysis, observation errors are considered to be uncorrelated so that the observation error covariance matrix R becomes diagonal. During the 6-h cycling, all the surface observations within ±1 h window at each analysis time were assimilated without further adjustment of observation error.

AOD retrievals and observation preprocessing
Total AOD retrievals at 550 nm from MODIS sensors onboard Terra and Aqua satellites have been widely used in aerosol stud- 15 ies Reid, 2006, 2010;Lee et al., 2011). But the polar-orbiting satellites produce a very limited dataset temporally (mostly around 06 UTC only) and spatially (with a sparse coverage) over Korea during the KORUS-AQ period. The MODIS AOD level 2 products over both land and ocean "dark" area are available at 10 km x 10 km resolution and thinned over 60-km resolution during the GSI analysis in this study. Following Remer et al. (2005), observation errors are specified as the retrieval errors: (0.03 + 0.05 * AOD) over ocean and (0.05 + 0.15 * AOD) over land. They do not include the representativeness error 20 and are slightly smaller than those for GOCI AOD, as described below.
The GOCI satellite monitors the East Asian region centered on the Korean peninsula (36 • N, 130 • E) covering about 2500 km × 2500 km. GOCI level II data has eight spectral bands from the visible to near-infrared range (412 to 865 nm) with hourly measurements during daytime from 9:00 (00 UTC) to 17:00 local time (08 UTC) at 6 km resolution. As summarized in Choi et al. (2018), a recently updated GOCI Yonsei aerosol retrieval (YAER) Version 2 algorithm targets cloud-and snow-free pixels 25 over land and cloud-and ice-free pixels over ocean in producing the level II data. By adopting the MODIS and VIIRS aerosol retrieval and cloud-masking algorithms, cloud pixels are filtered to avoid cloud contamination, and high reflectance or highly heterogeneous reflectance pixels are also masked to further increase data accuracy and consistency during the retrieval process.
Unlike MODIS retrievals, GOCI AOD has not been extensively used in the data assimilation community. The GSI system takes most observation types in prepbufr, which has already gone through some processing to be prepared for data assimilation, 30 but the preprocessing algorithms are not publicly available. This means that, when a new dataset is assimilated in GSI, users need to investigate the characteristics of the data (such as temporal and spatial distribution) and thereby make the data suitable for assimilation, which is of crucial importance for the analysis quality.
In terms of temporal distribution, most of GOCI level II data are retrieved on 30 min passed each hour in the hourly report.
For example, the actual time for most of the data reported at 00 UTC is centralized around 00:30:00 UTC (hh:mm:ss). In the 3DVAR algorithm, there is no time dimension and all observations are considered to be available at the analysis time. To account for temporal distribution, different weights are often given to observations based on the relative distance between the actual report time and the analysis time during the analysis step. However, taking possible latency in data transfer and retrieval 5 processing into consideration, it is not legitimate to assign weights to the retrievals based on their final report time, without further information. Therefore, considering high temporal and spatial variability of aerosols, the assimilation window is set to ±1 h in order to avoid inconsistent observed information within the window in this study.
Satellite data are known to have a large positive impact on the analysis quality thanks to the high data volume both in time and space, but such high density violates the assumption of uncorrelated observation errors in the analysis algorithm and increases 10 the computation time for the analysis step excessively. Hence, a large volume of satellite retrievals are typically sampled on a regularly spaced grid through the horizontal thinning procedure. In GSI, satellite radiance data can be thinned such that retrievals are randomly sampled at a predefined spacing for each instrument type before getting ingested into the observation operator during the analysis (Rienecker and Coauthors, 2008). This thinning procedure, however, can pick up inconsistent data (near the cloud boundaries, for instance) and is reported as suboptimal (Ochotta et al., 2005;Reale et al., 2018). Therefore, we 15 decided to preprocess GOCI AOD retrievals with superobing where all the data points are averaged within a certain radius. In this study, we superobed GOCI retrievals over each grid box in domain 1 (at 27-km resolution). Figure 2 shows the sample horizontal distribution of GOCI AOD retrievals valid at 06 UTC May 1 2016 before (a) and after (b) preprocessing them, comparing with those thinned over 60 km (c) and 27 km meshes (d) during the GSI analysis, respectively. Some high AOD values in the original dataset (as shown in a), especially on cloud edges, cannot be fully resolved by our 27-km model grids. 20 By averaging all data points over each grid box at 27-km resolution, the superobed data in b) have a better quality control throughout the domain reducing the data volume effectively. A total number of observations marked in the upper right corner of each panel indicates that thinning over the 60 km mesh in c) reduces the number of assimilated observations to 2.5% of that in the original level II data while superobing and thinning over 27-km mesh utilize 8-10 % of the original data representing the whole data coverage fairly well. 25 It might be noteworthy to make two more points related to data processing here. First, superobing was applied as part of preprocessing before the GSI analysis gets started while the thinning was conducted during the analysis step so that the preprocessing could speed up the GSI analysis up to 25 times (by injecting less than 10 % of the original data and turning off the thinning process). This can facilitate the use of satellite retrievals in the operational air quality forecasting. Next, the thinning algorithm in GSI V3.5 resulted in erroneous values in some places, as indicated by the maximum values in c) and d). 30 For the month of May 2016, multiple cases with such extreme fake values were found after the thinning process. This bug may need to be fixed in the GSI or avoided by bounding the values exceeding the original data.
To examine the effect of data processing on the performance of the analysis and the background during the cycles, we compare two cycling experiments -one with the assimilation of the original level II data thinned over 27-km mesh (named "GOCI_orig" in gray) and the other with the assimilation of GOCI retrievals preprocessed over 27-km grids in domain 1 (called 35 "GOCI" in black) -in Fig. 3. As GOCI data are reported from 00 to 08 UTC, only 00 and 06 UTC cycles are shown here in consecutive cycle numbers. The time series of (o-a)'s and (o-b)'s in each experiment show that the preprocessed data slightly fit better to the observations than the thinned data, assimilating more retrievals throughout the period. Because the differences between the two experiments are not significant, for the computational efficiency, we decided to preprocess all the GOCI retrievals and assimilate them turning off the thinning process in GSI for the rest of the experiments shown in this study. 5 Choi et al. (2018) described their improved retrieval algorithm (GOCI YAER V2) with updated cloud-masking and surface reflectance calculations, making a long-term validation against other ground-and satellite-based measurements. In their study, depending on the verifying objects -either ground-based Aerosol Robotic Network (AERONET) or satellite-based retrievals -, they specified uncertainties of GOCI AOD retrievals over land and ocean using two different linear regression formulae. We assign 1 following their error specification with respect to AERONET and 2 based on their expected error against retrieved 10 satellite AOD in GOCI YAER V2.
where τ A stands for GOCI AOD values. In an effort to account for representativeness error, we also tried with 2 increased by 20% everywhere as the third error formula (e.g. 3 = 1.2 × 2 ) and compared all three types of errors in Fig. 4. When these different observation errors were applied to GOCI retrievals in the assimilation, the smallest error ( 2 ) produced slightly better fits to observations specially for the high values (AOD > 2) during the cycles, as expected, but not in a statistically 20 meaningful way (not shown). In fact, it is not straightforward to estimate the representativeness error which is subject to the model resolution (both in horizontal and vertical) and data processing in use. Therefore, in many cases, observation error is specified based on the resulting forecast performance (Ha and Snyder, 2014). But because our forecast skills were not very sensitive to three different error formulae tried here, for the rest of the experiments, 2 is used as observation error for GOCI retrievals. 25 The goal of this study is to examine the relative impact of the GOCI assimilation on the prediction of surface PM 2.5 and ultimately to improve the forecasts for pollution events. Although it is rather easy to render the analysis close to GOCI observations by reducing the observation error, it is not guaranteed that the analysis in a good agreement with AOD retrievals would actually lead to better forecasts in surface PM 2.5 . This is partly because AOD, a column integrated quantity, is not directly associated with surface PM 2.5 and partly because large uncertainties in the forecast model and the emission forcing can dominate forecasts can be largely affected by the quality of the forecast model and the emission data in use, the effectiveness of the AOD assimilation is based on the relationship between the column-integrated AOD and PM 2.5 on the ground. Therefore, it might be worth checking the correlation between GOCI AOD retrievals and surface PM 2.5 observations for the cycling period. Figure 5 depicts a scatter diagram of GOCI AOD retrievals at 550 nm and surface PM 2.5 observations that are co-located in each grid box in domain 1 for the month of May 2016. As shown with the linear regression coefficient of 0.33, the two observation types have low correlations during this period, which is consistent with previous studies (Saide et al., 2014;Pang et al., 2018). Such an indirect relationship between the two observations makes the analysis challenging because it can induce a large error in the 5 observation operator and heavily depend on the model's ability of deriving PM 2.5 from AOD based on the vertical structure of aerosol variables and the conversion from aerosol mass to optical properties.

Results
With a careful design of model configuration and observation processing, the overall impact of assimilating all the available observations ("DA") is illustrated, compared to the baseline experiment without data assimilation ("NODA") in Figure 6. Here, observations marked as black dots show that the air quality gets distinctively aggravated for the last 7 days, related to longrange transport of air pollutants. With data assimilation ("DA"), the analyses at 00Z and the following forecasts (red) make a better agreement with corresponding observations than those without assimilation (gray), especially from day 15 (e.g. after a full spin-up for two weeks). In particular, on May 25-27, forecast error grows quickly even from the good analysis at 00Z, 15 possibly associated with large uncertainties in lateral boundary conditions and the forecast model in use. However, averaged over the entire period, the mean absolute error (mae) indicates that the performance of 0-23-h forecasts at 9-km resolution gets improved by ∼30% through data assimilation.

Observation impact during the cycles
Given that the aerosol assimilation has a positive impact on air quality forecasting, it might be worth isolating the contribution of 20 each observation type to the improvement of the analysis and the following forecasts. We first assimilate individual observation types separately, naming the experiment following each observation type, then we assimilate them all together (called "ALL").  observations (green) results in the smallest PM 2.5 while the GOCI assimilation (blue) produces the largest PM 2.5 throughout the atmosphere in both domains. When the analysis (solid line) is compared to background (dashed), it is revealed that PM 2.5 is predominantly increased over domain 1 with the assimilation of GOCI retrievals. Overall, the aerosol assimilation affects the entire profile of PM 2.5 with the largest impact at the surface. 5 To understand the observation impact in the horizontal distribution, Fig. 9 shows the analysis increments ( Note that we employ the 2010 inventory for our emission data, which does not reflect the emission control started from 2013 15 in China (Zheng et al., 2018). Given that air pollutants in the emission data constitute the majority of the precursors of PM 2.5 pollution, surface PM 2.5 could strongly depend on emissions which might have led to the overestimation in the background (e.g. first-guess). Therefore, the assimilation of surface PM 2.5 tends to counteract the overestimation driven by the emission data over China. On the other hand, over South Korea, the emission data does not seem to be overestimated and the assimilation of surface PM 2.5 leads to increasing surface PM 2.5 most effectively during the cycles. 20 Different from surface particulate matter, AOD in the background is contingent upon the optical properties described in the observation operator (e.g. CRTM) and the vertical structure of aerosols simulated in the column. The influence of GOCI assimilation may indicate the model deficiencies in the two aspects because the model states are pulled toward the observed information during the analysis step, as depicted in the analysis increment. 25 Since the real effect of data assimilation is manifested in the subsequent forecasts, we now examine forecast improvements when initialized by our own analyses. A good analysis is expected to slow down the forecast error growth, leading to better forecasts. In this subsection, forecast errors at the lowest model level are compared between experiments for 24 h with respect to surface observations from various sites in South Korea. As we focus on 9-km simulations over the Korean peninsula, it is hard to anticipate the direct effect of the assimilation beyond 24 h, specially in such a small domain where the weather 30 systems dramatically change from day to day. As shown in Fig. 10, the forecast error is the largest in the baseline experiment ("NODA"), followed by the assimilation of MODIS retrievals alone ("MODIS") in terms of mean absolute error (mae). Note that the analysis in the "PM" experiment is verified against the same surface PM 2.5 observations used in the assimilation.

Observation impact on 24-h forecasts
Therefore, the analysis error is smaller than those in other experiments, but the forecast error grows quickly over the next 24 h. The assimilation of surface PM 2.5 alone generally underestimates the prediction of surface PM 2.5 with the fastest growth of forecast error. On the other hand, the assimilation of AOD retrievals (either GOCI or MODIS) alone does not improve the surface analysis and mostly overestimates surface PM 2.5 for 24 h. This might be ascribed to an imperfection of the forward operator of AOD and the model deficiency in the representation of three-dimensional aerosol species that comprised AOD and PM 2.5 . When assimilated with surface PM 2.5 observations (in "ALL"), however, AOD retrievals effectively reduce the forecast 5 error and suppress the error growth throughout 24 h forecasts.
Recently, heavy pollution events have often taken place over Korea and considerable attention was drawn to the accuracy of the operational air quality forecasting in the country, particularly in surface PM 2.5 . As it has a great social impact to accurately predict exceedance and non-exceedance events in categorical predictions, it is necessary to evaluate the forecast performance  Table 2 and 3. Figure 11 summarizes the evaluation of 24 h forecasts based on the formulae described below.
In all events, the overall accuracy of 0-24-h forecasts is the highest in "ALL" (∼70 %) and the lowest in "NODA" (∼60 %), making about 10 % improvement by assimilation during this KORUS-AQ period. It is noted that the forecast error illustrated in Fig. 10 is dominated by days with clear sky or moderate air quality conditions (about two thirds of the month-long period, as shown in Fig. 6) while the forecast accuracy summarized in Fig. 11 is determined by equally weighting different categorical forecasts with different sample sizes. This implies that the categorical forecast evaluation tends to emphasize the forecast accuracy for pollution events (which has a smaller sample size). As such, Fig. 11a highlights the effect of data assimilation on improving air pollution forecasts. Differences between experiments are much larger in high pollution events (Fig. 11b) and 5 the detection rate (Fig. 11f) where AOD retrievals (both GOCI and MODIS) make the biggest positive contributions. While "NODA" produces poor forecasts consistently in most metrics shown in Fig. 11, the forecast accuracy in "PM" (green) drops very quickly for the first 12 h for all events (a) and pollution events (b), indicating that the assimilation of surface PM 2.5 alone may not be enough to maintain the forecast skills beyond the cycling frequency (e.g. 6 h). It also increasingly underestimates surface PM 2.5 with time, especially after 20 h, and produces more false alarms even though its overestimation rate is the lowest 10 among all experiments. Overall, the AOD assimilation tends to overestimate the prediction of surface PM 2.5 with a relatively large false alarm, but clearly helps enhance the forecast accuracy up to 24 h when assimilated with surface PM 2.5 observations.
Even with low correlations with surface PM 2.5 (as illustrated in Fig. 5), AOD retrievals keep the surface air pollution forecasts from drifting away from the true state, compensating for model deficiencies. This demonstrates that it could be substantially beneficial to monitor a wide range of the surrounding area using the geostationary satellite in the enhancement of air quality 15 forecasts.
In order to verify our forecasts against independent observations, we processed total AOD at 500 nm from the Aerosol Robotic Network (AERONET; https://aeronet.gsfc.nasa.gov/) sites and surface PM 2.5 concentrations measured at three more stations operated by NIER during the KORUS-AQ field campaign (Fig. 12). The level 2 quality level data are used for AERONET AOD observations as cloud-free and quality-assured data. Figure 13 illustrates the time series of hourly AOD produce AOD slightly lower than the observed one as a whole. The rms error and mean bias at total of 16 AERONET sites are summarized in Table 4, indicating that GOCI has the smallest forecast error in AOD nation-wide.
Surface PM 2.5 measurements from three NIER sites were downloaded from https://www-air.larc.nasa.gov/cgi-bin/ArcView/korusaq as raw data with no quality control. They are provided as hourly averages starting from May 9 and compared to our hourly model output for May 9 -31 (Fig. 14). These observations look somewhat noisy, but our forecasts broadly follow them through-30 out the period. Similar to the AOD verification shown in Fig. 13, forecasts from GOCI produce the smallest forecast mean bias among all the experiments in Olympic Park (in a) and Daejeon (in b), predicting the high surface PM 2.5 concentrations between May 24 and 26. But GOCI was worse than other experiments in Ulsan (in c), overestimating surface PM 2.5 , especially during the high pollution days. In the assimilation system, raw data are not considered to be reliable, but this verification is included for the completeness because there was no other instrument that reported surface PM 2.5 concentrations or all the precursors of PM 2.5 concentrations to validate PM 2.5 forecasts on the ground level.

A heavy pollution case
The effect of assimilating different observations is most distinguishable in high pollution events, as demonstrated in Fig. 11.
During the KORUS-AQ period, there were about 5 heavy pollution cases (when surface PM 2.5 > 50 µg/m 3 , as defined in Table   5 2) over South Korea. The longest and the most severe pollution events have occurred on May 25-26, 2016. Fig. 15 illustrates how air pollutants have been transported from China, associated with the strong synoptic weather systems in the region for a few days. As the analysis of our best experiment "ALL" showed, the With only 145 stations reporting high concentrations (e.g. surface PM 2.5 > 50 µg/m 3 ), the observed distribution still shows a sharp gradient between the stations, specially in SMA. Consistent with all the previous figures, the assimilation of surface 25 PM 2.5 alone (in "PM") underpredicts surface PM 2.5 (even more than "NODA") while GOCI overpredicts surface PM 2.5 most among all observation types almost everywhere except for SMA. MODIS retrievals slightly increase the concentrations from NODA (by ∼10 µg/m 3 ), with the spatial distribution almost the same as that of NODA. In the concurrent assimilation of all the observations (in "ALL"), a moderate overestimation is presented everywhere, but higher levels of pollution in SMA are not simulated either. To resolve such a large variability between urban and rural area and to increase the sharpness of the 30 forecast accuracy, the use of higher grid resolutions (such as 3 km), more accurate emission data and more sophisticated aerosol chemistry mechanisms might be indispensable.
GOCI AOD retrievals provide reliable and consistent aerosol information, monitoring air pollutants over the Korean peninsula at high resolution every day. One of the best ways of utilizing such invaluable observations is to inject them into the forecast system through data assimilation and better initialize numerical forecasts. For the successful assimilation of real observations, specially retrievals from satellites, extra attention should be paid to processing the data properly, based on the characteristics. 5 The spatial and temporal representativeness of GOCI retrievals was carefully examined and the corresponding data processing was conducted before assimilation in this study. We averaged all the pixels over each grid box at 27-km resolution (e.g. superobing) instead of thinning them randomly, for instance.
It is worth noting several challenges in the assimilation of AOD retrievals for improving the prediction of surface PM 2.5 concentrations: i) AOD is not directly associated with PM 2.5 concentration on the ground. Although the two datasets can be 10 highly correlated in specific conditions such as cloud-free, low boundary layer heights and low relative humidity, the overall correlation is low (∼0.3) in the present study and it is hard to expect the direct impact on each other. ii) an observation operator for AOD has errors due to the simplification and the limited aerosol specifications in the community radiative transfer model (CRTM). iii) significant model error, which is presumably one of the most critical issues. In the 3DVAR assimilation, in particular, the model estimates of AOD, a column-integrated quantity, are strongly constrained by the model error structure of 15 each aerosol species both horizontally and vertically.
Even with these challenges, however, satellite-based AOD, especially from geostationary satellites like GOCI, can be extremely useful for improving the prediction of air pollution on a daily basis. In the situation where the air quality can be largely affected by long-range transport of air pollutants, such consistent information on the wide upstream area is essential but hard to be obtained otherwise. 20 Using the GSI 3DVAR system coupled with the WRF-Chem forecast model, we assimilated the satellite AOD retrievals as well as surface PM 2.5 observations for the month of May 2016 during the KORUS-AQ period. Compared to the baseline experiment ("NODA"), the simultaneous assimilation of various observations consistently improved the prediction of ground PM 2.5 for 24-h forecasts, reducing systematic error and false alarms. The assimilation of ground PM 2.5 alone improved the analysis during the cycles, reducing the analysis error to almost half the size compared to the experiment without assimilation. 25 However, the forecast error grew very quickly over the next 12 hours, underestimating PM 2.5 at the surface, especially in the heavy pollution events where the forecast accuracy dropped from over 70% to ∼30% only in four hours. Meanwhile, the GOCI AOD retrievals alone tended to overestimate surface PM 2.5 but significantly contributed to improving air quality forecasts up to 24 h when assimilated with surface PM 2.5 observations. The effect of data assimilation is most distinguishable and remarkable for high pollution events. During the month of May 2016, most heavy pollution events were associated with long-30 range transport from China. In such cases, it was particularly beneficial to monitor the wide upstream region using geostationary instruments such as GOCI.
To assess the effect of data assimilation with respect to independent observations, 0-23 h forecasts from different experiments are verified against AOD from AERONET sites and ground PM 2.5 measurements from the sites operated during the KORUS-AQ field campaign. In this verification, the assimilation of GOCI retrievals is the most effective in improving the forecast performance at most sites, especially for high pollution events.
Even with the successful data assimilation, there are several limitations in this study. First, the simple GOCART aerosol scheme is well known for the underestimation of air pollutants due to the lack of the aerosol size distribution and the secondary organic aerosol (SOA) formation. We had to use the scheme for the assimilation of AOD retrievals since the observation 5 operator for AOD was only built for the GOCART scheme in the GSI system. Next, as there is no cross-covariance between aerosol and meteorological variables considered in the background error covariance estimates, the influence of aerosols on meteorological variables was not fully simulated in this study. Without the assimilation of meteorological observations, it was not possible to make an optimal estimate that is fully coupled between chemistry and meteorology although the meteorological information was provided through the first guess and lateral boundary conditions. Finally, the emission inventory used in this 10 study was based on the annual mean of 2010, which did not reflect the actual emissions for the year of 2016, especially over China. The large bias and uncertainties in the emission data was particularly detrimental to the assimilation of surface PM 2.5 alone.
To overcome the systematic underestimation of the GOCART aerosol scheme in the assimilation context, there is an ongoing effort for a new development of an interface for more sophisticated aerosol schemes such as MOSAIC and/or the Modal for 15 Aerosol Dynamics in Europe and the Volatility Basis Set (MADE/VBS; Ackermann et al. (1998), Ahmadov et al. (2012) in the WRFDA system (Barker et al., 2012). This would be advantageous for more realistic forecast behavior in high resolution applications.
The positive impact of data assimilation is generally limited to 24-h forecast because of three major reasons: First, most air pollutants have a short lifetime due to dry and wet deposition and transformations through interactions with solar radiation and 20 clouds. Secondly, pollutant transport and transformations in chemical transport models are strongly driven by external forcing, such as emissions, boundary conditions, and meteorological fields. Lastly, there are large uncertainties in aerosol and gas-phase chemistry parameterized in chemical transport models. Therefore, to extend the period of forecast improvements, emission data needs to be improved and large uncertainties in chemical and meteorological boundary conditions should be minimized.
It has been shown that the estimation of emission inventories as part of the DA procedure can help extend the impact of 25 data assimilation in longer forecasts (Elbern et al., 2007;Kumar et al., 2019). Also, more sophisticated aerosol and chemical mechanisms might be able to improve air quality forecasting by reducing model deficiencies (Chen et al., 2019). A simultaneous assimilation of meteorological observations and measurements of individual chemical species as well as particulate matter would be certainly beneficial in both NWP and air quality forecasting. To better account for high nonlinearities and uncertainties of aerosol forecasting on small scales, more advanced analysis techniques such as ensemble or hybrid data assimilation would 30 be more desirable.
Code and data availability. The WRF-Chem v3.9.1 and the GSI v3.5 codes used in this paper will be available upon request. Input observations and boundary conditions for a sample test period can be also provided upon request. Miyazaki, K., Sekiya, T., Fu, D., Bowman, K. W., Kulawik, S. S., Sudo, K., Walker, T., Kanaya, Y., Takigawa, M., Ogochi, K., Eskes, H., Boersma, K. F., Thompson, A. M., Gaubert, B., Barre, J., and Emmons, L. Ochotta, T., Gebhardt, C., Saupe, D., and Wergen, W.: Adaptive thinning of atmospheric observations in data assimilation with vector quan-          Korea. An average of 0-24 h forecast errors is shown next to each experiment name. The mean absolute error (mae) over the 24-h forecasts is also shown in the lower panel.