Improving air quality forecasting with the assimilation of GOCI aerosol optical depth (AOD) retrievals during the KORUS-AQ period

The Korean Geostationary Ocean Color Imager (GOCI) satellite has monitored the East Asian region in high temporal (e.g., hourly) and spatial resolution (e.g., 6 km) every day for the last decade, providing unprecedented information on air pollutants over the upstream region of the Korean Peninsula. In this study, the GOCI aerosol optical depth (AOD), retrieved at the 550 nm wavelength, is assimilated to enhance the quality of the aerosol analysis, thereby making systematic improvements to air quality forecasting over South Korea. For successful data assimilation, GOCI retrievals are carefully investigated and processed based on data characteristics such as temporal and spatial distribution. The preprocessed data are then assimilated in the threedimensional variational data assimilation (3D-Var) technique for the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem). For the Korea–United States Air Quality (KORUS-AQ) period (May 2016), the impact of GOCI AOD on the accuracy of surface PM2.5 prediction is examined by comparing with effects of other observations including Moderate Resolution Imaging Spectroradiometer (MODIS) sensors and surface PM2.5 observations. Consistent with previous studies, the assimilation of surface PM2.5 measurements alone still underestimates surface PM2.5 concentrations in the following forecasts, and the forecast improvements last only for about 6 h. When GOCI AOD retrievals are assimilated with surface PM2.5 observations, however, the negative bias is diminished and forecast skills are improved up to 24 h, with the most significant contributions to the prediction of heavy pollution events over South Korea.


Introduction
With the recent increase in chemical and aerosol observations in the troposphere, chemical data assimilation is expected to play an essential role in improving air quality forecasting, particularly in the real-time environment. Although various data assimilation (or analysis) techniques have been developed for many decades, they were predominantly applied in the context of numerical weather prediction (NWP) (Kalnay, 2003) and have not been extensively exploited for the prediction of air pollution.
Uncertainties in aerosol chemistry, as well as its multiscale interactions with daily changing weather conditions, make it challenging to predict air pollutants accurately (Grell and Baklanov, 2011;Baklanov et al., 2014;Kong et al., 2015;Baklanov et al., 2017). Surface concentrations are directly affected by the transport and dispersion of chemical species through advection, convection, vertical diffusion, and surface fluxes. In general, they are strongly driven by external forcing such as anthropogenic and natural emissions. The latter heavily relies on temperature, humidity, and wind speed in the boundary layer as well as solar radiation and soil moisture. Aerosols in turn affect local meteorology via aerosolmeteorology interaction (by directly scattering and absorbing solar radiation and also as sources of cloud condensation nuclei) at short timescales. For the operational air quality forecasting in South Korea, the Korean National Institute of Environmental Research (NIER) performs chemical simulations at 3 km resolution at present (Chang et al., 2016). For such a high-resolution application and for situations with very high aerosol concentrations, these fast-varying complex mecha-nisms might be better represented through online coupling between chemical and meteorological components. The online coupled forecasting system is particularly suitable for air quality forecasting associated with strong synoptic forcing or long-range transport of air pollutants. Also, finer-scale features may require more frequent coupling of the atmospheric system and only the online coupled system can provide the framework for such applications.
With large uncertainties in chemical modeling and emission data, particularly associated with meteorological components, one of the most effective ways of utilizing aerosol observations is to assimilate them into the forecast model and improve the initialization of aerosol simulations. However, due to the scarcity of three-dimensional chemical observations and the complexity of how to project the observed information (usually in the optical properties) onto the parameterized schemes in the chemical model, aerosol or chemical data assimilation in coupled chemistry and meteorology models has been limited to date (Bocquet et al., 2015). Improving the quality of chemical assimilation will not only improve the prediction of air pollution, but also advance numerical weather prediction (NWP) for precipitation, visibility, and high-impact weather.
An international cooperative air quality field study conducted in Korea, named Korea-United States Air Quality (KORUS-AQ), was a field campaign jointly developed by air quality researchers in the United States and South Korea to improve our understanding of major contributors to poor air quality in Korea for 1 May-12 June 2016. During this early summertime when it is mostly warm and humid, numerous measurements of pollutants were made at multiple platforms in an effort to identify local and transboundary pollution sources contributing to the formation of ozone and fine particulate matter (PM 2.5 ). Although local emissions played a nontrivial role throughout the period, the highest pollution event occurred through the long-range transport from the upwind area on 25-27 May 2016 (Miyazaki et al., 2019). As the transboundary transport cannot be fully measured by surface stations over land, the proper use of satellite data that have a wide spatial coverage would have great potential to improve air quality forecasting for such events.
The Korean Geostationary Ocean Color Imager (GOCI) onboard the Communication, Ocean, and Meteorology Satellite (COMS) provides hourly aerosol optical depth (AOD) retrievals at multiple spectral bands monitoring the East Asian region centered on the Korean Peninsula during daytime (Kim et al., 2017). Since its launch in 2010, the GOCI satellite has been producing AOD retrievals at high spatial and temporal resolution. It has long been demonstrated that the GOCI data are of high accuracy, comparable to the low-orbiting Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS) products (Lee et al., 2010;Wang et al., 2013;Xiao et al., 2016;Choi et al., 2018). Liu et al. (2011) first implemented the capability of assimilating AOD retrieved from MODIS satellite sensors (Remer et al., 2005) into the National Centers for Environmental Prediction (NCEP) Gridpoint Statistical Interpolation (GSI; Wu et al., 2002;Kleist et al., 2009) system. Since they confirmed that the AOD assimilation improved aerosol forecasts in a dust storm event that occurred in East Asia, the GSI threedimensional variational data assimilation (3D-Var) system has been widely used for air quality forecasting and extended for additional aerosol observations such as surface particulate matter -all particles with an aerodynamic diameter less than 2.5 µm (PM 2.5 ) or up to 10 µm (PM 10 ) (Schwartz et al., 2012 andJiang et al., 2013, respectively).
GOCI AOD retrievals have been assimilated in several studies to assess their impact on short-term air pollution forecasts in the online coupled forecasting system. Saide et al. (2014) performed the Observing System Experiment (OSE) using the eight-bin MOdel for Simulating Aerosol Interactions and Chemistry aerosol model (MOSAIC) (Zaveri et al., 2008) in the WRF-Chem/GSI 3D-Var system. Pang et al. (2018) assimilated AOD retrievals from GOCI and the Visible Infrared Imaging Radiometer Suite (VIIRS; Jackson et al., 2013) to predict surface PM 2.5 concentrations over eastern China and found that the assimilation of AOD retrievals improved the forecast accuracy but still underestimated heavy pollution events.
This work further extends the assimilation capabilities in the GSI 3D-Var system to best use GOCI AOD retrievals during the KORUS-AQ period with careful investigation of data characteristics. Aiming to improve the operational air quality forecasting in Korea, which is currently lacking a stateof-the-art analysis system, we are discussing how to effectively assimilate satellite-derived aerosol data and examine their impact on surface PM 2.5 predictions compared to other observations. In the categorical forecasts for different air pollution events, we focus on severe pollution cases describing how air pollutants evolve, coupled with the synoptic weather systems.
A brief overview of the analysis and forecasting systems used in this study is presented in Sect. 2, followed by cycling experiments with details on observation processing for GOCI retrievals described in Sect. 3. Results are summarized in Sect. 4, discussing the observation impact during the cycles and extended forecasts separately. Forecast performances in heavy pollution events are briefly described as well. Finally, conclusions are made in Sect. 5, along with a discussion on the limitations of this study and suggestions for future research.
S. Ha et al.: Improving air quality forecasting 6017 2 The WRF-Chem forecast model and the GSI 3D-Var analysis system

WRF-Chem forecast model
The model used in this study is an online coupled meteorology and chemistry model, WRF-Chem version 3.9.1 (Grell et al., 2005). The physics options used in WRF-Chem include the rapid radiative transfer model for general circulation models (RRTMG) for longwave radiation (Iacono et al., 2008), new Goddard shortwave radiation (Chou and Suarez, 1994), the Yonsei University (YSU) planetary boundary layer (PBL) scheme (Hong et al., 2006), the Lin microphysics scheme (Lin et al., 1983), and a new Grell 3D cumulus parameterization scheme. These options are chosen based on the operational configuration currently used in the Korean National Institute of Environmental Research (NIER) for their daily air quality forecasting in South Korea. The Goddard Chemistry Aerosol Radiation and Transport (GO-CART; Chin et al., 2002), developed by the National Aeronautics and Space Administration (NASA), is used as an aerosol scheme. Aerosol direct effects are allowed through the interaction between GOCART and the Goddard shortwave radiation scheme (Fast et al., 2006;Barnard et al., 2010). The Model for Ozone and Related Chemical Tracers (MOZART) gas-phase chemistry (Emmons et al., 2010) is generated with the kinetic preprocessor (KPP) (Damian et al., 2002;Sandu and Sander, 2006) and is used together with the simple GOCART aerosol scheme, known as the MOZCART mechanism (Pfister et al., 2011). The MOZART chemistry in WRF-Chem is designed to run with the Madronich FTUV scheme for photolysis processes (Tie et al., 2003), reading in climatological O 3 and O 2 overhead columns. It also utilizes the standard WRF-Chem implementation of the Wesley dry deposition scheme (based on Wesely, 1989) allowing for seasonal changes in the dry deposition. The resolved-scale wet scavenging is inactivated, but convective wet scavenging is applied in the Grell cumulus parameterization. Also, GOCART sea salt emissions and dust emissions with AFWA modifications (LeGrand et al., 2019) are included in this study.
Anthropogenic emissions are estimated offline based on the global EDGAR-Hemispheric Transport of Air Pollutants (HTAP) emission inventory (http://edgar.jrc.ec.europa. eu/htap_v2/, last access: 1 May 2018) that consists of 0.1 • × 0.1 • grid maps of CH 4 , CO, SO 2 , NO x , NMVOC, NH 3 , PM 10 , PM 2.5 , BC, and OC from the year 2010. The emission data mapped to our model grids have a single level with no vertical variations and are generated from the annual mean with no diurnal variations (e.g., time-invariant). In terms of data range, the maximum (average) value of PM 2.5 in the data, for example, is 3.56 (0.032) and 2.84 (0.026) µg m −2 s −1 in domain 1 and 2, respectively. Biogenic emissions are built up using the Model of Emission of Gases and Aerosol from Nature (MEGAN; version 2) (Guenther et al., 2006), and for biomass burning emissions, daily fire estimates provided by the fire inventory from NCAR (FINN; Wiedinmyer et al., 2011) are used with tracer transport allowed. All the WRF files including biomass and biomass burning emissions are processed using the MODIS land use datasets (Friedl et al., 2002).

2.2
The GSI 3D-Var analysis system

Cost function
To assimilate AOD retrievals and surface PM 2.5 observations in the Weather Research and Forecasting-Chemistry (WRF-Chem) model, the NCEP GSI 3D-Var version 3.5 system is used. As Liu et al. (2011) and Schwartz et al. (2012) described the details of the system for aerosol data assimilation, only a brief explanation follows. Incorporating observations into the three-dimensional model state space, a 3D-Var system produces the best estimate to the true state by minimizing the differences between observations and background forecasts (e.g., innovations; represented by o-b), which is called the "analysis". The analysis is then used to initialize aerosol variables in the forecast model (e.g., WRF-Chem) so that the quality of aerosol forecasts can be largely dependent on the quality of the aerosol analysis produced in the 3D-Var system. Given the model state vector (x), the penalty function (or cost function) J (x) is defined as where x b stands for the background state vector (e.g., forecasts from the previous cycle), and y is an observation vector. Here, the terms of background (b) and forecast (f) are used interchangeably throughout the paper; H is an observation operator that projects the model states onto the observation space linearly or nonlinearly to compute the model correspondent to each observation. Background and observation error covariance matrices B and R, respectively, indicate how reliable the background forecast (B in the first term) and the observed information (R in the second term) might be to determine how to properly weight the two disparate resources. By minimizing the cost function (J (x)) with respect to the model state vector x at the analysis time, the variational analysis algorithm produces the analysis that fits best to all the observations assimilated within the assimilation time window.
To characterize the forecast error magnitude and its spatial structure, background error covariance B is estimated for each aerosol species using the National Meteorological Center (NMC) method (Parrish and Derber, 1992) based on the differences between 48 and 24 h WRF-Chem forecasts valid at the same time for 30 samples ending at 00:00 UTC in May 2016. The current GSI/3D-Var system does not allow cross-correlation between aerosol species or between aerosol and meteorological variables. As this is a 3D-Var analysis with no time information, B only characterizes the spatial correlations in each analysis variable, which determines how to propagate the observed information across the model grids.

Observation operators
Following Liu et al. (2011) and Schwartz et al. (2012), this study also takes the speciated approach whereby the analysis vectors are comprised of 15 WRF-Chem/GOCART aerosol variables -sulfate, organic carbon and black carbon, mineral dust in five particle size bins (with effective radii of 0.5, 1.4, 2.4, 4.5, and 8.0 µm), sea salt in four particle size bins (with effective radii of 0.3, 1.0, 3.25, and 7.5 µm for dry air), and unspeciated aerosol contributions to PM 2.5 -as opposed to using the total aerosol mass of PM 2.5 as the analysis variable in Pagowski et al. (2010).
The observation operator H (x) for surface PM 2.5 requires 10 GOCART aerosol variables as where P represents unspeciated aerosol contributions to PM 2.5 ; U denotes sulfate; O 1 and O 2 (B 1 and B 2 ) are hydrophobic and hydrophilic organic (black) carbon, respectively; and D 1 and D 2 (S 1 and S 2 ) are dust (sea salt) aerosols in the smallest and second-smallest size bins. This formula originated from the WRF-Chem diagnostics of PM 2.5 for the GOCART aerosol scheme. PM observations are mass concentrations (µg m −3 ), while all the model variables listed within the bracket on the right-hand side are aerosol mixing ratios (µg kg −1 ); dry density ρ d is thus required for the unit conversion in Eq. (2). In this study, we assimilate AOD retrievals at 550 nm from both MODIS and GOCI sensors using the same observation operator based on the community radiative transfer model (CRTM; Han et al., 2006;Liu and Weng, 2006) as described in Liu et al. (2011). Although the GOCART aerosol scheme is well known to underestimate surface PM concentrations due to the lack of secondary organic aerosol (SOA) formation, nitrate, and ammonium Volkamer et al., 2006;McKeen et al., 2009;Pang et al., 2018), it is widely used in analysis studies because it is the only scheme publicly available for assimilating AOD retrievals from satellite data in the GSI system. Aerosol optical depth (AOD) measures the amount of light extinction by aerosol scattering and absorption in the atmospheric column, which depends on the refractive indices and the size distribution of aerosol. In GSI, the CRTM computes the effective radii and the refractive indices of the 14 speciated WRF-Chem/GOCART aerosol species, assuming spherical aerosol particles and lognormal size distributions. Applying single-scattering properties of spheres by Mie theory, the mass extinction coefficient is computed as a function of the effective radius for each aerosol species at a certain wavelength (here, 550 nm) at each model level. The mass extinction coefficient (m 2 g −1 ) for each aerosol species multiplied by the aerosol layer mass (g m −2 ) produces dimensionless AOD for the species at that level. To represent the entire atmospheric column, modelsimulated AOD is then computed as the column integration of AOD for all aerosol species. Using the CRTM as a forward operator, AOD retrievals are assimilated separately or simultaneously with PM 2.5 observations from the surface network over East Asia, as described in the following section.

Cycling experiments
During the month of May 2016, observations are assimilated in the GSI 3D-Var system to produce the analysis that is used as an initial condition for the following WRF-Chem simulations. WRF-Chem forecasts valid at the next analysis time are then used as a first guess (or background) for the next GSI analysis. In this study, the whole process is repeated every 6 h (called "cycled") for the month-long period. Here we describe the analysis and the forecast systems used in the cycling.

Model configurations and cycling
All the analyses and the following forecasts are conducted over two one-way nested domains centered on the Korean Peninsula, as shown in Fig. 1. Domain 1 uses 175 × 127 horizontal grids at 27 km resolution, and domain 2 has 97 × 136 grids at 9 km resolution. Both domains have a total of 31 vertical levels up to 50 hPa. The initial and boundary meteorological conditions for domain 1 are provided by the UK Met Office Unified Model (UM-MET) global forecasts operated by the Korean Meteorological Administration (KMA) with a horizontal resolution of ∼ 25 km (0.3515 • × 0.234375 • ) at 26 isobaric levels every 6 h. This configuration was chosen due to the limitation of computational resources, but the use of higher resolutions both in time and space might be desirable to further improve forecast skills in the future. The chemical initial and boundary conditions for domain 1 are taken from the output of the global Model for Ozone and Related Chemical Tracers (MOZART-4) (Emmons et al., 2010) converted to WRF-Chem species by using the "mozbc" utility (downloaded from https://www2. acom.ucar.edu/wrf-chem/wrf-chem-tools-community/, last access: 28 November 2018). Meteorological and chemical fields in domain 1 are initialized from the global forecasts every cycle, while initial and boundary conditions for domain 2 are nested down from domain 1 in a one-way nesting. Aerosol and chemical initial conditions are then overwritten by WRF-Chem forecasts from the previous cycle in each domain. The GSI analysis is consecutively performed in the two domains using the same observations within each domain to update the initial conditions. During the cycles, 24 h forecasts are initialized from the 00:00 UTC analysis every day.

Surface PM 2.5
Hourly surface PM concentrations are provided by the NIER, which collects real-time pollutant observations at 361 South Korean stations from AirKorea (http://www.airkorea.or.kr, last access: 11 September 2018) and those at ∼ 900 Chinese sites from the China National Environmental Monitoring Centre (CNEMC; http://www.cnemc.cn, last access: 31 October 2017). Figure 1 shows the entire surface observing network that was used to assimilate surface PM 2.5 . Observation sites are concentrated in the urban area where many sites are close enough to be overlapped with each other. The Seoul Metropolitan Area (SMA; centered around 37.5 • N, 127 • E), for example, has hourly reports from a total of 41 stations.
As part of data quality control (QC), surface PM 2.5 concentrations higher than 100 µg m −3 are not assimilated, and observations producing innovations (represented as o-b) that exceed 100 µg m −3 were also discarded during the analysis step. To accommodate most measurements in China during heavy pollution events, a much higher threshold of 500 µg m −3 was once applied as the maximum observed value in our test experiment for the same month-long cycles, but it did not lead to any meaningful changes in the forecast performance over South Korea (not shown). Presumably this is because such high values were observed only over China where air pollutants were already overestimated by emission data based on the 2010 inventory such that the forecast skills over Korea became insensitive to the assimilation of those additional surface observations in China. Therefore, we applied the original threshold of 100 µg m −3 to all our experiments presented here.
Observation error is composed of measurement error ( o ) and the representative error ( r ) caused by the discrete model grid spacing (e.g., pm 2.5 = o 2 + r 2 ). Following Elbern et al. (2007) and Schwartz et al. (2012), observation error for surface PM 2.5 increases with the observed value (x o ) as o = 1.5+0.0075×x o . The representative error is formulated as r = γ o x L , where γ is 0.5, x is grid spacing (here, 27 km for domain 1 and 9 km for domain 2), and the scaling factor L is defined as 3 km. Based on this formula, observation error ( pm 2.5 ) ranges from 2.0 to 3.2 µg m −3 in domain 2, assigning the error of 2.48 µg m −3 to the PM 2.5 observation of 50 µg m −3 , for example. In this 3D-Var analysis, observation errors are considered to be uncorrelated so that the observation error covariance matrix R becomes diagonal. During the 6 h cycling, all the surface observations within a ±1 h window at each analysis time were assimilated without further adjustment of observation error.

AOD retrievals and observation preprocessing
Total AOD retrievals at 550 nm from MODIS sensors onboard the Terra and Aqua satellites have been widely used in aerosol studies Reid, 2006, 2010;Lee et al., S. Ha et al.: Improving air quality forecasting 2011). But the polar-orbiting satellites produce a very limited dataset temporally (mostly around 06:00 UTC only) and spatially (with sparse coverage) over Korea during the KORUS-AQ period. The MODIS AOD level 2 products over both the land and ocean "dark" area are available at 10 km × 10 km resolution and thinned over 60 km resolution during the GSI analysis in this study. Following Remer et al. (2005), observation errors are specified as the retrieval errors: (0.03 + 0.05 × AOD) over ocean and (0.05 + 0.15 × AOD) over land. They do not include the representativeness error and are slightly smaller than those for GOCI AOD, as described below.
The GOCI satellite monitors the East Asian region centered on the Korean Peninsula (36 • N, 130 • E), covering about 2500 km × 2500 km. GOCI level II data have eight spectral bands from the visible to near-infrared range (412 to 865 nm) with hourly measurements during daytime from 09:00 (00:00 UTC) to 17:00 local time (08:00 UTC) at 6 km resolution. As summarized in Choi et al. (2018), a recently updated GOCI Yonsei aerosol retrieval (YAER) version 2 algorithm targets cloud-and snow-free pixels over land and cloud-and ice-free pixels over ocean in producing the level II data. By adopting the MODIS and VIIRS aerosol retrieval and cloud-masking algorithms, cloud pixels are filtered to avoid cloud contamination, and high-reflectance or highly heterogeneous reflectance pixels are also masked to further increase data accuracy and consistency during the retrieval process.
Unlike MODIS retrievals, GOCI AOD has not been extensively used in the data assimilation community. The GSI system takes most observation types in PrepBUFR format, which has already gone through some processing to be prepared for data assimilation, but the preprocessing algorithms are not publicly available. This means that when a new dataset is assimilated in GSI, users need to investigate the characteristics of the data (such as temporal and spatial distribution) and thereby make the data suitable for assimilation, which is of crucial importance for the analysis quality.
In terms of temporal distribution, most GOCI level II data are retrieved at 30 min past each hour in the hourly report. For example, the actual time for most of the data reported at 00:00 UTC is centralized around 00:30:00 UTC (hh:mm:ss). In the 3D-Var algorithm, there is no time dimension and all observations are considered to be available at the analysis time. To account for temporal distribution, different weights are often given to observations based on the relative distance between the actual report time and the analysis time during the analysis step. However, taking possible latency in data transfer and retrieval processing into consideration, it is not legitimate to assign weights to the retrievals based on their final report time without further information. Therefore, considering the high temporal and spatial variability of aerosols, the assimilation window is set to ±1 h in order to avoid inconsistent observed information within the window in this study.
Satellite data are known to have a large positive impact on the analysis quality thanks to the high data volume both in time and space, but such high density violates the assumption of uncorrelated observation errors in the analysis algorithm and increases the computation time for the analysis step excessively. Hence, a large volume of satellite retrievals are typically sampled on a regularly spaced grid through the horizontal thinning procedure. In GSI, satellite radiance data can be thinned such that retrievals are randomly sampled at a predefined spacing for each instrument type before being ingested into the observation operator during the analysis (Rienecker et al., 2008). This thinning procedure, however, can pick up inconsistent data (near the cloud boundaries, for instance) and is reported as suboptimal (Ochotta et al., 2005;Reale et al., 2018). Therefore, we decided to preprocess GOCI AOD retrievals with superobing whereby all the data points are averaged within a certain radius. In this study, we superobed GOCI retrievals over each grid box in domain 1 (at 27 km resolution). Figure 2 shows the sample horizontal distribution of GOCI AOD retrievals valid at 06:00 UTC on 1 May 2016 before (a) and after (b) preprocessing them, comparing with those thinned over 60 km (c) and 27 km meshes (d) during the GSI analysis, respectively. Some high AOD values in the original dataset (as shown in a), especially on cloud edges, cannot be fully resolved by our 27 km model grids. By averaging all data points over each grid box at 27 km resolution, the superobed data in (b) have a better quality control throughout the domain, reducing the data volume effectively. The total number of observations marked in the upper right corner of each panel indicates that thinning over the 60 km mesh in (c) reduces the number of assimilated observations to 2.5% of that in the original level II data, while superobing and thinning over the 27 km mesh utilize 8 %-10 % of the original data, representing the whole data coverage fairly well.
It might be noteworthy to make two more points related to data processing here. First, superobing was applied as part of preprocessing before the GSI analysis is started while, the thinning was conducted during the analysis step so that the preprocessing could speed up the GSI analysis up to 25 times (by injecting less than 10 % of the original data and turning off the thinning process). This can facilitate the use of satellite retrievals in operational air quality forecasting. Next, the thinning algorithm in GSI V3.5 resulted in erroneous values in some places, as indicated by the maximum values in (c) and (d). For the month of May 2016, multiple cases with such extreme fake values were found after the thinning process. This bug may need to be fixed in the GSI or avoided by bounding the values exceeding the original data.
To examine the effect of data processing on the performance of the analysis and the background during the cycles, we compare two cycling experiments -one with the assimilation of the original level II data thinned over the 27 km mesh (named GOCI_orig in gray) and the other with the assimilation of GOCI retrievals preprocessed over the 27 km grids in domain 1 (called GOCI in black) -in Fig. 3. As GOCI data are reported from 00:00 to 08:00 UTC, only 00:00 and 06:00 UTC cycles are shown here in consecutive cycle numbers. The time series of (o-a) and (o-b) in each experiment show that the preprocessed data fit slightly better to the observations than the thinned data, assimilating more retrievals throughout the period. Because the differences between the two experiments are not significant, for computational efficiency, we decided to preprocess all the GOCI retrievals and assimilate them with the thinning process turned off in GSI for the rest of the experiments shown in this study. Choi et al. (2018) described their improved retrieval algorithm (GOCI YAER V2) with updated cloud-masking and surface reflectance calculations, making a long-term eval-uation against other ground-and satellite-based measurements. In their study, depending on the verifying objectseither ground-based Aerosol Robotic Network (AERONET) or satellite-based retrievals -they specified the uncertainties of GOCI AOD retrievals over land and ocean using two different linear regression formulae. We assign 1 following their error specification with respect to AERONET and 2 based on their expected error against retrieved satellite AOD Here, τ A stands for GOCI AOD values. In an effort to account for representativeness error, we also tried with 2 increased by 20 % everywhere as the third error formula (e.g., 3 = 1.2 × 2 ) and compared all three types of errors in Fig. 4. When these different observation errors were applied to GOCI retrievals in the assimilation, the smallest error ( 2 ) produced slightly better fits to observations, especially for the high values (AOD > 2) during the cycles, as expected, but not in a statistically meaningful way (not shown). In fact, it is not straightforward to estimate the representativeness error, which is subject to the model resolution (in the both horizontal and vertical) and data processing in use. Therefore, in many cases, observation error is specified based on the resulting forecast performance (Ha and Snyder, 2014). But because our forecast skills were not very sensitive to the three different error formulae tried here, for the rest of the experiments, 2 is used as the observation error for GOCI retrievals.
The goal of this study is to examine the relative impact of the GOCI assimilation on the prediction of surface PM 2.5 and ultimately to improve the forecasts for pollution events. Although it is rather easy to render the analysis close to GOCI observations by reducing the observation error, it is not guaranteed that the analysis in good agreement with AOD retrievals would actually lead to better forecasts in surface PM 2.5 . This is partly because AOD, a column-integrated quantity, is not directly associated with surface PM 2.5 and partly because large uncertainties in the forecast model and the emission forcing can dominate over the analysis error during the model integration. Even if the efficiency of assimilating AOD toward improving surface PM 2.5 forecasts can be largely affected by the quality of the forecast model and the emission data in use, the effectiveness of the AOD assimilation is based on the relationship between the columnintegrated AOD and PM 2.5 on the ground. Therefore, it might be worth checking the correlation between GOCI AOD retrievals and surface PM 2.5 observations for the cycling period. Figure 5 depicts a scatter diagram of GOCI AOD retrievals at 550 nm and surface PM 2.5 observations that are collocated in each grid box in domain 1 for the month of May 2016. As shown with the linear regression coefficient of 0.33, the two observation types have low correlations during this period, which is consistent with previous studies (Saide et al., 2014;Pang et al., 2018). Such an indirect relationship between the two observations makes the analysis challenging because it can induce a large error in the observation operator and heavily depends on the model's ability to derive PM 2.5 from AOD based on the vertical structure of aerosol variables and the conversion from aerosol mass to optical properties.

Results
With a careful design of the model configuration and observation processing, the overall impact of assimilating all the available observations (DA) is illustrated compared to the baseline experiment without data assimilation (NODA) in Fig. 6. Here, the 0-23 h hourly forecasts from all the 00:00 UTC analyses in domain 2 are concatenated for the entire month. Surface PM 2.5 observations marked as black dots show that the air quality becomes distinctively aggravated for the last 7 d, which is related to the long-range transport of air pollutants. With data assimilation (DA), the analyses at 00:00 UTC and the following forecasts (red) are in better agreement with the corresponding observations than those without assimilation (gray), especially from day 15 (e.g., after a full spin-up for 2 weeks). In particular, on 25-27 May, forecast error grows quickly even from the good analysis at 00:00 UTC, possibly associated with large uncertainties in lateral boundary conditions and the forecast model in use. However, averaged over the entire period, the mean absolute error (MAE) indicates that the performance of 0-23 h forecasts at 9 km resolution is improved by ∼ 30 % through data assimilation.

Observation impact during the cycles
Given that the aerosol assimilation has a positive impact on air quality forecasting, it might be worth isolating the contribution of each observation type to the improvement of the analysis and the following forecasts. We first assimilate individual observation types separately, naming the experiment following each observation type, then we assimilate them all together (called ALL). Figure 7 illustrates the vertical profile Figure 6. Time series of surface PM 2.5 simulated with (DA; red) and without assimilation (NODA; gray) in domain 2, representing hourly 0-23 h forecasts from 00:00 UTC every day, as averages over 361 stations over South Korea. Corresponding observations are marked as black dots. The mean absolute error (MAE; |o-f|) averaged over the entire period is shown for each experiment. Here, DA refers to the ALL experiment.
of 10 three-dimensional GOCART aerosol variables that are used to diagnose PM 2.5 in the GOCART scheme, the analysis (solid), and background (e.g., 6 h forecast; dashed) averaged over domain 2. Assuming that cycles may need to spin up meteorology and chemistry at least for 3 d in the regional simulations, all the statistics are computed from day 4 in the rest of the figures. Although the analysis variables only at the lowest model level are used in the observation operator for surface PM 2.5 , the observation impact is detected throughout the atmosphere due to the spatial correlations specified in the background error covariance. Contributions of different observations to each analysis variable vary, with the largest variability in the analysis increments (analysis minus background) displayed in sulfate. Interestingly, a large impact of AOD retrievals is noticed in hydrophilic organic carbon (O 2 ) aloft (e.g., between 12 and 25 levels) and unspeciated aerosol (P ) in the boundary layer. The assimilation of all the observations (ALL) tends to reduce O 2 , dust in both size bins (D 1 and D 2 ), and unspeciated aerosol (P ) in the lower atmosphere. Figure 8 summarizes the effect of different observations on PM 2.5 in both domains. The assimilation of surface PM 2.5 observations (green) results in the smallest PM 2.5 , while the GOCI assimilation (blue) produces the largest PM 2.5 throughout the atmosphere in both domains. When the analysis (solid line) is compared to background (dashed), it is revealed that PM 2.5 is predominantly increased over domain 1 with the assimilation of GOCI retrievals. Overall, the aerosol assimilation affects the entire profile of PM 2.5 , with the largest impact at the surface.
It is noted that the vertical distribution of the model aerosol species is associated with the vertical stratification of the model as well as the vertical distribution of the species in the background error covariance. It might be worth evaluating the vertical structure of individual species simulated in the model with respect to the vertical profiles observed during the KORUS-AQ field campaign (such as NASA DC-8  aircraft) in the future, although all the flight tracks were limited to the vicinity of the Korean Peninsula (Peterson et al., 2019).
To understand the observation impact in the horizontal distribution, Fig. 9 shows the analysis increments (analysis minus background) averaged over the period of 4-31 May 2016. Generally, the assimilation of surface PM 2.5 observations (PM) reduces surface PM 2.5 over most regions in China, while the GOCI assimilation largely increases surface PM 2.5 almost everywhere, consistent with Fig. 8. As MODIS retrievals have a relatively low coverage of the East Asian region for the entire period, they have the smallest impact among all the observation types. When all the observations are assimilated together (in ALL), it combines the effect of surface PM 2.5 and GOCI retrievals, changing the vertical distribution of aerosol species to match the AOD column values and pulling the surface states towards surface PM 2.5 concentrations. While the observing network of surface PM 2.5 is widely distributed over China, the impact of GOCI data is more centralized over Korea, making unequivocal contributions to air quality forecasting in the Korean Peninsula.
Note that we employ the 2010 inventory for our emission data, which does not reflect the emission control started from 2013 in China (Zheng et al., 2018). Given that air pollutants in the emission data constitute the majority of the precursors of PM 2.5 pollution, surface PM 2.5 concentrations could strongly depend on emissions, which might have led to the overestimation in the background (e.g., first guess). Therefore, the assimilation of surface PM 2.5 tends to counteract the overestimation driven by the emission data over China. On the other hand, over South Korea, the emission data do not seem to be overestimated and the assimilation of surface PM 2.5 leads to increasing surface PM 2.5 most effectively during the cycles.
Different from surface particulate matter, AOD in the background is contingent upon the optical properties described in the observation operator (e.g., CRTM) and the vertical structure of aerosols simulated in the column. The influence of the GOCI assimilation may indicate model deficiencies in the two aspects because the model states are pulled toward the observed information during the analysis step, as depicted in the analysis increment.

Observation impact on 24 h forecasts
Since the real effect of data assimilation is manifested in the subsequent forecasts, we now examine forecast improvements when initialized by our own analyses. A good analysis is expected to slow down the forecast error growth, leading to better forecasts. In this subsection, forecast errors at the lowest model level are compared between experiments for 24 h with respect to surface observations from various sites in South Korea. As we focus on 9 km simulations over the Korean Peninsula, it is hard to anticipate the direct effect of the assimilation beyond 24 h, especially in such a small domain in which the weather systems dramatically change from day to day. As shown in Fig. 10, the forecast error is the largest in the baseline experiment (NODA), followed by the assimilation of MODIS retrievals alone (MODIS) in terms of mean absolute error (MAE). Note that the analysis in the PM experiment is verified against the same surface PM 2.5 observations used in the assimilation. Therefore, the analysis error is smaller than those in other experiments, but the forecast error grows quickly over the next 24 h. The assimilation of surface PM 2.5 alone generally underestimates the prediction of surface PM 2.5 , with the fastest growth of forecast error. On the other hand, the assimilation of AOD retrievals (either GOCI or MODIS) alone does not improve the surface analysis and mostly overestimates surface PM 2.5 for 24 h.  This might be ascribed to an imperfection of the forward operator of AOD and the model deficiency in the representation of three-dimensional aerosol species that comprised AOD and PM 2.5 . When assimilated with surface PM 2.5 observations (in ALL), however, AOD retrievals effectively reduce the forecast error and suppress the error growth throughout 24 h forecasts. Recently, heavy pollution events have often taken place over Korea, and considerable attention has been drawn to the accuracy of operational air quality forecasting in the country, particularly in surface PM 2.5 . As accurately predicting exceedance and non-exceedance events in categorical predictions has great social impact, it is necessary to evaluate the forecast accuracy for different categorical events. While Miyazaki et al. (2019) classified the entire KORUS-AQ campaign period into four different phases based on dominant atmospheric circulation patterns, we categorize events for the month of May 2016 based on hourly surface PM 2.5 concentrations, as summarized in Tables 2 and 3. Figure 11 summarizes the evaluation of 24 h forecasts based on the formulae described below.
The air quality forecasting operated by the Korean NIER is currently evaluated in the same way on a daily basis, except for daily mean values.
In all events, the overall accuracy of 0-24 h forecasts is the highest in ALL (∼ 70 %) and the lowest in NODA (∼ 60 %), which is about 10 % improvement through assimilation during this KORUS-AQ period. It is noted that the forecast error illustrated in Fig. 10 is dominated by days with a clear sky or moderate air quality conditions (about two-thirds of the month-long period, as shown in Fig. 6), while the forecast accuracy summarized in Fig. 11 is determined by equally weighting different categorical forecasts with different sample sizes. This implies that the categorical forecast evaluation tends to emphasize the forecast accuracy for pollution events (which has a smaller sample size). As such, Fig. 11a highlights the effect of data assimilation on improving air pollution forecasts. Differences between experiments are much  Figure 11. Time series of the forecast accuracy (%) of the hourly forecasts from the 00:00 UTC initialization for 4-31 May 2016 in domain 2 for categorized events based on hourly surface PM 2.5 concentrations, as defined in Tables 2 and 3. larger in high pollution events (Fig. 11b) and the detection rate (Fig. 11f) to which AOD retrievals (both GOCI and MODIS) make the biggest positive contributions. While NODA produces poor forecasts consistently in most metrics shown in Fig. 11, the forecast accuracy in PM (green) drops very quickly for the first 12 h for all events (a) and pollution events (b), indicating that the assimilation of surface PM 2.5 alone may not be enough to maintain the forecast skills beyond the cycling frequency (e.g., 6 h). It also increasingly underestimates surface PM 2.5 with time, especially after 20 h, and produces more false alarms even though its overestimation rate is the lowest among all experiments. Overall, the AOD assimilation tends to overestimate the prediction of surface PM 2.5 with a relatively large false alarm rate, but it clearly helps enhance the forecast accuracy up to 24 h when assimilated with surface PM 2.5 observations. Even with low correlations with surface PM 2.5 (as illustrated in Fig. 5), AOD retrievals keep the surface air pollution forecasts from drifting away from the true state, compensating for model deficiencies. This demonstrates that it could be substantially beneficial to monitor a wide range of the surrounding area using a geostationary satellite for the enhancement of air quality forecasts.
In order to verify our forecasts against independent observations, we processed total AOD at 500 nm from the Aerosol Robotic Network (AERONET; https://aeronet.gsfc. nasa.gov/, last access: 1 February 2019) sites and surface PM 2.5 concentrations measured at three more stations op-  (Fig. 12). The level 2 data are used for AERONET AOD observations as cloud-free and quality-assured data. Figure 13 illustrates the time series of hourly AOD from our experiments compared to hourly averages of AOD observations from eight AERONET sites (black dots). At all sites, GOCI (blue) produces the largest AOD at most of the high peaks, while PM (green) and NODA (gray) simulate the smallest AOD throughout the period. Regardless of relative AOD values between the experiments, model forecasts are well matched with observations at low AOD values but mostly miss high AOD observations, especially during the high pollution events for 24-27 May. This leads to the negative mean bias (as f-o) in all experiments (shown in the legend), implying that our forecasts produce AOD slightly lower than the observed one as a whole. The RMSE and mean bias at a total of 16 AERONET sites are summarized in Table 4, indicating that GOCI has the smallest forecast error in AOD nationwide. Surface PM 2.5 measurements from three NIER sites were downloaded from https://www-air.larc.nasa.gov/cgi-bin/ ArcView/korusaq (last access: 17 December 2019) as raw data with no quality control. They are provided as hourly averages starting from 9 May and compared to our hourly model output for 9-31 May (Fig. 14). These observations look somewhat noisy, but our forecasts broadly follow them throughout the period. Similar to the AOD verification shown in Fig. 13, forecasts from GOCI produce the smallest forecast mean bias among all the experiments in Olympic Park (a) and Daejeon (b), predicting high surface PM 2.5 concentrations between 24 and 26 May. But GOCI was worse than other experiments in Ulsan (c), overestimating surface PM 2.5 , especially during high pollution days.
In the assimilation system, raw data are not considered to be reliable, but this verification is included for completeness because there was no other instrument that reported surface PM 2.5 concentrations or all the precursors of PM 2.5 concentrations to validate PM 2.5 forecasts on the ground level. Figure 13. Hourly time series of total AOD at 500 nm from 00:00 UTC on 4 May to 23:00 UTC on 31 May at eight different AERONET sites. Model values in different colors represent output every hour beginning at the initial time and ending at the 23rd hour of integration patched together for each 00:00 UTC forecast. The bias (represented as f-o) averaged over the entire period is shown next to each experiment name. AERONET observations represent hourly averages as black dots.

A heavy pollution case
The effect of assimilating different observations is most distinguishable in high pollution events, as demonstrated in Fig. 11. During the KORUS-AQ period, there were about five heavy pollution cases (when surface PM 2.5 > 50 µg m −3 , as defined in Table 2) over South Korea. The longest and the most severe pollution events occurred on 25-26 May 2016. Figure 15 illustrates how air pollutants were transported from China, associated with the strong synoptic weather systems in the region for a few days.
As the analysis of our best experiment ALL showed, the Korean Peninsula was positioned in the downstream region of the upper-level trough at 500 hPa (in the left panel). In the low troposphere, the center of the North Pacific High was situated in the east of Japan, bringing lots of moisture to Korea at 00:00 UTC on 24 May 2016 and blocking the eastward movement of the surface low-pressure system located north of Korea (centered around 46 • N, 125 • E), as shown in Fig. 15d. With the slowly approaching upper-level westerlies, these warm and moist conditions in the low troposphere provided a favorable environment for increasing air pollution in the Korean Peninsula for the next few days. At 00:00 UTC on 25 May, the Shangdong area in China (shown as the largest polluted area to the west of Korea) exceeded 150 µg m −3 in surface PM 2.5 (Fig. 15b). This area has high topography with elevations higher than 3.5 km (in height above ground level; a.g.l.), while most regions in South Korea, especially the Seoul Metropolitan Area (SMA), are elevated near sea level. Therefore, when slow and deep baroclinic systems are approaching the Korean Peninsula like these events, a deep pool of highly polluted air can be advected from China as a whole to substantially degrade the air quality in South Korea at least for a day or two. This long-range transport case produced an hourly maximum surface PM 2.5 observation of 117 µg m −3 over the SMA in Korea at 00:00 UTC on 26 May 2016, as shown in Fig. 16a.
One notable difference between observations (a) and all the model simulations (b-f) in Fig. 16 is that 9 km forecasts driven by 0.1 • × 0.1 • anthropogenic emissions cannot simulate such a high spatial variability across stations. During this heavy pollution event, there were dozens of missing observations, resulting in fewer stations in (a) than all the experiments (b-f). With only 145 stations reporting high concentrations (e.g., surface PM 2.5 > 50 µg m −3 ), the observed distribution still shows a sharp gradient between the stations, especially in the SMA. Consistent with all the previous figures, the assimilation of surface PM 2.5 alone (in PM) underpredicts surface PM 2.5 (even more than NODA), while GOCI overpredicts surface PM 2.5 the most among all observation types almost everywhere except for the SMA. MODIS retrievals slightly increase the concentrations from NODA (by ∼ 10 µg m −3 ), with the spatial distribution almost the same as that of NODA. In the concurrent assimilation of all the observations (in ALL), a moderate overestimation is presented everywhere, but higher levels of pollution in the SMA are not simulated either. To resolve such a large variability between urban and rural areas and to increase the sharpness of the forecast accuracy, the use of higher grid resolutions (such as 3 km), more accurate emission data, and more sophisticated aerosol chemistry mechanisms might be indispensable.

Conclusions and discussion
GOCI AOD retrievals provide reliable and consistent aerosol information, monitoring air pollutants over the Korean Peninsula at high resolution every day. One of the best ways of utilizing such invaluable observations is to inject them into the forecast system through data assimilation and better initialize numerical forecasts. For the successful assimilation of real observations, especially retrievals from satellites, extra attention should be paid to processing the data properly based on the characteristics. The spatial and temporal representativeness of GOCI retrievals was carefully examined and the corresponding data processing was conducted before assimilation in this study. We averaged all the pixels over each grid box at 27 km resolution (e.g., superobing) instead of thinning them randomly, for instance.
It is worth noting several challenges in the assimilation of AOD retrievals for improving the prediction of surface PM 2.5 concentrations: (i) AOD is not directly associated with PM 2.5 concentrations on the ground. Although the two datasets can be highly correlated in specific conditions such as cloud-free, low boundary layer heights, and low relative humidity, the overall correlation is low (∼ 0.3) in the present study and a direct impact on each other cannot be expected. (ii) An observation operator for AOD has errors due to the simplification and limited aerosol specifications in the community radiative transfer model (CRTM). (iii) There is significant model er-  ror, which is presumably one of the most critical issues. In the 3D-Var assimilation, in particular, the model estimates of AOD, a column-integrated quantity, are strongly constrained by the model error structure of each aerosol species both horizontally and vertically.
Even with these challenges, however, satellite-based AOD, especially from geostationary satellites like GOCI, can be extremely useful for improving the prediction of air pollution on a daily basis. In the situation in which air quality can be largely affected by the long-range transport of air pollutants, such consistent information on the wide upstream area is essential but hard to obtain otherwise.
Using the GSI 3D-Var system coupled with the WRF-Chem forecast model, we assimilated satellite AOD retrievals and surface PM 2.5 observations for the month of May 2016 during the KORUS-AQ period. Compared to the baseline experiment (NODA), the simultaneous assimilation of various observations consistently improved the prediction of ground PM 2.5 for 24 h forecasts, reducing systematic error and false alarms. The assimilation of ground PM 2.5 alone improved the analysis during the cycles, reducing the analysis error to almost half the size compared to the experiment without assimilation. However, the forecast error grew very quickly over the next 12 h, underestimating PM 2.5 at the surface, especially in heavy pollution events during which the forecast accuracy dropped from over 70 % to ∼ 30 % in only 4 h. Meanwhile, the GOCI AOD retrievals alone tended to overestimate surface PM 2.5 but significantly contributed to improving air quality forecasts up to 24 h when assimilated with surface PM 2.5 observations. The effect of data assimilation is most distinguishable and remarkable for high pollution events. During the month of May 2016, most heavy pollution events were associated with long-range transport from China. In such cases, it was particularly beneficial to moni-tor the wide upstream region using geostationary instruments such as GOCI.
To assess the effect of data assimilation with respect to independent observations, 0-23 h forecasts from different experiments are verified against AOD from AERONET sites and ground PM 2.5 measurements from the sites operated during the KORUS-AQ field campaign. In this verification, the assimilation of GOCI retrievals is the most effective in improving the forecast performance at most sites, especially for high pollution events.
Even with successful data assimilation, there are several limitations in this study. First, the simple GOCART aerosol scheme is well known for the underestimation of air pollutants due to the lack of aerosol size distribution and secondary organic aerosol (SOA) formation. We had to use the scheme for the assimilation of AOD retrievals since the observation operator for AOD was only built for the GOCART scheme in the GSI system. Next, as there is no cross-covariance between aerosol and meteorological variables considered in the background error covariance estimates, the influence of aerosols on meteorological variables was not fully simulated in this study. Without the assimilation of meteorological observations, it was not possible to make an optimal estimate that is fully coupled between chemistry and meteorology, although the meteorological information was provided through the first guess and lateral boundary conditions. Finally, the emission inventory used in this study was based on the annual mean of 2010, which did not reflect the actual emissions for the year 2016, especially over China. The large bias and uncertainties in the emission data were particularly detrimental to the assimilation of surface PM 2.5 alone.
To overcome the systematic underestimation of the GO-CART aerosol scheme in the assimilation context, there is an ongoing effort for the development of a new interface for more sophisticated aerosol schemes such as MOSAIC and/or the Modal Aerosol Dynamics in Europe and the volatility basis set (MADE-VBS; Ackermann et al., 1998;Ahmadov et al., 2012) in the WRFDA system (Barker et al., 2012). This would be advantageous for more realistic forecast behavior in high-resolution applications.
The positive impact of data assimilation is generally limited to 24 h forecasts for three major reasons: first, most air pollutants have a short lifetime due to dry and wet deposition and transformations through interactions with solar radiation and clouds. Secondly, pollutant transport and transformations in chemical transport models are strongly driven by external forcing, such as emissions, boundary conditions, and meteorological fields. Lastly, there are large uncertainties in aerosol-and gas-phase chemistry parameterized in chemical transport models. Therefore, to extend the period of forecast improvements, emission data need to be improved, and large uncertainties in chemical and meteorological boundary conditions should be minimized. It has been shown that the estimation of emission inventories as part of the DA pro-cedure can help extend the impact of data assimilation in longer forecasts (Elbern et al., 2007;Kumar et al., 2019). Also, more sophisticated aerosol and chemical mechanisms might be able to improve air quality forecasting by reducing model deficiencies (Chen et al., 2019). A simultaneous assimilation of meteorological observations and measurements of individual chemical species as well as particulate matter would certainly be beneficial in both NWP and air quality forecasting. To better account for high nonlinearities and uncertainties of aerosol forecasting on small scales, more advanced analysis techniques such as ensemble or hybrid data assimilation would be more desirable.
Author contributions. ZL helped formulate the study, and WS performed initial test runs. YL and LC provided input datasets, partially funding this study. SH designed and ran the experiments, analyzed the results, and wrote the paper.
Competing interests. The authors declare that they have no conflict of interest.