Interactive comment on “ Impacts of air pollutants from fire and non-fire emissions on the regional air quality in Southeast Asia

the text. Table 3. You can cite the website where the readers can find the information here. Table 5-8. Try to move some of them to the supplement. Too many details will distract the readers. Figure 6. The readers are lost when they find so much information in this figure. Figure 8-10. Yes, the machine learning techniques used here are very fancy, but they are not the key points of this paper. There is no need to display three figures to illustrate your ML results. Abstract. This is a really long abstract. I suggest shortening it.


Introduction
Severe haze in Southeast Asia has attracted the attention of governments and the general public in recent years due to its impact on local economy, air quality, and public health (Miettinen et al., 2011;Kunii et al., 2002;Frankenberg et al., 2005;Crippa et al., 2016).Widespread biomass burning activities are one of the major sources of haze events in Southeast Asia.Our previous study demonstrated that biomass burning aerosols contributed to up to 40-60 % of haze events in the major cities of Southeast Asia during [2003][2004][2005][2006][2007][2008][2009][2010][2011][2012][2013][2014] (Lee Published by Copernicus Publications on behalf of the European Geosciences Union. et al., 2017).On the other hand, biomass burning in Southeast Asia could impact climate through emissions of carbon dioxide (CO 2 ) (van der Werf et al., 2009) and particulate matter -the latter has a substantial impact specifically on regional climate features including the spatiotemporal distribution of precipitation and energy budgets (Wang, 2004(Wang, , 2007)).
Regarding the impact of biomass burning aerosols on public health, a recent study based on a health model in the United States (USA) has estimated the number of deaths resulting from black carbon (BC) to be more than 13,500 in 2010 (Li et al., 2016).Considering that both the ambient concentration of particulate matter and overall population in Southeast Asia are higher than those of the USA, a worse scenario in the region could thus be foreseeable.In fact, a few studies quantifying the consequences of aerosols on human health in Southeast Asia have already suggested taking necessary measures to reduce biomass burning and deforestation in order to prevent related public health issues (Marlier et al., 2013).However, as important as biomass burning pollution may be, it is not the only source of particulate pollution in Southeast Asia.Indeed, aerosols emitted from fossil fuel burning alongside other non-biomass burning human activities, as indicated in our previous study (Lee et al., 2017), also contribute significantly to air quality degradation.
Particulate pollutants from human activities other than biomass burning in Southeast Asia include species both locally produced and brought in from neighboring regions by long-range transport.Fossil fuel emissions in Southeast Asia have increased significantly in recent years, especially in areas where energy demands are growing rapidly in response to economic expansion and demographic trends (IEA, 2015).Therefore, advancing our understanding of the respective contributions of aerosols from fire (i.e., biomass burning) versus non-fire (including fossil fuel combustion, road and industrial dust, land use, and land change, etc.) activities to air quality and visibility degradation has become an urgent task for developing effective air pollution mitigation policies in Southeast Asia.
In this study, we aim to examine and quantify the impacts of fire and non-fire aerosols on air quality and visibility degradation over Southeast Asia.Three numerical simulations have been conducted using the Weather Research and Forecasting (WRF) model coupled with a chemistry component (WRF-Chem), which is a sophisticated regional weather-chemistry model, driven respectively by aerosol emissions from (a) fossil fuel burning only, (b) biomass burning only, and (c) both fossil fuel and biomass burning.By comparing the results of these experiments, we examine the corresponding impacts of fossil fuel and biomass burning emissions, both separately and combined, on the air quality and visibility of the region.We also use available in situ measurements to evaluate and correct model results for providing a better base for further improvement of particularly emissions over the region.Beyond the traditional process models such as WRF-Chem, we also experiment using machine learning algorithms to identify suitable conditions for haze based on historical data and hence to forecast the likelihood of the occurrence of such events in this study.
We firstly describe methodologies adopted in the study, followed by the results and findings from our assessment of the relative contributions of fire and non-fire aerosols in degrading air quality and visibility over Southeast Asia.We then discuss the uncertainty of current emission inventories alongside the results from an exploratory experiment of using machine learning algorithms to forecast the occurrence of haze events in several major cities in Southeast Asia.The last section summarizes and concludes our work.

Methodology
2.1 Observational data

Surface visibility
The observational data of surface visibility from the Global Surface Summary of the Day (GSOD; Smith et al., 2011) are used in our study to identify the days with low visibility due to particulate pollution, i.e., haze events.The GSOD is derived from the Integrated Surface Hourly (ISH) dataset and archived at the US National Climatic Data Center (NCDC).The daily visibility data are available from 1973 onward.

Particulate matter (PM 10 )
The surface concentrations of particulate matter with sizes smaller than 10 µm (PM 10 ; measured in µg m −3 ) in Malaysia are derived from the Air Quality Index (AQI; named Air Pollutant Index, API, in Malaysia) records obtained from the website of Ministry of Natural Resources and Environment, Department of Environment, Malaysia (http://apims.doe.gov.my/public_v2/home.html, last access: 27 April 2018).When PM 10 is reported as the primary pollutant with a maximum pollutant index, the 24 h PM 10 concentrations are calculated from AQI based on the equations in Table S1 in the supplement (Malaysia, 2000).Data from 51 AQI observation stations are available in Malaysia from October 2005 onward.AQI number is reported twice daily (11 00 and 17 00 local time), and the data reported at 11 00 are used in this study.

Carbon monoxide (CO) and ozone (O 3 )
Surface mole fractions of CO and O 3 are measured by the World Meteorological Organization (WMO) Global Atmosphere Watch (GAW) station in Bukit Kototabang, which is located on the island of Sumatra, Indonesia.Hourly data are archived at the World Data Center for Greenhouse Gases (WDCGG) under the GAW program (http://ds.data.jma.go.jp/gmd/wdcgg/, last access: 27 April 2018).

Crustal matter and residual matter
The Surface PARTiculate mAtter Network (SPARTAN) is a network of ground-based measurements of fine particle concentrations (http://spartan-network.weebly.com/,last access: 27 April 2018) (Snider et al., 2016(Snider et al., , 2015)).Available data in the SPARTAN network include hourly PM 2.5 concentrations and certain compositional features (Table S2).Crustal matters and residual matters, which are mainly organic components, from filtered PM 2.5 samples are used in this study to fill the gap in modeled PM 2.5 created by the missing anthropogenic aerosol in emission inventory (Philip et al., 2017).The four operational SPARTAN sites in Southeast Asia are Bandung (Indonesia), Hanoi (Vietnam), Manila (Philippines), and Singapore.The chemical components of PM 2.5 in each city are presented in Fig. S1.

The model
WRF-Chem version 3.6.1 is used in this study to simulate trace gases and particulates interactively with the meteorological fields using several treatments for photochemistry and aerosols (Grell et al., 2005).We selected the Regional Acid Deposition Model version 2 (RADM2) photochemical mechanism (Stockwell et al., 1997) coupled with the Modal Aerosol Dynamics Model for Europe (MADE), which includes the Secondary Organic Aerosol Model (SORGAM; Ackermann et al., 1998;Schell et al., 2001), to simulate anthropogenic aerosols evolution in Southeast Asia.MADE/SORGAM uses a modal approach (including Aitken, accumulation, and coarse modes) to represent the aerosol size distribution, and predicts mass and number for each aerosol mode.The numerical simulations are employed within a model domain with a horizontal resolution of 36 km, including 432 × 148 horizontal grid points (Fig. 1), and 31 vertically staggered layers based on a terrain-following pressure coordinate system.The Mellor-Yamada-Nakanishi-Niino level 2.5 (MYNN; Nakanishi and Niino, 2009) is chosen as the planetary boundary scheme in this study.By using a vertical coordinate that is stretched to have higher reso-lutions inside the planetary boundary layer, the model has about 4-5 vertical layers inside the planetary boundary layer with a vertical resolution of ∼ 30 m near the surface.The domain covers an area from the Indian Ocean to the Western Pacific Ocean in order to capture the Madden-Julian Oscillation (MJO) pattern.The time step is 180 s for advection and physics calculation.The physics schemes in the simulations include Morrison (two moments) microphysics scheme (Morrison et al., 2009), Rapid Radiative Transfer Model for GCMs (RRTMG) longwave and shortwave radiation schemes (Mlawer et al., 1997;Iacono et al., 2008), Unified Noah land-surface scheme (Tewari et al., 2004), and Grell-Freitas ensemble cumulus scheme (Grell and Freitas, 2014).The initial and boundary meteorological conditions are taken from the US National Center for Environment Prediction FiNaL (NCEP-FNL) reanalysis data (National Centers for Environmental Prediction, 2000), which have a spatial resolution of 1 • and a temporal resolution of 6 h.Sea surface temperatures are updated every 6 h in NCEP-FNL.All simulations used a four-dimensional data assimilation (FDDA) method to nudge NCEP-FNL temperature, water vapor, and zonal and meridional wind speeds above the planetary boundary layer.

Emission inventories
The Regional Emission inventory in ASia (REAS) version 2.1 (Kurokawa et al., 2013) is a regional emission inventory for Asia, including monthly emissions of most major air pollutants, e.g., black carbon (BC), organic carbon (OC), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), and greenhouse gases between 2000 and 2008.The spatial resolution of REAS is 0.25 × 0.25 • , covering East, Southeast, South, and Central Asia and the Asian part of Russia (Russian Far East, eastern and western Siberia, and the Urals).The area coverage of REAS is from 60 to 160 • E in longitude and from 10 • S to 50 • N, which is smaller than our domain configuration.For this reason, we use the Emissions Database for Global Atmospheric Research (EDGAR) version 3.2 (year 2000 emission; Olivier et al., 2005) and version 4.2 (year 2005 emission; http://edgar.jrc.ec.europa.eu,last access: 27 April 2018) to complement the emissions over areas outside REAS coverage.The emission coverage of REAS and EDGAR in our simulated domain is presented in Fig. 1.We have compared the modeled results using REAS versus EDGAR emission inventories in a set of 1-year paired simulations: the differences between these two model runs are rather limited regarding aerosol-related variables (Table S3).After considering high spatiotemporal resolution of REAS emission inventory and the comparison results, we decided to use REAS in this study.In addition, a detailed comparison of REAS with other emission inventories in Southeast Asia was also presented by Kurokawa et al. (2013).
The Fire INventory from US National Center for Atmospheric Research (NCAR) version 1.5 (FINNv1.5;Wiedin-myer et al., 2011) is also used in the study to provide firebased emissions.FINNv1.5 classifies burning of extratropical forest, topical forest (including peatland), savanna, and grassland.The daily data are available from 2002 to 2014 with a 1 km spatiotemporal resolution.FINNv1.5 emission inventory also includes the major chemical species (e.g., BC, OC, SO 2 CO, and NO 2 ) from biomass burning.A modified plume rise algorithm in WRF-Chem, specifically for tropical peat fire, is described in Lee et al. (2017).
Compared with fossil fuel emissions, biomass burning emissions vary in space and time (Fig. S2).However, regarding long-term impact, both emissions are important to regional air quality in Southeast Asia (Table 1).BC from biomass burning emissions, for example, has significant interannual and inter-seasonal variabilities due to the Southeast Asia monsoon and the El Niño-Southern Oscillation (ENSO; Lee et al., 2017;Reid et al., 2012), but total BC emissions are equally contributed by fossil fuel and biomass burning sources (Table 1).

Numerical experiment design
Three numerical simulations are proposed to investigate the impacts of fire and non-fire aerosols on regional air quality and visibility in Southeast Asia.Among these three runs, the fossil fuel emissions only (FF) simulation and the biomass burning emissions only (BB) simulation are designed to assess the impact of stand-alone non-fire and fire aerosols, respectively.The simulation combining both fossil fuel and biomass burning emissions (FFBB) is to demonstrate the impacts of both types of aerosols; it is also closer to the real world case than the two other runs.Based on available years of emission inventories, each of these runs lasts 7 years (i.e., from 2002 to 2008).

Deriving low-visibility days (LVDs) caused by particulate pollution
According to Visscher (2013), a visibility reading lower than 10 km is considered a moderate to heavy air pollution event by particulate matter.As in Lee et al. (2017), we define a lowvisibility day (LVD) as when the daily mean surface visibility is lower or equal to 10 km, not including misty and fog days.The modeled visibility is calculated based on the extinction coefficient of the externally mixed aerosols, including BC, OC, sulfate (SO 2− 4 ), and nitrate (NO − 3 ), as a function of particle size, by assuming a log-normal size distribution of Aitken and accumulation modes.Note that all these calculations are computed for the wavelength of 550 nm.To make the calculated visibility based on modeled aerosols better match the reality, we also consider the hygroscopic growth of OC, sulfate, and nitrate in the calculation based on the modeled relative humidity (Kiehl et al., 2000;Lee et al., 2017).
Our focus in this study is to first identify LVDs and then to determine whether fire or non-fire aerosols alone, or in combination, could cause the occurrence of these LVDs.As a reference, the observed LVDs are identified and the annual frequency in every year for a given city is also derived by using the GSOD visibility data.Then, the modeled LVDs are derived following the same procedure.Using these results and based on the logical chart in Fig. 2, the major particulate source (FF, BB, or FFBB) that caused the occurrence of observed LVDs are determined.Here, Type 1 LVD represents the cases where either fire or non-fire aerosols alone can cause the observed LVD to occur.Type 2 means that non-fire aerosols are the major contributor to the observed LVD.Type 3 means that fire aerosols are the major contributor to the observed LVD.Type 4 represents the cases where the observed LVD is induced by coexisting fire and non-fire aerosols.The observed LVDs that the model cannot capture are classified as Type 5.

Air quality index (AQI)
The AQI is established mainly for the purpose of providing easily understandable information about air pollution to the public.The original derivation of AQI in the USA is based on six pollutants: particulate matter (PM 10 ), fine particulate matter (PM 2.5 ), sulfur dioxide (SO 2 ), carbon monoxide (CO), ozone (O 3 ), and nitrogen dioxide (NO 2 ).Each pollutant is scored on a scale extending from 0 through 500 based on the corresponding breakpoints, and then the highest AQI value is reported to the public.In this study, we focus on the AQI derived from modeled 24 h PM 2.5 and 9 h O 3 .Note that the original AQI is derived by using 8 h O 3 .Due to the 3 h output interval of simulated O 3 , we use the 9 h O 3 level instead in this study.An index I p for pollutant p is calculated by using a segmented linear function that relates pollutant concentration, C p : where B Hi is the upper breakpoint of C p set category and B Lo is the bottom breakpoint of C p set category in Table S4.I Hi and I Lo are the AQI values corresponding to B Hi and B Lo , respectively.For example, when the 24 h PM 2.5 concentration is 20 µg m −3 , B Hi , B Lo , I Hi , and I Lo are 12,1, 35.4,51, and 100, respectively.Then, we selected 24 h PM 2.5 and the maximum 9 h O 3 AQI value in 1 day to represent daily AQI for PM 2.5 (AQI (PM2.5) ) and O 3 (AQI (O3) ), respectively.

Health impact assessment (HIA)
Previous observations have revealed significantly higher PM 2.5 concentrations in the cities of Southeast Asia than those in the USA and Europe (WHO, 2016), implying that the concentration-response functions (CRFs) derived from the latter may not be directly applicable to Southeast Asia.In this study, we adapt CRFs in Gu and Yim (2016) to estimate the annual number of premature mortalities due to ambient PM 2.5 concentration in the corresponding region.The relative risk (RR) of four causes of death, chronic obstructive pulmonary disease, ischemic heart disease, lung cancer, and stroke, when compared with annual incident rate has been assessed separately.Such risks are described by a log-linear relationship with the corresponding PM 2.5 concentration level (Burnett et al., 2014).The basic form of RR formulas is provided as follows: where X j and X 0 are the particulate pollutant concentrations (µg m −3 ) in the target cities and the threshold value below which no additional risk is assumed to exist, respectively.
Here we present the uncertainty range of threshold value between 5.8 µg m −3 and 8.8 µg m −3 in a triangular distribution, as suggested by the GBD 2010 project (Lim et al., 2013).Epidemiological results are not always available in Southeast Asia.To capture both climbing and flattening out phases of CRFs curves suitable for Southeast Asia region, we fit parameters α, β, and δ in CRFs by the epidemiological samples in the East Asian cities based on Gu and Yim (2016) for China, where PM 2.5 concentration has a comparable level to that in Southeast Asia.

Model evaluation
Multiple ground-based observations are used in this study to evaluate the model's performance particularly in simulating aerosol and major gaseous chemical species such as ozone and carbon monoxide.PM 2.5 observations in Southeast Asia are very limited.Even in Singapore, observed PM 2.5 data are only available after 2014 for the general public and research community to access.Therefore, PM 10 concentrations derived from AQI in Kuala Lumpur (Malaysia) are used to present the variation of particulate matter during haze and non-haze seasons.Compared with the observations, the model accurately predicted PM 10 concentration, especially during haze seasons (July to October; Fig. 3a); however, it produced a systematic negative bias of 20 µg m −3 in background PM 10 concentration during non-haze periods.This discrepancy between modeled and observed background PM 10 concentration could come from either the relatively coarse resolution of the model or the underestimation of primary aerosol/ aerosol precursor emissions, or both.Philip et al. (2017) indicated that most global emission inventories do not include anthropogenic fugitive, combustion, and industrial dust (AFCID) from urban sources, typically including fly ash from coal combustion and industrial processes (e.g., iron and steel production, cement production), resuspension from paved and unpaved roads, mining, quarrying, and agricultural operations, and road-residential- commercial construction.In their study, they estimated a 2-16 µg m −3 increase in fine particulate matter (PM 2.5 ) concentration across East and South Asia simply by including AFCID emission.We also find that the major component of PM 2.5 particles from the filtered samples of SPARTAN observational network is residual materials, which are mainly organic matters (Snider et al., 2016; Fig. S1).All of these analyses show the incompletion in the current emission inventories.In addition to PM 10 data, we have also used observed surface visibility to evaluate model performance.As mentioned in Sect.2.5, the modeled visibility values are derived from the extinction coefficient of the externally mixed aerosols and simulated fine particulate concentrations.As shown in Fig. 4, the model correctly predicted about 40 % of observed low-visibility events during the fire seasons, while 60 % of mis-captured low-visibility events are mainly due to the missing AFCID.The details of this are discussed in Sect. 4. Additional uncertainty analysis of modeled LVDs by using a method for dichotomous (with or without LVDs) cases is presented in Sect.S1 of the supplementary material.
On the other hand, the model has overestimated the visibility range for many cases with observed visibility lower than 7 km.Such a result is likely due to the 36 km model resolution used in the study, which could be too coarse to resolve the typical size of air plumes containing high concentrations of fine particulate matters.The detailed discussion of potential uncertainty factors of modeled visibility, including meteorological datasets, fire emission inventories, and the model resolution can be found in Lee et al. (2017).
The observed CO and O 3 levels from the only WMO GAW station in the region, Bukit Kototabang, Indonesia (West Sumatra), are used to evaluate the model performance in simulating gas-phase chemistry.Fossil fuel and biomass combustion and biogenic emissions are among the major sources of CO in the region, while O 3 production is mainly from photochemical reactions of precursors such as nitrogen oxides, volatile organic compounds, and CO, largely from anthropogenic emissions.Due to the geographic location, the primary source of CO in Bukit Kototabang is from biomass burning; hence high CO levels occur during fire seasons (Fig. 3b).The model accurately captured observed CO levels during the simulation.Model simulated evolution of volume mixing ratio of O 3 also matches observations very well, though with a positive bias of about 20 ppbv on average (34.8 versus 13.4 ppbv; Fig. 3c).We notice that NO x emission is higher in REAS emission inventory compared with other emission inventories and studies (Kurokawa et al., 2013).The boundary conditions of WRF-Chem also sets the background surface ozone quite high (30 ppbv).Both could lead to the overestimated background ozone in the model.

Fire-and non-fire-caused LVDs in three selected cities
Based on the logical chart shown in Fig. 2, we can use the modeled results to classify observed LVDs into five types of events with different main aerosol sources.In Bangkok, there were about 165 LVDs per year during 2002-2008 based on observations.Modeled results suggest that about 60 % of these LVDs could have been brought about by either fire or non-fire aerosols (the sum of Type 1, Type 2, and Type 3 in Fig. 2; see Table 2).Generally speaking, fire and non-fire aerosols contribute equally towards the haze events occurring  2).Our study shows that non-fire aerosols are capable of causing of 28 % of LVDs occurring in Kuala Lumpur, even in the absence of fire aerosols.Once we include the impact of fire aerosols, the model can capture an additional 23 % of LVDs, of which most are Type 4 case.Overall, fire and non-fire aerosols make similar contributions to observed LVDs in Kuala Lumpur.
In Singapore, there are about 50 LVDs per year during 2002-2008.The contribution of non-fire aerosols to LVDs is about 8 %.Compared with the additional 25 % of LVDs owing to fire aerosols, the contribution of non-fire aerosols to LVDs is small in Singapore.However, the model failed to capture a high percentage of LVD cases in both Kuala Lumpur (49 %) and Singapore (67 %; Type 5; see Table 2).As discussed in Sect.3.1, missing AFCID in the emission inventory could explain why the model failed to capture the LVDs in these two sites.Further discussion is presented in Sect. 4.

Fire-and non-fire-caused LVDs throughout Southeast Asia
By comparing the annual mean PM 2.5 concentration in 50 Association of Southeast Asian Nations (ASEAN) cities between three simulations, we identify that there are 13 ASEAN cities receiving more than 70 % of PM 2.5 concentration from non-fire sources, while in another 10 ASEAN cities, fire aerosols are the major (more than 70 %) component of PM 2.5 (Fig. 5).Note that although fire aerosols are the major component of annual mean PM 2.5 concentration in these 10 ASEAN cities, the influence period of fire aerosols normally is only about 3 to 5 months.The rest of the ASEAN cities are essentially influenced by coexisting fire and nonfire aerosols.Note that the sum of PM 2.5 concentrations in FF and BB is not necessarily equal to the PM 2.5 concentration in FFBB in any given city due to non-linearity in modeled aerosol processes.
The annual mean LVDs among 50 ASEAN cities is 192 days during 2002-2008.Applying the logical chart described in Fig. 2 to analyze cases of each of these ASEAN cities, we find that by considering aerosols emitted from non-fire emissions alone, about 59 % of observed LVDs can be explained, whereas considering fire aerosols adds an additional 13 % of LVDs.Conversely, by considering aerosols emitted from fire alone, about 47 % of observed LVDs can be explained, whereas adding non-fire aerosols adds an additional 25 % of LVDs.About 28 % of observed LVDs remain unexplained.In general, non-fire aerosols appear to be the major contributor to LVDs in these cities.

Impacts of ozone and PM 2.5 on air quality and human health
Similar to PM 2.5 , O 3 also causes public health and air quality issues (Chen et al., 2007).Previously in Sect.3.1, we discussed that the model systematically overestimated the O 3 volume mixing ratio by 20 ppbv compared with observations.Overestimated 9 h O 3 could lead to a mistakenly derived high AQI (O3) .Nevertheless, the relative differences of AQI (O3) between various model simulations can still provide useful information of the relative contributions of fire and non-fire emissions, either alone or in combination, to air quality and potentially human health.We find that modeled 9 h O 3 in Bangkok from non-fire emissions (FF) alone triggered 19 % of daily AQI (O3) to reach moderate and unhealthy pollution levels during 2002-2008, while fire emissions (BB) alone trigger only 3 % of such situations (Table 3).In comparison, combining fire and non-fire emissions as derived from the simulation of FFBB can cause 33 % of daily AQI (O3) to reach moderate and unhealthy pollution levels.In Kuala Lumpur and Singapore, O 3 is not the major source for air quality degradation, where fire or non-fire emissions alone can seldom cause O 3 levels to reach even moderate pollution levels.For example, in the FF simulation, only 5 % of daily AQI (O3) readings in Kuala Lumpur and 1 % in Singapore reached moderate pollution levels.Again, the majority of the high AQI (O3) cases result from combining fire and non-fire emissions (FFBB; Table 3).Overall, non-fire emissions alone only cause 6 % of daily AQI (O3) to reach moderate pollution levels in 50 ASEAN cities, whereas about 12 % of moderate and unhealthy pollution cases resulted from the combined effect of fire and non-fire emissions.We find that in Southeast Asia, PM 2.5 actually plays a more important role than O 3 in causing high AQI cases.In Bangkok, PM 2.5 resulted in 37 and 33 % of high daily AQI (PM2.5)cases in FF and BB simulations, respectively (Table 4).Among these, 3 times more cases with daily AQI (PM2.5)reaching unhealthy levels can be attributed to PM 2.5 from BB than those from FF (Table 4).However, the unhealthy levels caused by fire aerosols alone still occur relatively infrequently in Bangkok, Kuala Lumpur, and Singapore.In Bangkok, a city with a population of 8 million, persistent aerosol emissions from non-fire sources, aided by seasonal fire aerosols, cause almost two-thirds of daily air quality readings that reached moderate or unhealthy pollution levels.Kuala Lumpur and Singapore also have 48 and 22 % of the days during 2002-2008 reaching moderate or unhealthy pollution levels, respectively.(Table 4).Examining 24 h PM 2.5 AQI (PM2.5)among 50 ASEAN cities shows that non-fire aerosols alone contribute to moderate to unhealthy pollution levels 2.6 times more often than fire aerosols alone (23 versus 9 %).Compared to the modeled results in FF, PM 2.5 in FFBB has 10 % worse air quality of the moderate and unhealthy pollution levels (Table 4).This result is consistent with the findings in Sect.3.3.
We have examined the health impacts due to PM 2.5 in 50 ASEAN cities using the method described in Sect.2.7 and the results show that the top three cities for premature mortality caused by particulate pollution are Jakarta (Indonesia), Bangkok (Thailand), and Hanoi (Vietnam) with 910, 1080, and 620 premature mortalities per year, respectively (Fig. 6).The premature mortality in Jakarta is mainly due to exposure to PM 2.5 particles emitted from non-fire emissions (95 %), the same situation as in Hanoi (80 %).However, in Bangkok, the health impact due to fire and non-fire aerosols are equally critical (Figs.S3 and S4).In general, owing to the increasing trend of non-fire emissions during the analysis period, the premature mortalities due to PM 2.5 emitted from non-fire sources increased with time in most ASEAN cities (Fig. S3).Besides this, higher fire aerosols levels in Sumatra and Borneo in 2002, 2004 and 2006 also increase the number of premature mortalities in cities, such as Kuching, which are exposed to particulate matters from these burning events (Figs. 6 and S4).
Additional discussion of the impact of fire and non-fire aerosols on regional climate is presented in Sect.S2 of the supplementary.

Impact of missing components in the emission inventories on modeled results
In this study, we have noticed that the simulated PM 2.5 concentrations in Singapore are often lower than the observations of the National Environment Agency of Singapore (https://data.gov.sg/dataset/air-pollutant-particulate-matter-pm2-5, last access: 27 April 2018) (6.1 versus 20.3 µg m −3 in annual mean during [2002][2003][2004][2005][2006][2007][2008].Owing to the lower simulated PM 2.5 concentration in Singapore, the model could not capture many  www.atmos-chem-phys.net/18/6141/2018/Atmos.Chem.Phys., 18, 6141-6156, 2018 observed LVDs (Table 2) and consequently underestimated AQI levels resulting from PM 2.5 .As mentioned before, Philip et al. (2017) have pointed out that global atmospheric models can produce a 2-16 µg m −3 underestimation in fine particulate mass concentration across East and South Asia and most current global emission inventories indeed either do not include anthropogenic fugitive and industrial dusts or substantially underestimate the quantities of these emissions (Klimont et al., 2016;Janssens-Maenhout et al., 2015).The fugitive dust sources, such as road and construction dust, in most major cities in Southeast Asia are apparently not well represented in the emission inventory used in our study.To correct these systematic underestimates, we have used crustal matter and residual matter from SPARTAN PM 2.5 measurements as the reference to fill in the modeled PM 2.5 for the missing anthropogenic aerosol components.
Excluding the high-concentration samples during the fire haze events, the mean concentration of crustal matter and residual matter is 25.8 µg m −3 in Hanoi, 10.4 µg m −3 in Singapore, 18.1 µg m −3 in Bandung, and 9.2 µg m −3 in Manila.We then added these values as additional anthropogenic aerosol components in modeled aerosol abundance to recalculate modeled visibility and AQI (PM2.5) .Table 5 shows the calculated percentage of LVDs caused by various aerosol types in Fig. 2 before and after the above correction.
Adding the missing anthropogenic aerosol component based on in situ measurements in the modeled results can reproduce 98 % of observed LVDs in Hanoi (an increase from 79 %).Because the missing anthropogenic aerosols are included in non-fire aerosols, LVDs in Type 1 and Type 2 are heavily weighted in the new result.The results also show that the LVDs in Hanoi are mainly caused by non-fire aerosols and that the contribution of fire aerosols is relatively small.Adding the missing anthropogenic aerosol components also reduced the number of missing LVDs events from 67 to 20 % in Singapore.Differing from Hanoi, Type 2 and Type 4 LVDs increased after introducing the missing anthropogenic aerosols in Singapore, implying that the fire and non-fire aerosols are equally important in causing LVDs there.After applying the correction, non-fire aerosols alone can explain 30 % of LVDs while coexisting fire and non-fire aerosols can explain 40 % of LVDs in Singapore (Table 5).Note that the mode of the distribution of observed visibility in Singapore is around 11 km.Therefore, when fire occurs in the surrounding Nevertheless, even after adding the missing anthropogenic aerosols to the non-fire aerosol category, the model still missed 57 % of LVDs in Manila.This is mainly because the model did not capture many fire events in that area, likely due to underestimation of fire emissions in the emission inventory.
Besides LVDs, the missing anthropogenic aerosols also substantially affect the modeled AQI (PM2.5) .Table 6 shows the frequency of various AQI (PM2.5)levels calculated respectively with and without the missing anthropogenic aerosol components in Hanoi, Singapore, Bandung, and Manila.After considering the missing anthropogenic aerosol components, modeled air pollution levels in Hanoi and Bandung persistently reach the moderate or unhealthy pollution levels.In Singapore, modeled frequency of moderate and unhealthy cases also increase from 22 to 66 %, and in Manila from 8 to 36 %.Furthermore, the number of premature mortalities in Singapore and Manila increases significantly from 0 to 230 and 130, respectively (Table 7).These results indicate the importance for models to include anthropogenic fugitive and industrial dusts in order to capture low-visibility events in the region.
5 Experiment in applying machine learning algorithms to predict the occurrence of PM 2.5 caused LVDs Traditional physical models such as WRF-Chem are developed based on equations describing fluid dynamics, physical processes, and chemical reactions to link these processes on different scales and to predict consequences resulting from circulation and physiochemical process evolutions.However, 201-300 0 ± 0 % 0 ± 0 % Hazardous 301-400 0 ± 0 % 0 ± 0 % Hazardous 401-500 0 ± 0 % 0 ± 0 % various parameterizations, and numerical and input data errors can all lead to the uncertainty of model prediction.Specifically, for the task of forecasting the occurrence of haze events (i.e., LVDs), using these models is nearly impossible due to the lack of real-time emission estimates to drive aerosol chemical and physical processes.On the other hand, machine learning algorithms permit interpretation of large quantity of complex historical data based on computer analyses, and this capacity of machine learning seems promising for us to derive suitable conditions for hazes from historical data and hence to forecast the likelihood of the occurrence of such events.We hence experiment using the so-called supervised learning skill that trains or optimizes a machine to produce the outcomes based on input data (or features) as close as possible to known results or gaining an accuracy as high as possible.In our experiment, we applied six different machine learning algorithms, nearest neighbors (Pedregosa et al., 2011), linear support vector machine (SVM; Schölkopf and Smola, 2002), SVM with radial basis function kernel (non-linear SVM;Scholkopf et al., 1997;Quinlan, 1986), decision tree (Quinlan, 1986), random forest (Breiman, 2001), and neural network (Haykin et al., 2009), to reproduce past visibility patterns or to predict haze occurrence.Through the supervised learning procedure, we have also examined the importance of each input variable.These machine learning machines are trained for predicting LVDs at three airports in Singapore reporting to the GSOD, i.e., Changi, Seletar, and Paya Lebar.All of the input data or features are listed in Table S5.Data are available from 2000 to 2015 at Changi and Paya Lebar but only between 2004 and 2015 at Seletar.
We have used several different classifications in the training.The first one uses two classes, corresponding to haze (visibility lower or equal to 10 km) and non-haze (visibility higher than 10 km) events.Another applied two-class classification uses 7 km instead of 10 km in identifying the haze events.In addition, a three-class classification has also been tested, which includes two haze classes: visibility lowers than 7 km and between 10 and 7 km, respectively.The trainingtesting ratio is set to be 60 : 40.
In our study, the highest validation accuracy and F 1 score (Powers, 2011) in any algorithm appear in the machine for Changi site, while the difference in accuracy between each algorithm is small (Figs. 7 and S5).However, the accuracy for all the algorithms at Seletar and Paya Lebar drops dramatically by about 20-30 % in two-class classification using 10 km visibility and three-class classification.The reason for the best performances in Changi is likely to be the lowest frequency of haze events at this site (accounting for only 10 % of the total LVDs), in comparison, 37 and 44 % of haze events occurred at Paya Lebar and Seletar during the training time period, respectively.The machines also predict non-haze events with higher accuracy than haze events at Changi.Using severe haze (visibility <7 km) instead of moderate haze (visibility <10 km) to label haze event can also increase accuracy (over 80 %).This could be due to the fact that severe haze events are primarily caused by heavy biomass burning, whose occurrence would be well captured in the satellite hotspot input data.
Besides accuracy and F 1 score analysis, we have also used the feature importance function in the scikit-learn random forest package to measure the importance of various features (i.e., Gini importance; Pedregosa et al., 2011).The function takes an array of features and computes the normalized total reduction of the criterion brought by that feature.The higher the value, the more important the feature is to the forecasting machine.We find that the hotspot counts from three fire regions are ranked consistently among the top three most important features for most machine learning predictions in all three classifications (Figs. 8,S6 and S7).The values of importance of hotspot counts are higher than 0.15.Analysis also suggests that "Month" is among the top five most important features in all machines, followed by wind direction and relative humidity (Fig. 8), implying that besides fire hotspots, the  S5.
seasonal monsoon wind patterns and wind-related weather conditions (i.e., SRV in Fig. 8) are also important factors in forecasting the occurrence of haze events in Singapore.In addition, relative humidity is a critical variable for visibility (i.e., growth of hygroscopic particles can drastically enhance the light extinction).These results are consistent with previous studies of haze events in Singapore (Reid et al., 2012;Lee et al., 2017).Nevertheless, previous works by Reid et al. (2012) and Lee et al. (2017) also suggested relationships between fire hotspot appearance and certain weather phenomena, particularly precipitation.Therefore, we are surprised that precipitation in the fire regions does not appear to be a significant feature for predicting Singapore haze compared with other features in our current analysis.

Summary
We have quantified the impacts of fire (emitted from biomass burning) and non-fire (emitted from anthropogenic sources other than biomass burning) aerosols on air quality and visi-bility degradation over Southeast Asia, by using WRF-Chem in three scenarios driven respectively by aerosol emissions from (a) fossil fuel burning only, (b) biomass burning only, and (c) both fossil fuel and biomass burning.These model results reveal that 39 % of observed LVDs in 50 ASEAN cities can be explained by either fossil fuel burning or biomass burning emissions alone when they coexist, a further 20 % by fossil fuel burning alone, a further 8 % by biomass burning alone, and a further 5 % by a combination of fossil fuel burning and biomass burning.The remaining 28 % of observed LVDs remain unexplained, likely due to emissions sources that have not been accounted for.Our results show that owing to the economic growth in Southeast Asia, non-fire aerosols have become the major reason for LVDs in most Southeast Asian cities.However, for certain cities including Singapore, LVDs are likely caused by coexisting fire and non-fire aerosols.Hence, both fire and non-fire emissions play important roles in visibility degradation in Southeast Asia.Furthermore, we also used air quality index (AQI) derived from modeled 9 h O 3 and 24 h PM 2.5 to analyze the air quality of 50 ASEAN cities.The results are consistent with the visibility modeling and analysis, indicating that PM 2.5 particles, primarily those from non-fire emissions, are the major reason behind high AQI (PM2.5)occurrence in these Southeast Asian cities.In addition to non-fire PM 2.5 stand-alone cases, coexisting fire and non-fire PM 2.5 jointly caused an increase of 11 % in bad air quality events with moderate or unhealthy pollution levels (23 versus 34 %).The premature mortality among the analyzed ASEAN cities has increased from ∼ 4110 in 2002 to ∼ 6540 in 2008.Bangkok (Thailand), Jakarta (Indonesia), and Hanoi (Vietnam) are the top three cities in our analysis for premature mortality due to air pollution, with 1080, 910, and 620 premature mortalities per year, respectively.
We find the reason behind the model's mis-capturing of 28 % of observed LVDs averaged over 50 ASEAN cities is largely due to a lack of inclusion of anthropogenic fugitive and industrial sources, as well as road dust from urban sources, in the emission inventories used in this study.Using PM 2.5 chemical composition data from the SPARTAN stations in Hanoi, Singapore, Bandung, and Manila to fill the missing aerosol components from these excluded sources can drastically increase the captured LVDs by the model in these cities, for example, by 47 % in Singapore.The improvement in LVD prediction is especially substantial in non-fire aerosols alone cases (Type 2; from 5 to 25 %) and coexisting fire and non-fire aerosols cases (Type 4; from 14 to 40 %).Including the missing anthropogenic aerosols in modeled results also increases the occurrence of cases with moderate and unhealthy air pollution levels from 22 to 66 % in Singapore.Our study clearly demonstrates the importance of anthropogenic aerosols along with other fugitive industrial and urban sources in air quality and visibility degradation in certain Southeast Asian cities such as Singapore.
We have also experimented using six different machine learning algorithms to predict the occurrence of LVDs caused by PM 2.5 .The focus is on forecasting hazes in three surface visibility observation sites in Singapore.We find that the machine learning algorithms can predict severe haze events (visibility <7 km) with an accuracy greater than 80 % in any of these stations.On the other hand, the accuracy is found to be sensitive to the selection of features, labeling of outcome, and forecast sites.
The current study extends our previous effort (Lee et al., 2017) by using a model including a full chemistry and aerosol package instead of a smoke aerosol module without chemistry.The added model capacity provides a more complete quantitative description of physiochemical processes that allows us to better analyze the contribution of fire versus non-fire aerosols to the regional air quality and visibility degradation.Our results show that the majority of the population in Southeast Asian cities are exposed to air pollution that can be mostly attributed to non-fire aerosols.On the other hand, our analysis also suggests that for certain cities such as Singapore, severe air pollution is likely caused by coexisting fire and non-fire aerosols.All of these further complicate the options for air pollution mitigation.

Figure 1 .
Figure 1.Model domain used for simulations.The blue color region indicates the fossil fuel emission coverage from the Regional Emission inventory in ASia (REAS).The rest of the domain uses the fossil fuel emission from the Emissions Database for Global Atmospheric Research (EDGAR).

Figure 2 .
Figure 2. Logical chart for fire (BB), non-fire (FF), or coexisting fire and non-fire (FF + BB) aerosols caused LVDs."Obs.LVD" is an LVD identified from observation.Then, the modeled visibility from FF (VIS FF ), BB (VIS BB ), and FFBB (VIS FFBB ) are used to classify observed LVD into five types.Type 1 LVD represents the cases where either fire or non-fire aerosols alone can cause the observed LVD to occur.Type 2 means that non-fire aerosols are the major contributor to the observed LVD.Type 3 means that fire aerosols are the major contributor to the observed LVD.Type 4 represents the cases where the observed LVD is induced by coexisting fire and non-fire aerosols.The observed LVDs that the model cannot capture are classified as Type 5.

Figure 3 .
Figure 3. (a) Time series of daily surface PM 10 (µg m −3 ; AQI derived) from the ground-based observations (black line) and FFBBsimulated results (orange line) in Kuala Lumpur, Malaysia, during October 2005-December 2008.(b) Time series of daily surface CO mixing ratio (ppbv) from the ground-based observations (black line) and FFBB-simulated results (orange line) in Bukit Kototabang, Indonesia, during 2002-2008.(c) Same as (b) but for surface O 3 .

Figure 4 .
Figure 4. Comparison of daily visibility between GSOD observation (black line) and FFBB-simulated results (orange line) in Singapore during the fire seasons from 2002 to 2008.A, S, and O on the x axis indicates August, September, and October.

Figure 6 .
Figure 6.Premature mortality in different years from 2002 to 2008 and cities in Association of Southeast Asian Nations (ASEAN) due to exposures PM 2.5 in FFBB (95 % confidence intervals).Colors from green to red represent relative number scale.

Figure 7 .
Figure 7.The testing accuracy in six machine learning algorithms for 2 two-class (7 or 10 km visibility as a breakpoint) and 1 threeclass classifications haze prediction in (a) Changi, (b) Paya Lebar, and (c) Seletar.

Figure 8 .
Figure 8. Feature importance by using two-class classification random forest algorithm in (a) Changi, (b) Paya Lebar, and (c) Seletar.Desired outputs, haze versus non-haze events, are defined by using visibility 10 km as a breakpoint.The full names of each input feature are listed in TableS5.

Table 1 .
Mean annual emissions of BC, OC, SO 2 , CO, and NO 2 from biomass burning emission (BB) and fossil fuel burning emission (FF) in the simulated domain from 2002 to 2008.Parentheses show the percentage of emission from fire and non-fire sources.

Table 2 .
The contribution of fire aerosols (BB), non-fire aerosols (FF), or coexisting aerosols to LVDs (based on the logic chart in Fig.2) in Bangkok, Kuala Lumpur, Singapore, and among 50 Association of Southeast Asian Nations (ASEAN) cities during 2002-2008.This highlights the importance of fire aerosols in worsening air quality of otherwise moderate haze conditions under the existing suspended non-fire aerosols.Overall, the model missed about 29 % of LVDs of Bangkok during the simulation period.Haze occurs slightly less frequently in Kuala Lumpur than Bangkok.There are about 104 LVDs per year in Kuala Lumpur during 2002-2008.Thirty-six percent of these LVDs are caused by either fire or non-fire aerosols, while 15 % of the LVDs need a combination of both aerosol sources to form haze (Table The annual mean simulated PM 2.5 concentration (µg m −3 ) in 50 Association of Southeast Asian Nations (ASEAN) cities, derived from FF (red), BB (blue), and FFBB (green) simulations and averaged over the period2002-2008.

Table 3 .
The frequency of air pollution levels in Bangkok, Kuala Lumpur, Singapore, and 50 Association of Southeast Asian Nations (ASEAN) cities derived using 9 h ozone (O 3 ) volume mixing ratio in FF,BB, and FFBB during 2002-2008.

Table 4 .
Same as Table3but using 24 h PM 2.5 concentration.

Table 5 .
The old (without missing anthropogenic aerosol components) and new (with missing anthropogenic aerosol components in FF and FFBB) calculated percentage of observed LVDs, categorized according the type classification explained in Fig.2.

Table 6 .
The frequency of various daily air pollution levels in Hanoi, Singapore, Bandung, and Manila derived using 24 h PM 2.5 concentration with (new) and without (old) the missing anthropogenic aerosol components in FFBB during2002-2008.