Monsoonal variations in aerosol optical properties and estimation of aerosol optical depth using ground-based meteorological and air quality data in Peninsular Malaysia

Obtaining continuous aerosol-optical-depth (AOD) measurements is a difficult task due to the cloudcover problem. With the main motivation of overcoming this problem, an AOD-predicting model is proposed. In this study, the optical properties of aerosols in Penang, Malaysia were analyzed for four monsoonal seasons (northeast monsoon, pre-monsoon, southwest monsoon, and post-monsoon) based on data from the AErosol RObotic NETwork (AERONET) from February 2012 to November 2013. The aerosol distribution patterns in Penang for each monsoonal period were quantitatively identified according to the scattering plots of the Ångström exponent against the AOD. A new empirical algorithm was proposed to predict the AOD data. Ground-based measurements (i.e., visibility and air pollutant index) were used in the model as predictor data to retrieve the missing AOD data from AERONET due to frequent cloud formation in the equatorial region. The model coefficients were determined through multiple regression analysis using selected data set from in situ data. The calibrated model coefficients have a coefficient of determination, R, of 0.72. The predicted AOD of the model was generated based on these calibrated coefficients and compared against the measured data through standard statistical tests, yielding a R of 0.68 as validation accuracy. The error in weighted mean absolute percentage error (wMAPE) was less than 0.40 % compared with the real data. The results revealed that the proposed model efficiently predicted the AOD data. Performance of our model was compared against selected LIDAR data to yield good correspondence. The predicted AOD can enhance measured shortand long-term AOD and provide supplementary information for climatological studies and monitoring aerosol variation.


Introduction
Air quality issues in Asia can be attributed to unavoidable climate change impacts and the negative impact of anthropogenic activities arising from rapid population growth, industrialization and urbanization (IPCC, 2007(IPCC, , 2013)).Aerosol optical depth (AOD) derived from remote sensing has potential for assessing air quality.In general, spatial and temporal variations in AOD data are large since they depend on production sources, transport and removal processes that are modulated by local and synoptic meteorological conditions.Many small-scale studies on the optical properties of aerosols have been conducted by Chew et al. (2013), Mishra et al. (2013), and Salinas et al. (2013) using sun and sky scanning radiometers of AErosol RObotic NETwork (AERONET) (Holben et al., 1998).These methods are limited spatially relative to satellite imagery, and therefore are complementary for comprehensive studies on atmospheric aerosols.Continuous measurements of AOD data are difficult because the atmosphere is frequently cloudy.To better monitor and understand aerosol variation, sufficient measurements and a practical observation paradigm of aerosols are necessary (Hansen et al., 1997;Tripathi et al., 2005;Kaskaoutis et al., 2007;Kaskaoutis and Kambezidis, 2008;Russell et al., 2010).
Southeast Asia (SEA) stands out globally as it hosts one of the most complex meteorological and environmental conditions, making remote sensing difficult both for AERONET and satellites (Reid et al., 2013).Cloud-cleared data leave gaps in our remote sensing data record, and conversely residual cloud contamination of remotely sensed data cause challenging tasks for scientists studying aerosols (Chew et al., 2011;Campbell et al., 2013).Moreover, anthropogenic biomass burning activities have increased dramatically in recent decades for land preparation and forest clearance (Field et al., 2009).These fire activities result in trans-boundary and long-range transport of aerosols that often affect air quality in both source and surrounding regions (Hyer and Chew, 2010;Reid et al., 2013;Salinas et al., 2013;N.-H. Lin et al., 2014).Those aerosols mix with locally generated aerosols (Engling et al., 2014).Therefore, it is potentially valuable to develop a regional/local model to estimate and monitor AOD.
Development of an empirical model to produce reliable AOD estimates for aerosol monitoring at local scales is novel and necessary for SEA, with potential global applications (Chen et al., 2013;Fan et al., 2013).Several researchers have used models as alternative tools to predict AOD values by using various ground based meteorology measurements (Wang et al., 2009;Qin et al., 2010;J. Lin et al., 2014).However, this approach has not yet been applied over the Malay Peninsula region of SEA.
Previous studies indicate that AOD is proportional to air quality parameters such as particulate matter (PM) with diameters less than 10 or 2.5 µm (PM 10 or PM 2.5 ) (Wang and Christopher, 2003;Cordero et al., 2012;Mielonen et al., 2012;Mogo et al., 2012;Müller et al., 2012) but inversely proportional to visibility (Vis) (Horvath, 1995;Li and Lu, 1997;Peppler et al., 2000;Bäumer et al., 2008;Singh and Dey, 2012) assuming most of the aerosol is at the surface.However, there are studies stating that AOD is not always highly correlated to surface or horizontal measurements especially with the occurrence of an elevated layer of AOD from transported dust or biomass burning (Mahowald et al., 2007;Barladeanu et al., 2012;Chen et al., 2013;Toth et al., 2014).
In this paper, our goal is to build on previous experience to develop an AOD prediction model based on three types of measured data, namely (i) RH, (ii) Vis and (iii) air pollution index (API).These parameters are measured routinely at many ground-based stations.The AOD prediction model based on these routine measurements is necessary to establish a long-term database for (i) climatological studies, (ii) providing continuous atmospheric columnar AOD data, and (iii) monitoring aerosol variation such as diurnal cycles of AOD.Meanwhile, it is important to understand the source of and dominant type of aerosol in this study.There is an absence of understanding these factors on a local scale.
AOD measurements were obtained through the AERONET site located in Universiti Sains Malaysia (USM) with geocoordinates 5.36 • N and 100.30• E. All AERONET direct sun data used were Level 2 quality assured (Smirnov et al., 2000).The Vis and API data were taken from the meteorological stations at the Penang International Airport and USM.All data were taken between 2012 and 2013.The aerosol characteristics in Penang are comprehensively analyzed for variation based on changes in seasonal monsoons.A near real-time AOD model is established based on multiple regression analysis of Vis and API.The accuracy and efficiency of the model are evaluated to assess air quality at Penang.

Methodology and statistical model
The present work was based on previous studies of Tan et al. (2014a, b).They predicted AOD using multiple regression analysis based on meteorological and air quality data.The AOD prediction model has been validated and successfully proven for the southwest monsoon period (June-September 2012) in Penang Island, Malaysia.However, the following issues require reconciliation: (i) under-and overprediction of AOD were not assessed because of the lack of available LIDAR data to monitor the variations in the vertical profile of the aerosol distribution, (ii) the algorithm was insufficiently robust because only a 4-month data set was considered; and (iii) seasonal changes other than southwest monsoon were not included in their study.The present study uses a 2-year data set (2012,2013) at Penang to validate the algorithms proposed by Tan et al. (2014a, b).
Penang is an island located in the northwestern region of Peninsular Malaysia that lies within latitudes 5.20 to 5.50 • N and longitudes 100.15 to 100.43 • E (Fig. 4).The weather is warm and humid year-round.However, two main monsoon seasons exist, northeast and southwest monsoons.Considering previous analyses of aerosol and air quality (Awang et al., 2000;Krishna Moorthy et al., 2007;Suresh Babu et al., 2007;Kumar and Devara, 2012;Chew et al., 2013;Xian et al., 2013), the monsoon periods in this study were classified as follows: (i) northeast monsoon (December-March), (ii) transition period of northeast to southwest monsoon or pre-monsoon (April-May), (iii) southwest monsoon (June-September), and (iv) transition period of southwest to northeast monsoon or post-monsoon (October-November).
The AOD and Ångström exponent are analyzed to identify the aerosol and broadly characterize properties in Penang during each period.Precipitable water (PW) was used to indicate the amount of the total water content in the atmosphere.The seasonal variations in AOD, Ångström exponent, and PW based on frequency distribution patterns are identified.The aerosol types are seasonally discriminated from scatter plots of Ångström exponent against the AOD.Threshold values in the scatter plot for aerosol classification have been previously reported by Smirnov et al. (2002bSmirnov et al. ( , 2003)), Pace et al. (2006), Kaskaotis et al. (2007), Toledano et al. (2007), Salinas et al. (2009), andJalal et al. (2012).The data selection criteria proposed by Tan et al. (2014a) are used in this study.Seasonal back-trajectory frequency plots from the Hybrid Single Particle Lagrangian Integrated Trajectory (HYS-PLIT_4) model are used to identify the frequency occurrence of origin sources for aerosol and transported pathways (Draxler and Hess, 1998;Wai et al., 2014).AOD, API, and Vis data were selected according to the procedure of Tan et al. (2014a) to generate predicted AOD data.AOD is computed from the solar transmission measured at 340,380,440,500,675,1020, and 1640 nm using the automatic tracking sun and sky scanning radiometers (Holben et al., 1998).These AOD data can be obtained from the AERONET web page (http:// aeronet.gsfc.nasa.gov).AERONET data has three different levels.Level 1.0 is cloud-unscreened data, and Level 1.5 is cloud-screened data.Only Level 2.0 was employed in this study because they are cloud screened and quality assured (Smirnov et al., 2000).Vis data were retrieved online from Weather Underground (http://www.wunderground.com) or from NOAA satellite (http://www7.ncdc.noaa.gov/CDO/cdo).Hourly data free from rainfall, thunderstorms, or fog during the calculations were utilized to predict AOD.Air quality in Malaysia is reported in terms of API, which can be obtained from the Department of Environment in Malaysia (http://apims.doe.gov.my/apims/).API is calculated from carbon monoxide, ozone, nitrogen dioxide, sulfur dioxide and PM 10 .Malaysia is mainly polluted by PM 10 (DOE, 2010).Therefore only API that are predominantly due to PM 10 are used in this study.API is computed using the technique developed by US-EPA.The Malaysian Department of Environment provides a standardized procedure on how to calculate API values (DOE, 1997).The conversion between API and PM 10 h as been shown in the guideline provided in DOE (1997).
A total of 790 data points from 2012 to 2013 were used.Initially, the data sets were separated into five sets as follows: (i) northeast monsoon, (ii) pre-monsoon, (iii) southwest monsoon, (iv) post-monsoon and (v) annual data set (overall).The number of data points for northeast monsoon, premonsoon, southwest monsoon and post-monsoon were 257, 132, 235, and 166 respectively.In a particular seasonal monsoon period there are n data, [D 1 , D 2 , D 3 , D 4 , D 5 , . ..D n ], which are arranged sequentially in time.The data for each seasonal monsoon were further divided into two subsets, in the form of [D 1 , D 3 , D 5 , . ..] and [D 2 , D 4 , D 6 , . ..] .The first data subset was used to calibrate (Eq. 1) for AOD at 500 nm, given below: where RH is the surface relative humidity (Tan et al., 2014a).
The root mean square error (RMSE), coefficient of determination (R 2 ), and weighted mean absolute percentage error (wMAPE) between the measured and predicted AOD for each seasonal model were calculated at 95 % confidence level.The wMAPE parameter was used to quantify the systematic differences between the concentration levels.This parameter is given as follows: wMAPE = ( |((AOD p,i − AOD m,i )/ AOD m,i )| • AOD m,i / AOD m,i ) • 100, where the subscript p refers to predicted, m to measured and i to individual measurements.The ability of the proposed model to produce reliable AOD estimates for temporal air quality monitoring can be quantitatively justified or falsified based on the value of the resultant wMAPE.
Aerosols can be hydrophilic or hydrophobic, and these properties can have a non-trivial impact on the magnitude of the retrieval AOD (Tang, 1996;Song et al., 2007;de Meij et al., 2012;Singh and Dey, 2012;Ramachandran and Srivastava, 2013;Wang et al., 2013;van Beelen et al., 2014).However, to discriminate between hydrophilic and hydrophobic aerosols requires additional resources beyond the reach of the present study.Most fine mode aerosols, such as sulfates (which likely dominate urban industrial aerosol composition), are hygrophilic such that one would expect RH to exert a significant influence on the measured AOD.Given that Penang is dominated by urban industrial aerosols, one would expect RH to be an important variable in the model.However, our pre-analysis showed that RH does not contribute significantly to AOD prediction.We suggest that the RH, which is very high year around in Penang, exerts less influence on AOD than we would see in drier climates.If RH were considered as a predictor, its related factors (e.g., aerosol stratification (dust or smoke aloft), convection, and hysteresis in particles) should otherwise be taken into account.The contribution of RH to the aerosol properties was integrated in the aerosol model of Srivastava et al. (2012), because the net effect of RH on aerosol and related factors were difficult to quantify.In similar spirit, the RH contribution is disregarded in the present model, yielding Eq. (2), given as follows: AOD =a 0 + a 1 (Vis) + a 2 (Vis) 2 + a 3 (Vis) 3 + a 4 (API) + a 5 (API) 2 + a 6 (API) 3 .
(2) RMSE, R 2 , wMAPE were calculated for Eq.(2) in each monsoon season.The data for subset 1 (i.e., [D 1 , D 3 , D 5 , . ..] ) was used for calibration.The data for subset 2 ([D 2 , D 4 , D 6 , . ..] ) was used for cross-validation.Lee et al. (2012) exclude days when the deviation between the measured and predicted values was greater than RMSE, or when the estimated AOD slope was negative because of measurement errors and cloud-contaminated AOD.The potential outliers in our model were removed following the approaches of Lee et al. (2012).The aforementioned procedures were used to calibrate the AOD prediction model, Eq. (2), using the resulting data from subset 1 after the elimination of outliers.The resultant coefficients of the calibration were then applied to data for subset 2 for cross validation, in which the predicted AOD values were compared with the measured data from AERONET.
Equation (2) was then applied to retrieve the AOD when no AOD values (from AERONET) were available.The underand overpredicted AOD were examined using a Raymetrics LIDAR system if the data were available.Our LIDAR system is co-located with the Cimel sunphotometer at the rooftop F. Tan et al.: Aerosol optical properties and estimation of AOD of School of Physics, USM (longitude 100.30• E, latitude 5.36 • N).The detailed description of this LIDAR system can be found in articles written by Tan et al. (2013Tan et al. ( , 2014c)).The LIDAR signals were pre-analyzed based on the protocol mentioned in Tan et al. (2013Tan et al. ( , 2014c)).We shall briefly illustrate the protocol here.First, background solar radiation in the LIDAR signal has to be deducted.Then the analog and photon signals in the LIDAR signal are combined to enhance the near and far field signals.The range-corrected signal (RCS) is obtained by multiplying the combined signal with a range square.To increase the signal-to-noise ratio, every 10 data files (each file contains data taken for 1 min) are averaged over to give a single 10 min averaged data.Then, the spatial resolution is determined by averaging over 10 bins (each bin is separated at a distance of 7.5 m) spatially to obtain a 75 m resolution profile.The RCS is then normalized by calibrating it against theoretical molecular backscatter according to the USSA976 standard atmosphere.
Often, the raw data are contaminated by presence of clouds during data-taking.Such contamination has to be removed so that the data is clean for the purpose of abstracting the values of AOD.LIDAR scattering ratio (defined as LIDAR signal divided by molecular backscatter signal) (Wang and Sassen, 2001;Lo et al., 2006) is used as a means to remove the cloud contaminated data.The referred LIDAR signal here is RCS, whereas molecular backscatter signal is referred to the attenuated molecular backscattering.The backscatter coefficients of the aerosol from LIDAR data are then determined using the method of Fernald (1984).
To strengthen our AOD prediction model, the variability in the retrieved AOD is compared to AOD retrieved from the LIDAR signal.Our LIDAR uses a laser pulse of wavelength 355 nm, whereas the AERONET data are taken at a different wavelength.A conversion is performed to obtain AOD data from AERONET at 355 nm as described in Eq. ( 3) using the Ångström power law (Ångström, 1929).It is used for Ångström exponent estimation (∝) in terms of AOD (τ a ) measured at wavelength λ 1 = 340 nm and λ 2 = 380 nm.In principle, if AOD and Ångström exponent at one wavelength are known, AOD at a different wavelength can be computed, within the range of validity of Eq. ( 3), as Therefore, AOD at wavelength 355 nm can be calculated as After the conversions, we repeat the procedure as described above to obtain a new set of coefficients at 355 nm for the AOD predicting model.
Next, an AOD value is obtained from the LIDAR signal.A LIDAR ratio (L) is a constant, defined as the ratio of aerosol extinction coefficient (α a ) and backscatter coefficient (β a ) (see Eq. 5).The value of L depends on the particle size distribution, shape and composition.R in Eq. ( 5) is the range or altitude.α a can be obtained once β a and L are known.The value of L has to be assumed for an elastic LIDAR system (Fernald, 1984;He et al., 2006;Lopes et al., 2012).Normally, L values can range from 20-40 sr for clean and polluted marine aerosol particles or dust, urban aerosols (40-60 sr), and biomass burning aerosols (60-80 sr) as suggested by Chew et al. (2013).
The value of L to be adopted for calculating α a depends on which dominant aerosol type is in the atmosphere.To arrive at a specific value for L is somewhat arbitrary.Different authors adopt different strategies to fix the value of L. In this study, the following strategy is adopted: the aerosol type is first identified by using a scatter plot of the Ångström exponent against the AOD (from AERONET data).Once the dominant aerosol type is determined the corresponding L value is set to be the mean value of the range suggested by Chew et al. (2013) for that particular aerosol type.Specifically, for clean and polluted marine aerosol particles or dust, L = 30 sr; for urban aerosols, L = 50 sr; for biomass burning aerosols, L = 70 sr.
AOD values (τ a ) can be obtained using Eq. ( 6), where R max is the maximum height of the aerosol distribution, and R 0 is the height where the overlap function, O(R) = 1, in our system R 0 is around 200 m.Inaccurate assumption of L can lead to large errors in the retrieval of α a and τ a (He et al., 2006), especially under inhomogeneous atmospheric conditions.Therefore, 10 % uncertainty of L and typical values of 7 % uncertainty for β a are set to estimate potentially erroneous values of α a at any given R in an atmospheric profile.Finally, all uncertainties in the profile are summed to obtain the uncertainty of the estimated columnar AOD.
The LIDAR estimated AOD values obtained were then compared against those predicted by our developed AOD prediction model.

Climatology of Penang, Malaysia
The climatological results derived from AERONET (http://aeronet.gsfc.nasa.gov/new_web/V2/climo_new/USM_Penang_500.html),based on the work of Holben et al. (2001), for USM Penang are tabulated in Table 1.The monthly AOD (referred to as AOD_500, second column) shows that the two lowest values are 0.18 and 0.19 during the inter-monsoon period (October-November and May).During the southwest monsoon period (June-September), smoke emitted locally and from large-scale open burning activities in Sumatra, Indonesia is transported to Malaysia, yielding the highest AOD at approximately 0.31-0.73.However, AOD is 0.21-0.24during the northeast monsoon period (December-February).Small aerosol particles contribute primarily to the air pollution in Penang, as the average Ångström exponent for wavelength between 440 and 870 nm (referred to as Ångström 440−870 ) is higher than 1.1.On the other hand, the precipitable water values (referred to as PW) were greater than 4.1, which indicate that Penang has humid atmospheres (Okulov et al., 2002).

Seasonal variations of AOD, Ångström exponent, and PW based on frequency distribution patterns
AERONET parameters are plotted (Fig. 1) to reveal the relative frequency distribution at Penang for each seasonal monsoon.Frequency histograms of AOD_500 and Ångström 440−870 (Fig. 1a-b, respectively) indicate changes in the optical properties of aerosols, whereas Fig. 1c shows the amount of water content in the atmospheric column for each season.These histograms here help distinguish aerosol types (Pace et al., 2006;Salinas et al., 2009;Smirnov et al., 2002aSmirnov et al., , 2011)).Our results show that the distributed AOD mainly ranges from 0.2 to 0.4, contributing to approximately 71 % of the total occurrence (Fig. 1a). Figure 1b shows that the Ångström exponent is typically between 1.3 and 1.7, translating to ∼ 72 % of the total.About 67 % of the total occurrence of PW ranged from 4.5 to 5.0 cm (Fig. 1c).
The maximum AOD frequency was centered near 0.2 for all seasons.The clearest season was the post-monsoon (Fig. 1a).Penang was most polluted in the southwest monsoon, most likely due to active open burning activities in Sumatra.The AOD peak was approximately 1.4, with three peaks distributed from AOD_500 = 0.1 to AOD_500 = 1.4 (Fig. 1a).The multiple peaks imply the presence of various aerosol populations, because AOD histograms follow lognormal distribution patterns (Salinas et al., 2009).By contrast, a single peak was observed for the clearest season (postmonsoon).
The frequency distributions as a function of Ångström exponent display a trend (Fig. 1b) in which approximately 95 % of the total occurrence falls within the range of 1-2 Å.This result implies that the effect of coarse particles (e.g., dust) on the study site was minimal.This statement is supported by Campbell et al. (2013), who showed that dust particles are uncommon in Southeast Asia.However, sometimes dust particle concentrations can be enhanced above the boundary layer.Two noticeable peaks were observed for the Ångström exponent during the northeast monsoon period (blue curve, Fig. 1b).These aerosols originated from the northern part of Southeast Asia, particularly Indochina, transported by the monsoon wind and mixed with locally emitted aerosols.Lin et al. (2013) analyzed aerosols in the northern region of Southeast Asia.They found that biomass burning aerosols from Indochina were transported in high-and low-level pathways to the west, and then later shifted to the southwest by northeast monsoons.
Biomass burning aerosols were continuously transported to our study site, as the wind circulation flows toward the southwest direction according to the monthly mean streamline charts of Lin et al. (2013) from 1979 to 2010.During and before the southwest monsoon, Ångström exponents in Penang ranged between 1.4 and 1.8, indicating the likely presence of biomass burning aerosols (Holben et al., 2001;Gerasopoulos et al., 2003;Toledano et al., 2007).They are likely to originate from local and neighboring countries.Indonesia is known to be very active in open burning during this season.Furthermore, southwest monsoonal winds are likely to have transported these biomass burning aerosols to Penang.
The southwest monsoon period is the driest season in Malaysia.PW frequency was approximately 20 % lower than that of the northeast monsoon period with PW < 4.0 (Fig. 1c).Marked variations in the PW frequency were observed during the northeast monsoon period.Almost no data were obtained for PW < 3.5, except the northeast monsoon period with about 14 % less than this value.The most humid period took place in the post-monsoon, with PW ranging from 5.0 to 5.5 (approximately 74 % of the total occurrence).

Seasonal discrimination of aerosol types based on the relationship between AOD and Ångström exponent
Aerosol clusters have been developed using relatively simple scatter plots of AOD and Ångström exponent.Similar studies have performed this analysis using AERONET data.These data sets have been applied at different locations, such as the Persian Gulf (Smirnov et al., 2002a); several oceanic regions (Smirnov et al., 2002b); Brazil, Italy, Nauru, and Saudi Arabia (Kaskaoutis et al., 2007); Spain (Toledano et al., 2007); Singapore (Salinas et al., 2009); Kuching (Jalal et al., 2012); and the multi-filter rotating shadowband radiometer in central Mediterranean (Pace et al., 2006).The scatter plot of Ångström 440−870 against AOD_500 or AOD_440 was used to characterize the aerosol type.The wavelength range of Ångström 440−870 was used because of its nearness to the typical size range of aerosol based on spectral AOD (Eck et al., 1999).The relation between AOD values at 500 nm and Ångström 440-870 is commonly used for aerosol classification in scatter plot diagrams (Kaskaoutis et al., 2007).Optically, 500 nm is an effective visible wavelength suitable for aerosol study (Stone, 2002).Aerosols are classified into five types, including dust, maritime, continental/urban/industrial, biomass burning, and mixed aerosols (Ichoku et al., 2004).Mixed aerosols in practice represent an indistinguishable type that cannot be categorized into any of the previous types.To effectively identify the aerosol distribution types in our study sites, the results are compared using different threshold criteria (Table 2), as presented in Fig. 2.
The thresholds proposed by Pace et al. (2006) and Kaskaoutis et al. (2007) failed to distinguish the maritime aerosol (MA) and dust aerosol (DA).Instead, they show that mixed-type aerosols (MIXA) are dominant at Penang (50-72 %).Urban and industrial (UIA) and biomass burning (BMA) aerosols are grouped into a single class (28-50 % of the total occurrence).Meanwhile, the threshold suggested by Smirnov et al. (2002bSmirnov et al. ( , 2003) ) failed to identify DA, UIA, and BMA, but efficiently identified MA.As a result, a large amount of MIXA was obtained (> 80 % of the total occurrence).These results reveal the extent of regional uncertainty.Indistinguishable aerosol types in the study sites are significant.Salinas et al. (2009) suggested that the determination of DA and BMA does not correspond entirely to the range of threshold used in our study, in which the amount of MIXA (approximately 43 % of the total occurrence) was large.Jalal et al. (2012) efficiently identified aerosol types using an alternative threshold criterion.Using their threshold, we find a low amount of MIXA of approximately 21 %.However, the determination of DA was unsatisfactory.The threshold criteria of Toledano et al. (2007) provided the least MIXA (< 5 %; Fig. 2).All thresholds consistently increased from June to September (Fig. 2c), coinciding with the occurrence of haze.UIA was constantly and highly distributed over Penang.Overall, the thresholds provided by Toledano et al. (2007) were selected for our study.
Based on the criteria suggested by Toledano et al. (2007), the UIA class was determined as the highest frequency of occurrence in the overall study period (Fig. 3).This could be the result of Penang being an urban area.The next highest was the MA class, likely due to its geolocation (i.e., sur- rounded by the sea).BMA is also one of the major pollutants in Penang, which was produced by active burning in local and neighboring countries.These results are consistent with records from our Department of Meteorological, DOE (2010).The study site was minimally affected by DA, which were less than 5 % in each seasonal monsoon.These results are supported by Campbell et al. (2013) who suggest UIA, MA, and BMA are likely the most common aerosol types in Southeast Asia and the maritime continent.

Seasonal flow patterns of air parcels from the HYSPLIT_4 model for identification of aerosol origins
From 7-day seasonal plots of the back-trajectory frequency sourced from the HYSPLIT_4 model, flow patterns reaching the Penang site were determined (Fig. 4) for each monsoon season, averaged between the ground surface up to 5000 m.
Residence time analysis was performed to generate the frequency plot and determine the time percentage of a specific air parcel in a horizontal grid cell across the domain.
During the northeast monsoon period, air parcels flow southwestward from the northern part of Southeast Asia (Fig. 4a), including Indochina, through the South China Sea to Penang.Aerosols observed during the northeast monsoon period are also locally produced, whereas those observed during the southwest monsoon period are predominantly from the Andaman Sea, Malacca Strait, and Sumatra (site of open active burning).
Figure 1b indicates differences in patterns (bimodal distribution pattern) of the seasonal relative frequency of occurrence for Ångström 440−870 during the northeast monsoon compared with the other monsoon periods.These differences are likely attributable to the mixing of various aerosol sources from the northern (e.g., Indochina, Philippines, Taiwan, and eastern China) and southern (e.g., Malaysia and Indonesia) parts of Southeast Asia (refer Fig. 4a).Biomass burning aerosols are likely different for northern and south-Table 2. Threshold values of AOD and Ångström 440−870 for aerosol classification.Abbreviations: MA = maritime, DA = dust, UIA = urban and industrial, BMA = biomass burning, MIXA = mixed-type aerosols.MIXA represents indistinguishable aerosol type that lies beyond the threshold ranges.ern SEA because of different types of burning processes (Wardoyo, 2007;Lopes et al., 2012;Bougiatioti et al., 2014;Kaskaoutis et al., 2014).As a result, a bimodal pattern was observed only for the northeast monsoon period (Fig. 1b).
Figure 1b reveals that the distribution patterns of Ångström exponent between the post-monsoon and northeast monsoon are similar.Figure 4a and d also indicate similarities between the air flow patterns for these monsoon seasons.Hence, a clear correspondence was observed between Fig. 1b with Fig. 4a and d.The similarity in the patterns of Ångström exponents for the post-monsoon and northeast monsoon may be attributed to the mixture of aerosols from the northern and southern parts of Southeast Asia.Given the classification results (Fig. 3), the occurrence frequency of MA was higher during the post-monsoon and northeast monsoon compared with the southwest and pre-monsoon period.The large amount of MA originates from the South China Sea and Andaman Sea.
For the pre-monsoon period, aerosols observed at Penang originate from the Malacca Strait, Andaman Sea, the northern and some eastern areas of Sumatra, and the western part of peninsular Malaysia, especially the local regions marked in yellow (Fig. 4b).During this season, air flow patterns are similar to those during the southwest monsoon (Fig. 4c).However, a small percentage of aerosols are transported from the northern part of Southeast Asia to Penang during the pre-monsoon period.Indonesia is known to be very active in open burning activities during the southwest monsoon.Therefore the BMA observed in Penang during this season is mainly due to local and transboundary aerosol from Indonesia.This phenomenon is reflected in the narrower and sharper curves on larger values of the Ångström exponent in Fig. 1b (detailed explanation in Sect.3.5).A clear correlation is observed between Fig. 1b and Fig. 4b and c during pre-monsoon and southwest monsoon.
The dominant aerosol types are UIA and MA (Fig. 3).The yellow portions in Fig. 4e indicate that for Penang, the second largest city in Malaysia and one of the most industrially concentrated cities, UIA is a major aerosol type.MA contribution to the overall aerosol distribution is likely influenced by proximity of the surrounding sea.

Examination of predicted AOD values
The optical properties of aerosol for each monsoonal season are obtained by analyzing the relative frequency occurrence of AOD_500 and Ångström 440−870 as shown in Fig. 1a and  b.We hypothesize that the proposed AOD prediction model should exhibit different accuracies seasonally because the sensitivity for AOD prediction depends on the distribution patterns of the measured AOD; these values were used as inputs to derive the correlation parameters of the model.The sensitivity of AOD prediction is low when the major occurrence frequency is clustered around small AOD values.The insensitivity of the aerosol models to clear atmospheric conditions (e.g., when AOD is low) was also previously observed (Zhong et al., 2007).
Model performance for each monsoonal season was tested (Table 3).The pre-monsoon and southwest periods exhibited R 2 of 0.65 (RMSE = 0.11) and 0.77 (RMSE = 0.17).However, for the transition period between post-monsoon to northeast monsoon, R 2 values were smaller than 0.45 and RMSE ranged from 0.06 to 0.11.The accuracy of AOD prediction is improved for cases with higher aerosol concentrations.This result is in agreement with the hypothesis mentioned in the previous paragraph.The analysis of 22 months of data (the so called "overall" model) is satisfactory, with R 2 = 0.72 and RMSE = 0.13.The low value of wMAPE (< 1 %) indicates that the model yielded relatively accurate results for all seasons.Given the criteria that a low wMAPE corresponds with a good prediction, the "overall" data set yields the least biased prediction.Therefore it is deemed that the "overall" model (which is obtained by training the model using 22 months of data) can be interpreted as an effective and representative model which can predict AOD in every period.
High correlation was observed between the measured and predicted AOD for the pre-monsoon and southwest monsoon, in which similar air flow patterns occurred (Fig. 4b  and c).On the other hand, the prediction accuracy of the AOD model in the post-monsoon and northeast monsoon seasons was moderate.The air flow patterns in Fig. 4a and  d, which are associated with northeast and post-monsoons respectively, also show similarity.This observation is consistent with Fig. 1b which displays the relative frequencies of occurrence of Ångström 440−870 .When scrutinizing the seasonal curves pair by pair, the post-monsoon and northeast monsoon pair (purple and blue curves) appears to be broader and flatter whereas the other two seasons (red and green curves) are sharper and narrower.To be more quantitative, the slopes for the purple and blue pair begin to pick up at a relatively fast pace (compared to the red-green pair) at around Ångström exponent of 1.1, dropping at around Ångström exponent of 1.7 from peak values, maintaining a relatively flat profile between these two limits.Whereas the slopes for the red and green pair begin to pick up at a relatively gentle pace (compared to the purple-blue pair) at around Ångström exponent of 1.1 (for red curve) and 1.3 (for green curve), dropping at around Ångström exponent of 1.6 (for red curve) and 1.7 (for green curve) from peak values.The profile for the red-green pair is relatively narrower and sharper between their pick-up and dropping limits.As a result, a clear correlation between aerosol optical properties in Fig. 1b and seasonal wind flow patterns in Fig. 4 is observed.
The broader and flatter curves in the post-monsoon and northeast monsoon indicate that coarser aerosols are more frequently loaded in the atmosphere.This observation is proved in Fig. 3 MA is the dominant aerosol type, because clean atmosphere is dominated by MA.When the atmosphere is dominated by UIA then the AOD values are larger than MA.Normally, when BMA is the dominant aerosol type, AOD values are large.In other words, if BMA is absent or small, AOD will have a narrower range of distribution.As a result, only a moderate accuracy in the AOD prediction is obtained for the post-monsoon and northeast monsoon (refer to Table 3).By comparing the types of dominant aerosol observed during each monsoon, we observe that the results, as obtained in Table 3, correlate well with the information from Fig. 3. Table 3 shows higher coefficients of determination of the proposed AOD prediction model, which can be associated with higher amounts of BMA during the pre-monsoon and southwest monsoon periods.Such observation implies that the aerosol types are possibly indirectly correlated with the AOD prediction model.This result was also noticed by Chen et al. (2013).However, the relationship between the predicted AOD and aerosol type as observed in our model is qualitative and preliminary.Further study is needed.In addition, as mentioned in Lee at al. (2012) and Gupta et al. (2013), the relationship between AOD and particulate matter at the surface depends also on extent of atmospheric mixing, relative humidity, chemical composition, aerosol size distribution, etc.

Validation of the predicted AOD
In this subsection the procedure to validate the proposed AOD prediction model is presented.To validate the model accuracy, [a i ] was used to generate a set of "predicted AOD" values that are to be directly compared with those AOD values in data for subset 2. In this case, [a i ] are optimized coefficients in Eq. (2); they are obtained from the data for subset 1 of the overall data set.This set of a i shall be denoted as overall-calibrated [a i ].The comparison is shown in Fig. 5.The predicted AOD exhibits a high correlation with the measured AOD (R 2 = 0.68).The temporal characteristics of predictions between 2012 and 2013 are similar to those of the measured AOD.Table 4 shows the performance of the predicted AOD as compared to the measured ones in terms of RMSE and wMAPE.It is found that the RMSE for the pre-dicted AOD is nearly the same as that for the calibration data (as shown in Table 3).Additionally, the error of the validation data is less than 1.0 % in terms of wMAPE (similar accuracy was obtained for the calibration data).
To examine potential bias, the approach proposed by Lee et al. ( 2012) was performed to remove outliers when the deviation of the predicted AOD was larger than the overall RMSE (0.13).Approximately 21 % of the total data were removed using this method.After filtering, the remaining data were used in the calibration of a i (this set of a i shall be denoted as overall POR -calibrated [a i ]) in Eq. ( 2).Note that the values of a i so obtained are different than that using the original data set.These two sets of a i are optimized based on different data sets.R 2 of this fitting increased to 0.92, with RMSE = 0.06 and wMAPE = 0.13 %.The values of R 2 , RMSE and wMAPE for the cases with and without outliers removed are shown in Table 3.Thus, by filtering the outliers, R 2 and RMSE were enhanced, but wMAPE only slightly increased from 0.04 to 0.13 %, although the error value remained less than 1 %.Subsequently, these a i coefficients (based on the outliers-removed data set) were used to predict AOD, which were then compared against the measured values in data for subset 2 for validation.
In the process of validation, the accuracies of the two sets of AOD values (one set is predicted using [a i ] with potential outliers removed, while the other without) are compared, see Table 4.It is found that, in terms of R 2 , the AOD predicted using overall POR -calibrated [a i ] fails to improve when compared to the AOD predicted using overall-calibrated [a i ].The wMAPE of AOD prediction before and after filtering the potential outliers are nearly the same.The two sets of AOD predicted can be visually compared in the time series plot in Fig. 5.Such observation implies that the removed data might not be the genuine outliers.In fact, the errors were attributed to non uniformly loaded atmospheric aerosols at different altitudes.We believe that the non-uniform atmospheric mixing caused the high deviations in our predicted results, according to previous studies (Qiu and Yang, 2000;Toth et al., 2014).The proposed model uses ground-based sources as input.It assumes (1) the aerosols are well-mixed, and (2) the air above the planetary boundary layer (PBL) is aerosol free.Any aerosol, if present, above the PBL is not taken into account by the model.If these assumptions are true, the model can then be correctly compared to the columnar measurement of the sun photometer.However, in reality, aerosol could be present above the PBL, or not always well-mixed, giving rise to some uncertainties in the AOD predicted by the model.These uncertainties are quantified in terms of RMSE.
Figure 5 indicates that most of the predicted AOD values are lower than the measured counterparts.Tan et al. (2014c) analyze the underprediction in these values.They used a LI-DAR system to determine the vertical profile of aerosols in Penang and found that the aerosol concentration decreased with height up to the planetary boundary layer (PBL).This layer was less than 2 km during the study period.The large amount of transported aerosols above the boundary layer yielded residual layers (Toth et al., 2014).Significant underestimation of AOD occurred for thick residual layers.By comparing the measured and predicted data in Fig. 5, it is found that only a few small time segments are significantly underpredicted, possibly due to the presence of aerosol residual layers above the PBL.Studies in Cyprus (Retalis et al., 2010) suggest that the extent of atmospheric mixing was relatively homogeneous on scales of a few meters to tens of kilometers.Hence, the predicted results were representative of the large samples.The predicted AOD was underestimated because all measured data were taken from the ground.However, overprediction would be significant if local burning were to occur near the measurement station.
To properly validate the prediction, these data coincide in time with those measured from API, Vis, and AOD level 2. In our case, the LIDAR data coincided only once on 12 July 2013 (Fig. 6). Figure 6a shows the vertical profile of the aerosol backscatter coefficient as a function of time (morning to evening).The brown vertical line represents the instance when both the measured and predicted AOD could be compared with the LIDAR data.
Figure 6b displays the profiles of aerosol backscatter coefficient obtained at 10:00 and 11:00 a.m.(local time), respectively.Aerosols had accumulated near the ground at 10:00 a.m. and our model indicated that the predicted AOD was overestimated by 0.039.By contrast, most aerosols at 11:00 a.m. were at a higher level; therefore the model predicted value was underestimated by 0.044.Therefore, the predicted AOD values were acceptable because they exhibited small deviations against the measured AOD.This result was thus valid as long as the aerosols did not considerably differ at altitude levels beneath the planetary boundary layer.The LIDAR data should be therefore considered as an independent validation method for ground-based prediction models.
Aerosols are not always well-mixed in the atmosphere over Penang.Particles transported within the free troposphere are a factor (Toth et al., 2014).If a significant number of elevated aerosol plumes (equivalent to aerosol residual layer) occur over the region, then a large deviation from the predicted value will be produced.Therefore, it can be inferred that a small group of highly underpredicted results (Fig. 5) may be attributed to a significant layer of high-level transported aerosol.

Applications of the proposed model in the absence of measured AOD data
In this section, we shall apply our AOD-predicting model in the absence of measured AOD data.For the purpose of AOD prediction, the overall-calibrated coefficient [a i ], will be used.The overall POR -calibrated coefficient [a i ], obtained with potential outliers removed, are not used as they may not be genuine outliers, as discussed in Sect.3.6.
Our proposed model generates AOD data when those from AERONET are unavailable.We described the procedure to predict AOD data.Only the API data for 7:00 a.m., 11:00 a.m., and 5:00 p.m. (local time) were available (http: //apims.doe.gov.my)before 24 June 2013.The API data were provided hourly beyond this date.In this study, approximately 5 % of the data were discarded due to fog, rain,   or thunderstorms, and only 4493 data points were retained.
Figure 7 shows the predicted results from 2012 to 2013, which overlapped with the measured AOD data to simplify the comparison.It is observed that the variation in the predicted AOD matches with that of the measured AOD from AERONET.Hence, as an application of the AOD predicting model, information missed out by sun photometer (i.e., AERONET) could be reasonably well reproduced.These "retrieved" AOD can be used in other aerosol studies.For example, the diurnal variability of AOD can be significant, depending on location and dominant aerosol type (Arola et al., 2013).They observed that the measurement-based estimates of aerosol direct radiative forcing (also known as aerosol direct radiative effect) at regional or individual sites are substantially influenced by the diurnal variability of AOD.In Pandithurai et al. (2007), they found that the diurnal AOD variation depends on meteorological factors such as relative humidity, winds, temperature and convection activities.Our model provides a helpful means to investigate the uniqueness of diurnal variability of AOD in different seasons of a specific region.The boxes marked in Fig. 7 are the time windows in which AOD measurements are unavailable.An independent method, i.e., LIDAR is used to estimate AOD at that particular time window (refer to Fig. 8a).In our case, we set L = 70 sr, because this window period is commonly associated with biomass burning aerosol (refer to the relative frequency of dominant of aerosol types in the southwest monsoon, in Fig. 3).Additionally, other studies conducted by Tesche et al. (2011) and Lopes et al. (2012) also suggested L = 70 sr for biomass burning aerosols.Via the procedures as mentioned in Sect.2, and using the obtained aerosol backscatter coefficient and an assumed L, aerosol extinction coefficients were calculated (based on Eq. 5).Integrating over these aerosol extinction coefficients, AOD values were then estimated using Eq. ( 6).
If the LIDAR signals were affected by cloud, the AOD data calculated from the LIDAR signal are removed.Then, the predicted AOD from our model and that calculated from LIDAR signal are compared.The result of the comparison between the predicted AOD (by our model) and that derived from LIDAR, as shown in Fig. 8b and c. Figure 8b, shows that the correlation between these two sets of data is high, as R 2 obtained is 0.86 with RMSE = 0.20.Figure 8c also indicates that the predicted AOD values from our model are within the error bars of estimated AOD from the LIDAR signal.However, the AOD prediction model is less sensitive during clear atmospheric conditions on 13 August (as shown in Fig. 8c).

Comparison with other linear regression models
The proposed model is compared against other AODpredicting models from the literature.This is done by comparing the predicted AOD values by our model against the measured AOD in data for subset 1.  sis to predict AOD from the Vis data.Mahowald et al. (2007) suggest a similar linear regression model for the AOD prediction model, in which the Vis data were converted to surface extinction coefficients b ext using the Koschmieder equation Vis = K/b ext , where K(= 3.912) is the Koschmieder constant (Koschmieder, 1924).Two other AOD-predicting models were also compared (Gao and Zha, 2010;Chen et al., 2013).
In these models, linear regression analysis for AOD and PM 10 was carried out to predict the surface air quality.The approaches can also be used to retrieve AOD after appropriate conversion procedures.Initially, we converted the API data into PM 10 via the guidance on air pollutant index from DOE (1997).The obtained PM 10 values were then inputted into the linear regression formula to predict AOD.The linear regressions in these models yielded R 2 ≤ 0.6 with RMSE approximately 0.16 and above, which was lower than that of our model (R 2 ≤ 0.72 with RMSE = 0.13).wMAPE of these models (0.05-0.08 %) were found to be similar but slightly higher than the present model (0.05 %).These figures are reported in Table 5.

Conclusions
Seasonal variation in primary aerosol types and their physical characteristics at Penang, Malaysia are analyzed from February 2012 to November 2013.The aerosol types for a specific monsoonal period were determined by applying threshold criteria to scatter plots, in which Ångström exponent is plotted against aerosol optical depth (AOD).The threshold criteria from Smirnov at al. (2002bSmirnov at al. ( , 2003)), Pace et al. (2006), Kaskaotis et al. (2007), Toledano et al. (2007), Salinas et al. (2009), andJalal et al. (2012) were used to distinguish the aerosol types.The testing results indicate that the threshold criteria by Toledano et al. (2007) were the most reliable, because of the minimal occurrence value of the indistinguishable aerosols (referred as mixed-type aerosols, MIXA).
For the study period, biomass burning aerosols (BMA) abruptly increased during the southwest monsoon period, because of active open burning activities in local areas and neighboring countries.During the northeast monsoon period, the optical properties (e.g., size distribution patterns) of the aerosols were unique.Two noticeable peaks were observed in the occurrence frequency of the Ångström exponents compared with the single peaks for other monsoon seasons.These results were attributed to the mixing of aerosols from local sources with those from the northern part of Southeast Asia, caused by the northeast monsoon winds.Urban and industrial aerosols (UIA) and marine aerosol (MA) were the major aerosols in Penang throughout the year.Dust aerosols (DA) negligibly contributed to the emissions in Penang.The variation in aerosol types for different monsoon seasons clearly yields distinct optical properties.
Previous models have used simple regression analysis between AOD and meteorological parameters to predict the corresponding AOD data.In this study, multiple regression analysis was used in analyzing the proposed model.Two predictors (API and Vis) were introduced to increase statistical reliability.To verify the robustness of multiple regression analysis, in contrast to the simple regression approach, AOD data based on previous simple models were retrieved (Mahowald et al., 2007;Gao and Zha, 2010;Retalis et al., 2010;Chen et al., 2013).R 2 , RMSE and wMAPE values in our calibration model are ≤ 0.72, 0.13, 0.05 % respectively.The accuracies are obtained by comparing the predicted AOD values in the current study against measured AOD in data for subset 1.These figures are compared with the results of other relevant work, which obtained R 2 ≤ 0.60 and RMSE approximately 0.16 and above.The comparison indicates that the quality of our AOD prediction is statistically better than those simple models, which makes sense given its tuning to local condition.
Predicted AOD from our model are compared with the data derived from a LIDAR system.The values of R This has added additional weight to the robustness of the developed AOD prediction model.Our algorithm predicts AOD data during non-retrieval days caused by the frequent occurrence of clouds in the equatorial region.The proposed model yields reliable near realtime AOD data despite the availability of the measured data for limited time points.The predicted AOD data are beneficial for monitoring aerosols in short-and long-term scenarios, their behavior, and provides supplementary information for climatological studies and monitoring aerosol variation.
The technique proposed in this work nevertheless ought to be further stress-tested for the extent of its feasibility by applying it in more cases using a higher volume of data.This technique is pragmatic and cost effective for such environmental study.

F.
Tan et al.: Aerosol optical properties and estimation of AOD

Figure 1 .
Figure 1.Seasonal relative frequencies of occurrences of (a) AOD_500, (b) Ångström 440−870 , and (c) PW in Penang for February 2012 to November 2013.Each curve was smoothed by using a moving average technique.
Figure 6.(a) Profiles of the aerosol backscatter coefficients (km −1 sr −1 ) recorded on 12 July 2013.No data were acquired from 12 to 2 p.m.The brown lines represent the moment of acquisition of sun photometer; (b) profiles of the aerosol backscatter coefficient (beta) obtained from 10 to 11 a.m. for the brown lines in (a).

Figure 7 .
Figure7.Predicted AOD_500 data plotted against the period from 2012 to 2013 (input all Vis and API available data into the established model to predict AOD, with 4493 data points).Rectangles 1 and 2 correspond to the data recorded on 24-25 July and 13-14 August 2013, respectively.These data were used for comparison with those obtained from LIDAR (Fig.8).

Figure 8 .
Figure 8.(a) Hourly retrieved AOD recorded on (a) 24-25 July and 13-14 August 2013 (the gaps are due to fog, rain, or when API value is predominantly caused by O 3 but not PM 10 ).(b) A scatter plot for AOD_355 predicted from our model versus the AOD calculated from Raymetrics LIDAR system.(c) Predicted AOD from our model and estimated AOD from LIDAR plotted versus local time and date (the gaps indicate no data available at the particular time due to LIDAR system was switched off or cloud contamination above the LIDAR system).Error bars for estimated AOD from LIDAR are shown.
2 and  RMSE (0.86 and 0.20) indicate favorable agreement between our model and LIDAR-derived data at wavelength 355 nm.

Table 3 .
Calibration results (data for subset 1) for the AOD prediction model (Eq.2) from 2012 and 2013 data.
Note: POR = potential outliers removed, N = number of data , in which both monsoons are showing higher occurrence of MA and lesser BMA in Penang.In fact, we realized that AOD predominantly falls on smaller values if Predicted and measured AOD at 500 nm for 2012 and 2013 for validation data set (subset 2, with 395 data points).The plot in black are predicted AOD values using a i obtained data set with potential outliers removed, whereas that in red is with data set without removing potential outliers.

Table 4 .
Validation results (data for subset 2) for the AOD prediction model (Eq.2) from 2012 and 2013 data.

Table 5 .
R 2 values of the AOD predicted by selected linear regression models from the literature.The values of R 2 , RMSE and wMAPE shown in this table are obtained by comparing the predicted AOD values against measured AOD data from subset 1.
Table 5 shows the R 2 values of selected AOD-predicting models calculated using the data for subset 1 by our model (Sect.2).Retalis et al. (2010) suggest a simple linear regression analy-