Interactive comment on “ Variations in optical properties of aerosols on monsoon seasonal change and estimation of aerosol optical depth using ground-based meteorological and air quality data ”

Tan et al. have studied the optical properties of aerosols in Malaysia and developed a regression model to estimate AOD in cloudy cases. The prediction of AOD is a very interesting topic but, unfortunately, the manuscript is really hard to read. The grammar and the structure of the manuscript need a lot of work before this work can be published. In addition, several figures are hard to read, especially from a printed version of the manuscript.


Introduction
The direct and indirect radiative influences of aerosols have been significant sources of uncertainty in climate change based on the report by the Intergovernmental Panel for Climate Change (IPCC, 2007(IPCC, , 2013)).The consequences of aerosol-radiation and aerosol-cloud interactions cannot be fully elucidated because of their uncertainties.These interactions are increasingly complex and compounded by high degrees of variations in atmospheric aerosols because of meteorological and climatic factors (Reid et al., 2012).The trans-boundary and long-range transport of aerosols interact with their local counterparts, enhancing the microphysical properties of aerosols, and affect their radiative properties and precipitation processes (Ichoku et al., 2004;Rosenfeld, 2007;Andreae and Rosenfeld, 2008;Lin et al., 2013).The global effects of aerosols on the Earth's climate are only coarsely quantifiable because of the lack of extensive and reliable measurements in most world regions (Hansen et al., 1997;Tripathi et al., 2005;Kaskaoutis et al., 2007;Kaskaoutis and Kambezidis, 2008;Russell et al., 2010).
The spatial and temporal variations in aerosol optical depth (AOD) are large because of production sources, transport and removal processes, and prevalent meteorological conditions.Given the large uncertainty in aerosol characterization, local analyses essentially verify satellite measurements because the extraction of aerosol optical properties from remote sensing data exhibits limited accuracy despite its capability to provide global-scale coverage (Yoram et al., 2002;Levy et al., 2005;Tripathi et al., 2005;Zhong et al., 2007;Gupta et al., 2013).Local studies on the optical properties of aerosols have been conducted using sun photometers and sky radiometers (Holben et al., 1998;Remer et al., 2008;Salinas et al., 2009).However, these methods are limited spatially in contrast to satellite imagery.Therefore, ground-and space-based measurements complementarily perform reliable and comprehensive studies on atmospheric aerosols.
The accuracy of satellite-derived daily AOD is often assessed by comparing satellite-based AOD with the AErosol RObotic NETwork (AERONET), a network of ground-based sun photometers.AERONET is widely used to monitor, investigate, and characterize the optical properties of aerosols (Holben et al., 1998).This network provides a database to correct and validate satellite-based aerosol retrievals.However, cloud-contaminated data should be removed from the AERONET database (Smirnov et al., 2000;Chew et al., 2011;Huang et al., 2011) the process is termed as cloud screening.Hence, only a limited dataset of level 2 AOD (data have been cloud screened and quality assured) available.Meanwhile, AODs obtained from satellites, such as those from MODIS (Retalis et al., 2010), are limited because these satellites are in sun-synchronous orbit.Continuous retrieval of AOD data is difficult due to the atmosphere is regularly cloud contaminated.Southeast Asia region stands out globally and hosts one of the most complex meteorological and environment conditions (Reid et al., 2013).These reasons cause challenging tasks to scientists on aerosol study (Campbell et al., 2013).
To better monitor and understand the aerosols variation, sufficient measurements are necessary in southeast Asia and maritime continent regions.Aerosol is a dynamic system, influenced by combination of various factor (Sherwood et al., 2013;Tesfaye et al., 2013).Omar et al. (2005) also indicate that aerosols are diverse and their properties in any location depend on sources, emission rates, and highly variable of removal process.So it is very important to develop a regional/local model to estimate and monitor the atmospheric columnar AOD.Several researchers have therefore, established the uses of model as alternative tool to predict the AOD values by using various ground based meteorology measurements (Wang et al., 2009;Qin et al., 2010;Lin et al., 2014).This research motivation is driven not only by the need for conceptualizing the development of a model to estimate the atmospheric pollution but as well as evaluating the robustness of these models and proposing of new prediction models.This is based on the fact that the previous work on these topics (Wang et al., 2009;Qin et al., 2010;Barladeanu et al., 2012;Lin et al., 2014) have provided the basis for creating database for housing the individual model produced in these aforementioned studies towards applications in atmospheric quality research domains.Previous studies indicate that AOD is proportional to air quality such as particulate matter (PM) with diameters less than 10 or 2.5 µm (PM 10 or PM 2.5 ) (Wang and Christopher, 2003;Cordero et al., 2012;Mielonen et al., 2012;Mogo et al., 2012;Müller et al., 2012) but inversely proportional to visibility (Vis) (Horvath, 1995;Li and Lu, 1997;Peppler et al., 2000;Bäumer et al., 2008;Singh and Dey, 2012).The high concentrations of atmospheric aerosols increase the AOD to effectively scatter light and reduce Vis.PM 10 and PM 2.5 are used to physically quantify the concentration of PM at ground level.High-quantity PM records imply high aerosol concentrations at the ground surface.Vis and air quality interact with columnar AOD; hence, these parameters should be considered into the algorithm to predict AOD through multiple regression analysis.The complementary combination increases the relative accuracy of prediction.
In this paper, we attempt to develop a AOD prediction model based on three types of measured data, namely (i) RH, (ii) Vis and (iii) air pollution index (API).It is important because the stated parameters have been measured routinely at many ground-based stations.The AOD prediction model based on these routine measurements is necessary to be established for a long term database for i) climatological studies, ii) providing continuous AOD data for atmospheric correction of satellite data, and iii) monitoring aerosol variation.Meanwhile, it is important to understand the source of aerosols and dominant type of aerosol in this study area.There is an absence of in depth understanding of these factors on a local scale.The AOD measurements were obtained through the AERONET site located in Universiti Sains Malaysia (USM) with geo-coordinates 5.36˚ N and 100.30˚E. The Vis and API data were taken from the meteorological stations at the Penang international airport and USM.All data were taken between 2012 and 2013.The aerosol characteristics in Penang were comprehensively analyzed based on changes in seasonal monsoons.A near real-time AOD model was established based on multiple regression analysis of Vis and API.The accuracy and efficiency of the model were validated and evaluated to assess atmospheric pollution in Penang.

Methodology and statistical model
The present work was based on previous studies of Tan et al. (2014a, b).They predicted AOD using multiple regression analysis based on meteorological and air quality data.The AOD prediction model has been validated and successfully proven for the southwest monsoon period (June-September, 2012) in Penang Island.However, the following issues require reconciliation: (i) under-and overprediction of AOD were not validated because of the lack of available LIDAR data to monitor the variations in the vertical profile of the aerosol distribution, (ii) the algorithm was insufficiently robust because only a four month dataset were considered; and (iii) seasonal changes other than southwest monsoon was not included in their study.The present study uses a two-year dataset (2012,2013) at Penang to efficiently validate the algorithms proposed by Tan et al. (2014a, b).
Penang is an island located in the northwestern region of Peninsular Malaysia and lies within latitudes 5˚12 ′ to 5˚30 ′ N and longitudes 100˚09 ′ E to 100˚26 ′ E (Fig. 4).The weather is warm and humid year-round.However, two main monsoon seasons exist, namely, northeast and southwest monsoons.Considering previous analyses on aerosol or air quality (Awang et al., 2000;Krishna Moorthy et al., 2007;Suresh Babu et al., 2007;Kumar and Devara, 2012;Xian et al., 2013), the monsoon period classified as follows: (i) northeast monsoon (December-March), (ii) transition period of northeast to southwest monsoon or pre-monsoon (April-May), (iii) southwest monsoon (June-September), and (iv) transition period of southwest to northeast monsoon or post-monsoon (October-November).
The optical properties of aerosols such as AOD and Angstrom exponent were analyzed to identify the aerosol characteristics in Penang during each period.Meanwhile, the precipitable water (PW) was used to indicate the amount of the total water content in the atmosphere.The seasonal variations in AOD, Angstrom exponent, and precipitable water (PW) based on the frequency distribution patterns were identified.The aerosol types were seasonally discriminated from the scatter plot of AOD against the Angstrom exponent.Threshold values in the scatter plot for aerosol classification have been previously reported by Smirnov (2002bSmirnov ( , 2003)), Pace et al. (2006), Kaskaotis (2007), Toledano et al. (2007), Salinas et al. (2009), and Jalal et al. (2012).The data selection criteria proposed by Tan et al. (2014a) were used in this study.The seven-day seasonal plot of the back-trajectory frequency from the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT_4) model was used to identify the original sources of aerosol and transported pathways.Subsequently, the obtained aerosol characteristics were used to examine the algorithm accuracy among the datasets.AOD, API, and Vis data were selected according to the procedure of Tan et al. (2014a) to generate predicted AOD data.The Vis data were retrieved online from Weather Underground (http://www.wunderground.com) or from NOAA satellite (http://www7.ncdc.noaa.gov/CDO/cdo).Hourly data free from rainfall, thunderstorms, or fog during the calculations were utilized to predict the AOD data.Air quality in Malaysia is reported in terms of API, which can be obtained from the Department of Environment in Malaysia (http://apims.doe.gov.my/apims/).API is calculated from carbon monoxide, ozone, nitrogen dioxide, sulfur dioxide and PM 10.The Malaysian Department of Environment provides a standardized procedure on how to calculate API values (DOE, 1997).
A total of 790 data points from 2012 to 2013 were used.Initially, the datasets were separated into (4+1) sets as follows: (i) December-March, (ii) April-May, (iii) June-September, and (iv) October-November.The fifth or "overall" set comprised the annual data.The number of data points for December-March, April-May, June-September, and October-November were 257, 132, 235, and 166, respectively.The data for each seasonal monsoon were further divided into two subsets.For example, consider that data with a particular seasonal monsoon period takes a sequential form of D 1 , D 2 , D 3 , D 4 , D 5 , …D n where n is the total number of points.Thus, the subsets are in the form of (D 1 , D 3 , D 5 , …) and (D 2 , D 4 , D 6 ,…).The first data subset was used to calibrate (Eq. 1) for AOD at 500 nm, given below: AOD = a 0 +a 1 (RH)+a 2 (RH) 2 +a 3 (RH) 3 +a 4 (Vis)+a 5 (Vis) 2 +a 6 (Vis) 3 +a 7 (API)+a 8 (API) 2 +a 9 (API) 3 (1) where RH is the relative humidity.This was the original model used by Tan et al. (2014a).
The root mean square error (RMSE), coefficient of determination (R 2 ), and percent mean relative error (%MRE) between the measured and predicted AOD for each seasonal model were calculated at 95 % confidence level.The %MRE parameter was used to quantify the systematic differences between the concentration levels.This parameter is given as follows: %MRE = [(mean predicted AOD -mean measured AOD)∕mean measured AOD]×100.The ability of the proposed model to produce reliable AOD estimates for temporal air monitoring can be quantitatively justified or falsified based on the quality of the resultant %MRE.
Aerosols can be hydrophilic or hydrophobic, and these properties can give rise to non-trivial contribution to AOD retrieval (Tang, 1996;Song et al., 2007;de Meij et al., 2012;Singh and Dey, 2012;Ramachandran and Srivastava, 2013;Wang et al., 2013;van Beelen et al., 2014).However, to discriminate whether the aerosols are hydrophilic or hydrophobic requires addition resources beyond the reach of the present study.On the other hand, our pre-analysis showed that RH does not contribute significantly to AOD prediction in the proposed model.If RH was considered as a predictor, its related factors (e.g., aerosol stratification (dust or smoke aloft), convection, and hysteresis in particles) should be taken into account.The contribution of RH to the aerosol properties was integrated in the aerosol model (Srivastava et al., 2012) because the net effect of RH on aerosol and related factors were difficult to quantify.The RH contribution can be disregarded in the present model, yielding Eq. ( 2), given as follows: AOD = a 0 +a 1 (Vis)+a 2 (Vis) 2 +a 3 (Vis) 3 +a 4 (API)+a 5 (API) 2 +a 6 (API) 3 . (2) The similar statistical measurements such as RMSE, R 2 , %MRE were calculated for Eq. ( 2) in each monsoon season.The second data subset was used validate the accuracy the developed model.
Lee et al. ( 2012) excluded days when the deviation between the measured and predicted values was greater than RMSE, or when the estimated AOD slope was negative because of measurement errors and cloud-contaminated AOD.Given the previous findings, the potential outliers in our model were removed using the approach of (Lee et al., 2012).Then, the aforementioned procedures were repeated to calibrate and validate the AOD prediction model using new dataset (the potential outliers have been removed).The predicted AOD was again compared with the measured counterpart from AERONET to determine the accuracy of the generated model.
Equation ( 2) was applied to retrieve the AOD for specific days when no AOD values were available.The features of predicted AOD were compared against those of the measured counterpart.The under-and overpredicted AOD were examined by RAYMETRICS LIDAR system.However, examination can only be performed when LIDAR data were available.When LIDAR data were available for examination, only the data that can clearly elucidate the under-and over-predicted AOD were selected.The LIDAR signals were pre-analyzed based on the published works of Tan et al. (2013Tan et al. ( , 2014c)).The backscatter coefficients of the aerosol from LIDAR signals were determined using the method of Fernald (1984).

Climatology of Penang, Malaysia
The climatological results derived from AERONET (http://aeronet.gsfc.nasa.gov/new_web/V2/climo_new/USM_Penang_500.html) for USM Penang is tabulated in Table 1.The monthly AOD (referred to as AOD_500, second column) shows that the two lowest AOD values are 0.18 and 0.19 during the inter-monsoon period (October-November and May).During the southwest monsoon period (June-September), the smoke emitted by the local area and large-scale open burning activities in Sumatra, Indonesia was transported to Malaysia and yielded the highest AOD at approximately 0.31-0.73.However, the AOD was 0.21-0.24during the northeast monsoon period (December-February).Small aerosol particles primarily contributed to the air pollution in Penang, as the average Angstrom exponents (referred to as Angstrom 440-870 ) were higher than 1.1 in humid atmospheres, because the precipitable water values (referred to as PW) were greater than 4. 1 (Okulov et al., 2002).

Seasonal variations of AOD, Angstrom exponent, and PW based on frequency distribution patterns
AERONET parameters were plotted (Fig. 1) to reveal the relative frequency distributions at Penang for each seasonal monsoon.Frequency histograms of AOD_500 and Angstrom 440-870 (Fig. 1a-b, respectively) indicate changes in the optical properties of aerosols, whereas Fig. 1c shows the amount of water content in atmosphere column for each season.These histograms here helped distinguish aerosol types (Pace et al., 2006;Salinas et al., 2009;Smirnov et al., 2002aSmirnov et al., , 2011)).Our results show that the distributed AOD mainly ranges from 0.2 to 0.4, contributing to approximately 71 % of the total occurrence (Fig. 1a).Fig. 1b shows that the Angstrom exponent is typically between 1.3 and 1.7, translating to ~ 72 % of the total occurrence.About 67 % of the total occurrence of PW ranged from 4.5 cm to 5.0 cm (Fig. 1c).
The maximum frequency of AOD was centered near 0.2 for all seasons.The clearest season was between October and November (Fig. 1a).Penang was most polluted from June to September most likely due to the active open burning activities in Sumatra.The AOD peak was approximately 1.4, with three peaks distributed from AOD_500 = 0.1 to AOD_500 = 1.4 (Fig. 1a).The multiple peaks imply the presence of various aerosol populations, because AOD histograms follow log-normal distribution patterns (Salinas et al., 2009).By contrast, a single peak was observed for the clearest season (October-November).
The frequency distributions as function of Angstrom exponent display a trend (Fig. 1b), in which approximately 95% of the total occurrence fall within the range of 1 Å to 2 Å.This result implies that the effect of coarse particles (e.g., dust) on the study site was minimal.This statement is supported by Campbell et al. (2013) who revealed that dust particles are less distributed in southeast Asia.However, sometimes dust particles concentration may increase above boundary layer in southeast Asia .Two noticeable peaks were observed for the Angstrom exponent during the northeast monsoon period (blue curve, Fig. 1b).These aerosols originated from the northern part of Southeast Asia, particularly Indochina, transported by the monsoon wind and mixed with locally emitted aerosols.Lin et al. ( 2013) analyzed the aerosols in the northern region of Southeast Asia.They found that biomass burning aerosols from Indochina were transported in high-and low-level pathways to the west, and then later shift to the southwest by northeast monsoons.Hence, these aerosols were transported in the southwest.The biomass burning aerosols were continuously transported to our study site as the wind circulation flows toward the southwest direction, according to the monthly mean streamline charts of Lin et al. ( 2013) from 1979 to 2010.During and before southwest monsoon, the Angstrom exponents in Penang ranged between 1.4 and 1.8, indicating the likely presence of biomass burning aerosols (Holben et al., 2001;Gerasopoulos et al., 2003;Toledano et al., 2007).They are likely to originate from local and neighboring countries.
Indonesia is known to be very active in open burning during this season.Furthermore, southwest monsoon wind is likely to have transported these biomass burning aerosols to Penang.
Although the southwest monsoon period is the driest season in Malaysia, PW frequency was approximately 21 % lower than that of the northeast monsoon period for PW < 4.0 (Fig. 1c).Marked variations in the PW frequency were observed during the northeast monsoon period.Almost no frequency data were obtained for PW < 3.5, except the northeast monsoon period with about 14 % less than this value.The most humid period took place in April-May, with PW ranging from 5.0 to 5.5 (approximately 74 % of the total occurrence).

Seasonal discrimination of aerosol types based on the relationship between AOD and Angstrom exponent
Aerosol clusters have been developed using relative simple scatter plots of AOD and Angstrom exponent.Related studies have been analyzed using AERONET data; these datasets have been applied at different locations, such as the Persian Gulf (Smirnov et al  et al., 2001;Smirnov et al., 2002bSmirnov et al., , 2003;;Pace et al., 2006;Kaskaoutis et al., 2007;Salinas et al., 2009) to study aerosol turbidity conditions.Optically, 500 nm is an effective visible wavelength suitable for aerosol study (Stone, 2002).In this study, AOD_440-Angstrom 440-870 and AOD_500-Angstrom 440-870 plots were used.
Aerosols were classified into five types, including dust, maritime, continental/urban/industrial, biomass burning, and mixed aerosols (Ichoku et al., 2004); mixed aerosols in practice represent an indistinguishable type that cannot be categorized into any of the previous types.
To effectively identify the aerosol distribution types in our study sites, the results were compared using different threshold criteria (Table 2).The results are presented in Fig. 2.
The thresholds proposed by Pace et al. (2006) and Kaskaoutis et al. (2007) failed to determine the maritime aerosol (MA) and dust aerosol (DA) for each season.Instead, they showed that mixed-type aerosols (MIXA) were dominant at Penang (50-72 %).Urban and industrial (UIA) and biomass burning (BMA) aerosols were grouped into a single class (28-50 % of the total occurrence).Meanwhile, the threshold suggested by Smirnov et al. (2002bSmirnov et al. ( , 2003) ) failed to identify DA, UIA, and BMA, but efficiently identified MA.As a result, a large amount of MIXA was obtained (> 80 % of the total occurrence).These results reveal the extent of uncertainty; the indistinguishable aerosol types in the study sites were large.
Salinas et al. (2009) suggested that the determination of DA and BMA did not correspond entirely to the range of threshold used in our study, in which the amount of MIXA (approximately 43 % of the total occurrence) was large.Jalal et al. ( 2012) efficiently identified aerosol types using an alternative threshold criterion.Using their threshold, we yielded a low amount of MIXA, approximately 21 %.However, the determination of DA was unsatisfactory.The threshold criteria of Toledano et al. (2007) provided the least MIXA (< 5 %; Fig. 2).All thresholds consistently increased from June to September (Fig. 2c) and coincided with the occurrence of haze.UIA was constantly and highly distributed over Penang.Overall, the thresholds provided by Toledano et al. (2007) were properly best for our study.
Based on the criteria suggested by Toledano et al. (2007), UIA class was determined as the highest frequency of occurrence in overall study period (Fig. 3).This could be as a result of Penang being an urban area.The next highest was the MA class because of its geolocation (i.e., surrounded by the sea).BMA is also one of the major pollutants in Penang which was produced by active burning in local and neighboring countries.These results were in accordance with the records from our Department of Meteorological, DOE (2010).The study site was minimally affected by coarse particles and DA, which were less than 5 % in each seasonal monsoon.These results are supported by Campbell et al. (2013) who suggest UIA, MA, and BMA is likely the most common in southeast Asia and maritime continent.

Seasonal flow patterns of air parcel from the HYSPLIT_4 model for identification of aerosol origins
From seven-day seasonal plots of the back-trajectory frequency sourced from the HYSPLIT_4 model, flow patterns reach in the Penang site were obtained (Fig. 4) for each monsoon season averaged between the ground surface up to an altitude of 5000 m.Residence time analysis was performed to generate the frequency plot and determine the time percentage of a specific air parcel in a horizontal grid cell across the domain.
During the northeast monsoon period, air parcels flow southwestward from the northern part of southeast Asia (Fig. 4a), including Indochina, transported through the South China Sea to reach Penang.The aerosols during the northeast monsoon period were also locally produced, whereas those observed during the southwest monsoon period were from the Andaman Sea, Malacca Strait, Sumatra (site of open active burning), and other more local areas.
Fig. 1b indicates the differences in the patterns (bimodal distribution pattern) of the seasonal relative frequency of occurrence for Angstrom 440-870 during the northeast monsoon compared to other monsoon period.These differences are likely attributable to the mixing of various aerosol sources from the northern (e.g., Indochina, Philippines, Taiwan, and eastern China) and southern (e.g., Malaysia and Indonesia) parts of Southeast Asia (refer Fig. 4a).The biomass burning aerosol is likely different for northern and southern SEA because of different types of burning process.As a result, bimodal pattern was only observed for the northeast monsoon period from the frequency distribution pattern of Angstrom 440-870 (Fig. 1b).
Figure 1b reveals that the distribution patterns of Angstrom exponent between the postmonsoon and northeast monsoon are similar.Figure 4a and d also indicate the similarities of the air flow patterns for these monsoon seasons.Hence, a clear correspondence was observed between Fig. 1b with Fig. 4a and d.The similarity in the patterns of Angstrom exponents for post-monsoon and northeast monsoon maybe attributed to the mixture of aerosols from northern and southern parts of Southeast Asia.Given the classification results (Fig. 3), the occurrence frequency of MA was higher during the post-monsoon and northeast monsoon compared to southwest and pre-monsoon period.The large amount of MA is originating from the South China Sea and Andaman Sea.
For the pre-monsoon period, aerosols observed at Penang originated from the Malacca Strait, Andaman Sea, the northern and some eastern areas of Sumatra, and the western part of peninsular Malaysia, especially the local regions marked in yellow (Fig. 4b).During this season, the air flow patterns were similar to those during the southwest monsoon (Fig. 4c).However, a small percentage of aerosols were transported from the northern part of southeast Asia to Penang.A clear correlation is observed between Fig. 1b with Fig. 4b and c during premonsoon and southwest monsoon.
The dominant aerosol types were UIA and MA (Fig. 3).The yellow portions in Fig. 4e indicate that Penang, the second largest city in Malaysia and one of the most industrially concentrated cities, therefore UIA is a major aerosol type in this area.MA contribution to the overall aerosol distribution is likely significantly influenced by proximity of the surrounding sea.

Examination of predicted AOD values
The optical properties of aerosol for each monsoonal season are obtained by analyzing the relative frequency occurrence of AOD_500 and Angstrom 440-870 .The relative frequency plot of PW value also shown each monsoonal season has different water amount in the atmosphere column.We hypothesize that the proposed AOD prediction model should exhibit different accuracies each season because the sensitivity for AOD prediction depends on the distribution patterns of the measured AOD; these values were used as inputs to derive the correlation parameters of the model.The sensitivity of AOD prediction is affected when the major occurrence frequency is clustered around small AOD values.The insensitivity of the aerosol models to clear atmospheric conditions was also previously observed (Zhong et al., 2007).Conversely, the model most appropriately predicted AOD the corresponding input data were clustered around large values.
The model performance for each monsoonal season was tested (Table 3).The pre-monsoon and southwest periods exhibited R 2 of 0.65 (RMSE = 0.114) and 0.77 (RMSE = 0.172).However, for the transition period between post-monsoon to northeast monsoon, R 2 < 0.45 and RMSE ranged from 0.06 to 0.11.The increased amount of atmospheric aerosol enhanced the predicted AOD and vice versa.This result was in agreement with the aforementioned hypothesis.Overall, the 22 month data were satisfactory with R 2 = 0.72 and RMSE = 0.133.The low value of %MRE (< 1) indicates that the model yielded accurate results for all seasons.
Given the criteria that a low %MRE corresponded to a good prediction, the "overall" dataset yielded the least biased prediction.
High correlation was observed between the measured and predicted AOD for pre-monsoon and southwest monsoon, in which similar air flow patterns occurred (Fig. 4b and c). Figure 1b displays the relative frequencies of occurrence of Angstrom 440-870 .The frequency spectra for pre-monsoon and southwest monsoon also indicated the same patterns for AOD (Fig. 4b and  c).The spectrum of Angstrom frequency exhibited narrow peaks at 1.6 and 1.7 Å for premonsoon and southwest monsoon, respectively.
The accuracy of the prediction of the AOD model in post-monsoon and northeast monsoon is moderate when the aerosols in Penang were locally mixed with those from foreign sources because of the wind flow pattern during these two seasons (Fig. 4a and d).Correlation between Fig. 1b with Fig. 4a and d represent these monsoonal periods.The spectrum of the Angstrom frequency exhibited a broad region from 1.3 Å to 1.7 Å for post-monsoon and northeast monsoon.
By comparing the types of dominant aerosol in each monsoon, we observed that the results as obtained in Table 3 are related with the information from Fig. 3. Table 3 shows higher coefficient of determination of the proposed AOD prediction model which can be associated with higher amount of BMA but lower UIA and MA during pre-monsoon and southwest monsoon period.Such observation implies that the aerosol types are possibly related to the AOD prediction model.However, the relationship between the predicted AOD and aerosol type as observed in our model is qualitative and preliminary.Further study is needed.In addition, as mentioned in Lee at al 2012, Gupta et al 2013, the relationship between AOD and air quality at ground surface depends also on environmental factors.Environmental factors that are disregarded in an AOD model may lead to deviations in the predicted values.

Validation of the predicted AOD
Optimized coefficients, a i (Eq.2), were obtained from the first subset in the overall dataset.
To validate the model accuracy, a i was used to predict AOD from the second subset (Fig. 5).
The predicted AOD exhibited high correlation to the measured AOD (R 2 = 0.68).In addition, the temporal characteristics of the predictions between 2012 and 2013 were similar to those of the measured AOD.
To examine bias, the approach proposed by Lee et al. ( 2012) was performed to remove the outliers when the deviation of the predicted AOD was larger than the overall RMSE (0.133).
Approximately 21 % of the total data were removed using this method.After filtering out 21 % of the potential outliers, the left over data were used to calibrate Eq. ( 2).R 2 of this fitting significantly increased to 0.92 with RMSE = 0.059 and % MRE = 1.17×10 -4 .After filtering the outliers, R 2 and RMSE were enhanced, but % MRE remained at 10 -4 level.
Subsequently, these new coefficients obtained were used to predict AOD data (subset 2), which were then compared against the measured counterpart for validation.The prediction failed to improve in terms of R 2 between the predicted and measured AOD (compare the red and black line, in Fig. 5).The %MRE increased from 0.33 to 5.99.As a result, the removed data might not be the genuine outliers.In fact the errors were attributed to the non-uniformly loaded atmospheric aerosols at different altitudes.We believe that the non-uniform atmospheric mixing caused the high deviations in our predicted results, according to previous studies (Qiu and Yang, 2000).
Considering that the proposed model was established based on ground-based sources, the aerosols should be well-mixed in the atmosphere to obey congruency with the vertical measurement of the sun photometer.The predicted AOD were subjected to some uncertainties, however, that were quantified in terms of RMSE because the atmosphere is not always well mixed.
Figure 5 indicates that most of the predicted AOD values were lower than the measured counterparts.Tan et al. (2014c) analyzed the underprediction in these values.They used a LIDAR system to determine the vertical profile of aerosols in Penang and found that the aerosol concentration decreased with height up to the planetary boundary layer (PBL).This layer was less than 2 km during the study period.The large amount of transported aerosols above boundary layer yielded residual layers (Toth et al., 2014).Significant underestimation of AOD occurred for thick residual layers.Only a few points were significantly underpredicted because of the aerosol residual layer beyond PBL.Studies in Cyprus (Retalis et al., 2010) suggested that the extent of atmospheric mixing was relatively homogeneous on scales of a few meters to tens of kilometers.Hence, the predicted results were representative of the large samples.The predicted AOD was underestimated because all measured data were taken from the ground.However, overprediction would be significant if local burning occurred near the measurement station.
To properly validate the prediction, these data should coincide in time with those measured from API, Vis, and AOD level 2. In our case, the LIDAR data coincided only once at 12 July 2013 (Fig. 6). Figure 6a shows the vertical profile of the aerosol backscatter coefficient as a function of time (morning to evening).The brown vertical line represented the instance when both the measured and predicted AOD could be compared with the LIDAR data.Figure 6b illustrates the normalized range corrected signal (RCS) at different altitudes from 10.00 a.m. and 11.00 a.m.local time.RCS was normalized through calibration based on the theoretical molecular backscatter (USSA976 standard atmospheric model) to calibrate the performance of the LIDAR system.
Figure 6c displays the profiles of the aerosol backscatter coefficient obtained at 10:00 and 11:00 a.m.local time.Aerosols had accumulated near the ground at 10:00 a.m., which was consistent with a slightly increased value in the predicted AOD of about 0.039.By contrast, most aerosols at 11.00 a.m. were at a higher level.This result corresponds with the lower value in the predicted AOD of approximately 0.044.Therefore, the predicted AOD values were acceptable because they exhibited small deviations against the measured AOD.This result was thus valid as long as the aerosols did not considerably differ at altitude levels beneath the planetary boundary layer.The LIDAR data should be therefore considered as an independent validation method for ground-based prediction models.In reality, aerosols are not frequently well mixed in the atmosphere.Several environmental factors can cause ambiguity in the predictions (Gupta et al., 2013;Lee et al., 2012).Propagating particles within the free troposphere is a factor and may not be ignored (Toth et al., 2014) when predicting columnar AOD in the atmosphere using near-surface measurement, or vice versa.If a significant number of elevated aerosol plumes (equivalent to aerosol residual layer) occurred over the region, then a large deviation of the prediction value will be produced.Therefore, it can be inferred that the small group of highly underpredicted results (Fig. 5) maybe attributed to the significant large amount of high-level transported aerosol.

Applications of the proposed model in the absence of measured AOD data
Our proposed model generates AOD data when those from AERONET are unavailable.We described the procedure to predict AOD data.Only the API data for 7.00 a.m., 11.00 a.m., and 5.00 p.m. (local time) were available (http://apims.doe.gov.my)before 24 June 2013.The API data were provided hourly beyond this date.In this study, approximately 5 % of the data were discarded due to fog, rain, or thunderstorms, and only 4493 data points were retained.Figure 7 shows the predicted results from 2012 to 2013, which overlapped with the measured AOD data to simplify the comparison.The average AOD was 0.31 based on 4493 predicted data for the entire study period, which was near that of AERONET (about 0.29).
As an illustration, we selectively examine into three separate data windows (28 September, 17 October, and 30-31 October 2013; Fig. 8a-c) to analyze variations in the predicted and measured AOD values.The predicted AOD and CIMEL sun photometer data are shown as blue and red dotted lines, respectively.AOD variations were continuously generated by the proposed model based on the hourly data from ground-based measurements.The unrecorded information by the sun photometer could be reproduced by the proposed method (Fig. 8).The model coefficients were trained under cloud-free conditions.Hence, the hourly AOD data could be generated anytime to compensate for the absence of measured AOD data during cloudy periods.In addition, the proposed model can generate daytime and nighttime temporal data in contrast to AERONET.
The proposed model was independently verified using four selective sets of LIDAR data.We generated these data and compared them against the temporal plots of the aerosol backscattering coefficient signal (Fig. 9).The rectangles in Fig. 9a corresponded to the window periods for the LIDAR signal (Fig. 9b).The variability in the retrieved AOD for the given window periods (Fig. 9a) correspond well to the intensity variations in the aerosol backscattering coefficient signal (Fig. 9b).The LIDAR signals reveal the fidelity of our predicted AOD because the low (high) intensities of aerosol backscattering coefficient signal corresponded to low (high) AOD.The high intensities at 1-1.5 km altitudes (low cloud distributions) are represented by green ovals.Although clouds were present within the selected time windows, the retrieved AOD remained invariant.

Comparison with other linear regression models
The proposed model was compared against other AOD-predicting models in the literature.
Table 4 shows the R 2 values of selected AOD-predicting models calculated using the first data subset by our model (Sect.2).The R 2 values in Table 4 were compared with those of the overall dataset (  , 1924).Two other AOD-predicting models were also compared (Gao and Zha, 2010;Chen et al., 2013).In these models, linear regression analysis for AOD and PM 10 was carried out to predict the surface air quality.The approaches can also be used to retrieve AOD after appropriate conversion procedures.Initially, we converted the API data into PM 10 via the guidance on air pollutant index from DOE (1997).The obtained PM 10 values were inputted into the linear regression formula to predict AOD.The linear regression yielded R 2 ≤ 0.6 with RMSE approximately 0.16 and above, which was much lower than that of our model (≤ 0.72 with RMSE = 0.13) based on the comparison of R 2 values for the "overall" dataset in Table 3 against those in Table 4.This result implied the dominance of the proposed model in terms of R 2 and RMSE.

Conclusions
Seasonal variation in the primary aerosol types and their characteristics in Penang were analyzed from February 2012 to November 2013.The aerosol types for a specific monsoonal period were determined by applying threshold criteria on the scatter plots between aerosol optical depth (AOD) and Angstrom exponent.The threshold criteria from Smirnov at al. (2002b, 2003), Pace et at.( 2006 Previous models used simple regression analysis between AOD and meteorological parameters to predict the corresponding AOD data.In this study, multiple regression analysis was used in the proposed model.Two predictors (API and Vis) were introduced to increase the statistical reliability.To verify the high robustness of multiple regression analysis in contrast to the simple regression approach, AOD data based on previous simple models were retrieved (Mahowald et al., 2007;Gao and Zha, 2010;Retalis et al., 2010;Chen et al., 2013).The R 2 and RMSE values in our model are ≤ 0.72 and 0.13.These figures are to be compared with the results of other relevant work which obtained R 2 ≤ 0.60 and RMSE approximately 0.16 and above (see Table 4).The comparison indicates that the quality of our AOD prediction is statistically better than those simple models.
Our algorithm could properly predict the AOD data during non-retrieval days caused by the frequent occurrence of clouds in the equatorial region.The proposed model yielded reliable and aptly real-time AOD data despite the availability of the measured data for limited time points.The predicted AOD data are beneficial to monitor aerosols in short-and long-term behavior and provide supplementary information in atmospheric correction.6 54.13 59.13 92.13 98.13 111.38 173.38 198.13 213.38 227.38 253.13 262.13 318.38 13.13 21.13 41.13 60.13 70.13 76.38 82.38 110.13 158.38 261.04 283.08 301.25 302.13 303.08 304.33

Figure 4 .Figure 5 .
Figure 4. Seven-day back-trajectory frequency seasonal plot by the HYSPLIT_4 model for a) northeast monsoon, b) pre-monsoon, c) southwest monsoon, d) post-monsoon, and e) overall study period at Penang, which was marked as a five-edged star.

Figure 6 .
Figure 6.a) Profiles of the aerosol backscatter coefficients (km -1 sr -1 ) recorded on 12 July 2013.No data were acquired from 12:00 PM to 2:00 PM.The brown lines represent the moment of acquisition of sun photometer; b) normalized range corrected signals at different altitudes; c) profiles of the aerosol backscatter coeffici ent (beta) obtained from 10 AM to 11 AM for the brown lines in a).

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Predicted AOD_500 data plotted against the period from 2012 to 2013.Rectangles 1 and 2correspond to the data recorded on 24-25 July and 13-14 August 2013, respectively.These data were used for comparison with those obtained from LIDAR (Fig.9).
(Eck et al., 1999)); and the Multi-filter Rotating Shadowband Radiometer in Central Mediterranean(Pace et al., 2006).The scatter plot of AOD_500 or AOD_440 against Angstrom 440-870 was used to identify the aerosol type.The wavelength range of Angstrom 440-870 was used because of its nearness to the typical size range of aerosol based on spectral AOD(Eck et al., 1999).The relation between AOD values at 500 nm and Angstrom 440-870 is usually used for aerosol classification in scatter plot diagram.Many studies used AOD values at 500 nm (Cachorro Table 3).Retalis et al. (2010) suggest a simple linear regression analysis to predict AOD from the Vis data.Mahowald et al. (2007) suggest a similar linear regression model for the AOD prediction model, in which the Vis data were converted to surface extinction coefficients b ext using the Koschmieder equation Vis = K∕b ext , where K (= 3.912) is the Koschmieder constant (Koschmieder Tan et al. (2014a)007) andJalal et al. (2012)07),Salinas et al. (2009), andJalal et al. (2012)determined the aerosol types.The testing results indicated that the threshold criteria byToledano et al. (2007)were the most reliable because of the minimal occurrence value of the indistinguishable aerosols (referred as mixed-type aerosols, MIXA).For the entire study period, the biomass burning aerosols (BMA) abruptly increased during the southwest monsoon period because of active open burning activities in local areas and neighboring countries.During the northeast monsoon period, the optical properties (e.g., size distribution patterns) of the aerosols were unique.Two noticeable peaks were observed in the occurrence frequency of the Angstrom exponents compared with the single peaks for other monsoon seasons.These results were attributed to the mixing of aerosols from local sources with those from the northern part of Southeast Asia, caused by the northeast monsoon winds.Urban and industrial aerosols (UIA) and marine aerosol (MA) were the major aerosols in Penang throughout the year.Dust aerosols (DA) negligibly contributed to the emissions in Penang.The variation in aerosol types for different monsoon seasons yielded distinct optical properties.The original prototype model ofTan et al. (2014a)feasibly predicted the AOD values based on the measured air pollution index (API), Visibility (Vis), and relative humidity (RH) data through multiple regression analysis.In this study, the algorithm ofTan et al. (2014a)was used and slightly modified by neglecting the RH contribution.Our results suggest that the removal of the RH contribution caused no changes in the predictability of the proposed model.The modified algorithm was quantitatively and qualitatively validated.The retrieved AOD data in the proposed model were in agreement with those measured.

Table 1 .
Average values of model-related parameters from the database collected from 4

Table 2 .
Threshold values of AOD and Angstrom 440-870 for aerosol classification.Abbreviations: MA = maritime, DA = dust, UIA = urban and industrial, BMA = biomass burning, MIXA = mixed-type aerosols.MIXA represents indistinguishable aerosol type that lies beyond the threshold ranges.

Table 4 .
R 2 values of the AOD predicted by selected linear regression models from the literature.