A robust calibration approach for PM 10 prediction from MODIS aerosol optical depth

Investigating the human health effects of atmospheric particulate matter (PM) using satellite data are gaining more attention due to their wide spatial coverage and temporal advantages. Such epidemiological studies are, however, susceptible to bias errors and resulted in poor predictive output in some locations. Current methods calibrate aerosol optical depth (AOD) retrieved from MODIS to further predict PM. The recent satellite-based AOD calibration uses a mixed effects model to predict location-specific PM on a daily basis. The shortcomings of this daily AOD calibration are for areas of high probability of persistent cloud cover throughout the year such as in the humid tropical region along the equatorial belt. Contaminated pixels due to clouds causes radiometric errors in the MODIS AOD, thus causes poor predictive power on air quality. In contrary, a periodic assessment is more practical and robust especially in minimizing these cloud-related contaminations. In this paper, a simple yet robust calibration approach based on monthly AOD period is presented. We adopted the statistical fitting method with the adjustment technique to improve the predictive power of MODIS AOD. The adjustment was made based on the long-term observation (2001– 2006) of PM10-AOD residual error characteristic. Furthermore, we also incorporated the ground PM measurement into the model as a weighting to reduce the bias of the MODISderived AOD value. Results indicated that this robust approach with monthly AOD calibration reported an improved average accuracy of PM 10 retrieval from MODIS data by 50 % compared to widely used calibration methods based on linear regression models, in addition to enabling further spatial patterns of periodic PM exposure to be undertaken.


Introduction
The interest in using earth observation satellites to measure atmospheric aerosols has progressed from climate studies to the more important topic of human health.This is due to a satellite's unique ability of providing a synoptic view over large areas in a uniform, repetitive and quantitative way.Atmospheric aerosols originate from both natural and anthropogenic emission sources.The latter are considered to have major implications on human health as they are highly related to mortality and morbidity as already shown by many researchers around the world (Wan Mahiyuddin et al., 2013;Sahani et al., 2011;Bell et al., 2007;Dominici et al., 2006;Franklin et al., 2007;Gent et al., 2003Gent et al., , 2009;;Schwartz et al., 1996;Slama et al., 2007;Hu, 2009).Most of the recent studies highlighted PM 2.5 as the main contributor towards health effects.However, PM 2.5 is a portion of PM 10 and can be estimated with a known constant (Marcazzan et al., 2001).In many developing countries PM 10 is still being measured instead of PM 2.5 due to limited resources.For example, in Malaysia, PM 10 is being measured and used in the Air Pollution Index (API) to assess regional air quality.
Satellite data can be used as a surrogate to monitor regional air quality due to the fact that there are limited ground monitoring stations where many regions are left unmonitored (Schaap et al., 2009;Engel-Cox et al., 2004b).The widely used method of predicting PM concentration from satellite data is by empirical analysis, where in situ PM measurement are linearly regressed with the corresponding satellite AOD.In order to improve the predictive power of the linear regression models, related parameters such as local meteorological and land use information were also used as an input into PM X. Q. Yap and M. Hashim: Robust calibration for PM 10 prediction from MODIS prediction (Liu et al., 2009).However, these models generally predict < 60 % of the PM variability (Hoff and Christopher, 2009).The latest model developed by Lee et al. (2011) uses a mixed effect model to establish daily specific AOD-PM 2.5 relationship and predicts the site mean PM 2.5 concentrations with R 2 = 0.62.Furthermore, Lee et al. (2011) also hypothesized that the relationship between PM 2.5 and AOD varies daily due to time-varying parameters influencing the AOD-PM relationship, such as PM vertical and diurnal concentration profiles, PM optical properties, and others.Therefore, a linear AOD-PM relationship in a long-term daily monitoring is rather limited (Yap et al., 2011), and in fact the time-varying assumption by Lee et al. (2011) that varies minimally spatially on a given day over a specific spatial scale is rarely valid for humid tropical weather over the equatorial regions, where a high probability of cloud-cover exists and is also dependent on the surroundings maritime environment.Thus it is more practical and efficient for the calibration of the satellite data to be based on a monthly basis.
The monthly calibrated satellite data are useful in improving the air pollution indicators of Environmental Performance Index (EPI) reporting in this region.The EPI is a system used to evaluate countries based on 22 performance indicators that focus on environmental issues for which governments can be held accountable (Emerson et al., 2012).Atmospheric PM derived from monthly average satellite data is one of the performance indicators used in EPI evaluation for environmental health.Without the robust calibration, errors in the datasets resulting from systematic error and local climatic effects such as the monsoon and site specific error may occur.These will lead to poor representation of PM concentration and EPI derived from satellite measurements, having consequences of misinterpretation by policy makers around the world.
In this paper, a robust calibration approach is introduced by incorporating a simple adjustment technique into a mixed effects model that is developed to predict PM concentrations using MODIS AOD monthly average datasets.The MODIS AOD is calibrated by minimizing the inherent systematic and random errors (i.e. from sensor and site specific ones) in order to improve the AOD-PM relationship.The adjustment in the mixed effects model was made based on a long-term (2001)(2002)(2003)(2004)(2005)(2006) analysis of the residual bias of MODIS AOD.In addition, this mixed effects model was adjusted for site errors which accounted for time varying parameters on a monthly basis.From the literature search, there are no specific similar robust calibration approaches for satellite AOD which have been reported to date.The result of this study can provide an improved AOD-PM prediction for an EPI and PM human health exposure study as well as for the investigation of PM spatial patterns.

Methodology
Our study is focused on Peninsular Malaysia.The air station across Peninsular Malaysia uses Met One BAM 1020 instrument to collect in situ PM 10 concentration in an hourly basis and averaged into daily average.In order to calibrate the MODIS AOD data for this region, PM 10 was sampled at 34 air stations as shown in Fig. 1 for a period of six years (i.e. 2001 to 2006).The monthly corresponding PM 10 values for each of the 34 air stations were averaged out from the daily PM 10 concentration measurements.On the other hand, the calibration on MODIS AOD data was done by using the in situ PM 10 measurements.Here, the calibration was performed independently for each monitoring site using multiple regression method to identify the random error to be included into the mixed effects model.Thus, this accounts for the spatial variability of the random errors on a monthly basis.After that, a single monthly AOD-PM 10 relationship was established using all the parameters from the 31 monitoring stations.The predicted PM 10 concentration from this method was validated independently in three sites, namely, in the northern, central and southern part of Peninsular Malaysia.

MODIS derived AOD
MODIS is a space sensor aboard NASA's (National Aeronautics and Space Administration) Terra Earth Observing System (EOS) satellite launched in December 1999.Operating at an altitude of approximately 700 km, this polar-orbiting satellite is able to provide aerosol data on a daily basis.MODIS Terra satellite crosses the equator at about 10.30 a.m.(descending orbit) UTC, with a scanning swath of 2330 km (cross-track) by 10 km (along-track at nadir).MODIS has a total of 36 different wavelength channels suited for a wide range of applications.AOD was retrieved by using the second generation operational algorithm (Collection 5) developed by Levy et al. (2009).In general, seven out of 36 wavelength channels (between 0.47 and 2.12 µm) are used during the AOD retrieval.
According to the MODIS AOD retrieval algorithm (Collection 5) by Levy et al. (2009), three different channels of 0.47, 0.66, and 2.12 µm are primarily employed for land aerosol retrievals, while others are used to screen out cloud, snow cover, and ice cover.The reported AOD by MODIS at the wavelength of 0.55 µm is the result of simultaneous inversion from these 3 channels.The accuracy of MODIS AOD data is expected to be AOD = ±0.05± 0.15 AOD over land.More details about the retrieval of MODIS satellite aerosol data are reported in Remer et al. (2005) and Levy et al. (2007Levy et al. ( , 2009Levy et al. ( , 2010)).The MODIS AOD value ranged from 5.0 to −0.05.In this study, the full range of MODIS AOD value were taken into consideration to avoid any biases that may occur during the calibration.
To conduct this study, level 2 MODIS Terra AOD product (MOD04) data were collected for a period of six years (2001 to 2006).However, aerosol data are often missing due to clouds, high surface reflectance (e.g.snow-and ice-cover), and retrieval errors.For Malaysian climatic conditions, cloud cover is a serious issue that causes failure in AOD retrieval by MODIS in most of the region.
To overcome this problem, an averaging algorithm of a 5×5 window was used (Yap et al., 2011).The algorithm assumes that the neighbouring 5 pixels with no AOD retrieval have the same value with the reference pixel with a valid retrieval.This means that if there is no retrieval of AOD in that particular area, then the nearest 5 pixel (50 km) retrieval will be used.In this regard, the number of pixels without AOD information due to cloud cover can be reduced.If there is a continuous valid AOD retrieval, a normal averaging scheme will be applied by ignoring pixels with no AOD retrieval.On averaging multiple pixels, it is expected to reduce the influence of random errors associated with the retrieval of AOD.Furthermore, a 5 × 5 window averaging has been widely used in MODIS validation work, which is in agreement with the average speed of aerosol air mass transport in the mid-troposphere in the Atlantic (Ichoku et al., 2002;Remer et al., 2005).However, as the average wind speed near the earth surface is much less than mid-troposphere, a 5 × 5 window is consider appropriate.On the other hand, if a 3 × 3 window is used, we found that there are many voids left in the imagery that resulted in the poor retrieval of the overall MODIS AOD in Peninsular Malaysia.

Mixed effects model
Recent work by Lee et al. (2011) states that AOD-PM relationship is influenced by time-varying parameters such as relative humidity, PM vertical and diurnal concentration profiles, and PM optical properties.Thus Lee et al. (2011) developed a mixed effect model which allows for day to day variability with a hypothesis of little spatial variability over the study region.In this study, a monthly observation is performed.
The mixed effects model proposed in this study, therefore, uses a monthly input parameter.Here, we hypothesis that the time-varying parameter exhibits a certain pattern in Peninsular Malaysia as a result of the peninsula's climate.Therefore, the monthly site specific spatial variability error, which affected by the time-varying parameter, are statistically estimated from the AOD-PM 10 relationship and plotted to characterize its overall pattern across Peninsular Malaysia.This site specific spatial variability error is then further include in the mixed effects model to predict PM 10 concentrations of the study region.
The mixed effects model used to predict PM 10 concentration is summarized by the following equation: where where E(Y ) mn is the estimated PM 10 concentration in month m, at site n; AOD mn is the MODIS AOD value in the grid cell corresponding to month m, at site n; α fix and β fix fix is the intercept and slope; ε mn is the random error for month m, and site n; ε fix is the fix error or adjustment derived from long-term observations of MODIS AOD.Here, the fix error represents the average effect of AOD on PM 10 concentrations as shown in Eq. ( 2).The α and β in the Eq. ( 2) is the intercept, and slope for the relationship of the long-term observations of MODIS AOD and residual error measurement for PM 10 concentrations.This equation is obtained from a long-term observation of the MODIS AOD residual effects on the in situ PM 10 concentrations measurement.The relationship between the fixed error, ε fix and AOD mn is statistically significant with R 2 = 0.653 where intercept: α = 0.214 [(SE = 0.00379), p < 0.0001] and slope: β = 0.653 [(SE = 0.0105), p < 0.0001].These observations showed that there is a linear pattern of error in the AOD data where the error is directly proportional to the AOD data.Therefore, we added a constant, ε fix , that is derived from this observation to minimize this error.
On the other hand, the random error represent the monthly site bias in the AOD-PM 10 relationship.The site bias may arise since an AOD value in a 10 × 10 km grid cell is an average optical depth in the given grid cell, while the PM 10 concentrations measured at a given site may not be representative www.atmos-chem-phys.net/13/3517/2013/Atmos.Chem.Phys., 13, 3517-3526, 2013 of the whole grid cell (Lee et al., 2011).In short, it represents the bias due to their spatial locations and meteorological condition in relation to the surrounding attribute.Therefore, the site bias is different for every location.To control for this site bias, we added a site term as a random error into the statistical model (Lee et al., 2011).The bias value was computed from ground measurements and interpolated to represents the approximate ground conditions.From the monthly observations, the spatial pattern of the site bias exhibit three general patterns due to the meteorological conditions, i.e. the monsoon effect.From here, we average the spatial distribution of the site bias (random error) according to the monsoon period.
Once this parameter has been entered into the mixed effects model, the PM 10 concentration was estimated for the whole study area using the MODIS AOD.

Model validation
This model is analyzed throughout Peninsular Malaysia by using a cross-validation (CV) method to examine whether the mixed effects model is applicable to our study region.
There are a total of 31 sampling sites which were used in establishing the model and three independent sampling sites were used to validate the model.From the 31 sampling sites, a mixed effects model was developed to predict PM 10 in Peninsular Malaysia.To assess the relationship between the predicted and measured PM 10 concentrations for each site, the Pearson correlation coefficients were used.A high correlation indicates that the MODIS AOD data can be used to assess human health exposure investigations and can be applied in establishing the EPI for Malaysia.In addition, root mean square error (RMSE) will be used to quantitatively assess the accuracy of final output.This validation is important to investigate the reliability and accuracy of the predicted PM 10 concentration to assess the spatial accuracy of the predicted PM 10 .

PM 10 in Malaysia
In Malaysia, severe cases of air pollution generally affect our neighboring countries as a result of forest fire and monsoon wind (Hashim et al., 2004).This event usually occurs during southwest monsoon season between May till September, which brings haze from the Sumatra region to the western side of Peninsular Malaysia.Other local sources of air pollution include vehicle emissions, power generation, industrial emissions, open burning and forest fires (Afroz et al., 2003;Azmi et al., 2010;Dominick et al., 2012).Furthermore, west Peninsular Malaysia, where major cities are located, usually has a higher PM 10 concentration in comparison to other regions due to anthropogenic activities (Azmi et al., 2010).

Descriptive statistics
The mean concentrations of PM 10 in our study site are summarized in Table 1.From Table 1, there are several sites that exhibit high mean (SE) PM 10 concentrations across Peninsular Malaysia.For example, Perai, Melaka, Kuala Selangor, Klang, KL, Shah Alam and Manjung.These sites are mainly industrialized regions that are affected by heavy traffic and seasonal haze.Surprisingly, Bukit Rambai, Melaka, has the highest mean (SE) PM 10 concentration at 74.9 (1.81) µg m −3 followed by Klang at 72.3 (3.04) µg m −3 .The exceptionally high PM 10 concentration in Bukit Rambai is mainly a result of local anthropogenic activities as it is situated in an industrial district with a secondary impact from seasonal haze (Mahmud and Iza, 2010).The range and average number of monthly sample points, (n) across peninsular Malaysia was between 35 to 72 and 61, respectively.A total of 433 (17.69 %) monthly samples points were discarded due to the unavailability of a corresponding point with the MODIS AOD samples.

PM 10 prediction
In the mixed effects model, the random error for all 72 months were generated (from year 2001 to 2006) and are summarized in Table 2.The random error was attributed to site and time varying errors.It has a seasonal pattern across Peninsular Malaysia as shown in Fig. 2. From Table 2 and Fig. 2, regions of densely developed sites have a high negative random error.This shows that these regions tend to have an overestimated AOD value.Therefore, it is necessary to include the random error into the mixed effects model to perform the adjustment.The fixed error or adjustment effect, ε fix , represents the monthly effect of MODIS AOD on PM 10 for all study days.This constant is derived from Eq. ( 2).
Using the mix effects model approach, the seasonal pattern of random error clearly shows the effects of the monsoon wind on our study region.The negative (red) region denotes the overestimated value from MODIS AOD that needed to be trimmed down.Similarly the positive (blue) region indicates an underestimation of the MODIS AOD, so that enhancement is needed.The overestimation of MODIS AOD may be due to the effect of unscreened clouds resulting from the MODIS cloud screening algorithm (Lee et al., 2011).This was also demonstrated in the work of Holben et al. (1998) where level 2 (AERONET) data were compared with MODIS AOD where unscreened clouds cause a positive bias in the predicted particulate matter concentration.Furthermore, bright surface condition may also increase the error as a result of poorer visible to infrared (2.12 µm) band relationship (Levy et al., 2009).In addition, the overestimation of the AOD may also be related to the natural multiple scattering effect of the atmospheric particulate matter (pollutant).Thus most of the well-developed regions (having bright surface) tend to be overestimated.In Fig. 2, the effect of the monsoon wind was clear as it drifted the overestimated region further inland towards east Peninsular Malaysia.In contrast, the occurrence of the underestimated MODIS AOD values was common only during the intermonsoon season and in some rural areas.The intermonsoon is the interval when a change of monsoon wind direction occurs.During this period, most of the atmospheric particulate matter concentrations recorded originated from local anthropogenic activities due to the stagnant wind condition.Therefore, the underestimated MODIS AOD value at this period could be due to lower pollution levels in that particular area, compared to its surroundings that resulted in a plunge of offset in the observed MODIS AOD value below the apparent value.
The predicted PM 10 concentrations from the mixed effects model are also prone to errors attributed from the difference in AOD retrieval and in situ measurements of the PM 10 concentrations.This is due to the fact that the in situ measurements were point measurements, whilst the AOD was based on 10 × 10 km grid cells.However, this error was not taken into account due to the fact that the in situ measurements were a 24 h average.Here, if the surrounding (within 10 × 10 km) PM 10 concentration of a particular station was to represent the 10 × 10 km grid cell, it would most probably have been measured by the monitoring station within 24 h.Thus the 24 h in situ PM 10 concentrations averaged to represent the 10 × 10 km grid cell, would most probably resemble the predicted PM 10 concentration from MODIS AOD.Furthermore, the comparison of a 10 × 10 km grid cell with a point measurement was a common practice among researchers such as Chu et al. (2003), and Koelemeijer et al. (2006).However, for a monitoring station that is close to the pollution source such as Bukit Rambai, Melaka, the random error would appear higher due to the point measurement as it does not represent the 10 × 10 km 2 grid cell.Therefore, it is important to avoid sampling in close proximity to a pollution source, when the aim is to compare it to a large grid cell.

Accuracy assessment
In order to examine the accuracy of the predicted monthly PM 10 concentrations, the monthly in situ PM 10 concentrations and the monthly predicted PM 10 concentrations were regressed as shown in Fig. 3.The mixed effects model ex-plained 77 % of the variability in the monthly measured PM 10 concentration for a period of six years (i.e. 2001 till 2006).From Fig. 3, the relationship of the predicted PM 10 concentration using the mixed effects model approximate the ground condition [slope = 1; intercept = 2 × 10 −05 ; n = 1895, p < 0.0001].Furthermore, validation from three independent ground stations (i.e.Johor Bahru, Shah Alam and Pengkalan Chepa which are situated in southern, central and northern parts of Peninsular Malaysia) that were chosen to assess the predicted PM 10 concentration also show a promising result when regressed with the measured PM 10 concentrations [slope = 1.085; intercept = −5.515;n = 181; p < 0.0001] (Fig. 4).This slope presented in Fig. 4 shows that the predicted PM 10 concentration had a high agreement with the in situ PM 10 concentration measurement and the intercept represent the noise in the predicted PM 10 concentration dataset which is considerably lower.Furthermore, the ability of the mixed effects model to predict the PM 10 concentration was compared to a linear regression model by using Pearson correlation, R, and root mean square error, RMSE (Table 3).The linear regression model has been widely used by many researcher (Chu et al., 2003;Engel-Cox et al., 2004a,b, 2005, 2006) to establish the AOD-PM 10 or 2.5 relationship, and therefore is regarded as a common and valid methodology to predict particulate matters of different sizes (10 µm and 2.5 µm in diameter).Since the R does not quantitatively reflect the difference between the measured and predicted PM 10 concentrations, RMSE is necessary to better assess both models.In Table 3, the performance of the mixed effects model has significantly improved the accuracy of the predicted PM 10 , compared to the linear regression model.Overall, the long-term Pearson cor-relation, R, of predicted PM 10 concentration has improved from 0.60 to 0.88 using the mixed effects model.Similarly, the RMSE of the predicted PM 10 concentration of the mixed effects model improvised the linear regression model by an average of ±6.18 µg m −3 annually.In other words, the accuracy of the mixed effects model was superior and has improved approximately 50 % compared to the conventional linear regression model.This was further confirmed by the ANOVA test (p value ≈ 1) which suggests that the predicted PM 10 concentration using our method is in high agreement with in situ measurements.From this performance test, the mixed effects model appeared to be a better solution in producing a reliable concentration map for both environmental and health effect studies.

Conclusions
To date, there has been an increase in the adoption of satellite AOD data into air pollution, health effects and environmental studies.The awareness of the potential of remote sensing technologies to enhance ground-level particulate matters monitoring networks has further encouraged the many government and private agencies to look into its practicality.In Malaysia, the used of satellite derived parameters as performance indicators in EPI is one of the highlights to bring forward these technologies.However, the application of satellite data has always been received with skepticism in this region due to cloud cover and low predictive power.The proposed mixed effects model suggested in this paper has shown that this calibration method can be reliable in producing a better PM 10 concentration map for this region.Taking into account the site specific random error and the fixed errors, the accuracy of the satellite data improve significantly.Next, we anticipate that the outcome of this method will be increasingly used for health effects, pollution and environmental related studies.Future satellite technologies are expected to improve spatial and temporal resolutions in the near future, resulting in an even more accurate retrieval method.As the satellite data are readily available, monitoring and predicting atmospheric pollution such as PM 10 can be made in a cost-effective way.Another focus of our future research will be to study atmospheric particulate matter and other atmospheric trace gases that are harmful to human health.

Fig. 3 .
Fig. 3. Assessment of the monthly in situ PM 10 measurement and monthly predicted PM 10 by a mixed effects model.

Fig. 4 .
Fig. 4. Assessment of the monthly in situ PM 10 measurement and monthly predicted PM 10 from three independent monitoring sites.
SE: Standard error.n: Number of monthly samples points.

Table 3 .
Long-term comparison on linear regression model and mixed effects model on PM 10 -AOD and RMSE (µg m −3 ) of annual MODIS estimated PM 10 concentration.