Assessing positive matrix factorization model fit: a new method to estimate uncertainty and bias in factor contributions at the measurement time scale

Hemann, J. G.; Brinkman, G. L.; Dutton, S. J.; Hannigan, M. P.; Milford, J. B.; Miller, S. L.

doi:https://doi.org/10.5194/acp-9-497-2009

Articles | Volume 9, issue 2

https://doi.org/10.5194/acp-9-497-2009

© Author(s) 2009. This work is distributed under
the Creative Commons Attribution 3.0 License.

https://doi.org/10.5194/acp-9-497-2009

© Author(s) 2009. This work is distributed under
the Creative Commons Attribution 3.0 License.

Articles | Volume 9, issue 2

22 Jan 2009

| 22 Jan 2009

Assessing positive matrix factorization model fit: a new method to estimate uncertainty and bias in factor contributions at the measurement time scale

J. G. Hemann, G. L. Brinkman, S. J. Dutton, M. P. Hannigan, J. B. Milford, and S. L. Miller

Abstract. A Positive Matrix Factorization receptor model for aerosol pollution source apportionment was fit to a synthetic dataset simulating one year of daily measurements of ambient PM_2.5 concentrations, comprised of 39 chemical species from nine pollutant sources. A novel method was developed to estimate model fit uncertainty and bias at the daily time scale, as related to factor contributions. A circular block bootstrap is used to create replicate datasets, with the same receptor model then fit to the data. Neural networks are trained to classify factors based upon chemical profiles, as opposed to correlating contribution time series, and this classification is used to align factor orderings across the model results associated with the replicate datasets. Factor contribution uncertainty is assessed from the distribution of results associated with each factor. Comparing modeled factors with input factors used to create the synthetic data assesses bias. The results indicate that variability in factor contribution estimates does not necessarily encompass model error: contribution estimates can have small associated variability across results yet also be very biased. These findings are likely dependent on characteristics of the data.

Received: 03 Jan 2008 – Discussion started: 14 Feb 2008 – Published: 22 Jan 2009