De praeceptis ferendis: good practice in multi-model ensembles
Abstract. Ensembles of air quality models have been formally and empirically shown to outperform single models in many cases. Evidence suggests that ensemble error is reduced when the members form a diverse and accurate ensemble. Diversity and accuracy are hence two factors that should be taken care of while designing ensembles in order for them to provide better predictions. Theoretical aspects like the bias–variance–covariance decomposition and the accuracy–diversity decomposition are linked together and support the importance of creating ensemble that incorporates both these elements. Hence, the common practice of unconditional averaging of models without prior manipulation limits the advantages of ensemble averaging. We demonstrate the importance of ensemble accuracy and diversity through an inter-comparison of ensemble products for which a sound mathematical framework exists, and provide specific recommendations for model selection and weighting for multi-model ensembles. The sophisticated ensemble averaging techniques, following proper training, were shown to have higher skill across all distribution bins compared to solely ensemble averaging forecasts.