Computing Monthly Particulates (PM2.5) Predictions using Machine Learning (prebeta)
DISCLAIMER: The results reported here are preliminary and subject to modification. We disclaim all warranties of any kind, express or implied, including, without limitation, the warranties of merchantability, fitness for a particular purpose and non-infringement.
In the first version of PM2.5 maps with 1 km spatial resolution we included only aerosol optical depth (AOD) and meteorological data to predict PM2.5 over Europe. We integrated spatiotemporal information in a tree-based ensemble learning approach called an extra-trees (ET) which applies a meta estimator that fits a number of randomized decision trees on various sub-samples of the dataset and uses averaging to enhance the predictive accuracy and to limit over-fitting. We generated daily maps for the year 2018 and these maps were aggregated to compute the monthly maps. A simple gap filling method based on the Inverse Distance Weighted (IDW) was used to fill small gaps in the monthly maps, by taking the average value of 7 by 7 matrix, where its center is the missing value. At least 8 pixels should be available in the matrix to apply this method.
Results of the preliminary model:
Mean Absolute Error (MAE) = 4.220 μg/m3.
Root Mean Squared Error (RMSE) = 6.463 μg/m3.
All data used in this study are publicly available and can be accessed by the links provided in the following table. PM2.5 ground observations from 923 stations across Europe were downloaded from OpenAQ and used as the dependent variable, AOD green band (0.55 μm) and AOD quality assurance (AOD_QA) layers of the daily MODIS MAIAC products from both Aqua and Terra satellites, using only retrievals that have the best quality by combining two filters (QA.CloudMask = Clear and QA.AdjacencyMask = Clear), and meteorological data from ERA-Interim atmospheric reanalysis products.
All input data were processed and uniformly aggregated to a 1-km grid which covers the study area. PM2.5 and auxiliary data observations between 10 am and 2 pm local time were averaged to match the MODIS satellites overpass times, after removing unrealistic PM2.5 values, the total number of PM2.5-AOD pairs and independent variables is 69077.
MAIAC AOD retrievals from satellites have many gaps due to cloud and snow coverage and this is the reason for incomplete coverage over some countries (for example Norway and Sweden). In order to resolve this issue we will use other AOD products to fill such gaps.
Subsequent efforts will focus on improving obtained model accuracy and reducing estimating errors by including more variables that have significant importance such as land cover, NDVI, topographic measures, and population data.
|Aerosol Optical Depth||AOD||MAIAC AOD||1 km||–||MCD19A2|
|Meteo-rological data||BLH||Boundary layer height||9 km||m||ERA 5 Land global re-analysis|
|10m-V||10m v-component of wind||m/s|
|10m-U||10m u-component of wind||m/s|
|TCWV||Total column water vapor||m|
In addition to PM2.5 monthly maps for the year 2018, we included another product that represents the number of daily PM2.5 predictions used to compute the monthly average. Both products are generated as Cloud Optimized GeoTIFF (COG). In the following table we provide specifications for the two products.
|Product||Format||Data Type||Unit||Scale factor|
|Pixel Count||COG||UINT8||Number of pixels||–|
We selected October for downloading demonstratio data, the same principle applies to the remaining months by changing the month in the links.