Machine learning models accurately model ozone exposure during wildfire events

Author(s):

Gregory L. Watson, Donatello Telesca, Colleen Reid, Gabriele G. Pfister, Michael Jerrett

Year Published: 2019

Cataloging Information

Hot Topic(s):

Smoke and Air Quality Management

Topic(s):

Fire Effects
Smoke & Air Quality
Fire & Smoke Models
Smoke Emissions

NRFSN number: 20492

FRAMES RCS number: 58423

Record updated: December 16, 2019

Epidemiologists use prediction models to downscale (i.e., interpolate) air pollution exposure where monitoring data is insufficient. This study compares machine learning prediction models for ground-level ozone during wildfires, evaluating the predictive accuracy of ten algorithms on the daily 8-h maximum average ozone during a 2008 wildfire event in northern California. Models were evaluated using a leave-one-location-out cross-validation (LOLO CV) procedure to account for the spatial and temporal dependence of the data and produce more realistic estimates of prediction error. LOLO CV avoids both the well-known overly optimistic bias of k-fold cross-validation on dependent data and the conservative bias of evaluating prediction error over a coarser spatial resolution via leave-k-locations-out CV. Gradient boosting was the most accurate of the ten machine learning algorithms with the lowest LOLO CV estimated root mean square error (0.228) and the highest LOLO CV R2 (0.677). Random forest was the second best performing algorithm with an LOLO CV R2 of 0.661. The LOLO CV estimates of predictive accuracy were less optimistic than 10-fold CV estimates for all ten models. The difference in estimated accuracy between the 10-fold CV and LOLO CV was greater for more flexible models like gradient boosting and random forest. The order of estimated model accuracy depended on the choice of evaluation metric, indicating that 10-fold CV and LOLO CV may select different models or sets of covariates as optimal, which calls into question the reliability of 10-fold CV for model (or variable) selection. These prediction models are designed for interpolating ozone exposure, and are not suited to inferring the effect of wildfires on ozone or extrapolating to predict ozone in other spatial or temporal domains. This is demonstrated by the inability of the best performing models to accurately predict ozone during 2007 southern California wildfires.

Citation

Watson, Gregory L.; Telesca, Donatello; Reid, Colleen E.; Pfister, Gabriele G.; Jerrett, Michael. 2019. Machine learning models accurately model ozone exposure during wildfire events. Environmental Pollution 254(Part A):112792. https://doi.org/10.1016/j.envpol.2019.06.088

Access this Document

online link

Treesearch

publication access with no paywall

Check to see if this document is available for free in the USDA Forest Service Treesearch collection of publications. The collection includes peer reviewed publications in scientific journals, books, conference proceedings, and reports produced by Forest Service employees, as well as science synthesis publications and other products from Forest Service Research Stations.

Search for this document on Treesearch

Document | Book or Chapter or Journal Article