Statistical analysis of particulate matter data in Doha, Qatar

TAYLOR, Charles C., YOUSIF, Asil E. and MWITONDI, Kassim (2018). Statistical analysis of particulate matter data in Doha, Qatar. WIT Transactions on Ecology and the Environment, 230, 107-118.

[img]
Preview
PDF
Mwitondi Statistical analysis of particulate matter data in Doha.pdf - Published Version
All rights reserved.

Download (305kB) | Preview
Official URL: https://www.witpress.com/elibrary/wit-transactions...
Link to published version:: https://doi.org/10.2495/air180101

Abstract

Pollution in Doha is measured using passive, active and automatic sampling. In this paper we consider data automatically sampled in which various pollutants were continually collected and analysed every hour. At each station the sample is analysed on-line and in real time and the data is stored within the analyser, or a separate logger so it can be downloaded remotely by a modem. The accuracy produced enables pollution episodes to be analysed in detail and related to traffic flows, meteorology and other variables. Data has been collected hourly over more than 6 years at 3 different locations, with measurements available for various pollutants – for example, ozone, nitrogen oxides, sulphur dioxide, carbon monoxide, THC, methane and particulate matter (PM1.0, PM2.5 and PM10), as well as meteorological data such as humidity, temperature, and wind speed and direction. Despite much care in the data collection process, the resultant data has long stretches of missing values, when the equipment has malfunctioned – often as a result of more extreme conditions. Our analysis is twofold. Firstly, we consider ways to “clean” the data, by imputing missing values, including identified outliers. The second aspect specifically considers prediction of each particulate (PM1.0, PM2.5 and PM10) 24 hours ahead, using current (and previous) pollution and meteorological data. In this case, we use vector autoregressive models, compare with decision trees and propose variable selection criteria which explicitly adapt to missing data. Our results show that the regression tree models, with no variable transformations, perform the best, and that attempts to impute missing values are hampered by non-random missingness.

Item Type: Article
Additional Information: ** From Crossref via Jisc Publications Router.
Research Institute, Centre or Group - Does NOT include content added after October 2018: Cultural Communication and Computing Research Institute > Communication and Computing Research Centre
Departments - Does NOT include content added after October 2018: Faculty of Science, Technology and Arts > Department of Computing
Identification Number: https://doi.org/10.2495/air180101
Page Range: 107-118
SWORD Depositor: Margaret Boot
Depositing User: Margaret Boot
Date Deposited: 31 Oct 2018 12:09
Last Modified: 18 Mar 2021 08:17
URI: https://shura.shu.ac.uk/id/eprint/23018

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics