Grid-based Data Analysis of Air Pollution Data

Grid-based Data Analysis of Air Pollution Data,M. Ghanem,Y. Guo,J. Hassard,M. Osmond,M Richards

Grid-based Data Analysis of Air Pollution Data   (Citations: 4)
BibTex | RIS | RefWorks Download
In this paper, we discuss the main informatics challenges that arise when a high throughput sensor network is constructed with view to addressing real environmental challenges, such as real-time urban air pollution monitoring and mapping. We present a distributed infrastructure based on grid technology and data integration and mining tools, and describe our experience in developing such components for the analysis of air pollution data. The Discovery Net project (1) is a UK e-Science project funded for the development of grid-based knowledge discovery environments. In particular, the project focuses on developing methods for the integration and the analysis of data generated from distributed high throughput devices including environmental science, remote sensing, biochips, high through screening technology in biochemistry and combinatorial chemistry and high throughput sensors in energy and geology. The goal is to develop an advanced generic computing infrastructure that supports real-time processing, interpretation, integration, visualisation and mining of massive amounts of time-critical data generated from such devices. One of the main application areas of Discovery Net is the analysis of data generated by highly accurate high throughput pollution monitoring GUSTO sensors. GUSTO is an acronym for Generic Ultraviolet Sensors Technologies and Observations based on open-path DUVASTM (differential ultraviolet absorption spectroscopy) technology and measures and transmits the volume mixing ratios (at ppb levels) of key urban pollutants in real-time, providing exceptional temporal and spatial resolution. Such sensors were developed for assessing the impact of urban air pollution in densely populated regions. 2. Knowledge Discovery Challenges for Environmental Data As many cities throughout the world become more and more congested, concerns increase over the level of urban air pollution being generated and in particular its impact on localised human health effects such as asthma or bronchitis. The more this relationship is understood, the better chance there is of controlling and ultimately minimising such effects. In the majority of the developed world, legislation has already been introduced to the extent that local authorities are required by law to conduct regular Local Air Quality Reviews of key urban pollutants such as Benzene, SO2, NOx or Ozone - produced by industrial activity and/or road transport. In order to achieve this however, pollutant concentrations must be monitored accurately and ideally in situ so that sources can be identified quickly and the atmospheric dynamics of the process is understood. Furthermore, such data would lend itself to real-time environmental decision- making capabilities as a result of hazardous levels being identified quickly. Also more precise atmospheric modelling and predictive research is possible with the availability of more extensive datasets. Deploying a sensor network over a target region, such as a heavy industrialised or densely populated area, creates a wealth of data allowing new types of analysis to be conducted. These include the analysis and visualisation of the spatiotemporal variation of multiple pollutants in respect to one another, and their correlation with third-party data, such as weather, health or traffic data. Such analysis can provide valuable clues to why local health effects (relating to respiratory illnesses) are observed. Such data sets (if available) typically reside on remote databases, and are stored in a variety of formats. The analysis of such data requires the integration of a multitude of data analysis components (statistical, clustering, visualisation and data classification tools). Also with this type of integrated analysis comes a new layer of complexity in terms of data management and data analysis considerations and therefore it is vital that the right infrastructure is in place in order to exploit the data sets fully. For example, the choice of which data sets and data analysis components to use is typically governed by end user requirements, and these vary from city planners and local government, to health practitioners, environmental organizations and academic researcher groups. Consequently, the data analysis infrastructure used must be versatile enough to cater for the needs of such diverse users.
Cumulative Annual
Sort by: