Clear and Precise Specification of Ecological Data Management Processes and Dataset Provenance

Clear and Precise Specification of Ecological Data Management Processes and Dataset Provenance,10.1109/TASE.2009.2021774,IEEE Transactions Automation

Clear and Precise Specification of Ecological Data Management Processes and Dataset Provenance   (Citations: 2)
BibTex | RIS | RefWorks Download
With the availability of powerful computational and commu- nication systems, scientists now readily access large, complicated derived datasets and build on those results to produce, through further processing, yet other derived datasets of interest. The scientific processes used to create such datasets must be clearly documented so that scientists can evaluate their soundness, reproduce the results, and build upon them in responsible and appropriate ways. Here, we present the concept of an analytic web, which defines the scientific processes employed and details the exact appli- cation of those processes in creating derived datasets. The work described here is similar to work often referred to as "scientific workflow," but em- phasizes the need for a semantically rich, rigorously defined process defi- nition language. We illustrate the information that comprises an analytic web for a scientific process that measures and analyzes the flux of water through a forested watershed. This is a complex and demanding scientific process that illustrates the benefits of using a semantically rich, executable language for defining processes and for supporting automatic creation of process provenance metadata. Note to Practitioners—The Internet and associated computing capabil- ities have made it possible for scientists to derive novel datasets through complex processing of existing datasets that may be collected from many locations. But scientists rarely document dataset provenance - the set of processes and a description of how those processes were used - to allow derived datasets to be recreated. Enabling such recreation is an essential part of repeatable science, and thus it is imperative that any dataset generated by scientific computation include provenance metadata, doc- umentation of the precise way in which that dataset was produced. Provenance metadata can help assure that scientists and others un- derstand the value and limitations associated with using that data, but creating provenance metadata is a difficult and time-consuming problem. This paper describes an approach for helping scientists deal with the production and management of their datasets, including the automated generation of provenance metadata. The approach is based on the use of a precisely defined process definition language. The language is relatively clear and easy for scientists to understand, yet it is precise enough to support their control of the application of computing capabilities to the generation of datasets, and is also an aid to the management and under- standing of these datasets. This paper illustrates these ideas by providing a case study of a specific problem in ecological dataset production and metadata provenance generation. Index Terms—Dataset provenance, process definition, scientific workflow.
Journal: IEEE Transactions Automation Science and Engineering , vol. 7, no. 1, pp. 189-195, 2010
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: