Academic
Publications
Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance

Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance,10.1109/SCC.2010.14,John Abraham,Pearl Brazier,Artem

Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance   (Citations: 1)
BibTex | RIS | RefWorks Download
In scientific workflow environments, scientific discovery reproducibility, result interpretation, and problem diagnosis primarily depend on provenance, which records the history of an in-silico experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this paper, we research how HBase Bigtable-like capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. In particular, we architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. Using the Third Provenance Challenge queries, we conduct an experimental study to show the feasibility of our approach.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...More recently, distributed technologies that are often used in cloud computing, such as Hadoop 1 and HBase 2 , are being explored for distributed and scalable RDF data management [3], [4]...
    • ...Finally, our previous work [4] presents our initial findings on RDF data management in HBase...
    • ...This paper, when compared to [4], proposes new, more effective HBase database schema design, more efficient algorithms for SPARQL triple and basic graph pattern matching, and an empirical comparison with a distributed relational RDF database...
    • ...Our experimental comparison with [4] (not reported in the paper) showed several orders of magnitude speedup for some queries and substantial improvements in scalability...
    • ...To our best knowledge, this paper and our previous paper [4] are the first published research works on Semantic Web data management in HBase...
    • ...matchTP-T takes a triple pattern ���� and a triple �� and returns true if they match or false otherwise. Its pseudocode is outlined in [4]...
    • ...The three PC3 SPARQL queries utilized for the experiments can be found in our previous work [4]...

    Craig Frankeet al. Distributed Semantic Web Data Management in HBase and MySQL Cluster

Sort by: