Academic
Publications
SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems

SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems,10.1145/1654059.1654070,Yu Hua,Hong Jiang,Yi

SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems   (Citations: 5)
BibTex | RIS | RefWorks Download
Existing data storage systems based on hierarchical direc- tory tree do not meet scalability and functionality require - ments for exponentially growing datasets and increasingly complex metadata queries in large-scale file systems with billions of files and Exabytes of data. This paper proposes a novel decentralized semantic-aware metadata organization, called SmartStore, which exploits metadata semantics of files to judiciously aggregate correlated files into semanti c- aware groups by using information retrieval tools. The de- centralized design of SmartStore can improve system scal- ability and reduce query latency for both complex queries (including range and top-k queries), which is helpful to con- struct semantic-aware caching, and conventional filename- based point query. The key idea of SmartStore is to limit search scope of a complex metadata query to a minimal number of semantically related groups and avoid or alle- viate brute-force search in entire system. Extensive exper- iments based on real-world traces show that SmartStore sig- nificantly improves system scalability and reduces query la - tency by more than one thousand times faster than current database approaches. To the best of our knowledge, this is the first paper addressing complex queries in large-scale fil e systems.
Conference: Supercomputing Conference - SC , pp. 1-12, 2009
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...In addition, HPC researchers have developed novel methods in support of high performance IO, which include data staging [1], [41], the use of alternative file formats or organizations [8], [34], [29], [30], and better ways to organize and update file metadata [42], [16], [20], [57], [31]...

    Jay F. Lofsteadet al. Managing Variability in the IO Performance of Petascale Storage System...

    • ...Partitioning algorithms are a proven way to improve metadata search speeds for large file systems [24], [21], when a monolithic metadata index is too large to fit comfortably into main memory...
    • ...Our claim of increased search performance is not unique: many other systems, such as Spyglass [24] and SmartStore [21], have proposed different methods of partitioning large file systems, each promising improved search performance...
    • ...• A greedy time based algorithm • An interval time based algorithm • User based partitioning • Cosine correlation clustering • Cosine correlation clustering with Latent Semantic Analysis (LSA) – SmartStore [21] • Greedy depth first search partitioning – Spyglass [24]...
    • ...We chose this algorithm because we wanted to replicate the results of SmartStore [21], which attempts to improve performance by clustering data according to the correlation of its metadata, and then using the clusters as the partitions...
    • ...Since there are currently no standard benchmarks for file system search, many systems have adopted the method of generating randomized queries in order to evaluate performance [24], [21]...

    Aleatha Parker-Woodet al. Security Aware Partitioning for efficient file system search

    • ...Our preliminary results based on these and the HP [17], MSN [18], and EECS [19] traces further show that exploiting semantic correlation of multi-dimensional attributes can help prune up to 99.9% search space [20]...
    • ...Due to space limitation, additional performance evaluation results are omitted but can be found in our technical report [20] and work-in-progress report [21]...

    Yu Huaet al. SmartStore: a new metadata organization paradigm with semantic-awarene...

    • ...parallel). Previous work has used Latent Semantic Indexing (LSI) as a policy to group related files [8]...

    Andrew W. Leunget al. Copernicus: A Scalable, High-Performance Semantic File System

    • ...This work-in-progress report proposes a novel decentralized semantic-aware metadata organization paradigm, called Smart-Store [1], to efficiently organize file metadata into a semantic R-tree through semantic analysis on file metadata, which ena bles efficient complex queries including range and top-k queries ...

    Yu HuaHong Jianget al. SmartStore: A New Metadata Organization Paradigm with Semantic-Awarene...

Sort by: