Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters

Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters,10.1007/978-3-642-00887-0_27,Hung-chih Yang,D. Stott Parker

Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters   (Citations: 4)
BibTex | RIS | RefWorks Download
The search engines that index the World Wide Web today use access methods based primarily on scanning, sorting, hashing, and partitioning (SSHP) techniques. The MapReduce framework is a distinguished example. Unlike DBMS, this search engine infrastructure provides few general tools for indexing user datasets. In particular, it does not include order-preserving tree indexes, even though they might have been built using such indexing components. Thus, data processing on these infrastructures is linearly scalable at best, while index-based techniques can be logarithmically scalable. DBMS have been using indexes to improve performance, especially on low-selectivity queries and joins. Therefore, it is natural to incorporate indexing into search-engine infrastructure. Recently, we proposed an extension of MapReduce called Map-Reduce-Merge to efficiently join heterogeneous datasets and executes relational algebra operations. Its vision was to extend search engine infrastructure so as to permit generic relational operations, expanding the scope of analysis of search engine content. In this paper we advocate incorporating yet another database primitive, indexing, into search engine data processing. We explore ways to build tree indexes using Hadoop MapReduce. We also incorporate a new primitive, Traverse, into the Map-Reduce-Merge framework. It can efficiently traverse index files, select data partitions, and limit the number of input partitions for a follow-up step of map, reduce, or merge.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: