Keywords (2)

Optimizing joins in a map-reduce environment

Optimizing joins in a map-reduce environment,10.1145/1739041.1739056,Foto N. Afrati,Jeffrey D. Ullman

Optimizing joins in a map-reduce environment   (Citations: 13)
BibTex | RIS | RefWorks Download
ABSTRACT Implementations of map-reduce are being used to perform many,operations on very large data. We explore alternative ways that a system could use the environment and capa- bilities of map-reduce implementations such as Hadoop. In particular, we look at strategies for combining the natural join of several relations. The general strategy we employ is to identify certain attributes of the multiway join that are part of the "map-key," an identifler for a particular Reduce process to which the Map processes send tuples. Each at- tribute of the map-key gets a "share," which is the number of buckets into which its values are hashed, to form a com- ponent of the identifler of a Reduce process. Relations have their tuples replicated in limited fashion, the degree of repli- cation depending on the shares for those map-key attributes that are missing from their schema. We study the problem of optimizing the shares, given a flxed product (i.e., a flxed number,of Reduce processes). An algorithm for detecting and flxing problems where a variable is mistakenly included in the map-key is given. Then, we consider two important special cases: chain joins and star joins. In each case we are able to determine the map-key and determine the shares that yield the least amount,of replication.
Conference: Extending Database Technology - EDBT , pp. 99-110, 2010
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: