Understanding application-level interoperability: Scaling-out MapReduce over high-performance grids and clouds
Application-level interoperability is defined as the ability of an application to utilize multiple distributed heterogeneous resources. Such interoperability is becoming increasingly important with increasing volumes of data and multiple sources of data as well as resource types. The primary aim of this paper is to understand different ways and levels in which application-level interoperability can be provided across distributed infrastructure. Our approach is: (i) Given the simplicity of MapReduce, its widespread usage, and its ability to capture the primary challenges of developing distributed applications, use MapReduce as the underlying exemplar; we develop an interoperable implementation of MapReduce using SAGA — an API to support distributed programming, (ii) Using the canonical wordcount application that uses SAGA-based MapReduce, we investigate its scale-out across clusters, clouds and HPC resources, (iii) Establish the execution of wordcount application using MapReduce and other programming models such as Sphere concurrently. SAGA-based MapReduce in addition to being interoperable across different distributed infrastructures, also provides user-level control of the relative placement of compute and data. We provide performance measures and analysis of SAGA-MapReduce when using multiple, different, heterogeneous infrastructures concurrently for the same problem instance.