Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms

Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms,10.1109/CCGrid.2011.74,Daniel Zinn,Quinn Hart,Timothy McPhillips,

Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms   (Citations: 1)
BibTex | RIS | RefWorks Download
Scientific workflows are commonplace in eScience applications. Yet, the lack of integrated support for data models, including streaming data, structured collections and files, is limiting the ability of workflows to support emerging applications in energy informatics that are stream oriented. This is compounded by the absence of Cloud data services that support reliable and performant streams. In this paper, we propose and present a scientific workflow framework that supports streams as first-class data, and is optimized for performant and reliable execution across desktop and Cloud platforms. The workflow framework features and its empirical evaluation on a private Eucalyptus Cloud are presented. I. INTRODUCTION Scientific workflows have gained a firm foothold in mod- eling and orchestrating data intensive scientific applications by scientists and domain researchers (1). Despite advances in workflow systems, the diversity of data models supported by workflows remains inadequate. Directed acyclic graphs (DAGs), and control and data flows operating on simple value types and files form the most common programming model available. Workflow systems that support collections or structured objects (2), (3) are more the exception than the rule. While existing workflow data models are sufficient for a number of legacy applications that were originally or- chestrated as scripts operating on files, an emerging class of scientific and engineering applications needs to actively operate on data as it arrives from sensors or instruments, and react to natural or physical phenomena that are detected. In addition, these novel data and compute intensive appli- cations are well suited to be targeted for Cloud platforms, whether public or private (4), (5). The elastic resources available on the Cloud fit with the non-uniform resource needs of these applications, and the on-demand nature of the Cloud can help with their lower latency requirements. However, the native data services offered by many public Clouds - files, queues and tables - do not yet include high- performance, streaming-friendly services. For example, consider the energy informatics domain and smart power grids 1 in particular. Data continuously arriving 1 program from 1.4 million smart meters in Los Angeles households will soon need to be continuously analyzed in order to detect impending peak power usage in the smart power grid and notify the utility to respond by either spinning up additional power sources or by triggering load curtailment operations to reduce the demand (5). This closed loop cyber- physical application, modeled as a workflow, needs to com- bine streaming data arriving from sensors with historic data available in file archives along with structured collections of weather forecast data that help the large scale computational model make an energy use prediction in near real time. A workflow framework that supports this data model diversity, including streaming data, structured collections and files, and the ability to execute reliably and scalably on elastic computational platforms like the Cloud is currently absent. In this paper, we address this lacuna by proposing a scientific workflow framework that supports the diverse data models required by these emerging scientific applications, and evaluate its performance and reliability across desktop and Cloud platforms. Specifically, we make the following contributions:
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...As observed in [1], focus on cyber-physical applications, modelled as a workflow, often need to combine dynamic data from sensors with static data held within file systems...
    • ...user-defined markers – referred to as “landmarks” in [1]...
    • ...Reference nets support different representations [1]: tokens that store remote locations of distributed files, or express structured collections of data (represented as a Java Array)...
    • ...Additionally, Reference nets also allows us to express different data models [1]: tokens that reference files, structured collections of data represented as Java Array and Stream of data represented as abstract workflows that are executed in the pipeline...

    Rafael Tolosana-Calasanzet al. Dynamic Workflow Adaptation over Adaptive Infrastructures

Sort by: