BTS: Resource capacity estimate for time-targeted science workflows
Workflow technologies have become a major vehicle for easy and efficient development of scientific applications. A critical challenge in integrating workflow technologies with state-of-the-art resource provisioning technologies is to determine the right amount of resources required for the execution of workflows. This paper introduces an approximation algorithm named BTS (Balanced Time Scheduling), which estimates the minimum number of computing hosts required to execute workflows within a user-specified finish time. The experimental results, based on a number of synthetic workflows and several real science workflows, demonstrate that the BTS estimate of resource capacity approaches to the theoretical lower bound. The BTS algorithm is scalable and its turnaround time is only tens of seconds, even with huge workflows with thousands of tasks and edges. Moreover, BTS achieves good performance with workflows having MPI-like parallel tasks. Finally, BTS can be easily integrated with any resource description languages and resource provisioning systems since the resource estimate of BTS is abstract.