Coordinating Computation and I/O in Massively Parallel Sequence Search
With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation- and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these run-time irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have bee nw ell studied independently, little attention has been pai dt o the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search .O ur study reveals that the lack of coordination between com putation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load-balancing of computation and high- performance noncontiguous I/O. Index Terms—scheduling, parallel I/O, bioinformatics, parallel genomic sequence search, BLAST !