Academic
Publications
Collective communication: theory, practice, and experience

Collective communication: theory, practice, and experience,10.1002/cpe.1206,Concurrency and Computation: Practice and Experience,Ernie Chan,Marcel Hei

Collective communication: theory, practice, and experience   (Citations: 5)
BibTex | RIS | RefWorks Download
SUMMARY We discuss the design and high-performance implementation of collective communica- tions operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included.
Journal: Concurrency and Computation: Practice and Experience - CONCURRENCY , vol. 19, no. 13, pp. 1749-1783, 2007
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...With respect to the optimization of MPI collective operations, in [13] Chan et al. discuss thoroughly the design and high-performance implementation of collective communications operations on distributed-memory computer architectures...
    • ...These algorithms are thoroughly described in [13]...

    Guillermo L. Taboadaet al. Design of efficient Java message-passing collectives on multi-core clu...

    • ...One particular example is that several application models assume that broadcast or allreduce scale with Θ(Sblog(P )) (e.g., [3,17]) while, as demonstrated in Section 4, a good MPI implementation would implement a broadcast or allreduce with Θ(S + log(P )) [5,13,21]...
    • ...An optimal broadcast algorithm that fully exploits the bandwidth of the underlying (strong) interconnect could thus be specified as having running time TBC(S, P )= Θ(log(P )) + βS [13] in contrast to merely efficient broadcast algorithms with the same asymptotic performance TBC(S, P )= Θ(S +log(P )), but in reality behaving as TBC(S, P )= Θ(log(P )) + 2βS [5]...

    Torsten Hoefleret al. Toward Performance Models of MPI Implementations for Understanding App...

    • ...The design, implementation and runtime selection of efficient collective communication operations have been extensively discussed in the context of native messagepassing libraries [4, 7, 17, 19], but not in MPJ...

    Guillermo L. Taboadaet al. F-MPJ: scalable Java message-passing communications on parallel system...

    • ...Thus, the recent implementations, to optimize a collective operation, either use hybrid algorithms [2] depending on the size of messages to be exchanged or select the most efficient algorithm from a pool of algorithms using a performance model [10]...
    • ...In [2], it was also reported that collective communication was an active research topic in the 1980s and early 1990s as distributed-memory architectures with large numbers of processors were first introduced, and, since then no dramatic new developments had been reported...
    • ...small message sizes) message initiation cost tends to dominate the total communication cost, on the other hand, when the amount of data is large, message transmission and handling costs tend to dominate [2]...

    Kayhan M. İmreet al. Efficient and Scalable Routing Algorithms for Collective Communication...

Sort by: