Academic
Publications
In search for contention-descriptive metrics in HPC cluster environment

In search for contention-descriptive metrics in HPC cluster environment,10.1145/1958746.1958815,Sergey Blagodurov,Alexandra Fedorova

In search for contention-descriptive metrics in HPC cluster environment  
BibTex | RIS | RefWorks Download
In this paper, we argue that the modern HPC cluster environments contain several bottlenecks both within cluster multicore nodes and between them in the cluster interconnects. These bottlenecks represent resources that can be of high demand to several jobs, concurrently executing on the cluster. As such, the jobs can compete for accessing these resources and experience performance degradation due to contention. We point out, that, although the contention for shared resources like memory hierarchy of the cluster nodes, accessing the cluster interconnects or sharing the floating point unit can incur severe performance degradation to the cluster workload, the state-of-the-art cluster schedulers do not contain adequate means of addressing it. To fill this gap, we propose a new set of metrics that models shared resource contention and represents a fine-grained information about each job's resource utilization and communication patterns. The necessary information can be obtained with the performance counters within cluster nodes and cluster interconnect monitoring between them.
Published in 2011.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.