Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors,10.1145/1272996.1273004,David Tam,Reza Azimi,Michael Stumm

Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors   (Citations: 51)
BibTex | RIS | RefWorks Download
The major chip manufacturers have all introduced chip mul- tiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache shar- ing within a chip. Mid- and large-scale systems will have multiple processing chips and hence consist of an SMP- CMP-SMT configuration with non-uniform data sharing over- heads. Current operating system schedulers are not aware of these new cache organizations, and as a result, distribute threads across processors in a way that causes many unnec- essary, long-latency cross-chip cache accesses. In this paper we describe the design and implementation of a scheme to schedule threads based on sharing patterns detected online using features of standard performance mon- itoring units (PMUs) available in today's processing units. The primary advantage of using the PMU infrastructure is that it is fine-grained (down to the cache line) and has rel- atively low overhead. We have implemented our scheme in Linux running on an 8-way Power5 SMP-CMP-SMT multi- processor. For commercial multithreaded server workloads (VolanoMark, SPECjbb, and RUBiS), we are able to demon- strate reductions in cross-chip cache accesses of up to 70%. These reductions lead to application-reported performance improvements of up to 7%.
Conference: EuroSys Conference - EUROSYS , pp. 47-58, 2007
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: